Skip to content

Commit

Permalink
Remove redundant description of web-spider demo.
Browse files Browse the repository at this point in the history
  • Loading branch information
ajdavis committed Apr 15, 2015
1 parent 41724ba commit aed1fc4
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 19 deletions.
13 changes: 0 additions & 13 deletions demos/webspider/webspider.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
"""A trivial web-spider that crawls all the pages in http://tornadoweb.org.
``spider()`` downloads the page at `base_url` and any pages it links to,
recursively. It ignores pages that are not beneath `base_url` hierarchically.
This function demonstrates `queues.Queue`, especially its methods
`~queues.Queue.join` and `~queues.Queue.task_done`.
The queue begins containing only
`base_url`, and each discovered URL is added to it. We wait for
`~queues.Queue.join` to complete before exiting. This ensures that
the function as a whole ends when all URLs have been downloaded.
"""

# start-file
import HTMLParser
import time
Expand Down
13 changes: 7 additions & 6 deletions docs/guide/queues.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ until there is room for another item.
A `~Queue` maintains a count of unfinished tasks, which begins at zero.
`~Queue.put` increments the count; `~Queue.task_done` decrements it.

In the web-spider example here, when a worker fetches a page it parses the
links and puts new ones in the queue, then calls `~Queue.task_done` to
decrement the counter once. Eventually, a worker fetches a page whose URLs have
all been seen before, and there is also no work left in the queue. Thus that
worker's call to `~Queue.task_done` decrements the counter to zero. The main
coroutine, which is waiting for `~Queue.join`, is unpaused and finishes.
In the web-spider example here, the queue begins containing only base_url. When
a worker fetches a page it parses the links and puts new ones in the queue,
then calls `~Queue.task_done` to decrement the counter once. Eventually, a
worker fetches a page whose URLs have all been seen before, and there is also
no work left in the queue. Thus that worker's call to `~Queue.task_done`
decrements the counter to zero. The main coroutine, which is waiting for
`~Queue.join`, is unpaused and finishes.

.. literalinclude:: ../../demos/webspider/webspider.py
:start-after: # start-file

0 comments on commit aed1fc4

Please sign in to comment.