-
Notifications
You must be signed in to change notification settings - Fork 0
kokani/tarantula
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
© Amit Sawant, Sept 2016 # tarantula Read clinic information COMMENTS -------- -Results of my test run are also in the folder in the file groupon_dental_clinics.txt. Some records on those pages are not actual clinics but deals of products, which have been skipped so the count on the dental category of 263 is incorrect, the real count is close to whats in the results file. -This can be done without using Selenium and using HTTP requests, which will be faster. But with Selenium offers couple of benefits which HTTP requests don't, 1. This crawler will also work for sites which build html content on the page by running Javascript. 2. Debugging is easier, since you can see the page loaded in a browser, when the crawler crashes its easier to determine the cause. -To crawl more sites to need to add a new folder in the sites folder and customize the three files in the groupon folder for the site LIBRARIES --------- python2.7, Selenium, BeautifulSoup4, chrome browser SETTINGS -------- Set PYTHONPATH, PYTHONPATH=....PARENT_FOLDER\tarantula HOW TO RUN ---------- 1. cd to tarantula folder 2. Run this command, python crawler.py -s groupon
About
Read clinic information
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published