Skip to content

anhnd1995/Data-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steam Data Crawler

CSCE 470 Team project - A crawler program to retrieve essential information from Steam.com

Authors

Background

This project's goal is to build a data crawler program to retrieve data from Steam website. These data will be kept in our form of a database, and will be used by another program to build a search engine.

Installation Instructions

What things you need to install the software and how to install them

The crawler is stored in Steam Crawler folder Step 1: Download crawler directory into your system, and cd into this folder. Step 2: Setup python 3.6 environment:

virtualenv -p python3.6 env
env/bin/activate

Step 3: Install listed package requirements

pip install -r requirements.txt

Step 4: Create a folder named 'output' <- this folder will contain the crawled data Step 5: Crawl the data using this command. All crawled will be retrieved in real time and stored in products_all.ij with the format listed in steam/spider/product_spirder.py

scrapy crawl products -o output/products_all.jl --logfile=output/products_all.log --loglevel=INFO -s JOBDIR=output/products_all_job -s HTTPCACHE_ENABLED=False

Review Data Crawling

First, all review urls from crawled products must be extracted and exported to a text file using split_review_url script in script folder. Once generated a url text file, we can then call scrapy to start crawling reviews data.

scrapy crawl reviews -o review.jl ...

Timeline

  • 02/05/2018 - Project Initiated
  • 02/18/2018 - Project Proposal Submitted
  • 02/18/2018 - Data Crawler Completed - Data Retrieved

Built With

Acknowledgments

  • Big thanks to ANDRE PERUNICIC for making the crawler available for reference.
  • Scrapy Tool
  • Steam

About

CSCE 470 Team Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published