Steam Data Crawler

CSCE 470 Team project - A crawler program to retrieve essential information from Steam.com

Authors

Juan Duran - Contributor - Juan's Github
Anh Nguyen - Contributor - Anh's Github
Han Hong - Contributor - Hong's Github

Background

This project's goal is to build a data crawler program to retrieve data from Steam website. These data will be kept in our form of a database, and will be used by another program to build a search engine.

Installation Instructions

What things you need to install the software and how to install them

The crawler is stored in Steam Crawler folder Step 1: Download crawler directory into your system, and cd into this folder. Step 2: Setup python 3.6 environment:

virtualenv -p python3.6 env
env/bin/activate

Step 3: Install listed package requirements

pip install -r requirements.txt

Step 4: Create a folder named 'output' <- this folder will contain the crawled data Step 5: Crawl the data using this command. All crawled will be retrieved in real time and stored in products_all.ij with the format listed in steam/spider/product_spirder.py

scrapy crawl products -o output/products_all.jl --logfile=output/products_all.log --loglevel=INFO -s JOBDIR=output/products_all_job -s HTTPCACHE_ENABLED=False

Review Data Crawling

First, all review urls from crawled products must be extracted and exported to a text file using split_review_url script in script folder. Once generated a url text file, we can then call scrapy to start crawling reviews data.

scrapy crawl reviews -o review.jl ...

Timeline

02/05/2018 - Project Initiated
02/18/2018 - Project Proposal Submitted
02/18/2018 - Data Crawler Completed - Data Retrieved

Built With

Python 3.6 - Python Development and Environment

Acknowledgments

Big thanks to ANDRE PERUNICIC for making the crawler available for reference.
Scrapy Tool
Steam

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Crawled-Data		Crawled-Data
Documents		Documents
Steam Crawler		Steam Crawler
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steam Data Crawler

Authors

Background

Installation Instructions

Review Data Crawling

Timeline

Built With

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

anhnd1995/Data-Crawler

Folders and files

Latest commit

History

Repository files navigation

Steam Data Crawler

Authors

Background

Installation Instructions

Review Data Crawling

Timeline

Built With

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages