This set of scripts crawls STEAM website to download game reviews.
These scripts are aimed at students that want to experiment with text mining on review data.
The script have an order of execution.
-
steam-game-crawler.py download pages that lists games into ./data/games/
-
steam-game-extractor.py extracts games ids from the downloaded pages, saving them into ./data/games.csv
-
steam-review-crawler.py uses the above list to download game reviews pages into ./data/reviews This process can take a long time (it's a lot of data and the script sleeps between requests to be fair with the server). When the script is stopped and restarted it will skip games for which all reviews have been downloaded on the previous run (it does not downloads new reviews for such games).
-
steam-review-extractor.py extracts reviews and other info from the downloaded pages, saving them into ./data/reviews.csv
Column in the reviews.csv file:
- game id
- number of people that found the review to be useful
- number of people that found the review to be funny
- username of the reviewer
- number of games owned by the reviewer
- number of reviews written by the reviewer
- 1=recommended, -1=not recommended
- hours played by the reviewer on the game
- date of creation of the review
- text of the review
The last script steam-reviews-stats.py is a sample script that processes the review.csv file and outputs some basic info and stats in json files:
-
./data/games.json number of reviews and played hours for every game.
-
./data/users.json number of game owned (as reported by user's badge on STEAM) and number of played hours.
-
./data/summary.json number of reviews, number of played hours, number of users, number of games.
On March 15, 2018 those last statistics are:
reviews 6614765
played hours 554702535
users 2720777
games 26677