nairaland_scraping

Web scraper for popular nigerian website (nairaland.com) to JSON Lines formatted text file.

run

To run on your own, you need scrapy https://scrapy.org/, then use:

scrapy crawl nairaland -o nairaland.jl --logfile nairaland.log --loglevel INFO

extracted data

The data extraction pipeline was developed to aggregate data e.g. number of images in a post, number of links etc rather than extracting full text from the website. The format of the extracted data is:

{
    _id - unique id of post/comment
    retrieved - timestamp of article scraping
    article_id - unique article id
    forum - article forum (e.g. politics, business)
    links - number of links in comment/post
    posted- timestamp of comment/post
    quote - true or false representing if this comment quotes another one
    shares - number of times shared
    likes -  number of times liked
    images - number of images attached
    page_no - page number of the post in the article
    user - username of post writer
}

sample:

A sample of the extracted data can be downloaded here. A line looks like this:

{
  "posted": "6:19am, Jul 05", 
  "links": 0, 
  "forum": "business", 
  "retrieved": "2017-08-21 02:13:03.562000", 
  "shares": 0, 
  "user": "StylixSVC", 
  "quote": false, 
  "images": 2, 
  "_id": "58136409", 
  "article_id": "3901301", 
  "page_no": 0, 
  "likes": 0
 }

The data is deliberately left unprocessed to some extent for speed in parsing as well as to give the user some data cleaning experience.

possible analysis

There are a number of questions that can be answered using the data:

What is the busiest hour of the day (and day of the week) in terms of posts?

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
nairaland		nairaland
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nairaland_scraping

run

extracted data

sample:

possible analysis

About

Releases

Packages

Languages

edeas123/nairaland_scaping

Folders and files

Latest commit

History

Repository files navigation

nairaland_scraping

run

extracted data

sample:

possible analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages