Scraping Active Foreign Principals

This is a Scrapy project to scrape data from the DOJ Foreign Agents Registration Act website:

https://www.fara.gov/quick-search.html

The goal is to scrape the all Active Foreign Principals present in the site.

Extracted data

The following sample shows actual data extracted from the Active Foreign Principals site:

{
    "country": "RUSSIA",
    "foreign_principal": "Uranium One, Inc.",
    "foreign_principal_reg_date": "2019-02-18 00:00:00",
    "address": "333 Bay Street #12--",
    "state": "",
    "registrant": "Tenam Corporation",
    "reg_num": "6638",
    "date": "2019-02-18 00:00:00",
    "url": "https://efile.fara.gov/ords/f?p=1381:200:5265451153848::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:6638,Exhibit%20AB,RUSSIA",
    "exhibit_urls": [
        "https://efile.fara.gov/docs/6638-Exhibit-AB-20190517-2.pdf",
        "https://efile.fara.gov/docs/6638-Exhibit-AB-20190218-1.pdf"
    ]
}

Requirements

Python 3.8.x
Scrapy 2.1.0

Install requirements

$ cd scraping
$ pip install -r requirements.txt

How to run scrapy spiders

You can run the spider using:

$ cd scraping
$ scrapy crawl foreign

All the data that was scraped from the Active Foreign Principals will be saved in the items.json

Run the tests

$ cd scraping
$ python -m unittest

Name	Name	Last commit message	Last commit date
Latest commit p-stannis final adjusments May 26, 2020 defe3de · May 26, 2020 History 5 Commits
.idea	.idea	ppushing scrapy project	May 24, 2020
foreignprincipal	foreignprincipal	final adjusments	May 26, 2020
tests	tests	final adjusments	May 26, 2020
.DS_Store	.DS_Store	ppushing scrapy project	May 24, 2020
.gitignore	.gitignore	ppushing scrapy project	May 24, 2020
README.md	README.md	final adjusments	May 26, 2020
items.json	items.json	final adjusments	May 26, 2020
requirements.txt	requirements.txt	WIP: Creating tests	May 26, 2020
scrapy.cfg	scrapy.cfg	Creating tests	May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Active Foreign Principals

Extracted data

Requirements

Install requirements

How to run scrapy spiders

Run the tests

About

Releases

Packages

Contributors 2

Languages

p-stannis/scraping

Folders and files

Latest commit

History

Repository files navigation

Scraping Active Foreign Principals

Extracted data

Requirements

Install requirements

How to run scrapy spiders

Run the tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages