storyteller-data

storyteller-data is the pipeline to collect data for the machine learning portion. Custom crawler can be written fit into the pipeline for data to be inserted into the warehouse.

Getting Started

This scraper does require MongoDB to be installed to have documents inserted.

Setup

Mongo needs to be setup. Docker is recommended.

Mongo image can be setup with

docker run --name some-name -d -p 27017:27017 mongo

Clone the repo first.

git clone [email protected]:unit-00/storyteller-data.git

storyteller-data is written on Python 3.7.9 and uses pip-tools for setup:

pip install pip-tools && pip-sync

Be sure to remember to setup a virtual environment to keep things separated.

Usage

Bash scripts have been prepared for ease of running.

. tasks/run_pipeline.sh

Test has been prepared as well

. tasks/test_functionality.sh

After tasks/run_pipeline.sh have been run, the pipeline will insert the collected information in the database.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
crawler		crawler
tasks		tasks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

storyteller-data

Getting Started

Setup

Usage

About

Releases

Packages

Languages

License

unit-00/storyteller-data

Folders and files

Latest commit

History

Repository files navigation

storyteller-data

Getting Started

Setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages