ETL for historical data and development database
- docker >= 17.12.0+
- docker-compose
docker-compose up
If running for the first time without the sql.dump file you will need the obatin the data set files from another team member. Then mkdir -p etl/data/tmp
then copy the data set files into that directory.
This is the core file used at this time etl-db-env/etl/data/tmp/PA2459713_Philadelphia_CaseData_Deliverable.xlsx
- localhost:8888?token=
token
For now you will have to grab the token
value from the docker-compose up stdout
notebook_container | To access the notebook, open this file in a browser:
notebook_container | file:///home/jovyan/.local/share/jupyter/runtime/nbserver-7-open.html
notebook_container | Or copy and paste one of these URLs:
notebook_container | http://e643afef477d:8888/?token=54f3bf34463f369b2bc2b52be882930dfc9ead5f88da4cd1
notebook_container | or http://127.0.0.1:8888/?token=54f3bf34463f369b2bc2b52be882930dfc9ead5f88da4cd1
SSH into server:
docker exec -it etl-toolbox_container /bin/bash
- URL:
localhost:5432
- Username: postgres (as a default)
- Password: changeme (as a default)
To start a interactive Postgres terminal session:
docker exec -it postgres_container psql -U postgres
- URL:
localhost:5050
- Username: [email protected] (as a default)
- Password: admin (as a default)
- Host name/address
postgres
- Port
5432
- Username as
POSTGRES_USER
, by default:postgres
- Password as
POSTGRES_PASSWORD
, by defaultchangeme
If you have not already mkdir -p etl/data/tmp/
then extract the sample_dockets.zip into that directory. Ask a team member for the password to unzip.
The current state of the docket scrapping process will be found in a Jupyter notebook image. Two workflows exist. One using pdfminer.six and another using PyPDF2.