Builds a postgres database of NYC housing data, containing the following datasets:
- Pluto
- Department of Buildings Job Filings
- HPD Violations
- HPD Registrations
- Department of Finance Rolling Sales
- Tax bills - Rent Stabilization Unit Counts (John Krauss's data)
Requirements
On Debian & Ubuntu, issue this command to install the requirements:
sudo apt-get install build-essential wget python3 python3-dev python3-psycopg2 python3.4-venv postgresql-client unzip git
On Debian Testing use python3-venv
instead of python3.4-venv
Setup
Clone the repo: git clone https://github.com/aepyornis/nyc-db.git --recursive
Create a pg database if you don't have one setup already: createdb nycdb
Rename env.sh.sample to env.sh and modify it according to how postgres is setup. env.sh defines two functions -- excute_sql and execute_sql_cmd -- which simply passes your postgres connection settings to psql and stores a connection string used by python.
Example env.sh:
export PGPASSWORD=YOURPGPASSWORD
execute_sql () {
psql -h 127.0.0.1 -d nycdb -U postgres -f $1
}
execute_sql_cmd () {
psql -h 127.0.0.1 -d nycdb -U postgres --command "$1"
}
NYCDB_CONNECTION_STRING="dbname=nycdb user=postgres password=YOURPGPASSWORD host=127.0.0.1"
Additionally, take a look at sample_setup.sh for a rough idea of how to setup up a debian or ubuntu server with the database.
These must be run from the root of the repo. Expect this whole process to take an hour.
Download the source files: make download
Build the database: make nyc-db
By default it only includes the most recent pluto. If you you'd like to include 14 (!) versions of pluto for historic analysis run the download and db script with the flag --pluto-all: ./scripts/download.sh all --pluto-all
and ./scripts/nyc_db.sh --pluto-all
Or use the make commands: make download-pluto-all
and make nyc-db-pluto-all
Notes:
- The scripts will drop existing tables of the same name from the database and re-populate them. This means you can re-run the scripts when new data is released
- Some but not all of the tables have indexes.
Individual datasets If you want only one dataset or if you prefer to import the datasets one-at-a-time, you can run the download script with the name of the dataset and then execute the script for the corresponding dataset.
For example, to import only DOF sales data do: ./scripts/download.sh dofsales
and ./scripts/dofsales.sh
The scripts to insert the data for each datasets are stored in separate repos and are are kept as submodules in the modules folder:
- PLUTO
- Department of Building's Job Filings
- HPD Violations
- HPD Registrations
- DOF Sales
- Rent Stabilization Unit Counts
This requires docker, docker-compose, git, wget, make, and unzip
Clone the repo: git clone https://github.com/aepyornis/nyc-db.git --recursive
In root of repo:
Setup docker: make docker-setup
Download the data files: make download
Create the database: make docker-run
After the database is built:
Enter a psql shell: make docker-shell
Make database dump: make docker-dump
Run the database standalone: make docker-db-standalone
pluto
- pluto_03c
- pluto_04c
- pluto_05d
- pluto_06c
- pluto_07c
- pluto_09v2
- pluto_10v2
- pluto_11v2
- pluto_12v2
- pluto_13v2
- pluto_14v2
- pluto_15v1
- pluto_16v1
- pluto_16v2
dob
- dobjobs
hpd violations
- violations
- uniq_violations
- open_violations
- all_violations
hpd registrations
- hpd.contacts nn - hpd.corporate_owners
- hpd.registrations
- hpd.registrations_grouped_by_bbl
dof
- dof_sales
tax bills
- rentstab
- ACRIS
- hpd complaints
- selected 311 complaints
NYC-DB - Postgres database of NYC housing data
Copyright (C) 2016 Ziggy Mintz
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.