Skip to content
/ dataprep Public
forked from sfu-db/dataprep

Dataprep: Data Preparation in Python

License

Notifications You must be signed in to change notification settings

Najq/dataprep

 
 

Repository files navigation

DataPrep Build Status

Documentation | Mail List & Forum

Dataprep is a collection of functions that helps you accomplish tasks before you build a predictive model.

Implementation Status

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.data_connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...

Installation

pip install dataprep

dataprep is in its alpha stage now, so please manually specific the version number.

Examples & Usages

More detailed examples can be found at the examples folder.

Data Connector

You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!

from dataprep.data_connector import Connector

dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="ramen", location="vancouver")

DataConnectorResult

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.
from dataprep.eda import plot

df = ...

plot(df)

  • Want to understand the correlation between columns? Use plot_correlation.
from dataprep.eda import plot_correlation

df = ...

plot_correlation(df)

  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.
from dataprep.eda import plot_missing

df = ...

plot_missing(df)

  • You can even drill down to get more information by given plot, plot_correlation and plot_missing a column name.
df = ...

plot_missing(df, x="some_column_name")

Don't forget to checkout the examples folder for detailed demonstration!

Contribution

Contribution is always welcome. If you want to contribute to dataprep, be sure to read the contribution guidelines.

About

Dataprep: Data Preparation in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%