Dataprep lets you prepare your data using a single library with a few lines of code.
Currently, you can use dataprep
to:
- Collect data from common data sources (through
dataprep.data_connector
) - Do your exploratory data analysis (through
dataprep.eda
) - ...more modules are coming
Documentation | Mail List & Forum
pip install dataprep
The following examples can give you an impression of what dataprep can do:
- Documentation: Data Connector
- Documentation: EDA
- EDA Case Study: Titanic
- EDA Case Study: House Price
There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.
The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.
- Want to understand the distributions for each DataFrame column? Use
plot
.
- Want to understand the correlation between columns? Use
plot_correlation
.
- Or, if you want to understand the impact of the missing values for each column, use
plot_missing
.
- You can drill down to get more information by given
plot
,plot_correlation
andplot_missing
a column name. E.g. forplot_missing
:
Don't forget to checkout the examples folder for detailed demonstration!
You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!
from dataprep.data_connector import Connector
dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="korean", location="seattle")
There are many ways to contribute to Dataprep.
- Submit bugs and help us verify fixes as they are checked in.
- Review the source code changes.
- Engage with other Dataprep users and developers on StackOverflow.
- Help each other in the Dataprep Community Discord and Mail list & Forum.
- Contribute bug fixes.
- Providing use cases and writing down your user experience.
Please take a look at our wiki for development documentations!