Codecov and Census dataset

jeanho1 · Jul 27, 2019 · 87b62af · 87b62af
1 parent 76e685e
commit 87b62af
Show file tree

Hide file tree

Showing 10 changed files with 382 additions and 8 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -20,9 +20,9 @@ install:
   - pip install -e .
 
 script:
-  - if [ $TEST == 'unit' ]; then pytest --cov=./ --sanitize-with tests/sanitize-notebook.cfg tests/unit/; fi
-  - if [ $TEST == 'issue' ]; then pytest --cov=./ --sanitize-with tests/sanitize-notebook.cfg tests/issues/; fi
-  - if [ $TEST == 'examples' ]; then pytest --nbval --cov=./ --sanitize-with tests/sanitize-notebook.cfg examples/; fi
+  - if [ $TEST == 'unit' ]; then pytest --cov=$TEST --sanitize-with tests/sanitize-notebook.cfg tests/unit/; fi
+  - if [ $TEST == 'issue' ]; then pytest --cov=$TEST --sanitize-with tests/sanitize-notebook.cfg tests/issues/; fi
+  - if [ $TEST == 'examples' ]; then pytest --nbval --cov=$TEST --sanitize-with tests/sanitize-notebook.cfg examples/; fi
   # Our well-behaved Unix-style command-line tool exits with code 0 unless an internal error occurred
   - if [ $TEST == 'console' ]; then pandas_profiling -h; fi
 

diff --git a/README.md b/README.md
@@ -22,6 +22,7 @@ For each column the following statistics - if relevant for the column type - are
 
 The following examples can give you an impression of what the package can do:
 
+* [Census Income](http://pandas-profiling.github.io/pandas-profiling/examples/census/census_report.html) (US Adult Census data relating income)
 * [NASA Meteorites](http://pandas-profiling.github.io/pandas-profiling/examples/meteorites/meteorites_report.html) (comprehensive set of meteorite landings)
 * [Titanic](http://pandas-profiling.github.io/pandas-profiling/examples/titanic/titanic_report.html) (the "Wonderwall" of datasets)
 * [NZA](http://pandas-profiling.github.io/pandas-profiling/examples/nza/nza_report.html) (open data from the Dutch Healthcare Authority)

diff --git a/docs/index.html b/docs/index.html
@@ -42,6 +42,7 @@ <h1 id="pandas-profiling">Pandas Profiling</h1>
 <h2 id="examples">Examples</h2>
 <p>The following examples can give you an impression of what the package can do:</p>
 <ul>
+<li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/census/census_report.html">Census Income</a> (US Adult Census data relating income)</li>
 <li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/meteorites/meteorites_report.html">NASA Meteorites</a> (comprehensive set of meteorite landings)</li>
 <li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/titanic/titanic_report.html">Titanic</a> (the "Wonderwall" of datasets)</li>
 <li><a href="http://pandas-profiling.github.io/pandas-profiling/examples/nza/nza_report.html">NZA</a> (open data from the Dutch Healthcare Authority)</li>

diff --git a/examples/census/census.py b/examples/census/census.py
@@ -0,0 +1,44 @@
+from pathlib import Path
+
+import pandas as pd
+import numpy as np
+import requests
+
+import pandas_profiling
+
+if __name__ == "__main__":
+    file_name = Path("census_train.csv")
+    if not file_name.exists():
+        data = requests.get(
+            "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
+        )
+        file_name.write_bytes(data.content)
+
+    # Names based on https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names
+    df = pd.read_csv(
+        file_name,
+        header=None,
+        index_col=False,
+        names=[
+            "age",
+            "workclass",
+            "fnlwgt",
+            "education",
+            "education-num",
+            "marital-status",
+            "occupation",
+            "relationship",
+            "race",
+            "sex",
+            "capital-gain",
+            "capital-loss",
+            "hours-per-week",
+            "native-country",
+        ],
+    )
+
+    # Prepare missing values
+    df = df.replace("\\?", np.nan, regex=True)
+
+    profile = df.profile_report(title="Census Dataset")
+    profile.to_file(output_file=Path("./census_report.html"))
diff --git a/examples/census/census_report.html b/examples/census/census_report.html
diff --git a/examples/meteorites/meteorites_report.html b/examples/meteorites/meteorites_report.html
diff --git a/examples/nza/nza_report.html b/examples/nza/nza_report.html
diff --git a/examples/stata_auto/stata_auto_report.html b/examples/stata_auto/stata_auto_report.html
diff --git a/examples/titanic/titanic_report.html b/examples/titanic/titanic_report.html
diff --git a/examples/website_inaccessibility/website_inaccessibility_report.html b/examples/website_inaccessibility/website_inaccessibility_report.html