Skip to content

Commit

Permalink
Move some comment from source code to description
Browse files Browse the repository at this point in the history
  • Loading branch information
bpesquet committed Sep 26, 2024
1 parent 1ddb8c1 commit 6f36e4b
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 4 deletions.
4 changes: 3 additions & 1 deletion mlcourse/project_workflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ This [example](test_project_workflow.py) demonstrates how to apply the Machine L

The dataset is based on data from the 1990 California census. The raw CSV file is available [here](https://raw.githubusercontent.com/bpesquet/mlcourse/main/datasets/california_housing.csv). It is a slightly modified version of the [original dataset](https://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html).

9 features are numerical, one (`ocean_proximity`) is categorical. One feature (`total_bedrooms`) has missing values. `median_house_value` is the target feature (value to predict).

Data preprocessing is done through a series of sequential operations on data:

- handling missing values;
- scaling data:
- scaling data;
- encoding categorical features.

A scikit-learn [pipeline](https://scikit-learn.org/stable/modules/compose.html#pipeline) streamlines these operations. This is useful to prevent mistakes and oversights when preprocessing new data.
3 changes: 0 additions & 3 deletions mlcourse/project_workflow/test_project_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ def load_dataset(url):
print(f"Dataset shape: {dataset.shape}")

# Print a concise summary of the dataset
# 9 attributes are numerical, one ("ocean_proximity") is categorical
# "median_house_value" is the target attribute
# One attribute ("total_bedrooms") has missing values
dataset.info()

# Show 10 random samples of the dataset
Expand Down

0 comments on commit 6f36e4b

Please sign in to comment.