Move some comment from source code to description

bpesquet · Sep 26, 2024 · 6f36e4b · 6f36e4b
1 parent 1ddb8c1
commit 6f36e4b
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 4 deletions.
diff --git a/mlcourse/project_workflow/README.md b/mlcourse/project_workflow/README.md
@@ -6,10 +6,12 @@ This [example](test_project_workflow.py) demonstrates how to apply the Machine L
 
 The dataset is based on data from the 1990 California census. The raw CSV file is available [here](https://raw.githubusercontent.com/bpesquet/mlcourse/main/datasets/california_housing.csv). It is a slightly modified version of the [original dataset](https://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html).
 
+9 features are numerical, one (`ocean_proximity`) is categorical. One feature (`total_bedrooms`) has missing values. `median_house_value` is the target feature (value to predict).
+
 Data preprocessing is done through a series of sequential operations on data:
 
 - handling missing values;
-- scaling data:
+- scaling data;
 - encoding categorical features.
 
 A scikit-learn [pipeline](https://scikit-learn.org/stable/modules/compose.html#pipeline) streamlines these operations. This is useful to prevent mistakes and oversights when preprocessing new data.
diff --git a/mlcourse/project_workflow/test_project_workflow.py b/mlcourse/project_workflow/test_project_workflow.py
@@ -31,9 +31,6 @@ def load_dataset(url):
  print(f"Dataset shape: {dataset.shape}")
 
  # Print a concise summary of the dataset
- # 9 attributes are numerical, one ("ocean_proximity") is categorical
- # "median_house_value" is the target attribute
- # One attribute ("total_bedrooms") has missing values
  dataset.info()
 
  # Show 10 random samples of the dataset