This project focuses on analyzing house price data for King County, Washington. The goal is to explore the dataset, perform data cleaning, feature engineering, and build a machine learning model to predict house prices.
Dataset: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction
The dataset used in this project contains information about homes sold in King County, including house features such as square footage, number of bedrooms, location, and more. This project includes the following steps:
-
Exploratory Data Analysis (EDA):
- Visualizing the data to understand trends, distributions, and correlations.
- Identifying missing values and outliers in the data.
-
Data Preprocessing:
- Cleaning the data by handling missing values, encoding categorical variables, and scaling numerical features.
-
Feature Engineering:
- Creating new features that can improve model performance.
-
Modeling:
- Building machine learning models (such as linear regression, decision trees, etc.) to predict house prices.
-
Model Evaluation:
- Evaluating models using various metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared.
The dataset used in this project contains information about houses sold in King County, WA. It includes the following features:
id
: Unique ID for each housedate
: Date when the house was soldprice
: Price of the housebedrooms
: Number of bedroomsbathrooms
: Number of bathroomssqft_living
: Square footage of the housesqft_lot
: Square footage of the lotfloors
: Number of floors in the housewaterfront
: Whether the house has a waterfront view (1 if true, 0 if false)view
: Quality of the view (a rating from 0 to 4)condition
: Condition of the house (a rating from 1 to 5)grade
: Grade of the house (a rating from 1 to 13)sqft_above
: Square footage of the house above groundsqft_basement
: Square footage of the basementyr_built
: Year the house was builtyr_renovated
: Year the house was renovatedzipcode
: ZIP code of the house's location
pandas
: For data manipulation and analysis.numpy
: For numerical operations.matplotlib
andseaborn
: For data visualization.scikit-learn
: For machine learning models and evaluation.
To run the analysis on your local machine, follow these steps:
-
Clone or download the repository:
git clone https://github.com/elfgk/KC-House-Data-Analysis.git
-
Install the required Python libraries.
-
Open the kc-house.ipynb Jupyter notebook and follow the steps.