This machine learning project aims to predict house prices on the Belgian market by scraping the data from immoweb.
- Make sure you have PostgreSQL installed on your machine. visit this link to install
- Make sure you have Python 3.8 installed on your machine. visit this link to install
- Make sure you have pip installed on your machine. visit this link to install
- Make sure you have Docker installed on your machine. visit this link to install
- Make sure you have Docker Compose installed on your machine. visit this link to install
- Make sure you have airflow installed on your machine. visit this link to install
-
make an .env file with the following credentials for your PostgreSQL database:
DB_NAME=YourDB DB_HOST=YourHost DB_PORT=YourPort DB_USER=YourUser DB_PASS=YourPassword
You can also run the project with docker. Make sure to build the images with your environment variables as build arguments.
Build the docker images with your environment variables:
docker build --build-arg DB_NAME=YourDB --build-arg DB_HOST=YourHost --build-arg DB_PORT=YourPort --build-arg DB_USER=YourUser --build-arg DB_PASS=YourPassword -t immo-hrequest -f docker-hrequest .
docker build --build-arg DB_NAME=YourDB --build-arg DB_HOST=YourHost --build-arg DB_PORT=YourPort --build-arg DB_USER=YourUser --build-arg DB_PASS=YourPassword -t immo-most-expensive -f docker-most-expensive .
- Create a virtual environment with Python 3.8:
python3 -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install the requirements:
pip install -r requirements.txt
You can run the project with airflow. Make sure to have the airflow scheduler and webserver running. You can then copy the dags in the dags folder to your airflow dags folder. The dags will run in a Docker container and scrape the data from immoweb and store it in your PostgreSQL database.
- immo_most_expensive_dag.py: This dag will scrape the data from the immoweb website and store it in your PostgreSQL database.
- immo_hrequest_dag.py: This dag will scrape the data from the individual immoweb pages and store it in your PostgreSQL database.
You can run the machine learning model by running the following command:
python3 ml/preprocess.py
python3 ml/train.py
python3 ml/predict.py