This GitHub repository presents a Python script developed for the purpose of conducting in-depth analysis and forecasting of stock prices by utilizing historical data. The script makes use of well-established data science and machine learning libraries such as NumPy, Matplotlib, Seaborn, Scikit-Learn, Pandas, and Keras. It is meticulously designed to seamlessly integrate with a JSON configuration file and historical stock price data in CSV format.
- Requirements
- Usage
- Introduction
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Building and Training a Deep Learning Model
- Model Evaluation
- Making Predictions
- Root Mean Squared Error (RMSE)
- Contributing
- License
Before using this script, make sure you have the following libraries installed:
- NumPy
- Matplotlib
- Seaborn
- Scikit-Learn
- Pandas
- Keras
You can install the required packages using pip:
pip install numpy matplotlib seaborn scikit-learn pandas keras
- Create a JSON configuration file (e.g.,
config.json
) with the following content:
{
"train_data_path": "your_train_data.csv",
"test_data_path": "your_test_data.csv"
}
Replace "your_train_data.csv"
and "your_test_data.csv"
with the paths to your historical stock price data files.
- Use the provided Python script to analyze and predict stock prices:
# ... (script content)
# Make sure to replace 'your_train_data.csv' and 'your_test_data.csv' with the actual file paths in the JSON configuration.
# ... (script content)
Please replace 'your_train_data.csv'
and 'your_test_data.csv'
with the actual paths to your training and testing data files in the JSON configuration file.
This script provides a comprehensive workflow for stock price prediction:
-
Exploratory Data Analysis (EDA): It begins with an exploratory data analysis to understand the dataset's characteristics.
-
Data Preprocessing: The script preprocesses the data, filtering and cleaning it, and performs feature selection.
-
Building and Training a Deep Learning Model: It builds a deep learning model using Keras, a popular deep learning library.
-
Model Evaluation: The script evaluates the model's performance on a testing dataset and provides key metrics.
-
Making Predictions: After training, the model is used to make predictions on new data.
-
Root Mean Squared Error (RMSE): The script calculates and displays the Root Mean Squared Error, a crucial metric for regression tasks.
The script begins by loading historical stock price data from a CSV file and provides an initial overview, including the dataset's shape, basic statistics, and information about the columns. It also performs visualizations like pairplots and identifies top dates with the highest closing stock prices.
Data preprocessing includes filtering rows based on specific conditions, standardizing the data using a scaler, splitting the data into training and testing sets, and defining a deep learning model for regression.
A deep learning model is built using Keras. It consists of multiple dense layers with different numbers of units and uses the mean squared error (MSE) loss function for training. The model is trained using early stopping to prevent overfitting.
The model is evaluated on the testing dataset, and its performance is assessed using the Mean Squared Error (MSE). The training history is visualized to understand the model's learning progress.
After training, the model is used to make predictions on the test data.
The script calculates and displays the Root Mean Squared Error, a standard metric for measuring the model's predictive accuracy.
Feel free to contribute to this project by opening issues, suggesting improvements, or submitting pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
Get started with stock price prediction using this script and start making informed investment decisions today!