Mercury-Invest: End-to-End Stock Analysis & Portfolio Optimization

Mercury-Invest is a financial analytics platform that automates US stock market data ingestion, forecasting, and portfolio optimization. Using Apache Spark, Delta Lake, Airflow, Docker, and ML frameworks like Scikit-learn or TensorFlow within a Medallion Architecture (Bronze → Silver → Gold), it showcases production-level data engineering combined with quantitative finance capabilities.

Key Features

Real-World Architecture: Automates data ingestion, transformations, and analytics using Apache Airflow, Spark, and Delta Lake.
Integrated Analytics: Combines machine learning models for stock forecasting and Modern Portfolio Theory for optimized allocations.
Scalable & Reproducible: Docker-based environment ensures seamless scaling, deployment, and maintenance.
Flexible Extensibility: Supports additional data sources, advanced ML models, and new analytics requirements.

Project Workflow Overview

Mercury-Invest automates data pipelines, prediction modeling, and portfolio management following these steps:

1. Automated Data Pipelines

Ingestion: Airflow schedules daily/weekly data pulls from:
- yfinance for stock market data.
- FRED for macroeconomic indicators.
Data is stored in the Medallion Architecture:
- Bronze: Raw data.
- Silver: Cleaned & joined data.
- Gold: Analytics-ready datasets.

2. Scalable Data Transformations

Spark performs large-scale ETL and feature engineering.
Delta Lake ensures ACID compliance and versioning for tracking changes.

3. Machine Learning for Stock Forecasting

Models: Predict stock returns or outperformance using:
- Scikit-learn for traditional ML models (e.g., Random Forest).
- TensorFlow for time-series models like LSTM.
Ensures time-series validation to avoid lookahead bias.

4. Portfolio Optimization & Rebalancing

Implements Mean-Variance Optimization (MVO):
- Assigns weights to maximize returns while minimizing portfolio risk.
- Tracks metrics: Sharpe Ratio, drawdown, and volatility.
Rebalancing frequencies: Monthly/quarterly.

5. Deployment Environment

Docker Containerization:
- Includes pipelines, transformations, and ML models for reproducible testing and production.

Technology Stack

Component	Purpose
Apache Airflow	Schedules automated data workflows for ingestion and transformation.
Apache Spark	Handles massive data transformations and feature extraction.
Delta Lake	Provides transactional consistency (Bronze → Silver → Gold layers).
Scikit-learn	Builds predictive ML models for forecasting.
TensorFlow	Supports advanced time-series forecasting (e.g., LSTM).
Docker	Ensures consistent, reproducible environments.
Databricks (Optional)	Scalable Spark environment for distributed data processing.
Power BI (Planned)	Creates real-time dashboards for performance monitoring.

Usage Instructions

To replicate or extend the Mercury-Invest workflow:

Automated Data Ingestion
- Use Airflow DAGs for scheduling weekly/daily data ingestion.
- Pull data from:
  - yfinance (stocks).
  - FRED (macro).
Data Transformation
- Clean and enrich data using Apache Spark.
- Utilize Delta Lake to track historical versions (time-travel).
Machine Learning
- Train ML models for forecasting. Available options include:
  - Scikit-learn-based predictive models.
  - TensorFlow-based models for time-series forecasting.
Portfolio Optimization
- Run Mean-Variance Optimization and record rebalancing metrics.
Containerized Execution
- Build and deploy the entire pipeline using Docker for consistency.

Future Development Opportunities

Expansion Area	Description
CI/CD Integration	Automate testing, linting, and container builds via GitHub Actions.
Sector Data Inclusion	Leverage sector classification for greater analysis depth.
Advanced ML Models	Explore LSTM/Transformers for time-series predictions.
BI Dashboards	Enable real-time insights through Power BI dashboards.
Microsoft Fabric Migration	Unified analytics platform with improved integration capabilities.

Architecture Flow (Bronze → Silver → Gold)

Add a diagram like this to visually illustrate your architecture (can be created using tools like Lucidchart or PowerPoint):

Ingestion (Airflow) --> Bronze --> Cleaning (Spark) --> Silver --> ML Models / Optimization --> Gold

Contact & Disclaimer

This project does not constitute financial advice. It is intended as a learning-oriented implementation of end-to-end data analytics and financial optimization pipelines. For inquiries or contributions, please create a GitHub Issue.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.idea		.idea
docs		docs
mercury		mercury
notebooks		notebooks
.DS_Store		.DS_Store
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
repo_structure.txt		repo_structure.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mercury-Invest: End-to-End Stock Analysis & Portfolio Optimization

Key Features

Project Workflow Overview

1. Automated Data Pipelines

2. Scalable Data Transformations

3. Machine Learning for Stock Forecasting

4. Portfolio Optimization & Rebalancing

5. Deployment Environment

Technology Stack

Usage Instructions

Future Development Opportunities

Architecture Flow (Bronze → Silver → Gold)

Contact & Disclaimer

About

Releases

Packages

Languages

License

nathantrnn/Mercury-Invest

Folders and files

Latest commit

History

Repository files navigation

Mercury-Invest: End-to-End Stock Analysis & Portfolio Optimization

Key Features

Project Workflow Overview

1. Automated Data Pipelines

2. Scalable Data Transformations

3. Machine Learning for Stock Forecasting

4. Portfolio Optimization & Rebalancing

5. Deployment Environment

Technology Stack

Usage Instructions

Future Development Opportunities

Architecture Flow (Bronze → Silver → Gold)

Contact & Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages