This data science project is structured to support robust data analysis and machine learning workflows using Python and Jupyter Notebooks. Below is an overview of the project structure and instructions on how to get started.
- raw/: Contains raw data files that are immutable and unaltered directly from the source.
- processed/: Contains cleaned and manipulated data, ready for analysis.
- external/: Data from third-party sources.
- exploratory/: Contains Jupyter Notebooks for initial data exploration and analysis.
- report/: Contains finalized notebooks for reporting and presentation purposes.
This directory houses all Python scripts organized by their functionality:
- data/: Scripts for data acquisition and generation.
- features/: Scripts for feature engineering.
- models/: Scripts for model training and prediction.
- visualization/: Scripts for generating visualizations.
- figures/: Stores generated graphical content for use in reports.
- logs/: Contains output logs from scripts and models, useful for debugging and tracking experiments.
Lists all Python libraries required to run the project. Ensures environment consistency across different setups.
- Install Requirements:
pip install -r requirements.txt
- Conda Package Install Requirements:
conda env create --name data_science_project_env --file environment.yaml conda activate data_science_project_env
Specifies intentionally untracked files that Git should ignore.
Makes the project pip-installable, allowing its modules to be easily imported across different parts of the project.
- Clone the repository:
git clone [email protected]:ncdingari/data-science-project.git cd data-science-project