Skip to content

Template to give structure to new projects from the start

License

Notifications You must be signed in to change notification settings

MiqG/project_template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Project Template

This is a personal way of structuring projects built through observation and personal experience that helped me planning and scaling up without getting lost.

.
├── config.yml
├── data
├── envs
├── LICENSE
├── README.md
├── reports
├── results
├── src
├── start_project.sh
└── workflows

Rationale

  • config.yml: hand-curated list of external files and parameters required for the project.
  • data: keep raw and preprocessed data organized.
  • envs: conda environments to run the main project and, if necessary, create more.
  • LICENSE
  • reports: discuss your insights with a project webpage created with jupyter-book.
  • results: store files and final plots for every experiment.
  • src: project modules.
  • workflows: to download, preprocess, and analyze your data.

Typical workflow

  1. Modify config.yml to your taste adding variables that could be useful project-wide.
  2. Create the workflows to download and preprocess your project's data at workflows/download/ and workflows/preprocess, respectively. Make sure to distinguish between code that can be used project-wide -place it in the project's modules in src/ and call the functions in your workflow-; or code that is only used specifically for that part of the project -place it in your workflow's scripts/ subdirectory-.
  3. Now, you can analyse your data creating different experiments as subdirectories of workflows/analyses that will get inputs from data/ and will output at results/your_experiment_name/.
  4. Commit your work, and consider adding README files.
  5. Inspect and explore results creating jupyter notebooks at reports/notebooks/ that can be rendered into static webpages with jupyter-book. Structure your project's book by modifying reports/_toc.yml.

Requirements (for this use case)

  • an environment manager: e.g. conda
  • a workflow manager: e.g. snakemake
  • (optional) a webpage builder: e.g. jupyter-book

Installation

# clone repository
git clone https://github.com/MiqG/project_template
cd project_template

# removes git remote
bash start_project.sh

# remove start_project.sh
rm start_project.sh

Structure

.
├── config.yml
├── data
│   ├── prep
│   ├── raw
│   └── references
├── envs
│   └── main.yml
├── LICENSE
├── README.md
├── reports
│   ├── _config.yml
│   ├── images
│   │   └── logo.png
│   ├── notebooks
│   │   ├── example_notebook.md
│   │   ├── intro.md
│   │   └── README.md
│   ├── README.md
│   └── _toc.yml
├── results
│   ├── new_experiment
│   │   ├── files
│   │   │   └── output_example.tsv
│   │   └── plots
│   │       └── output_example.pdf
│   └── README.md
├── src
│   └── python
│       ├── setup.py
│       └── your_project_name
│           └── config.py
├── start_project.sh
└── workflows
    ├── analyses
    │   └── new_experiment
    │       ├── README.md
    │       ├── run_all.sh
    │       ├── scripts
    │       │   └── workflow_step.py
    │       └── snakefile
    ├── download
    │   ├── README.md
    │   ├── run_all.sh
    │   ├── scripts
    │   │   └── workflow_step.py
    │   └── snakefile
    ├── preprocess
    │   ├── README.md
    │   ├── run_all.sh
    │   ├── scripts
    │   │   └── workflow_step.py
    │   └── snakefile
    └── README.md

References

Have fun!

About

Template to give structure to new projects from the start

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published