(Heavily based on the leggedrobotics/plr-exercise repository)
Note
In this laboratory session, we will refine your previous ML project. For reference, consider the previous mini-projects.
- A github.com account
- A computer with GPU or a Google Account for Colaboratory
- An existing machine learning project
Before proceeding, you must define the development environment for your Python-based project. There are two main approaches:
- Containerization - Create a Dockerfile to define an image with all dependencies.
- Virtual Environment - Set up a virtual environment to isolate Python packages from the OS and other projects.
If you have access to a local machine with admin privileges, containerization (e.g., with Docker or Podman) is recommended.
If you are a standard user on the local machine, please proceed with a virtual environment:
# Create a folder for virtual environments
mkdir ~/venv
# Create the virtual environment
python3 -m venv ~/venv/mldevops
# Test the virtual environment
source ~/venv/mldevops/bin/activate
which python
# Create an alias for easier sourcing (edit the ~/.bashrc file)
nano ~/.bashrc
# Add the following line at the end of the file and save it
alias venv_plr="source ~/venv/plr/bin/activate"
In cases where the local machine is not prepared for project development and installing new software is not possible, it can be used as an SSH terminal to access a remote development environment.
Alternatively, you can use a web browser to access remote IDEs such as Jupyter Lab instance or Google Colaboratory. Note, however, that it is generally not recommended to work exclusively in Jupyter Notebooks, as various issues may arise.
The following exercises can be completed using only Google Colaboratory, though this should be viewed as a temporary solution rather than a best practice.
You will need to interact with git
using commands like the following:
!git config --global user.email "[email protected]"
!git config --global user.name "student"
!git clone https://github.com/vision-agh/mldevops_exercise.git
!cd mldevops_exercise && git status
!cd mldevops_exercise && git add filename.txt
!cd mldevops_exercise && git commit -m "message"
!cd mldevops_exercise && git push
To begin, use Git for version control on your project:
- Create a fork of this project on GitHub.
- Clone the repository via SSH:
mkdir ~/ws
cd ~/ws
git clone [email protected]:vision-agh/mldevops_exercise.git
(Replace the
vision-agh
with yourgithub_username
).
Important
Cloning via https
allows only for pulling the code. For both pulling and pushing, use the ssh
protocol. Instructions for setting up SSH keys and adding them to your GitHub account are available here.
- Copy your previous ML project files (
.py
and.ipynb
) the the root of the cloned repository.
cd ~/ws/mldevops_exercise
# Copy here your project
- Commit and push your changes to the origin (i.e. GitHub) repository.
# We use the dot to add all files. Note, that it is not a typical practice.
git add .
git commit -m "initial project commit"
git push
-
Enable the
Issues
feature in your GitHub repository:- Open
https://github.com/GITHUB_USERNAME/mldevops_exercise
. - Go to
Settings
on the right. - Enable the
Issues
feature (the other features can be disabled).
- Open
-
Secure your default branch (
main
/master
) from accidental commits:- Open
https://github.com/GITHUB_USERNAME/mldevops_exercise
. - Go to
Settings
on the right. - Select
Branches
on the left menu. - Click on the
Add branch ruleset
button. - Name the ruleset "default," set the enforcement status to "enabled," and configure the Targets with the
Add target
dropdown by selecting the "Include default branch" option. - For the options, enable the following:
Restrict deletions
,Require a pull request before merging
, andBlock force pushes
. - Finish by clicking the
Create
button.
- Open
After completing the setup, you are ready to proceed with the exercises.
- For each task, create a branch named
feature/task_X
. - Commit all changes (and only those changes) related to the specific task to its branch, then push them to GitHub.
- To complete a task, create a pull request (PR) from
feature/task_X
tomain
. Set the PR title to the task description (see below). - Do not delete the branches after merging the PR.
Tasks:
- Task 1: Improve formatting using
black
. - Task 2: Set up Pre-commit to automate formatting.
- Task 3: Create a Python package for your project.
- Task 4: Add an online logging framework.
- Task 5: Use Optuna to perform hyperparameter search.
- Task 6: Add docstrings and type annotations to every Python file.
Your first task is to install black
and format the code. For more information, visit: https://github.com/psf/black.
pip3 install black
black --line-length 120 ~/ws/mldevops_exercise
Now everything should look well-formatted.
While it's possible to run black
manually, relying on memory before every commit can be unreliable. Fortunately, automation with Pre-commit
makes this easier.
Begin by following the official quick-setup guide.
pip3 install pre-commit
pre-commit --version
In the repository, there is an already-prepared .pre-commit-config.yaml
file. Inspect it—this file contains the necessary configuration for Pre-commit
.
Next, register the pre-commit
command as a Git hook, which will automatically run each time you use git commit
:
# Register the hook
pre-commit install
# Run the pre-commit on all files
pre-commit run --all-files
You may notice many changes, particularly regarding whitespace in your code. It should now appear much cleaner (at least from git
's perspective).
To further extend automation, add tools like black
(for formatting), codespell
(to fix typos), and pyupgrade
(to update syntax to Python 3.10).
# Black formatter
- repo: https://github.com/psf/black
rev: 24.4.0
hooks:
- id: black
args: ["--line-length=120"]
# Codespell - Fix common misspellings in text files.
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
args: [--write-changes]
# Pyupgrade - automatically upgrade syntax for newer versions of the language.
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.2
hooks:
- id: pyupgrade
args: [--py310-plus]
There are two main systems for dependency management and package building in Python: setuptools
and poetry
. As a rule of thumb, if your project requires complex builds (e.g., with Python bindings for dynamic C/C++ libraries), setuptools
is a suitable choice. However, for many modern Python projects, poetry
offers simpler configuration, making both dependency management (which can simplify Dockerfiles) and package building easier.
Explore the poetry
Introduction and Basic Usage to set up dependency management and enable package building.
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install poetry
Remember to update the .gitignore
file to exclude any necessary files from Git tracking.
Add the Weights & Biases (wandb
) logger to track and visualize your experiments.
pip3 install wandb
- Follow the official wandb guide.
- Log
training_loss
,validation_loss
, and your code as an artifact. - Capture a screenshot of a run showing the loss curve and the uploaded artifact.
- Commit this screenshot to the repository.
Use optuna
to find the best hyperparameters (e.g., learning rate
or epochs
).
pip3 install optuna
Refer to the official examples and conduct a hyperparameter search. Here is a small example:
import optuna
def objective(trial):
x = trial.suggest_float('x', -10, 10)
return (x - 2) ** 2
study = optuna.create_study()
study.optimize(objective, n_trials=100)
study.best_params # E.g. {'x': 2.002108042}
- Add docstrings to all public or non-trivial classes and functions. Refer to PEP 8 (Style Guide for Python Code) for guidance: https://peps.python.org/pep-0257/
- Add type annotations to every function and method in your project. This article provides a good introduction: https://realpython.com/python-type-checking/#hello-types
- Linting
- Automatic testing
- Automation (GitHub Actions)
- CI/CD (Continuous Integration/Continuous Delivery)
- github.com/leggedrobotics/plr-exercise by @JonasFrey96