Paraffin, derived from the Latin phrase parum affinis
meaning
little related
, is a Python package designed to run DVC
stages in parallel. While DVC does not currently support this directly, Paraffin
provides an effective workaround. For more details, refer to the DVC
documentation on
parallel stage execution.
Warning
paraffin
is still very experimental. Do not use it for production workflows.
Install Paraffin via pip:
pip install paraffin
To use Paraffin, you can run the following to run up to 4 DVC stages in parallel:
paraffin -n 4 <stage names>
If you have pip install dash
you can also access the dashboard by running
paraffin --dashboard <stage names>
For more information, run:
paraffin --help
You can run paraffin
in multiple processes (e.g. on different hardware with a
shared file system). To specify where a stage
should run, you can assign
labels to each worker.
paraffin --labels GPU # on a GPU node
paraffin --label CPU intel # on a CPU node
To configure the stages you need to create a paraffin.yaml
file as follows:
labels:
GPU_TASK:
- GPU
CPU_TASK:
- CPU
SPECIAL_CPU_TASK:
- CPU
- intel
All stages
that are not part of the paraffin.yaml
will choose any of the
available workers.
Tip
If you are building Python-based workflows with DVC, consider trying our other project ZnTrack for a more Pythonic way to define workflows.