This repo contains reference examples for training ML models quickly and to high accuracy. It's designed to be easily forked and modified.
It currently features the following examples:
To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style language models, just:
git clone https://github.com/mosaicml/examples.git
cd examples
pip install -e ".[llm]" # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm
If you already have the dependencies for a given example installed, you can just run:
pre-commit run --all-files # autoformatting
pyright . # type checking
pytest tests/ # run tests
from the example's directory.
To run the full suite of tests for all examples, invoke make test
in the project's root directory. Similarly, invoke make lint
to autoformat your code and detect type issues throughout the whole codebase. This is much slower than linting or testing just one example because it installs all the dependencies for each example from scratch in a fresh virtual environment.
This repo features the following examples, each as their own subdirectory:
Figure 1: Comparison of MosaicML recipes against other results, all measured on 8x A100s on MosaicML Cloud.
Train the MosaicML ResNet, the fastest ResNet50 implementation that yields a ✨ 7x ✨ faster time-to-train compared to a strong baseline. See our blog for more details and recipes. Our recipes were also demonstrated at MLPerf, a cross industry ML benchmark.
🚀 Get started with the code here.
Train the MosaicML DeepLabV3 that yields a ✨5x✨ faster time-to-train compared to a strong baseline. See our blog for more details and recipes.
🚀 Get started with the code here.
A simple yet feature complete implementation of GPT, that scales to 70B parameters while maintaining high performance on GPU clusters. Flexible code, written with vanilla PyTorch, that uses PyTorch FSDP and some recent efficiency improvements.
🚀 Get started with the code here.
This benchmark covers both pre-training and fine-tuning a BERT model. With this starter code, you'll be able to do Masked Language Modeling (MLM) pre-training on the C4 dataset and classification fine-tuning on GLUE benchmark tasks.
We also provide the source code and recipe behind our Mosaic BERT model, which you can train yourself using this repo.
🚀 Get started with the code here.