🦀 GRPO-With-Cargo-Feedback

This repository goes along with the blog post Train a Rust 1.5B Coder LLM with Reinforcement Learning (GRPO). It is composed into a set of Marimo Notebooks to make the experiments reproducible and easy to run.

🥅 Goal

The goal of this repository is to fine-tune a small language model (1.5B) with reinforcement learning to become better at Rust programming. The idea is to use the cargo build tool as feedback for the model. It runs cargo build, cargo clippy, and cargo test on LLM generated Rust programs to improve the model over time. The first experiment we ran was taking Qwen2.5-Coder-1.5B-Instruct as a starting point and fine tuning it with GRPO.

📈 Results

All of the data and results were saved to the ox/Rust repository in Oxen.ai. After a single epoch it improved the model's ability to write code that compiles by 20% and ability to write code that pass unit tests by 15%.

Starting Instruct Model

View Raw Results

After 1 Epoch of Fine-Tuning with GRPO

View Raw Results

📒 Running the Notebooks

There are 4 main notebooks used to train, evaluate, and monitor the results.

train.py - Train the model using GRPO
viz.py - Visualize the reward functions
inference.py - Run inference on a model checkpoint
eval.py - Run the results of inference through the Rust toolchain to compute accuracy.

You can use Marimo to run the Notebooks.

pip install marimo

marimo edit train.py

🏋️ Training

The train.py notebook allows you to put in a base model, Oxen.ai repository to pull data from, and an Oxen.ai repository to write data back to. If you want to run it, simply create an account here, create a repo, and setup your api key locally.

This will write the logs to a branch within Oxen.ai as jsonl files so that we can look at the data that is generated and plot the rewards later.

For example, here is the cargo build rewards logs.

📊 Visualization

The logs from the training run then can easily be visualized with the viz.py notebook. This pulls the results from the output repository from the training script given a branch name.

🚀 Inference

Once you have a trained model, you can run inference on it with the inference.py notebook.

🧐 Evaluation

Finally once you have an output file from the Inference script, you can evaluate the output with the eval.py notebook. This will result in a graph similar to the ones at the top with the % passed for each cargo tool.

🖥️ GPU Requirements

If you are curious about the GPU requirements, feel free to the following blog post where we run different experiments. There are also libraries such as unsloth that reduce the VRAM requirements, but we have not tried them yet.

Blog: GRPO VRAM Requirements For the GPU Poor

🐂 What is Oxen.ai?

This project is powered by Oxen.ai 🐂 🌾

Oxen.ai provides open-source tools to track, iterate, collaborate on, and discover large datasets in any format. Within the Oxen.ai toolchain a lightning fast data version control tool, data visualization, notebooks, and ability to run models on your data. You can learn more at https://oxen.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
inference.py		inference.py
train.py		train.py
viz.py		viz.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦀 GRPO-With-Cargo-Feedback

🥅 Goal

📈 Results

Starting Instruct Model

After 1 Epoch of Fine-Tuning with GRPO

📒 Running the Notebooks

🏋️ Training

📊 Visualization

🚀 Inference

🧐 Evaluation

🖥️ GPU Requirements

🐂 What is Oxen.ai?

About

Releases

Packages

Contributors 2

Languages

License

Oxen-AI/GRPO-With-Cargo-Feedback

Folders and files

Latest commit

History

Repository files navigation

🦀 GRPO-With-Cargo-Feedback

🥅 Goal

📈 Results

Starting Instruct Model

After 1 Epoch of Fine-Tuning with GRPO

📒 Running the Notebooks

🏋️ Training

📊 Visualization

🚀 Inference

🧐 Evaluation

🖥️ GPU Requirements

🐂 What is Oxen.ai?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages