Gemstones: A Model Suite for Multi-Faceted Scaling Laws

A joint project by: Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda and Tom Goldstein.

Citing Our Work

To cite our work, please use this bibtex.

@article{mcleish2024gemstones
    title={Gemstones: A Model Suite for Multi-Faceted Scaling Laws}, 
    author={Sean McLeish and John Kirchenbauer and David Yu Miller and Siddharth Singh and Abhinav Bhatele and Micah Goldblum and Ashwinee Panda and Tom Goldstein},
    journal={arXiv preprint arXiv:2502.06857},
    year={2025},
    url={https://arxiv.org/abs/2502.06857},
}

Getting Started

We developed in Python 3.10.4, to install run:

git clone [email protected]:mcleish7/gemstone-scaling-laws
cd gemstone-scaling-laws
pip install .

Training Data

All of our training runs were completed on Frontier at the Oak Ridge National Laboratory. We train in two hour intervals over multiple nodes of AMD MI250X GPUs logging to wandb. We extract data from wandb using wandb_data_extraction.py, where we stich the two hour chunks back into complete runs. However, our wandb space is currently private so we provide the intermediate dataframe after our we process the models, this is close to raw form apart from the runs being grouped.

Fitting

We provide bash commands to run all code needed in shells/fitting.sh. We also give the outputs in json from in the ./parameters folders as this is a compute intensive process.

Approach 1

We use approach_1.py to fit approach 1 laws. This is a quick process so we also plot at the same time.

Approach 3

We use depth_width.py to fit approach 3 laws. We provide our outputs in parameters/, parameters_delta-3/ and parameters_delta-4/.

Plotting

We provide bash commands to run all code needed in plotting.sh, due to the large compute requirements to run the grid searches in many parts of this code, we provide our cache files here, please read the README there for how to use it. This should be placed:

gemstone-scaling-laws
└── plotters
    └── data_cache

Approach 3

We use approach_3_brute_force.py to plot the output of approach 3 width-depth laws using brute force search.

Other plots

The rainbow of scaling laws is plotted inside of rainbow.py. This requires the correct approach 1 and approach 3 laws to have been created.
Plotting of overtraining parabolas is done in overtrain_parabola.py. This requires the correct part of approach_3_brute_force.py to have ran before hand to cache outputs correctly. Caution: this is currently hard coded to point to only the files we use in the paper.
Overspending analysis is done inside of approach_1.py.
Chinchilla Reduced Sampling is visualised in chinchilla_reduced_sampling.py.
Analysis of delta and grid search sizes in done in slope_analysis.py
Plotting for feasible model shapes is done in plot_feasible_model_shapes_paper_plots.ipynb.
Plotting for (\mu P) is done in plot_mup.py.
Loss curves are plotted in wandb_data_plot.py.

Contact

Please, feel free to contact us with any questions, or open an issue on Github.

Acknowledgements

We used Resolving Discrepancies in Compute-Optimal Scaling of Language Models to guide the format of this code base. We use the Epoch AI Analyzing Chinchilla data in data/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Citing Our Work

Getting Started

Training Data

Fitting

Approach 1

Approach 3

Plotting

Approach 3

Other plots

Contact

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
parameters		parameters
parameters_delta-3		parameters_delta-3
parameters_delta-4		parameters_delta-4
plotters		plotters
shells		shells
wandb_dfs		wandb_dfs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chinchilla_reduced_sampling.py		chinchilla_reduced_sampling.py
depth_width.py		depth_width.py
requirements.txt		requirements.txt
wandb_data_extraction.py		wandb_data_extraction.py
wandb_data_plot.py		wandb_data_plot.py

License

mcleish7/gemstone-scaling-laws

Folders and files

Latest commit

History

Repository files navigation

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Citing Our Work

Getting Started

Training Data

Fitting

Approach 1

Approach 3

Plotting

Approach 3

Other plots

Contact

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages