Skip to content

mcleish7/gemstone-scaling-laws

Repository files navigation

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

A joint project by: Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda and Tom Goldstein.


Citing Our Work

To cite our work, please use this bibtex.

@article{mcleish2024gemstones
    title={Gemstones: A Model Suite for Multi-Faceted Scaling Laws}, 
    author={Sean McLeish and John Kirchenbauer and David Yu Miller and Siddharth Singh and Abhinav Bhatele and Micah Goldblum and Ashwinee Panda and Tom Goldstein},
    journal={arXiv preprint arXiv:2502.06857},
    year={2025},
    url={https://arxiv.org/abs/2502.06857},
}

Getting Started

We developed in Python 3.10.4, to install run:

git clone [email protected]:mcleish7/gemstone-scaling-laws
cd gemstone-scaling-laws
pip install .

Training Data

All of our training runs were completed on Frontier at the Oak Ridge National Laboratory. We train in two hour intervals over multiple nodes of AMD MI250X GPUs logging to wandb. We extract data from wandb using wandb_data_extraction.py, where we stich the two hour chunks back into complete runs. However, our wandb space is currently private so we provide the intermediate dataframe after our we process the models, this is close to raw form apart from the runs being grouped.

Fitting

We provide bash commands to run all code needed in shells/fitting.sh. We also give the outputs in json from in the ./parameters folders as this is a compute intensive process.

Approach 1

We use approach_1.py to fit approach 1 laws. This is a quick process so we also plot at the same time.

Approach 3

We use depth_width.py to fit approach 3 laws. We provide our outputs in parameters/, parameters_delta-3/ and parameters_delta-4/.

Plotting

We provide bash commands to run all code needed in plotting.sh, due to the large compute requirements to run the grid searches in many parts of this code, we provide our cache files here, please read the README there for how to use it. This should be placed:

gemstone-scaling-laws
└── plotters
    └── data_cache

Approach 3

We use approach_3_brute_force.py to plot the output of approach 3 width-depth laws using brute force search.

Other plots

Contact

Please, feel free to contact us with any questions, or open an issue on Github.

Acknowledgements

We used Resolving Discrepancies in Compute-Optimal Scaling of Language Models to guide the format of this code base. We use the Epoch AI Analyzing Chinchilla data in data/.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published