TrainingFramework

All

16 repositories

gpt-neox
Public
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Python
•
Apache License 2.0
•1k•0•0•0•Updated Jan 24, 2025Jan 24, 2025
wandb
Public
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Python
•
MIT License
•693•0•0•0•Updated Dec 30, 2024Dec 30, 2024
unilm
Public
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Python
•
MIT License
•2.6k•0•0•0•Updated Dec 29, 2024Dec 29, 2024
torchtune
Public
PyTorch native finetuning library
Python
•
BSD 3-Clause "New" or "Revised" License
•506•0•0•0•Updated Oct 17, 2024Oct 17, 2024
maxtext
Public
A simple, performant and scalable Jax LLM!
Python
•
Apache License 2.0
•311•0•0•0•Updated Oct 17, 2024Oct 17, 2024
DeepSpeed
Public
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Python
•
Apache License 2.0
•4.2k•0•0•0•Updated Oct 17, 2024Oct 17, 2024
Megatron-LM
Public
Ongoing research training transformer models at scale
Python
•
Other
•2.5k•0•0•0•Updated Oct 16, 2024Oct 16, 2024
torchtitan
Public
A native PyTorch Library for large model training
Python
•
BSD 3-Clause "New" or "Revised" License
•257•0•0•0•Updated Oct 16, 2024Oct 16, 2024
Megatron-DeepSpeed
Public
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Python
•
Other
•2.5k•0•0•0•Updated Oct 8, 2024Oct 8, 2024
horovod
Public
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Python
•
Other
•2.2k•1•0•0•Updated Aug 31, 2024Aug 31, 2024
PARL
Public
A high-performance distributed training framework for Reinforcement Learning
Python
•
Apache License 2.0
•815•0•0•0•Updated Jul 30, 2024Jul 30, 2024
BMTrain
Public
Efficient Training (including pre-training and fine-tuning) for Big Models
Python
•
Apache License 2.0
•79•0•0•0•Updated Jul 22, 2024Jul 22, 2024
LLM-Training-Puzzles
Public
What would you do with 1000 H100s...
Jupyter Notebook
•
MIT License
•58•0•0•0•Updated Jan 10, 2024Jan 10, 2024
trlx
Public
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Python
•
MIT License
•474•0•0•0•Updated Jan 8, 2024Jan 8, 2024
alpa
Public
Training and serving large-scale neural networks with auto parallelization.
Python
•
Apache License 2.0
•361•0•0•0•Updated Dec 9, 2023Dec 9, 2023
mesh
Public
Mesh TensorFlow: Model Parallelism Made Easier
Python
•
Apache License 2.0
•254•0•0•0•Updated Nov 17, 2023Nov 17, 2023