Stars
Staging ground for release notes for PyTorch
mcarilli / FlameGraph
Forked from brendangregg/FlameGraphStack trace visualizer
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
World's Smallest Nintendo Wii, using a trimmed motherboard and custom stacked PCBs
Zero Bubble Pipeline Parallelism
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
You like pytorch? You like micrograd? You love tinygrad! ❤️
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in bot…
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry lead…
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Running large language models on a single GPU for throughput-oriented scenarios.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
An open-source efficient deep learning framework/compiler, written in python.