- Stanford
- https://twitter.com/karpathy
Highlights
- Pro
Stars
Implementing DeepSeek R1's GRPO algorithm from scratch
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Official repository for our work on micro-budget training of large-scale diffusion models.
A PyTorch native library for large-scale model training
Efficient Triton Kernels for LLM Training
A MLX port of FLUX based on the Huggingface Diffusers implementation.
Official inference repo for FLUX.1 models
the scott CPU from "But How Do It Know?" by J. Clark Scott
Run PyTorch LLMs locally on servers, desktop and mobile
A lightweight library for portable low-level GPU computation using WebGPU.
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Implementation of Diffusion Transformer (DiT) in JAX
Schedule-Free Optimization in PyTorch
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
Tile primitives for speedy kernels
A minimal GPU design in Verilog to learn how GPUs work from the ground up
lightweight, standalone C++ inference engine for Google's Gemma models.
Distribute and run LLMs with a single file.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.