Stars
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Real Time (WebRTC & WebTransport) Proxy for LLM WebSocket APIs
This repository based by Mellanox/gpu_direct_rdma_access. Some errors in the code have been modified, some methods have been optimized, and some features have been added
BentoDiffusion: A collection of diffusion models served with BentoML
A throughput-oriented high-performance serving framework for LLMs
A generative speech model for daily dialogue.
This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Fast and memory-efficient exact attention
Building a quick conversation-based search demo with Lepton AI.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Hacky repo to see what the Copilot extension sends to the server
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Train transformer language models with reinforcement learning.
Reverse engineered API of Microsoft's Bing Chat AI
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
An unnecessarily tiny implementation of GPT-2 in NumPy.
Code and documentation to train Stanford's Alpaca models, and generate the data.