Stars
BentoDiffusion: A collection of diffusion models served with BentoML
A throughput-oriented high-performance serving framework for LLMs
A generative speech model for daily dialogue.
This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Fast and memory-efficient exact attention
Building a quick conversation-based search demo with Lepton AI.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Hacky repo to see what the Copilot extension sends to the server
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Train transformer language models with reinforcement learning.
Reverse engineered API of Microsoft's Bing Chat AI
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
An unnecessarily tiny implementation of GPT-2 in NumPy.
Code and documentation to train Stanford's Alpaca models, and generate the data.
Stepper motor with multi-function interface and closed loop function. 具有多功能接口和闭环功能的步进电机。
Mechaduino hardware design files. Project logs:
STM32 bootloader example that can jump to 2 apps.