Skip to content
View jcao-ai's full-sized avatar

Organizations

@leptonai

Block or report jcao-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 4,419 441 Updated Feb 8, 2025

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,281 171 Updated Feb 7, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 9,170 1,194 Updated Feb 1, 2025

Real Time (WebRTC & WebTransport) Proxy for LLM WebSocket APIs

Python 23 2 Updated Jan 17, 2025

This repository based by Mellanox/gpu_direct_rdma_access. Some errors in the code have been modified, some methods have been optimized, and some features have been added

C 2 1 Updated Sep 5, 2024

GIL-powered* locking library for Python

Python 20 2 Updated Feb 8, 2025

BentoDiffusion: A collection of diffusion models served with BentoML

Python 348 25 Updated Dec 23, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 724 28 Updated Sep 21, 2024

A generative speech model for daily dialogue.

Python 34,178 3,697 Updated Jan 25, 2025

This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…

Python 18 3 Updated Nov 24, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 697 56 Updated Sep 4, 2024
Python 4,104 544 Updated Mar 19, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 652 29 Updated Dec 2, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 6,168 1,057 Updated Feb 7, 2025

Fast and memory-efficient exact attention

Python 15,359 1,446 Updated Feb 8, 2025

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,971 1,018 Updated Jan 14, 2025

Serving multiple LoRA finetuned LLM as one

Python 1,022 47 Updated May 8, 2024

Mamba SSM architecture

Python 13,906 1,200 Updated Jan 18, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,350 1,091 Updated Feb 8, 2025

Hacky repo to see what the Copilot extension sends to the server

JavaScript 656 73 Updated Apr 21, 2023

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,174 499 Updated May 3, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,405 163 Updated Jun 25, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,491 481 Updated Feb 7, 2025

Train transformer language models with reinforcement learning.

Python 11,279 1,507 Updated Feb 8, 2025
Python 267 21 Updated Jan 6, 2025

Reverse engineered API of Microsoft's Bing Chat AI

Python 8,033 904 Updated Aug 3, 2023

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333

Python 1,078 66 Updated Jan 11, 2024

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,305 428 Updated Apr 24, 2023

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,790 4,058 Updated Jul 17, 2024

Inference code for Llama models

Python 57,530 9,693 Updated Jan 26, 2025
Next