Skip to content
View jcao-ai's full-sized avatar

Block or report jcao-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

BentoDiffusion: A collection of diffusion models served with BentoML

Python 331 25 Updated Aug 26, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 568 23 Updated Sep 21, 2024

A generative speech model for daily dialogue.

Python 31,211 3,387 Updated Sep 21, 2024

This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…

Python 17 Updated Apr 8, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 575 45 Updated Sep 4, 2024
Python 4,078 515 Updated Mar 19, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 561 21 Updated Aug 17, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,457 924 Updated Sep 25, 2024

Fast and memory-efficient exact attention

Python 13,644 1,250 Updated Oct 5, 2024

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,762 988 Updated Sep 18, 2024

Serving multiple LoRA finetuned LLM as one

Python 964 45 Updated May 8, 2024

Mamba SSM architecture

Python 12,743 1,074 Updated Sep 26, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,336 936 Updated Oct 1, 2024

Hacky repo to see what the Copilot extension sends to the server

JavaScript 624 71 Updated Apr 21, 2023

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,711 454 Updated May 3, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,234 153 Updated Jun 25, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,344 390 Updated Sep 28, 2024

Train transformer language models with reinforcement learning.

Python 9,608 1,207 Updated Oct 5, 2024
Python 251 21 Updated Nov 22, 2023

Reverse engineered API of Microsoft's Bing Chat AI

Python 8,077 910 Updated Aug 3, 2023

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333

Python 1,029 62 Updated Jan 11, 2024

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,204 412 Updated Apr 24, 2023

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,387 4,031 Updated Jul 17, 2024

Inference code for Llama models

Python 55,851 9,512 Updated Aug 18, 2024

Smart NFC & ink-Display Card

C 7,315 1,798 Updated Jan 10, 2021

Stepper motor with multi-function interface and closed loop function. 具有多功能接口和闭环功能的步进电机。

C 1,240 444 Updated May 12, 2024

Mechaduino hardware design files. Project logs:

Eagle 345 124 Updated May 4, 2017

STM32 bootloader example that can jump to 2 apps.

C 251 66 Updated Jul 27, 2021

Closed Loop Step Motor Controller

C++ 24 6 Updated May 28, 2022

Stepper feedback controller

C++ 423 179 Updated Apr 27, 2024
Next