Skip to content
View Xu-Chen's full-sized avatar

Block or report Xu-Chen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 801 90 Updated Jan 20, 2025

A Fast TTS Engine

Python 407 28 Updated Jan 10, 2025

My learning notes/codes for ML SYS.

Python 376 14 Updated Jan 20, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,172 1,072 Updated Jan 20, 2025

PearAI: Open Source AI Code Editor (Fork of VSCode). The PearAI Submodule (https://github.com/trypear/pearai-submodule) is a fork of Continue.

TypeScript 400 116 Updated Jan 18, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,132 624 Updated Jan 20, 2025

Composable building blocks to build Llama Apps

Python 6,068 752 Updated Jan 19, 2025
Python 241 12 Updated Jan 11, 2025

An open-source RAG-based tool for chatting with your documents.

Python 20,485 1,583 Updated Jan 20, 2025
TypeScript 9,076 491 Updated Jan 20, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 34,051 5,241 Updated Jan 20, 2025

A package for parsing PDFs and analyzing their content using LLMs.

Python 255 7 Updated Aug 6, 2024

Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot

Python 594 116 Updated Jul 20, 2024

SearchGPT / Perplexity clone, but personalised for you.

TypeScript 1,003 141 Updated Aug 5, 2024

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,740 590 Updated Jan 8, 2025

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Python 5,409 431 Updated Sep 26, 2024
Python 79 8 Updated Sep 9, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 3 Updated Aug 19, 2024

Agentic components of the Llama Stack APIs

4,078 623 Updated Jan 20, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 7,451 720 Updated Jan 20, 2025

SearchGPT / Perplexity Pages clone, but personalised for you.

Python 230 26 Updated Aug 31, 2024

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 219 8 Updated Jan 17, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 1,812 182 Updated Jan 20, 2025

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 235 24 Updated Nov 22, 2024

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preproc…

C++ 15 2 Updated Jul 5, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,223 69 Updated Nov 27, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 864 72 Updated Jan 20, 2025

GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code

Lua 64 2 Updated Jan 20, 2025

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 219 34 Updated Jan 20, 2025

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 94 8 Updated Dec 5, 2024
Next