Xu-Chen

Follow

Xu-Chen

Follow

4 followers · 41 following

Achievements

Achievements

Stars

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 801 90 Updated Jan 20, 2025

astramind-ai / Auralis

A Fast TTS Engine

Python 407 28 Updated Jan 10, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 376 14 Updated Jan 20, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,172 1,072 Updated Jan 20, 2025

trypear / pearai-app

Forked from microsoft/vscode

PearAI: Open Source AI Code Editor (Fork of VSCode). The PearAI Submodule (https://github.com/trypear/pearai-submodule) is a fork of Continue.

TypeScript 400 116 Updated Jan 18, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,132 624 Updated Jan 20, 2025

meta-llama / llama-stack

Composable building blocks to build Llama Apps

Python 6,068 752 Updated Jan 19, 2025

lmarena / copilot-arena

Python 241 12 Updated Jan 11, 2025

Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.

Python 20,485 1,583 Updated Jan 20, 2025

voideditor / void

TypeScript 9,076 491 Updated Jan 20, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 34,051 5,241 Updated Jan 20, 2025

lazyFrogLOL / llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Python 255 7 Updated Aug 6, 2024

karthik-codex / Autogen_GraphRAG_Ollama

Microsoft's GraphRAG + AutoGen + Ollama + Chainlit = Fully Local & Free Multi-Agent RAG Superbot

Python 594 116 Updated Jul 20, 2024

supermemoryai / opensearch-ai

SearchGPT / Perplexity clone, but personalised for you.

TypeScript 1,003 141 Updated Aug 5, 2024

InternLM / MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,740 590 Updated Jan 8, 2025

aiwaves-cn / agents

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Python 5,409 431 Updated Sep 26, 2024

AlibabaPAI / FLASHNN

Python 79 8 Updated Sep 9, 2024

izhuhaoran / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 3 Updated Aug 19, 2024

meta-llama / llama-stack-apps

Agentic components of the Llama Stack APIs

4,078 623 Updated Jan 20, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7,451 720 Updated Jan 20, 2025

alexfazio / OpenPlexity-Pages

SearchGPT / Perplexity Pages clone, but personalised for you.

Python 230 26 Updated Aug 31, 2024

HanGuo97 / flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 219 8 Updated Jan 17, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,812 182 Updated Jan 20, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 235 24 Updated Nov 22, 2024

zhihu / TLLM_QMM

TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preproc…

C++ 15 2 Updated Jul 5, 2024

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,223 69 Updated Nov 27, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 864 72 Updated Jan 20, 2025

aaronik / GPTModels.nvim

GPTModels - a multi model, window based LLM AI plugin for neovim, with an emphasis on stability and clean code

Lua 64 2 Updated Jan 20, 2025

ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 219 34 Updated Jan 20, 2025

HandH1998 / QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 94 8 Updated Dec 5, 2024