LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,123 245 Updated Apr 15, 2025

zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1

C++ 10,102 2,395 Updated Dec 30, 2024

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 698 56 Updated Jan 21, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,170 129 Updated Apr 14, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 875 66 Updated Apr 10, 2025

jgraph / drawio-desktop

Official electron build of draw.io

JavaScript 53,883 5,233 Updated Apr 1, 2025

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 13,208 1,899 Updated Apr 10, 2025

SUSTech-CRA / chinese-opensource-mirror-site

Mirror clone of https://gitee.com/gsls200808/chinese-opensource-mirror-site as the README.md on that repository has been filtered.

302 29 Updated May 16, 2021

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 885 104 Updated Apr 14, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,040 204 Updated Apr 15, 2025

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 10,695 1,590 Updated Jan 13, 2025

anilshanbhag / gpu-topk

Efficient Top-K implementation on the GPU

Cuda 175 21 Updated Apr 9, 2019

michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali

Python 2,060 137 Updated Mar 28, 2025

huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models

Rust 3,426 242 Updated Apr 15, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,922 243 Updated Apr 14, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 39,340 4,975 Updated Aug 16, 2024

genmoai / mochi

The best OSS video generation models

Python 3,095 334 Updated Jan 8, 2025

open-mpi / ompi

Open MPI main development repository

C 2,324 893 Updated Apr 15, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,452 650 Updated Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unix1986

Achievements

Achievements

Block or report unix1986

Starred repositories

kvcache-ai / ktransformers

kubernetes-sigs / lws

BytedTsinghua-SIA / DAPO

modelcontextprotocol / python-sdk

thu-pacman / chitu

USCiLab / cereal

yhirose / cpp-httplib

NVIDIA / nccl-tests

astral-sh / uv

HFAiLab / hai-platform

deepseek-ai / open-infra-index

ModelTC / lightllm