comaniac

Cody Yu comaniac

MLSys, LLM Serving, Deep Learning Compiler

348 followers · 11 following

Achievements

x4 x4

Achievements

x4 x4

Organizations

Lists (5)

Sort

Stars

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,040 151 Updated Feb 27, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,700 195 Updated Mar 4, 2025

NVIDIA / kvpress

LLM KV cache compression made easy

Python 424 28 Updated Mar 5, 2025

Theia-4869 / FasterVLM

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 52 1 Updated Dec 14, 2024

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 759 53 Updated Mar 7, 2025

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,068 94 Updated Mar 7, 2025

LMCache / LMCache

10x Faster Long-Context LLM By Smart KV Cache Optimizations

Python 568 57 Updated Mar 9, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,458 133 Updated Mar 9, 2025

hao-ai-lab / Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

Python 383 18 Updated Nov 16, 2024

BobMcDear / attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 520 26 Updated Feb 19, 2025

xai-org / grok-1

Grok open release

Python 50,230 8,369 Updated Aug 30, 2024

emmett-framework / granian

A Rust HTTP server for Python applications

Rust 3,209 92 Updated Mar 4, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 11,642 1,185 Updated Mar 10, 2025

dair-ai / Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 53,975 5,269 Updated Jan 21, 2025

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Python 802 137 Updated Dec 9, 2024

deepspeedai / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,978 177 Updated Feb 25, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,642 1,137 Updated Mar 7, 2025

kemingy / vllm-env

setup the env for vllm users

Dockerfile 16 1 Updated Oct 31, 2023

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

21,962 1,794 Updated Mar 4, 2025

joonspk-research / generative_agents

Generative Agents: Interactive Simulacra of Human Behavior

18,560 2,455 Updated Aug 5, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,994 233 Updated Mar 9, 2025