- San Francisco, CA
- https://comaniac.github.io
- https://orcid.org/0000-0002-9298-6254
- in/cody-hao-yu
Lists (5)
Sort Name ascending (A-Z)
Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
10x Faster Long-Context LLM By Smart KV Cache Optimizations
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
[ICML 2024] CLLMs: Consistency Large Language Models
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
A Rust HTTP server for Python applications
SGLang is a fast serving framework for large language models and vision language models.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
LLMPerf is a library for validating and benchmarking LLMs
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Awesome-LLM: a curated list of Large Language Model
Generative Agents: Interactive Simulacra of Human Behavior
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Large Language Model Text Generation Inference
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
StableLM: Stability AI Language Models
Development repository for the Triton language and compiler
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Training and serving large-scale neural networks with auto parallelization.