Skip to content
View comaniac's full-sized avatar

Organizations

@UCLA-VAST

Block or report comaniac

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Expert Parallelism Load Balancer

Python 1,040 151 Updated Feb 27, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,700 195 Updated Mar 4, 2025

LLM KV cache compression made easy

Python 424 28 Updated Mar 5, 2025

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 52 1 Updated Dec 14, 2024

Fast, Flexible and Portable Structured Generation

C++ 759 53 Updated Mar 7, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,068 94 Updated Mar 7, 2025

10x Faster Long-Context LLM By Smart KV Cache Optimizations

Python 568 57 Updated Mar 9, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,458 133 Updated Mar 9, 2025

[ICML 2024] CLLMs: Consistency Large Language Models

Python 383 18 Updated Nov 16, 2024

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 520 26 Updated Feb 19, 2025

Grok open release

Python 50,230 8,369 Updated Aug 30, 2024

A Rust HTTP server for Python applications

Rust 3,209 92 Updated Mar 4, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 11,642 1,185 Updated Mar 10, 2025

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 53,975 5,269 Updated Jan 21, 2025

LLMPerf is a library for validating and benchmarking LLMs

Python 802 137 Updated Dec 9, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,978 177 Updated Feb 25, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,642 1,137 Updated Mar 7, 2025

setup the env for vllm users

Dockerfile 16 1 Updated Oct 31, 2023

Awesome-LLM: a curated list of Large Language Model

21,962 1,794 Updated Mar 4, 2025

Generative Agents: Interactive Simulacra of Human Behavior

18,560 2,455 Updated Aug 5, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,994 233 Updated Mar 9, 2025

Large Language Model Text Generation Inference

Python 9,861 1,162 Updated Mar 7, 2025

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Python 7,443 529 Updated Sep 18, 2024

StableLM: Stability AI Language Models

Jupyter Notebook 15,836 1,033 Updated Apr 8, 2024

A schedule language for large model training

Python 145 16 Updated Jun 18, 2024

Development repository for the Triton language and compiler

MLIR 14,782 1,849 Updated Mar 10, 2025

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Python 2,543 505 Updated Mar 10, 2025
C++ 141 21 Updated Jan 30, 2025
C++ 23 12 Updated Nov 25, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,113 361 Updated Dec 9, 2023
Next