-
Tsinghua University
-
19:27
(UTC +08:00)
Highlights
- Pro
Starred repositories
Domain-specific language designed to streamline the development of high-performance GPU/CPU kernels
Accelerated First Order Parallel Associative Scan
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Unified KV Cache Compression Methods for Auto-Regressive Models
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Helpful tools and examples for working with flex-attention
A very fast and expressive template engine.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A web-based collaborative LaTeX editor
Python interface for MLIR - the Multi-Level Intermediate Representation
A throughput-oriented high-performance serving framework for LLMs
🔥 Top-Rated Web-Based Linux Server Management Tool. 1Panel features an intuitive web interface that seamlessly integrates server management and monitoring, container management, database administra…
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
OneDiff: An out-of-the-box acceleration library for diffusion models.
A fast communication-overlapping library for tensor parallelism on GPUs.
The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.
Experimental projects related to TensorRT
PyTorch native quantization and sparsity for training and inference
A family of header-only, very fast and memory-friendly hashmap and btree containers.
A light-weight and high-efficient training framework for accelerating diffusion tasks.