-
Tencent
- Beijing
Stars
Tutel MoE: An Optimized Mixture-of-Experts Implementation
MSCCL++: A GPU-driven communication stack for scalable AI applications
A throughput-oriented high-performance serving framework for LLMs
Synchronization and asynchronous computation package for Go
[ICLR2025] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Borgo is a statically typed language that compiles to Go.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
GLake: optimizing GPU memory management and IO transmission.
A fast inference library for running LLMs locally on modern consumer-class GPUs
Ring attention implementation with flash attention
Implementation of MagViT2 Tokenizer in Pytorch
Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.
XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.
Development repository for the Triton language and compiler
Hackable and optimized Transformers building blocks, supporting a composable construction.
Implementation of a Transformer, but completely in Triton
Unsupervised text tokenizer focused on computational efficiency
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Efficient cache for gigabytes of data written in Go.
A library to analyze PyTorch traces.
asyncio is a c++20 library to write concurrent code using the async/await syntax.