Skip to content
View tanzelin430's full-sized avatar
  • University of Science and Technology of China
  • Hefei,China
  • 05:50 (UTC -12:00)

Highlights

  • Pro

Block or report tanzelin430

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,153 163 Updated Feb 28, 2025

🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 利用 Rust 轻松构建轻量级多端桌面应用

Rust 35,366 6,355 Updated Feb 23, 2025

FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs

C++ 10,710 695 Updated Feb 27, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 11,923 774 Updated Feb 28, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

5,537 103 Updated Feb 28, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,556 81 Updated Feb 22, 2025

My learning notes/codes for ML SYS.

Python 1,146 54 Updated Feb 27, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 473 51 Updated Aug 19, 2024

Efficient and easy multi-instance LLM serving

Python 305 24 Updated Feb 28, 2025

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 2 Updated Jan 15, 2025

Materials for learning SGLang

298 19 Updated Feb 26, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 430 26 Updated Feb 10, 2025

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 22,962 2,278 Updated Feb 28, 2025

润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。

31,912 2,619 Updated Jul 31, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 11,060 1,106 Updated Feb 28, 2025

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Python 4,572 2,348 Updated Feb 28, 2025

A low-latency & high-throughput serving engine for LLMs

Python 316 40 Updated Jan 31, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 372 31 Updated Feb 7, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,719 502 Updated Feb 27, 2025

Dynamic Memory Management for Serving LLMs without PagedAttention

C 296 23 Updated Feb 20, 2025

Nightly Build for LMDeploy

PowerShell 10 Updated Jan 28, 2025

Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 906 171 Updated Feb 28, 2025

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,738 308 Updated Oct 26, 2024

High performance Transformer implementation in C++.

C++ 103 14 Updated Jan 18, 2025

Knowledge Agents and Management in the Cloud

Python 3,732 364 Updated Feb 28, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

2,815 321 Updated Aug 14, 2024

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

Python 20,279 5,994 Updated Feb 12, 2025

Large Language Model (LLM) Systems Paper List

789 30 Updated Feb 27, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,527 242 Updated Feb 27, 2025
Next