-
University of Science and Technology of China
- Hefei,China
-
05:50
(UTC -12:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 利用 Rust 轻松构建轻量级多端桌面应用
FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
My learning notes/codes for ML SYS.
Disaggregated serving system for Large Language Models (LLMs).
Efficient and easy multi-instance LLM serving
yinfan98 / PaddleSpeech
Forked from PaddlePaddle/PaddleSpeechEasy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
A modular graph-based Retrieval-Augmented Generation (RAG) system
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
SGLang is a fast serving framework for large language models and vision language models.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
A low-latency & high-throughput serving engine for LLMs
A tool for bandwidth measurements on NVIDIA GPUs.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Dynamic Memory Management for Serving LLMs without PagedAttention
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
High performance Transformer implementation in C++.
Knowledge Agents and Management in the Cloud
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Large Language Model (LLM) Systems Paper List
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉