Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 2 Updated Jan 15, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

298 19 Updated Feb 26, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 430 26 Updated Feb 10, 2025

microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 22,962 2,278 Updated Feb 28, 2025

The-Run-Philosophy-Organization / run

润学全球官方指定GITHUB，整理润学宗旨、纲领、理论和各类润之实例；解决为什么润，润去哪里，怎么润三大问题；并成为新中国人的核心宗教，核心信念。

31,912 2,619 Updated Jul 31, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 11,060 1,106 Updated Feb 28, 2025

spack / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Python 4,572 2,348 Updated Feb 28, 2025

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 316 40 Updated Jan 31, 2025

NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 372 31 Updated Feb 7, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,719 502 Updated Feb 27, 2025

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 296 23 Updated Feb 20, 2025

zhyncs / lmdeploy-build

Nightly Build for LMDeploy

PowerShell 10 Updated Jan 28, 2025

facebookresearch / generative-recommenders

Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 906 171 Updated Feb 28, 2025

zjhellofss / KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,738 308 Updated Oct 26, 2024

Howardzhangdqs / jumping-latex-ocr

HTML 123 3 Updated Jun 26, 2024

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 103 14 Updated Jan 18, 2025

run-llama / llama_cloud_services

Knowledge Agents and Management in the Cloud

Python 3,732 364 Updated Feb 28, 2025

HuaizhengZhang / AI-System-School

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

2,815 321 Updated Aug 14, 2024

NanmiCoder / MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫、百度贴吧帖子｜百度贴吧评论回复爬虫 | 知乎问答文章｜评论爬虫

Python 20,279 5,994 Updated Feb 12, 2025

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

789 30 Updated Feb 27, 2025

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,527 242 Updated Feb 27, 2025