Stars
Accessible large language models via k-bit quantization for PyTorch.
Python packaging and dependency management made easy
SGLang is a fast serving framework for large language models and vision language models.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
A Gradio web UI for Large Language Models with support for multiple inference backends.
[ICLR 2025] Agent S: an open agentic framework that uses computers like a human
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
This repository offers a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies. It serves as an essen…
Machine Learning Engineering Open Book
Knowledge Agents and Management in the Cloud
A modular graph-based Retrieval-Augmented Generation (RAG) system
A playbook for systematically maximizing the performance of deep learning models.
Fast and memory-efficient exact attention
Practical GPU Sharing Without Memory Size Constraints
A high-throughput and memory-efficient inference and serving engine for LLMs
Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型