Stars
Applied AI experiments and examples for PyTorch
A throughput-oriented high-performance serving framework for LLMs
Github mirror of trition-lang/triton repo.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
SGLang is a fast serving framework for large language models and vision language models.
Master programming by recreating your favorite technologies from scratch.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
很多镜像都在国外。比如 gcr 。国内下载很慢,需要加速。致力于提供连接全世界的稳定可靠安全的容器镜像服务。
Allows to check regexes for overlaps. Based on greenery by @qntm.
An LR(1) parser generator and visualizer created for educational purposes.
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
Data validation using Python type hints
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python bindings for general-sam and some utilities
A general suffix automaton implementation in Rust with Python bindings
Easiest and laziest way for building multi-agent LLMs applications.
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3
[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".