Highlights
- Pro
Stars
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
how to optimize some algorithm in cuda.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Header-only C++/python library for fast approximate nearest neighbors
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
OLMoE: Open Mixture-of-Experts Language Models
Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
Everything we actually know about the Apple Neural Engine (ANE)
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
A natural language interface for computers
A powerful framework for building realtime voice AI agents 🤖🎙️📹
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
A generative speech model for daily dialogue.
Minimal container for Chrome's headless shell, useful for automating / driving the web
OpenGFW is a flexible, easy-to-use, open source implementation of GFW (Great Firewall of China) on Linux
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Make images smaller using best-in-class codecs, right in the browser.
A blazing fast inference solution for text embeddings models