Stars
OCR Annotations from Amazon Textract for Industry Documents Library
OLMoE: Open Mixture-of-Experts Language Models
registor / awesome-TikZ-1
Forked from maphy-psd/awesome-TikZA curated list of awesome TikZ packages and resources
Oracle Bone Script data collected by VLRLab of HUST
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
MoBA: Mixture of Block Attention for Long-Context LLMs
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
Automating the Search for Artificial Life with Foundation Models!
An Open Large Reasoning Model for Real-World Solutions
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Low Precision Arithmetic Simulation in PyTorch
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Making large AI models cheaper, faster and more accessible
Evaluating text-to-image/video/3D models with VQAScore
High-speed Large Language Model Serving for Local Deployment
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models