Stars
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
🛋 The AI and Generative Art platform for everyone
(experimental) modification of gistemp4.0 to run on latest python. Utilities to generate, inspect an approximate an ERSSTv5 SBBX.
StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
Instant voice cloning by MIT and MyShell. Audio foundation model.
A fast inference library for running LLMs locally on modern consumer-class GPUs
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
High-speed Large Language Model Serving for Local Deployment
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
Webui for using XTTS and for finetuning it
🔊 Text-Prompted Generative Audio Model
Custom C++ implementation of deep learning based OCR
Stable Diffusion and Flux in pure C/C++
a state-of-the-art-level open visual language model | 多模态预训练模型
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
create dataset from list of youtube links easily
A multi-voice TTS system trained with an emphasis on quality
CLIP inference in plain C/C++ with no extra dependencies