Stars
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guid…
Official inference repo for FLUX.1 models
Understand Human Behavior to Align True Needs
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
A generative speech model for daily dialogue.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Scrapy, a fast high-level web crawling & scraping framework for Python.
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提…
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Open-source file format designed for high-quality, customizable singing synthesis.
Generative models for conditional audio generation
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A feature-rich command-line audio/video downloader