Stars
An Open-Sourced LLM-empowered Foundation TTS System
A small seq2seq punctuator tool based on DistilBERT
This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
real time face swap and one-click video deepfake with only a single image
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
A feature-rich command-line audio/video downloader
Simple text to phones converter for multiple languages
📖 A curated list of resources dedicated to talking face.
Speech, Language, Audio, Music Processing with Large Language Model
A generative speech model for daily dialogue.
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Open-Sora: Democratizing Efficient Video Production for All
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
[CSUR] A Survey on Video Diffusion Models
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
PyTorch code and models for V-JEPA self-supervised learning from video.
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.