Skip to content
View macroustc's full-sized avatar

Block or report macroustc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An Open-Sourced LLM-empowered Foundation TTS System

Python 226 12 Updated Sep 25, 2024
Python 6,066 453 Updated Oct 4, 2024

A small seq2seq punctuator tool based on DistilBERT

Python 50 7 Updated Sep 8, 2024

This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".

Python 34 3 Updated Sep 28, 2024

simple dnn based vad

C++ 70 49 Updated Dec 2, 2018

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,756 254 Updated Sep 25, 2024

Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

38 1 Updated Sep 11, 2024

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 102 9 Updated Sep 29, 2024

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

Python 278 38 Updated Jun 16, 2024

real time face swap and one-click video deepfake with only a single image

Python 37,786 5,400 Updated Oct 6, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,339 551 Updated Sep 29, 2024

Multilingual Voice Understanding Model

Python 2,844 268 Updated Sep 25, 2024

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 151 18 Updated May 29, 2024

A feature-rich command-line audio/video downloader

Python 84,201 6,564 Updated Oct 7, 2024
Python 476 39 Updated Jun 7, 2024

Simple text to phones converter for multiple languages

Python 1,209 168 Updated Sep 26, 2024

📖 A curated list of resources dedicated to talking face.

1,290 109 Updated Oct 3, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 517 45 Updated Oct 5, 2024

A generative speech model for daily dialogue.

Python 31,261 3,386 Updated Sep 21, 2024

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 25,437 5,270 Updated Oct 8, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,534 739 Updated Jun 24, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,776 2,111 Updated Aug 9, 2024

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,180 149 Updated Sep 25, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,296 1,009 Updated Oct 6, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,683 1,539 Updated Sep 29, 2024

[CSUR] A Survey on Video Diffusion Models

1,734 88 Updated Oct 8, 2024

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,104 187 Updated Aug 20, 2024

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,632 251 Updated Aug 9, 2024

Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

940 35 Updated Jul 31, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,888 373 Updated Aug 7, 2024
Next