macroustc

macroustc

0 followers · 4 following

moshi Public
Forked from kyutai-labs/moshi

Python Apache License 2.0 Updated Sep 25, 2024
FireRedTTS Public
Forked from FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Python Updated Sep 25, 2024
Draw-an-Audio-Code Public
Forked from yannqi/Draw-an-Audio-Code

Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

Apache License 2.0 Updated Sep 11, 2024
S3Tokenizer Public
Forked from xingchensong/S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python Apache License 2.0 Updated Sep 10, 2024
mini-omni Public
Forked from gpt-omni/mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python MIT License Updated Sep 9, 2024
punctuator Public
Forked from FerdinandZhong/punctuator

A small seq2seq punctuator tool based on DistilBERT

Python Apache License 2.0 Updated Sep 8, 2024
Deep-Live-Cam Public
Forked from hacksider/Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Python GNU Affero General Public License v3.0 Updated Aug 21, 2024
SenseVoice Public
Forked from FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

Python MIT License Updated Jul 5, 2024
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Python Apache License 2.0 Updated Jul 5, 2024
yt-dlp Public
Forked from yt-dlp/yt-dlp

A feature-rich command-line audio/video downloader

Python The Unlicense Updated Jun 17, 2024
g2pW Public
Forked from GitYCC/g2pW

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

Python Apache License 2.0 Updated Jun 16, 2024
TeleSpeech-ASR Public
Forked from Tele-AI/TeleSpeech-ASR

Python Updated Jun 7, 2024
Awesome-Talking-Face Public
Forked from JosephPai/Awesome-Talking-Face

📖 A curated list of resources dedicated to talking face.

MIT License Updated Jun 4, 2024
SLAM-LLM Public
Forked from X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python MIT License Updated Jun 2, 2024
Diff-Foley Public
Forked from luosiallen/Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python Apache License 2.0 Updated May 29, 2024
ChatTTS Public
Forked from 2noise/ChatTTS

ChatTTS is a generative speech model for daily dialogue.

Jupyter Notebook Other Updated May 29, 2024
Codecfake Public
Forked from xieyuankun/Codecfake

This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".

Python Updated May 16, 2024
diffusers Public
Forked from huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python Apache License 2.0 Updated Apr 19, 2024
VoiceCraft Public
Forked from jasonppy/VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Python Other Updated Mar 25, 2024
Open-Sora Public
Forked from hpcaitech/Open-Sora

Building your own video generation model like OpenAI's Sora

Python Apache License 2.0 Updated Mar 6, 2024
minisora Public
Forked from mini-sora/minisora

The Mini Sora project aims to explore the implementation path and future development direction of Sora.

Python Apache License 2.0 Updated Mar 4, 2024
Open-Sora-Plan Public
Forked from PKU-YuanGroup/Open-Sora-Plan

This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

Jupyter Notebook Other Updated Mar 4, 2024
DeepLearningSystem Public
Forked from chenzomi12/AISystem

Deep Learning System core principles introduction.

Jupyter Notebook Apache License 2.0 Updated Mar 3, 2024
Awesome-Video-Diffusion-Models Public
Forked from ChenHsing/Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

Updated Mar 2, 2024
Awesome-Text-to-Image Public
Forked from Yutong-Zhou-cv/Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

MIT License Updated Mar 1, 2024
jepa Public
Forked from facebookresearch/jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Python Other Updated Feb 20, 2024
llm-paper-daily Public
Forked from xianshang33/llm-paper-daily

Daily updated LLM papers. 每日更新 LLM 相关的论文，欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

Updated Jan 26, 2024
Qwen-VL Public
Forked from QwenLM/Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python Other Updated Jan 22, 2024
GPT-SoVITS Public
Forked from RVC-Boss/GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python MIT License Updated Jan 17, 2024
fish-speech Public
Forked from fishaudio/fish-speech

Brand new TTS solution

Python BSD 3-Clause "New" or "Revised" License Updated Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macroustc

Block or report macroustc

moshi Public

FireRedTTS Public

Draw-an-Audio-Code Public

S3Tokenizer Public

mini-omni Public

punctuator Public

Deep-Live-Cam Public

SenseVoice Public

CosyVoice Public

yt-dlp Public

g2pW Public

TeleSpeech-ASR Public

Awesome-Talking-Face Public

SLAM-LLM Public

Diff-Foley Public

ChatTTS Public

Codecfake Public

diffusers Public

VoiceCraft Public

Open-Sora Public

minisora Public

Open-Sora-Plan Public

DeepLearningSystem Public

Awesome-Video-Diffusion-Models Public

Awesome-Text-to-Image Public

jepa Public

llm-paper-daily Public

Qwen-VL Public

GPT-SoVITS Public

fish-speech Public