Skip to content
View macroustc's full-sized avatar

Block or report macroustc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
  • moshi Public

    Forked from kyutai-labs/moshi
    Python Apache License 2.0 Updated Sep 25, 2024
  • An Open-Sourced LLM-empowered Foundation TTS System

    Python Updated Sep 25, 2024
  • Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

    Apache License 2.0 Updated Sep 11, 2024
  • Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

    Python Apache License 2.0 Updated Sep 10, 2024
  • mini-omni Public

    Forked from gpt-omni/mini-omni

    open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

    Python MIT License Updated Sep 9, 2024
  • A small seq2seq punctuator tool based on DistilBERT

    Python Apache License 2.0 Updated Sep 8, 2024
  • real time face swap and one-click video deepfake with only a single image

    Python GNU Affero General Public License v3.0 Updated Aug 21, 2024
  • Multilingual Voice Understanding Model

    Python MIT License Updated Jul 5, 2024
  • CosyVoice Public

    Forked from FunAudioLLM/CosyVoice

    LLM based TTS model, providing inference/training/deployment full-stack ability.

    Python Apache License 2.0 Updated Jul 5, 2024
  • yt-dlp Public

    Forked from yt-dlp/yt-dlp

    A feature-rich command-line audio/video downloader

    Python The Unlicense Updated Jun 17, 2024
  • g2pW Public

    Forked from GitYCC/g2pW

    Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

    Python Apache License 2.0 Updated Jun 16, 2024
  • Python Updated Jun 7, 2024
  • 📖 A curated list of resources dedicated to talking face.

    MIT License Updated Jun 4, 2024
  • SLAM-LLM Public

    Forked from X-LANCE/SLAM-LLM

    Speech, Language, Audio, Music Processing with Large Language Model

    Python MIT License Updated Jun 2, 2024
  • Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

    Python Apache License 2.0 Updated May 29, 2024
  • ChatTTS Public

    Forked from 2noise/ChatTTS

    ChatTTS is a generative speech model for daily dialogue.

    Jupyter Notebook Other Updated May 29, 2024
  • Codecfake Public

    Forked from xieyuankun/Codecfake

    This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".

    Python Updated May 16, 2024
  • diffusers Public

    Forked from huggingface/diffusers

    🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

    Python Apache License 2.0 Updated Apr 19, 2024
  • VoiceCraft Public

    Forked from jasonppy/VoiceCraft

    Zero-Shot Speech Editing and Text-to-Speech in the Wild

    Python Other Updated Mar 25, 2024
  • Open-Sora Public

    Forked from hpcaitech/Open-Sora

    Building your own video generation model like OpenAI's Sora

    Python Apache License 2.0 Updated Mar 6, 2024
  • minisora Public

    Forked from mini-sora/minisora

    The Mini Sora project aims to explore the implementation path and future development direction of Sora.

    Python Apache License 2.0 Updated Mar 4, 2024
  • This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

    Jupyter Notebook Other Updated Mar 4, 2024
  • Deep Learning System core principles introduction.

    Jupyter Notebook Apache License 2.0 Updated Mar 3, 2024
  • [Arxiv] A Survey on Video Diffusion Models

    Updated Mar 2, 2024
  • (ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

    MIT License Updated Mar 1, 2024
  • jepa Public

    Forked from facebookresearch/jepa

    PyTorch code and models for V-JEPA self-supervised learning from video.

    Python Other Updated Feb 20, 2024
  • Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

    Updated Jan 26, 2024
  • Qwen-VL Public

    Forked from QwenLM/Qwen-VL

    The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

    Python Other Updated Jan 22, 2024
  • GPT-SoVITS Public

    Forked from RVC-Boss/GPT-SoVITS

    1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

    Python MIT License Updated Jan 17, 2024
  • Brand new TTS solution

    Python BSD 3-Clause "New" or "Revised" License Updated Jan 15, 2024