zhizhengwu

Zhizheng Wu zhizhengwu

A/Prof @ CUHK-Shenzhen, Founder@Mel Lab SG, Advisor@Sanas-ai, ex-Meta, ex-Apple, ex-Microsoft. Research audio and speech AIGC. Founder of Amphion

111 followers · 6 following

Chinese University of Hong Kong, Shenzhen
Shenzhen
https://drwuz.com/
@drwuz

Highlights

Lists (1)

Sort

Research

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

amphionspace / Awesome-Zero-Shot-TTS-Papers

7 Updated Sep 2, 2024

Yuan-ManX / ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

477 33 Updated Sep 6, 2024

amphionspace / SD-Eval

[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Python 38 1 Updated Jun 25, 2024

modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 353 30 Updated Jan 25, 2024

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,499 387 Updated Sep 23, 2024

JJTech0130 / pypush

[being rewritten] Cross-platform iMessage POC

Python 3,560 396 Updated Jun 3, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 19,638 2,161 Updated Aug 12, 2024

AdvSV / AdvSV.github.io

AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. It aims to benchmark the robustness of ASV models in the face…

HTML 11 Updated Nov 21, 2023

Stability-AI / StableStudio

Community interface for generative AI

TypeScript 8,713 864 Updated Apr 30, 2024

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

Python 11,926 1,145 Updated Jul 13, 2024

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 69,790 7,634 Updated Oct 8, 2024

haoheliu / AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,405 221 Updated Jun 2, 2024

csun22 / LibriVoc-Dataset

LibriVoc is a new open-source, large-scale dataset for vocoder artifact detection. LibriVoc is derived from the LibriTTS speech corpus, which is widely used in text-to- speech research. The LibriTT…

Rich Text Format 16 1 Updated Jan 24, 2023

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 856 97 Updated Sep 5, 2024

AllenDowney / ThinkDSP

Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

Jupyter Notebook 3,942 3,205 Updated May 10, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 36,593 5,766 Updated Aug 19, 2024

Hello-SimpleAI / chatgpt-comparison-detection

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

Python 1,251 119 Updated Dec 1, 2023

allenai / csqa2

Python 36 1 Updated Mar 26, 2024

microsoft / muzic

Muzic: Music Understanding and Generation with Artificial Intelligence

Python 4,492 439 Updated Oct 8, 2024

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 34,437 4,166 Updated Aug 16, 2024

andabi / deep-voice-conversion

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

Python 3,918 843 Updated Sep 30, 2022

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 11,672 3,451 Updated Oct 8, 2024

CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 52,336 8,752 Updated Aug 14, 2024

G-Wang / WaveRNN-Pytorch

Fatcord's Alternative WaveRNN (Faster training)

Python 126 72 Updated Mar 29, 2019

fatchord / WaveRNN

WaveRNN Vocoder + TTS

Python 2,131 697 Updated Jul 2, 2022

nailperry-zd / The-Economist

The Economist 经济学人，持续更新

3,592 544 Updated Jun 23, 2023

NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Jupyter Notebook 5,058 1,378 Updated Jun 12, 2024

TadasBaltrusaitis / OpenFace

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

MATLAB 6,881 1,842 Updated Jun 1, 2024

Kyubyong / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Python 1,828 436 Updated Jan 17, 2022

martinarjovsky / WassersteinGAN

Python 3,205 725 Updated Dec 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly