-
Chinese University of Hong Kong, Shenzhen
- Shenzhen
- https://drwuz.com/
- @drwuz
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[being rewritten] Cross-platform iMessage POC
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. It aims to benchmark the robustness of ASV models in the face…
Community interface for generative AI
An open-source tool-augmented conversational language model from Fudan University
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
AudioLDM: Generate speech, sound effects, music and beyond, with text.
LibriVoc is a new open-source, large-scale dataset for vocoder artifact detection. LibriVoc is derived from the LibriTTS speech corpus, which is widely used in text-to- speech research. The LibriTT…
Official PyTorch implementation of BigVGAN (ICLR 2023)
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Muzic: Music Understanding and Generation with Artificial Intelligence
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Fatcord's Alternative WaveRNN (Faster training)
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model