VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Speech synthesis model /inference GUI repo for galgame characters based on Tacotron2, Hifigan, VITS and Diff-svc
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Audio style transfer with shallow random parameters CNN.
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021, oral.
Affordable WiFi hacking platform for testing and learning
Intergalactic serial monitor for ESP8266 Deauther
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
The world's simplest facial recognition api for Python and the command line
To process/edit video and audio with Python+FFmpeg. [简单实用] 基于Python+FFmpeg的视频和音频的处理/剪辑。
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
Out of time: automated lip sync in the wild
Disentangled Speech Embeddings using Cross-Modal Self-Supervision