Starred repositories
ShareX is a free and open source program that lets you capture or record any area of your screen and share it with a single press of a key. It also allows uploading images, text or other types of f…
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
real time face swap and one-click video deepfake with only a single image
Fast, unopinionated, minimalist web framework for node.
Inpaint anything using Segment Anything and inpainting models.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An open source library for face detection in images. The face detection speed can reach 1000FPS.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
modify from memorymodule. support exception
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
🔊 Text-Prompted Generative Audio Model
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
A library for efficient similarity search and clustering of dense vectors.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A helper script for unpacking and decompiling EXEs compiled from python code.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
A python package to analyze and compare voices with deep learning
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.