Stars
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
A curated list of different papers and datasets in various areas of audio-visual processing
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Clone a voice in 5 seconds to generate arbitrary speech in real-time
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A collection of implementations of adversarial domain adaptation algorithms
CookiePPP / tacotron2
Forked from bfs18/tacotron2Tacotron 2 - PyTorch implementation with faster-than-realtime inference
bfs18 / tacotron2
Forked from NVIDIA/tacotron2Tacotron 2 - PyTorch implementation with faster-than-realtime inference
[ECCV 2020] XingGAN for Person Image Generation
[BMVC 2020 Oral] Bipartite Graph Reasoning GANs for Person Image Generation
Give a portrait face, move the gaze up (ACM MM 2020)
[CVPR2022 oral] A Simple and Effective Baseline for Text-to-Image Synthesis
Tensorflow implementation of the Gradient Reversal layer from https://arxiv.org/abs/1505.07818
Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
The python implementation for paper "Towards Discriminative Representation Learning for Speech Emotion Recognition" in IJCAI-2019
DeepMind's Tacotron-2 Tensorflow implementation