Stars
🎥 Python and OpenCV-based scene cut/transition detection program & library.
yihuitang / StyleTTS_Mandarin
Forked from yl4579/StyleTTSImplementation of StyleTTS for Mandarin
Foundational model for human-like, expressive TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
UT-Sarulab MOS prediction system using SSL models
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Fast and memory-efficient exact attention
The official repository of Dynamic-SUPERB.
wav2vec2 audio classification for prosodic boundary detection and other tasks
A python package to build AI-powered real-time audio applications
ModelScope: bring the notion of Model-as-a-Service to life.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Repository for fine-tuning BEATs and using BEATs as feature extractor in a prototypical network. This repository has been used to complete the DCASE2023 challenge on few-shot bioacoustic events.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Some comprehensive papers about speaker diarization
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
An awesome spoken LID repository. (Working in progress
Pytorch implementation of "spectro-temporal attention-based voice activity detection"
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
Reading list for research topics in Sound AI
Conditional Diffusion Probabilistic Model for Speech Enhancement
Morse Code detection with eyes using Computer Vision