Starred repositories
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
Speech, Language, Audio, Music Processing with Large Language Model
A toolkit for making real world machine learning and data analysis applications in C++
An end-to-end chorus detection model DeepChorus.
Transformer based on a variant of attention that is linear complexity in respect to sequence length
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
chinese speech pretrained models
SpeechGPT Series: Speech Large Language Models
Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose" (Arxiv 2020) and "Predicting Personalized Head Movement From Short Video and Speech Signal" (TMM 2022)
Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021…
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Audio fingerprinting and recognition in Python
Olaf: Overly Lightweight Acoustic Fingerprinting is a portable acoustic fingerprinting system.
A repository for my MSc thesis in Data Science & Machine Learning @ NTUA. A deep learning approach to audio fingerprinting for recognizing songs on real time through the microphone.
Capture Screen, Audio, Cursor, Mouse Clicks and Keystrokes
eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are …
A deep learning project for automated chorus detection in songs, featuring a command-line interface (CLI) tool that allows users to input a YouTube link and utilize a pre-trained CRNN model to dete…
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Machine Learning Journal for Intermediate to Advanced Topics.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
THIS REPOSITORY IS JUST MIRROR! Main development repository is https://codeberg.org/Freedium-cfd/web