-
Nanyang Technological University
- Singapore
- https://scholar.google.com/citations?user=iAT_5-kAAAAJ&hl=en
Stars
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
[TPAMI 2024 & CVPR 2023] PyTorch code for DGM4: Detecting and Grounding Multi-Modal Media Manipulation and beyond
Variational Bayes HMM over x-vectors diarization
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
A torch implementation of a recursion which turns out to be useful for RNN-T.
Robust Speech Recognition via Large-Scale Weak Supervision
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Source code for: Efficient Self-supervised Learning Representations for Spoken Language Identification
[NeurIPS 2023] Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective
Final project for the Speaker Recognition course on Udemy, 机器之心, 深蓝学院 and 语音之家
LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解,记录自己的leetcode解题之路。)
PHO-LID: A Unified Model to Incorporate Acoustic-Phonetic and Phonotactic Information for Language Identification