- Shanghai, China
- https://jiaxin-ye.github.io/
Lists (8)
Sort Name ascending (A-Z)
Stars
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation [Inoue+, CVPR2023]
A Framework of Small-scale Large Multimodal Models
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
AI powered speech denoising and enhancement
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Noise supression using deep filtering
EEG-Audio-Video Dataset for Emotion Recognition in Conversations
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
An unofficial PyTorch implementation of the audio LM VALL-E
[CVPR 2024🔥] EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
🔎 Search for YouTube videos, channels & playlists. Get 🎞 video & 📑 playlist info using link. Get search suggestions. WITHOUT YouTube Data API v3.
一个还算强大的Web思维导图。A relatively powerful web mind map.
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
This is the GitHub page for publicly available emotional speech data.
Automatic Depression Detection: a GRU/ BiLSTM-based Model and An Emotional Audio-Textual Corpus