Skip to content
View shuaijiang's full-sized avatar
🍉
summer
🍉
summer

Block or report shuaijiang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Audio Captioning datasets for PyTorch.

Python 114 6 Updated Nov 4, 2024
Python 3,433 265 Updated Feb 25, 2025

Align Anything: Training All-modality Model with Feedback

Python 2,309 327 Updated Feb 19, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,672 1,372 Updated Feb 1, 2025

Sample Repository for the AlibabaCloud Bailian Speech SDK

92 6 Updated Feb 14, 2025

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 108 6 Updated Feb 25, 2025

值得关注的中文twitter用户

Python 935 37 Updated Jan 8, 2025

so-vits-svc fork with realtime support, improved interface and more features.

Python 8,910 1,181 Updated Feb 24, 2025
15 1 Updated Jul 4, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 11,067 1,087 Updated Feb 25, 2025

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Python 1 Updated Aug 29, 2023

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Python 400 57 Updated Aug 29, 2023

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 122 8 Updated Feb 25, 2025

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 234 14 Updated Dec 16, 2024

【三年面试五年模拟】AI算法工程师面试秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。

1,140 170 Updated Feb 25, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,554 607 Updated Feb 25, 2025

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,156 91 Updated Dec 12, 2024

LLM101n: Let's build a Storyteller

31,964 1,735 Updated Aug 1, 2024

Sample code for the Microsoft Cognitive Services Speech SDK

C# 3,075 1,904 Updated Feb 21, 2025

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 11,062 861 Updated Aug 20, 2024

Unoffical implementation of Megatts2

Python 276 36 Updated Mar 23, 2024

🔊 Text-prompted Generative Audio Model - With the ability to clone voices

Jupyter Notebook 3,247 434 Updated Jun 12, 2024
Python 348 53 Updated Sep 3, 2024

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

Python 1 Updated Mar 13, 2024

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Python 3,483 333 Updated Feb 21, 2025

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 3,737 211 Updated Feb 25, 2025

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Python 870 211 Updated Mar 10, 2024

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,749 311 Updated Jan 8, 2025

Manipulate audio with a simple and easy high level interface

Python 9,194 1,073 Updated Jul 25, 2024
Next