Skip to content
View SWivid's full-sized avatar

Highlights

  • Pro

Block or report SWivid

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 4,576 311 Updated Apr 12, 2025
Python 5 Updated Mar 23, 2025
Python 104 9 Updated Apr 11, 2025

ComfyUI node for F5-Text To Speech

Python 164 21 Updated Apr 5, 2025

Speech-to-text server framework with next-gen Kaldi

C++ 658 115 Updated Apr 3, 2025
Python 1,073 331 Updated Apr 10, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 459 29 Updated Apr 7, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,794 522 Updated Apr 7, 2025

F5-TTS 推理加速,速度提升约4倍!

Python 77 10 Updated Jan 6, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,750 2,171 Updated Apr 15, 2025

Turn any common eBook file into an HQ Audiobook with F5-TTS (Easy Install)

Python 22 1 Updated Mar 30, 2025

A Survey of Spoken Dialogue Models (60 pages)

288 16 Updated Nov 28, 2024

UTokyo-SaruLab MOS Prediction System

Python 168 16 Updated Apr 3, 2025

Easy-to-Use Speech MOS predictors

Python 276 16 Updated Oct 24, 2023

Implementation of F5-TTS in MLX

Python 519 54 Updated Mar 19, 2025

first base model for full-duplex conversational audio

Python 1,730 111 Updated Jan 5, 2025
JavaScript 60 21 Updated Mar 25, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,834 235 Updated Dec 5, 2024

Running the F5-TTS by ONNX Runtime

Python 147 22 Updated Apr 6, 2025

AI powered speech denoising and enhancement

Python 1,740 198 Updated Dec 3, 2024

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 20,232 1,488 Updated Mar 13, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,304 89 Updated Apr 15, 2025

Collection of Open Source Speech Data

153 6 Updated Nov 8, 2024

Multilingual G2P in 100 languages

Jupyter Notebook 320 25 Updated May 26, 2023

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,644 2,791 Updated Apr 16, 2025

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

C 4,949 991 Updated Apr 13, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 11,214 1,556 Updated Apr 14, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,924 243 Updated Apr 14, 2025

Text to speech alignment using CTC forced alignment

Python 262 47 Updated Mar 24, 2025
Next