Skip to content
View walker-hyf's full-sized avatar

Block or report walker-hyf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)

Python 52 2 Updated Nov 1, 2024

SOTA Open Source TTS

Python 18,937 1,434 Updated Feb 3, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,122 270 Updated Nov 5, 2024

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 9,903 1,504 Updated Jan 13, 2025

音视频流媒体权威资料整理,500+份文章,论文,视频,实践项目,协议,业界大神名单。

5,552 1,243 Updated May 20, 2024

2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍

9,520 2,900 Updated Jun 11, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 10,302 994 Updated Feb 8, 2025

CosyVoice在Windows环境下使用的版本

Python 578 84 Updated Nov 19, 2024

官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project

1,429 87 Updated Jul 3, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 37,458 4,676 Updated Aug 16, 2024

A generative speech model for daily dialogue.

Python 34,178 3,697 Updated Jan 25, 2025

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 53,117 5,178 Updated Jan 21, 2025

A 10000+ hours dataset for Chinese speech recognition

Shell 516 49 Updated Jul 3, 2023

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 590 76 Updated Dec 27, 2023

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)

Python 53 4 Updated Jun 20, 2024

Alignment files of LibriTTS.

61 7 Updated Mar 16, 2020

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,444 646 Updated Feb 3, 2025

[UNMAINTAINED] A reverse engineered Python API wrapper for Quora's Poe, which provides free access to ChatGPT, GPT-4, and Claude.

Python 2,508 318 Updated Sep 18, 2023

A Python wrapper for mkvmerge. It provides support for muxing, splitting, linking, chapters, tags, and attachments through the use of mkvmerge.

Python 76 28 Updated May 13, 2024

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Python 315 39 Updated Jul 22, 2024

A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)

Python 473 68 Updated Feb 7, 2024

SoftVC VITS Singing Voice Conversion

Python 26,469 4,907 Updated Nov 11, 2023

Command line utility for forced alignment using Kaldi

Python 1,392 252 Updated Dec 2, 2024

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Python 331 36 Updated Feb 17, 2022

This is the experimental description of MnTTS2.

Jupyter Notebook 9 3 Updated Apr 11, 2024

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…

Python 67,355 8,262 Updated Feb 8, 2025

Upload a photo of your room to generate your dream room with AI.

TypeScript 10,229 1,400 Updated Apr 20, 2024

Let us control diffusion models!

Python 31,383 2,806 Updated Feb 25, 2024

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)

Python 22 5 Updated Feb 22, 2024

MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline. (Accepted by IALP'2022)

Jupyter Notebook 16 4 Updated Dec 5, 2022
Next