-
Inner Mongolia University (IMU)
- Hohhot, China
- https://orcid.org/0009-0008-2276-1456
Stars
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
This repository contains demos I made with the Transformers library by HuggingFace.
音视频流媒体权威资料整理,500+份文章,论文,视频,实践项目,协议,业界大神名单。
2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A generative speech model for daily dialogue.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
A 10000+ hours dataset for Chinese speech recognition
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[UNMAINTAINED] A reverse engineered Python API wrapper for Quora's Poe, which provides free access to ChatGPT, GPT-4, and Claude.
A Python wrapper for mkvmerge. It provides support for muxing, splitting, linking, chapters, tags, and attachments through the use of mkvmerge.
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
SoftVC VITS Singing Voice Conversion
Command line utility for forced alignment using Kaldi
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
This is the experimental description of MnTTS2.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Upload a photo of your room to generate your dream room with AI.
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline. (Accepted by IALP'2022)