Stars
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
An open-source invisible desktop application to help you pass your technical interviews.
Real time interactive streaming digital human
Awesome Digital Human
Digital Human Resource Collection: 2D/3D/4D human modeling, avatar generation & animation, clothed people digitalization, virtual try-on, and others.
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
【三年面试五年模拟】AI算法工程师面试秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、SLAM、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
🛠「Watt Toolkit」是一个开源跨平台的多功能 Steam 工具箱。
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
A PixiJS plugin to display Live2D models of any kind.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).
JS Library to estimate the Mean Opinion Score (MOS) for Real Time audio & video communications
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …
免费,可商用,Java AI 人工智能一站式解决方案,为工作减负,为产品研发加速。项目类别包括:Java版 Pytorch 训练引擎,AI SDK,web应用等在内,合计超过100个项目组成的项目集。| Artificial Intelligence Accelerator Kit. It provides: a project collection consisting of over 1…
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
A python module to repair invalid JSON, commonly used to parse the output of LLMs
Provide best practices for LMOps, as well as elegant and convenient access to the features of the Qianfan MaaS Platform. (提供大模型工具链最佳实践,以及优雅且便捷地访问千帆大模型平台)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
text to speech using autoregressive transformer and VITS
ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT