Starred repositories
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
🎤 Microphone sound source localization by SRP-PHAT and others numerical methods.(基于SRP-PHAT的麦克风声源定位)
A simple library for theoretical research on direction-of-arrival (DOA) estimation in array signal processing.
A set of MATLAB functions for direction-of-arrival (DOA) estimation in array signal processing.
关于语音信号声源定位DOA估计所用的一些传统算法
Cross-platform C++ library providing a simple API to read and write INI-style configuration files
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
This is the audio sample repository for speech separation model "MossFormer2".
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
CoreMark® is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded microcrontrollers (MCU).
Easy to use Beamformers for multi-channel speech separation/enhancement
simple delaysum, MVDR and CGMM-MVDR
Implementation of the CGMM-MVDR beamforming (for python version please refer to https://github.com/funcwj/setk)
Noise suppression plugin based on Xiph's RNNoise
Fast and accurate automatic speech recognition (ASR) for edge devices
The baseline system for the ICASSP2024 ICMC-ASR Challenge.
Multilingual Voice Understanding Model
The INI header-only library for Modern C++ supports reading and writing, even writing comments. It is cross-platform and can be used on multiple operating systems. - MIT license.
📺IPTV电视直播源更新项目『✨秒播级体验🚀』:支持IPv4/IPv6;支持自定义频道;支持本地源、组播源、酒店源、订阅源、关键字搜索;每天自动更新两次,结果可用于TVBox等播放软件;支持工作流、Docker(amd64/arm64/arm v7)、命令行、GUI运行方式 | IPTV live TV source update project
PortAudio is a cross-platform, open-source C language library for real-time audio input and output.
Jarvis:An intelligent assistant based voice control on Mac OS.中文版贾维斯Jarvis语音助手(电脑版Siri)
The Art World in Your Pocket or Your Trendy Tech Company's Tote, Artsy's mobile app.
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Production first, nn-based on-device signal processing toolkit.