Skip to content
View fanlu's full-sized avatar

Organizations

@wenet-e2e

Block or report fanlu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,427 177 Updated Apr 11, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 44,298 6,781 Updated Apr 11, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 13,105 1,486 Updated Apr 11, 2025

Fully open reproduction of DeepSeek-R1

Python 23,874 2,175 Updated Apr 11, 2025

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.

Go 13,906 1,703 Updated Apr 11, 2025

百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断

Python 1,096 192 Updated Mar 15, 2025

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

C 324 69 Updated Jan 31, 2025

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Python 1,055 80 Updated Aug 24, 2024

Build your own AI friend

JavaScript 540 225 Updated Jan 26, 2025

Build your own AI friend

C++ 11,188 2,117 Updated Apr 10, 2025

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,789 1,160 Updated Apr 24, 2024

Noise supression using deep filtering

Python 2,969 275 Updated Oct 17, 2024

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,558 196 Updated Mar 28, 2025

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaki…

Python 5,581 629 Updated Apr 10, 2025

SOTA Open Source TTS

Python 20,593 1,626 Updated Apr 7, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,823 234 Updated Dec 5, 2024

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,201 96 Updated Mar 4, 2025

Speech, Language, Audio, Music Processing with Large Language Model

Python 776 77 Updated Mar 11, 2025

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 152 7 Updated Mar 14, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 28,397 3,544 Updated Jul 23, 2024

Ongoing research training transformer models at scale

Python 12,054 2,700 Updated Apr 11, 2025

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 836 67 Updated Aug 27, 2024

Deezer source separation library including pretrained models.

Python 26,668 2,916 Updated Apr 2, 2025

an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具,本地化网页操作,无需连接外网

Python 1,512 174 Updated Nov 26, 2024

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 21,557 2,666 Updated Apr 11, 2025

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 4,402 506 Updated Mar 11, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 31,690 3,227 Updated Jan 7, 2025

Fast and memory-efficient exact attention

Python 16,821 1,599 Updated Apr 11, 2025

The official Meta Llama 3 GitHub site

Python 28,606 3,357 Updated Jan 26, 2025
Next