fanlu

fanlu

33 followers · 10 following

Achievements

Organizations

Stars

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,427 177 Updated Apr 11, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 44,298 6,781 Updated Apr 11, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 13,105 1,486 Updated Apr 11, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 23,874 2,175 Updated Apr 11, 2025

deepseek-ai / DeepSeek-V3

Python 95,426 15,475 Updated Apr 9, 2025

bluenviron / mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.

Go 13,906 1,703 Updated Apr 11, 2025

wwbin2017 / bailing

百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断

Python 1,096 192 Updated Mar 15, 2025

gkonovalov / android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

C 324 69 Updated Jan 31, 2025

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Python 1,055 80 Updated Aug 24, 2024

78 / xiaozhi

Build your own AI friend

JavaScript 540 225 Updated Jan 26, 2025

78 / xiaozhi-esp32

Build your own AI friend

C++ 11,188 2,117 Updated Apr 10, 2025

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,789 1,160 Updated Apr 24, 2024

Rikorose / DeepFilterNet

Noise supression using deep filtering

Python 2,969 275 Updated Oct 17, 2024

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,558 196 Updated Mar 28, 2025

TEN-framework / TEN-Agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaki…

Python 5,581 629 Updated Apr 10, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 20,593 1,626 Updated Apr 7, 2025

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,823 234 Updated Dec 5, 2024

bytedance / SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,201 96 Updated Mar 4, 2025

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 776 77 Updated Mar 11, 2025

SpeechColab / GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 152 7 Updated Mar 14, 2025

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 28,397 3,544 Updated Jul 23, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 12,054 2,700 Updated Apr 11, 2025

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 836 67 Updated Aug 27, 2024

deezer / spleeter

Deezer source separation library including pretrained models.

Python 26,668 2,916 Updated Apr 2, 2025

jianchang512 / vocal-separate

an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具，本地化网页操作，无需连接外网

Python 1,512 174 Updated Nov 26, 2024

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 21,557 2,666 Updated Apr 11, 2025

modelscope / FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 4,402 506 Updated Mar 11, 2025

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 31,690 3,227 Updated Jan 7, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,821 1,599 Updated Apr 11, 2025

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 28,606 3,357 Updated Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fanlu

Achievements

Achievements

Organizations

Block or report fanlu

Stars

QwenLM / Qwen2.5-Omni

vllm-project / vllm

sgl-project / sglang

huggingface / open-r1

deepseek-ai / DeepSeek-V3

bluenviron / mediamtx

wwbin2017 / bailing

gkonovalov / android-vad

ictnlp / StreamSpeech

78 / xiaozhi

78 / xiaozhi-esp32

facebookresearch / demucs

Rikorose / DeepFilterNet

modelscope / ClearerVoice-Studio

TEN-framework / TEN-Agent

fishaudio / fish-speech

THUDM / GLM-4-Voice

bytedance / SALMONN

X-LANCE / SLAM-LLM

SpeechColab / GigaSpeech2

openai / CLIP

NVIDIA / Megatron-LM

OpenMOSS / AnyGPT

deezer / spleeter

jianchang512 / vocal-separate

HumanSignal / label-studio

modelscope / FunClip

myshell-ai / OpenVoice

Dao-AILab / flash-attention

meta-llama / llama3