Starred repositories
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Everything about the SmolLM & SmolLM2 family of models
proof of concept conversation orchestrator with a speech-language model
Joint speech-language model - respond directly to audio!
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue…
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
first base model for full-duplex conversational audio
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
An Open-source Streaming High-fidelity Neural Audio Codec
Realtime Video and Audio Streaming with WebRTC and Gradio
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Minimalistic large language model 3D-parallelism training
Fast and accurate automatic speech recognition (ASR) for edge devices
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.