Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A quick guide (especially) for trending instruction finetuning datasets
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Sky-T1: Train your own O1 preview model within $450
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
✨✨Latest Advances on Multimodal Large Language Models
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The repo provides information about KeSpeech dataset.
A generative world for general-purpose robotics & embodied AI learning.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Let's build better datasets, together!
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Ongoing research training transformer models at scale
Convert any PDF into a podcast episode!
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A high-throughput and memory-efficient inference and serving engine for LLMs
o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…
Convert any PDF into a podcast episode!
Prompt工程师指南,源自英文版,但增加了AIGC的prompt部分,为了降低同学们的学习门槛,翻译更新
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour