Highlights
- Pro
Stars
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
A Collection of Variational Autoencoders (VAE) in PyTorch.
李宏毅2021/2022/2023春季机器学习课程课件及作业
Inference and training library for high-quality TTS models.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
SGLang is a fast serving framework for large language models and vision language models.
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
OpenEQA Embodied Question Answering in the Era of Foundation Models
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
An open-source implementation for training LLaVA-NeXT.
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Awesome papers & datasets specifically focused on long-term videos.
800,000 step-level correctness labels on LLM solutions to MATH problems
SpeechGPT Series: Speech Large Language Models