Starred repositories
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Curated list of datasets and tools for post-training.
An Open Large Reasoning Model for Real-World Solutions
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
A modular graph-based Retrieval-Augmented Generation (RAG) system
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
✨✨Latest Advances on Multimodal Large Language Models
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A Self-Training Framework for Vision-Language Reasoning
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总
致力于实习/校招/社招进大厂打法,计算机基础知识学习,C++、Java、算法学习路线,专注于编程打法!
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
MICCAI 2024 - Loose Lesion Location Self-supervision Enhanced Colorectal Cancer Diagnosis
GPT4V-level open-source multi-modal model based on Llama3-8B
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路