Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Witness the aha moment of VLM with less than $3.
R1-onevision, a visual language model capable of deep CoT reasoning.
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Solve Visual Understanding with Reinforced VLMs
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
YOLO-UniOW: Efficient Universal Open-World Object Detection
Awesome-LLM: a curated list of Large Language Model
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
✨✨Latest Advances on Multimodal Large Language Models
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Quick exploration into fine tuning florence 2
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Everything about the SmolLM2 and SmolVLM family of models
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
A curated list of reinforcement learning with human feedback resources (continually updated)
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
A treasure chest for visual classification and recognition powered by PaddlePaddle
AirLLM 70B inference with single 4GB GPU
OpenShot Video Library (libopenshot) is a free, open-source project dedicated to delivering high quality video editing, animation, and playback solutions to the world. API currently supports C++, P…
VNN是由欢聚集团(Joyy Inc.)推出的高性能、轻量级神经网络部署框架。目前已为Hago、VOO、VFly、马克相机等App提供20余种AI能力的支持,覆盖直播、短视频、视频编辑等泛娱乐场景和工程场景