Skip to content
View Depth2World's full-sized avatar
  • University of Science and Technology of China

Block or report Depth2World

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

VLM-M

VLM for multi-images
9 repositories

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 10,257 1,002 Updated Nov 18, 2024

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]

Python 199 17 Updated Feb 17, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,512 2,364 Updated Aug 12, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,490 419 Updated Aug 7, 2024

Inference code for Llama models

Python 57,667 9,708 Updated Jan 26, 2025
Python 3,424 313 Updated Feb 13, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,929 237 Updated Feb 10, 2025

Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).

Python 151 4 Updated Sep 27, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 7,062 538 Updated Dec 25, 2024