Skip to content
View Germany321's full-sized avatar

Highlights

  • Pro

Block or report Germany321

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results
Python 40 1 Updated Oct 3, 2024
Python 968 125 Updated Oct 3, 2022
Python 342 13 Updated Jul 29, 2024

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

Python 119 14 Updated Sep 8, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,752 955 Updated Oct 11, 2024

The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.

Python 147 9 Updated Sep 19, 2024

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,504 667 Updated Aug 12, 2024

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 740 22 Updated Aug 9, 2024
Python 8,373 491 Updated Oct 9, 2024

Awesome-LLM: a curated list of Large Language Model

18,195 1,464 Updated Oct 7, 2024

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"

Python 43 1 Updated Aug 23, 2024

A brief introduction to the quaternions and its applications in 3D geometry.

HTML 1,721 273 Updated Sep 9, 2021

Mathematical Visual Instruction Tuning for Multi-modal Large Language Models

101 1 Updated Aug 5, 2024

GRUtopia: Dream General Robots in a City at Scale

Python 482 23 Updated Sep 5, 2024

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 326 13 Updated Oct 7, 2024
C++ 598 72 Updated Oct 11, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,723 113 Updated Sep 19, 2024

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 512 33 Updated Jan 7, 2024

😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.

66 1 Updated Oct 2, 2024

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 187 19 Updated Sep 26, 2024
JavaScript 2,426 858 Updated Jun 21, 2024

Kolmogorov Arnold Networks

Jupyter Notebook 14,813 1,356 Updated Sep 15, 2024

OpenEQA Embodied Question Answering in the Era of Foundation Models

Jupyter Notebook 212 20 Updated Sep 20, 2024

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,717 370 Updated Mar 14, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,721 447 Updated Sep 19, 2024
Python 281 7 Updated Jan 27, 2024

Code for "Biological Sequence Design with GFlowNets", 2022

Python 70 16 Updated Mar 17, 2023
Python 563 27 Updated Feb 15, 2024

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1,055 73 Updated Oct 8, 2024
Next