Skip to content
View ZJHTerry18's full-sized avatar

Highlights

  • Pro

Block or report ZJHTerry18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 49 Updated Dec 20, 2024

M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.

Python 12 Updated Dec 12, 2024

Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)

Python 131 9 Updated Jan 19, 2025

SOTA Re-identification Methods and Toolbox

Python 3,517 845 Updated Jul 30, 2024

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 137 4 Updated Sep 10, 2024

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,894 264 Updated Jun 4, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

449 18 Updated Dec 14, 2024
Python 19 3 Updated Jun 28, 2024
Python 15 1 Updated May 13, 2024
Python 17 Updated Oct 8, 2024

XBM: Cross-Batch Memory for Embedding Learning

Python 307 39 Updated Dec 27, 2022

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,123 223 Updated Dec 3, 2024
Python 136 21 Updated Oct 31, 2024
27 Updated Jul 8, 2024

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)

Python 39 Updated Apr 15, 2024

Fine-Grained Egocentric Hand-Object Segmentation, ECCV 2022

Python 99 12 Updated Feb 26, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,692 134 Updated Jan 23, 2025

Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.

Python 56 2 Updated Sep 13, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,889 526 Updated Dec 25, 2024

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 311 30 Updated Jan 8, 2024

[BMVC2022, IJCV2023, Best Student Paper, Spotlight] Official codes for the paper "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation".

Python 21 3 Updated Aug 23, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,080 86 Updated Oct 21, 2024

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Python 184 22 Updated Jan 14, 2024
Python 3,316 297 Updated Oct 16, 2024

Pandora: Towards General World Model with Natural Language Actions and Video States

Python 494 35 Updated Sep 23, 2024

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 261 10 Updated Jul 17, 2024

This is the official code for MIME: Human-Aware 3D Scene Generation (CVPR2023)

Python 94 8 Updated Jun 22, 2023

Repository of TRUMANS

Python 139 7 Updated Jan 19, 2025

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 415 23 Updated Jan 18, 2025

Resolving 3D Human Pose Ambiguities with 3D Scene Constraints https://prox.is.tue.mpg.de

Python 221 18 Updated Jul 13, 2021
Next