ZJHTerry18

Zhao Jiahe ZJHTerry18

THU BEng'22 | UCAS MSc'25

1 follower · 2 following

Highlights

Lists (6)

Sort

Stars

shunlinlu / ScaMo_code

Python 49 Updated Dec 20, 2024

luomingshuang / M3GPT

M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.

Python 12 Updated Dec 12, 2024

ZzZZCHS / Chat-Scene

Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)

Python 131 9 Updated Jan 19, 2025

JDAI-CV / fast-reid

SOTA Re-identification Methods and Toolbox

Python 3,517 845 Updated Jul 30, 2024

TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 137 4 Updated Sep 10, 2024

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,894 264 Updated Jun 4, 2024

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

449 18 Updated Dec 14, 2024

ZhangCYG / U-RED

Python 19 3 Updated Jun 28, 2024

ZhangCYG / DDFHO

Python 15 1 Updated May 13, 2024

ZhangCYG / MOHO

Python 17 Updated Oct 8, 2024

msight-tech / research-xbm

XBM: Cross-Batch Memory for Embedding Learning

Python 307 39 Updated Dec 27, 2022

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,123 223 Updated Dec 3, 2024

RifleZhang / LLaVA-Hound-DPO

Python 136 21 Updated Oct 31, 2024

yaolinli / DeCo

27 Updated Jul 8, 2024

facebookresearch / ego4d-goalstep

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)

Python 39 Updated Apr 15, 2024

owenzlz / EgoHOS

Fine-Grained Egocentric Hand-Object Segmentation, ECCV 2022

Python 99 12 Updated Feb 26, 2024

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,692 134 Updated Jan 23, 2025

Becomebright / GroundVQA

Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.

Python 56 2 Updated Sep 13, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,889 526 Updated Dec 25, 2024

JialianW / GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 311 30 Updated Jan 8, 2024

BolinLai / GLC

[BMVC2022, IJCV2023, Best Student Paper, Spotlight] Official codes for the paper "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation".

Python 21 3 Updated Aug 23, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,080 86 Updated Oct 21, 2024

Yui010206 / SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Python 184 22 Updated Jan 14, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,316 297 Updated Oct 16, 2024

maitrix-org / Pandora

Pandora: Towards General World Model with Natural Language Actions and Video States

Python 494 35 Updated Sep 23, 2024

Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 261 10 Updated Jul 17, 2024