Skip to content
View ChChwang's full-sized avatar

Block or report ChChwang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

92 3 Updated Jan 27, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 15,822 2,083 Updated Feb 1, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,430 1,319 Updated Feb 11, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,051 1,297 Updated Feb 1, 2025

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Python 261 34 Updated Feb 13, 2025
Python 12 Updated Jul 18, 2024

从底层机理了解Transformer

26 1 Updated Mar 4, 2022

[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"

Python 302 16 Updated Sep 4, 2024

Offical PyTorch implementation of "BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework"

Python 782 108 Updated Apr 5, 2023

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Python 2,450 443 Updated Jul 31, 2024
Jupyter Notebook 14 2 Updated Apr 23, 2024

SECOND for KITTI/NuScenes object detection

Python 1,732 720 Updated Oct 14, 2022

[ECCV 2024] OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

Python 63 Updated Sep 26, 2024

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

495 20 Updated Mar 21, 2024

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 266 8 Updated Jan 8, 2025

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

Python 3,549 571 Updated Aug 15, 2024

[CVPR 2022] PointCLIP: Point Cloud Understanding by CLIP

Python 356 33 Updated Nov 24, 2022

[CVPR 2023] Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection

Python 67 5 Updated May 11, 2023

[AAAI 2024] BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

Python 55 5 Updated Jan 7, 2025

[ECCV 2024] Embodied Understanding of Driving Scenarios

Python 174 12 Updated Jan 2, 2025

HEDNet (NeurIPS 2023) & SAFDNet (CVPR 2024 Oral)

Python 137 11 Updated Sep 28, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,346 3,437 Updated Jul 23, 2024

🤖 PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

Python 1,226 323 Updated Sep 7, 2022

🤖 PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

Python 1 Updated Sep 7, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Python 60 11 Updated Jan 4, 2022

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 10,236 1,000 Updated Nov 18, 2024

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Python 124 6 Updated Nov 7, 2024

[ICCV 2023] SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Python 208 22 Updated Aug 16, 2024
Next