Skip to content
View 2201957's full-sized avatar

Block or report 2201957

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Python 55 12 Updated Oct 7, 2022

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,147 1,476 Updated Sep 5, 2024

Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding (AAAI'23).

Python 17 1 Updated Apr 13, 2023
Python 33 1 Updated Apr 2, 2024

VMamba: Visual State Space Models,code is based on mamba

Python 2,544 173 Updated Mar 7, 2025

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Python 1,301 68 Updated Mar 29, 2025

Visualizer for neural network, deep learning and machine learning models

JavaScript 29,961 2,884 Updated Apr 18, 2025
C++ 10 Updated Mar 5, 2025

A curated list of research papers in Vision-Language Navigation (VLN)

204 32 Updated Apr 17, 2024

Reading list for research topics in embodied vision

598 75 Updated Feb 10, 2025

Fully open reproduction of DeepSeek-R1

Python 24,020 2,196 Updated Apr 18, 2025
Jupyter Notebook 17 3 Updated Oct 19, 2024

[ICCV 2023 Oral]: Scaling Data Generation in Vision-and-Language Navigation

Python 171 5 Updated Oct 8, 2024

Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"

Jupyter Notebook 272 33 Updated May 16, 2022

This is the official repository for MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation Learning towards Efficient Vision-and-Language Navigation

Python 10 Updated Jun 6, 2024

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 159 10 Updated Sep 20, 2024

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python 1,129 88 Updated Dec 12, 2023
Python 33 2 Updated Aug 19, 2023

[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

Python 177 12 Updated Jun 18, 2024

[CVPR 2025] RoomTour3D - Geometry-aware, cheap and automatic data from web videos for embodied navigation

Python 40 3 Updated Mar 17, 2025

OpenMMLab Detection Toolbox and Benchmark

Python 30,831 9,635 Updated Aug 21, 2024

Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)

Python 68 8 Updated Dec 4, 2024

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Python 982 57 Updated Oct 24, 2024

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,635 243 Updated Mar 25, 2025

This is the official implementation of Deep Orthogonal Hypersphere Compression for Anomaly Detection, ICLR 2024 (Spotlight).

Python 9 Updated Oct 7, 2024

Downloading a dataset from Airbnb

Python 19 11 Updated Oct 23, 2022

Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.

Jupyter Notebook 179 9 Updated Apr 17, 2022

[IEEE SPL 2023] CPR-CLIP: Multimodal Pre-training for Composite Error Recognition in CPR Training.

Python 8 Updated Sep 13, 2023

Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Python 76 5 Updated May 31, 2023
Next