2201957

Follow

2201957

Follow

3 followers · 14 following

Stars

arjunmajum / vln-bert

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Python 55 12 Updated Oct 7, 2022

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,147 1,476 Updated Sep 5, 2024

zehao-wang / LAD

Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding (AAAI'23).

Python 17 1 Updated Apr 13, 2023

LYX0501 / DiscussNav

Python 33 1 Updated Apr 2, 2024

MzeroMiko / VMamba

VMamba: Visual State Space Models，code is based on mamba

Python 2,544 173 Updated Mar 7, 2025

NVlabs / MambaVision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Python 1,301 68 Updated Mar 29, 2025

lutzroeder / netron

Visualizer for neural network, deep learning and machine learning models

JavaScript 29,961 2,884 Updated Apr 18, 2025

MuzK01 / VLN-Tutorial

C++ 10 Updated Mar 5, 2025

daqingliu / awesome-vln

A curated list of research papers in Vision-Language Navigation (VLN)

204 32 Updated Apr 17, 2024

ChanganVR / awesome-embodied-vision

Reading list for research topics in embodied vision

598 75 Updated Feb 10, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,020 2,196 Updated Apr 18, 2025

iSEE-Laboratory / VLN-PRET

Jupyter Notebook 17 3 Updated Oct 19, 2024

wz0919 / ScaleVLN

[ICCV 2023 Oral]: Scaling Data Generation in Vision-and-Language Navigation

Python 171 5 Updated Oct 8, 2024

huangwl18 / language-planner

Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"

Jupyter Notebook 272 33 Updated May 16, 2022

CrystalSixone / VLN-MAGIC

This is the official repository for MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation Learning towards Efficient Vision-and-Language Navigation

Python 10 Updated Jun 6, 2024

GengzeZhou / NavGPT-2

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 159 10 Updated Sep 20, 2024

lucidrains / CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python 1,129 88 Updated Dec 12, 2023

jialuli-luka / VLN-SIG

Python 33 2 Updated Aug 19, 2023

zd11024 / NaviLLM

[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

Python 177 12 Updated Jun 18, 2024

roomtour3d / roomtour3d-NaviLLM

[CVPR 2025] RoomTour3D - Geometry-aware, cheap and automatic data from web videos for embodied navigation

Python 40 3 Updated Mar 17, 2025

open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark

Python 30,831 9,635 Updated Aug 21, 2024

CrystalSixone / VLN-GOAT

Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)

Python 68 8 Updated Dec 4, 2024

AILab-CVC / UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Python 982 57 Updated Oct 24, 2024

OpenGVLab / InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,635 243 Updated Mar 25, 2025

jicongfan / Multi-Mode-Deep-Matrix-and-Tensor-Factorization

MATLAB 8 3 Updated May 4, 2022

wownice333 / DOHSC-DO2HSC

This is the official implementation of Deep Orthogonal Hypersphere Compression for Anomaly Detection, ICLR 2024 (Spotlight).

Python 9 Updated Oct 7, 2024

airbert-vln / bnb-dataset

Downloading a dataset from Airbnb

Python 19 11 Updated Oct 23, 2022

facebookresearch / SWAG

Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.

Jupyter Notebook 179 9 Updated Apr 17, 2022

Shunli-Wang / CPR-CLIP

[IEEE SPL 2023] CPR-CLIP: Multimodal Pre-training for Composite Error Recognition in CPR Training.

Python 8 Updated Sep 13, 2023

jialuli-luka / PanoGen

Code and Data for Paper: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Python 76 5 Updated May 31, 2023