Stars
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
A web client for ScreenAgent: Let Large Models Control Your Desktop
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
No fortress, purely open ground. OpenManus is Coming.
A curated list of recent diffusion models for video generation, editing, and various other applications.
PyTorch implementation of Pointnet2/Pointnet++
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[ECCV2024] UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
3D Occupancy Prediction Benchmark in Autonomous Driving
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
An open-source overseas graduate application information-sharing platform for ShanghaiTech University
Vector (and Scalar) Quantization, in Pytorch
Collect some World Models for Autonomous Driving (and Robotic) papers.
[CVPR 2024 Oral, Best Paper Award Candidate] Official repository of "PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness"
⏰ AI conference deadline countdowns
A list of papers and datasets about point cloud analysis (processing) since 2017. Update every day!
awesome-autonomous-driving
[Information Fusion 2025] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective
An easy calibration toolbox for VECtor Benchmark