
-
University of Oxford
- Oxford, United Kingdom
-
01:47
(UTC -12:00)
Highlights
- Pro
Stars
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
[ICLR 2025 Spotlight] Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
🐍 Geometric Computer Vision Library for Spatial AI
Official Pytorch Implementation for "SceneScape: Text-Driven Consistent Scene Generation"
[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
The Arcade Learning Environment (ALE) -- a platform for AI research.
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
Official Implementation of Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors
[3DV 2025]🐱🐶🐲🐮🐷Official Implementation of DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
A pure pytorch implementation of 3D gaussian Splatting
[CVPR2024 (Highlight)] RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D. Live Demo:https://modelscope.cn/studios/Damo_XR_Lab/3D_AIGC
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Rembg is a tool to remove images background