Curated by Xianzheng Ma and Yash Bhalgat
🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.
ID | keywords | Institute (first) | Paper | Publication | Others |
---|---|---|---|---|---|
1 | 3D-CLR | UCLA | 3D Concept Learning and Reasoning from Multi-View Images | CVPR'2023 | github |
2 | Transcribe3D | TTI, Chicago | Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning | CoRL'2023 | github |
ID | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
1 | 3D-GPT | ANU | 3D-GPT: PROCEDURAL 3D MODELING WITH LARGE LANGUAGE MODELS | Arxiv | github |
2 | MeshGPT | TUM | MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | Arxiv | project |
3 | ShapeGPT | Fudan University | ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model | Arxiv | github |
4 | DreamLLM | MEGVII & Tsinghua | DreamLLM: Synergistic Multimodal Comprehension and Creation | Arxiv | github |
5 | LLMR | MIT, RPI & Microsoft | LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | Arxiv | github |
6 | ChatAvatar | Deemos Tech | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | ACM TOG | website |
ID | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
1 | RT-1 | RT-1: Robotics Transformer for Real-World Control at Scale | Arxiv | github | |
2 | RT-2 | Google-DeepMind | RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Arxiv | github |
3 | SayPlan | QUT Centre for Robotics | SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning | CoRL'2023 | github |
4 | UniHSI | Shanghai AI Lab | Unified Human-Scene Interaction via Prompted Chain-of-Contacts | Arxiv | github |
5 | LLM-Planner | The Ohio State University | LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | ICCV'2023 | github |
6 | STEVE | ZJU & UW | See and Think: Embodied Agent in Virtual Environment | Arxiv | github |
7 | SceneDiffuser | BIGAI | Diffusion-based Generation, Optimization, and Planning in 3D Scenes | Arxiv | github |
8 | LEO | BIGAI | An Embodied Generalist Agent in 3D World | Arxiv | github |
ID | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
1 | ScanQA | RIKEN AIP | ScanQA: 3D Question Answering for Spatial Scene Understanding | CVPR'2023 | github |
2 | ScanRefer | TUM | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | ECCV'2020 | github |
3 | Scan2Cap | TUM | Scan2Cap: Context-aware Dense Captioning in RGB-D Scans | CVPR'2021 | github |
4 | SQA3D | BIGAI | SQA3D: Situated Question Answering in 3D Scenes | ICLR'2023 | github |
5 | - | DeepMind & UCL | Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects | Arxiv | github |
This is an active repository and your contributions are always welcome!
I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.
If you have any questions about this opinionated list, please get in touch at [email protected].
This repo is inspired by Awesome-LLM