Stars
Refine high-quality datasets and visual AI models
[CVPR'24] Interactive3D: Create What You Want by Interactive 3D Generation
CustomDiffusion360: Customizing Text-to-Image Diffusion with Camera Viewpoint Control
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
[ECCV 2024] Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
[CVPR'24] Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Single Image to 3D using Cross-Domain Diffusion for 3D Generation
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Official implentation of "Ghost on the Shell: An Expressive Representation of General 3D Shapes" (ICLR 2024 Oral)
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
A collaboration friendly studio for NeRFs
NeRFshop: Interactive Editing of Neural Radiance Fields
A custom extension for sd-webui that with 3D modeling features (add/edit basic elements, load your custom model, modify scene and so on), then send screenshot to txt2img or img2img as your ControlN…
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Kandinsky 2 — multilingual text2image latent diffusion model