Multi-View Latent Diffusion Model (MV-LDM)

We propose an open-source multi-view diffusion model trained on RealEstate-10K dataset. The architecture follows similar structure as in CAT3D.

This is an extension of the codebase for

MET3R: Measuring Multi-View Consistency in Generated Images
Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, and Jan Eric Lenssen

Check out the project website here.

Installation

To get started, create a conda environment:

conda create -n diffsplat python=3.10
conda activate diffsplat

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Acquiring Datasets

Please move all dataset directories into a newly created datasets folder in the project root directory or modify the root path as part of the dataset config files in config/dataset.

RealEstate10k

For experiments on RealEstate10k, we use the same dataset version and preprocessing into chunks as pixelSplat. Please refer to their codebase here for information about how to obtain the data.

Acquiring Pre-trained Checkpoints

Trained checkpoint of MV-LDM for RealEstate10k are available on Hugging Face at asimbluemoon/mvldm-1.0.

Running the Code

Sampling RealEstate-10K Scenes

The main entry point is src/scripts/generate_mvldm.py. Call it via:

Important

Sampling requires a GPU with at least 16 GB of VRAM.

python -m src.scripts.generate_mvldm +experiment=baseline mode=test dataset.root="<root-path-to-re10k-dataset>" scene_id="<scene-id>" checkpointing.load="<path-to-checkpoint>" dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index/re10k_video.json test.sampling_mode=anchored test.num_anchors_views=4 test.output_dir=./outputs/mvldm

Note

scene_id="<scene-id>" either defines the specific integer index which refers to an ID of a scene ordered as in assets/evaluation_index/re10k_video.json or the sequence ID as a string e.g. "2d3f982ada31489c".

scene_id=25
scene_id="2d3f982ada31489c"

To limit the number of frames in a given sequence, add the test.limit_frames argument to the above command as integer, e.g.,

test.limit_frames=80

To define DDIM sampling steps, use the argument model.scheduler.num_inference_steps, e.g.,

model.scheduler.num_inference_steps=25

Training MV-LDM

Our code supports multi-GPU training. The above batch size is the per-GPU batch size.

Important

Training requires a GPU with at least 40 GB of VRAM.

python -m src.main +experiment=baseline
  mode=train
  dataset.root="<root-path-to-re10k-dataset>"
  hydra.run.dir="<runtime-dir>"
  hydra.job.name=train

Warning

In case of memory issues during training, we recommend lowering the batch size by appending data_loader.train.batch_size="<batch-size>" to the above command.

For running the training as a job chain on slurm or resuming training, always set the correct path in hydra.run.dir="<runtime-dir>" for each task.

BibTeX

If you are planning to use MV-LDM in your work, consider citing it as follows.

@misc{asim24met3r,
    title = {MET3R: Measuring Multi-View Consistency in Generated Images},
    author = {Asim, Mohammad and Wewer, Christopher and Wimmer, Thomas and Schiele, Bernt and Lenssen, Jan Eric},
    booktitle = {arXiv},
    year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.hydra		.hydra
assets		assets
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-View Latent Diffusion Model (MV-LDM)

Installation

Acquiring Datasets

RealEstate10k

Acquiring Pre-trained Checkpoints

Running the Code

Sampling RealEstate-10K Scenes

Training MV-LDM

BibTeX

About

Releases

Packages

Languages

License

mohammadasim98/mv-ldm

Folders and files

Latest commit

History

Repository files navigation

Multi-View Latent Diffusion Model (MV-LDM)

Installation

Acquiring Datasets

RealEstate10k

Acquiring Pre-trained Checkpoints

Running the Code

Sampling RealEstate-10K Scenes

Training MV-LDM

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages