[ CVPR2024 Highlight ] MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

[ TPAMI2024 ] MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

COCO-MIG Bench:

Online Demo on Colab:

[MIGC Paper] [MIGC++ Paper] [Project Page] [ZhiHu(知乎)]

🔥🔥🔥 News

2024-07-03: Iterative editing mode "Consistent-MIG" in MIGC++ is available!
2024-11-24: Our paper "MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis" has been accepted by TPAMI.
2024-12-23: We have released the pretrained weights of MIGC++, which can simultaneously use masks and boxes to specify instance locations.

To Do List

Gallery

Installation

Conda environment setup

conda create -n MIGC_diffusers python=3.9 -y
conda activate MIGC_diffusers
pip install -r requirement.txt
pip install -e .

Checkpoints

Download the MIGC_SD14.ckpt (219M) and put it under the 'pretrained_weights' folder.

├── pretrained_weights
│   ├── MIGC_SD14.ckpt
├── migc
│   ├── ...
├── bench_file
│   ├── ...

If you want to use MIGC++, please download the MIGC++_SD14.ckpt (191M) and put it under the 'pretrained_weights' folder. Note: Due to our collaborator's request, I can't release the original weights. These are re-implemented weights, trained with a smaller batch size.

├── pretrained_weights
│   ├── MIGC++_SD14.ckpt
├── migc
│   ├── ...
├── bench_file
│   ├── ...

Single Image Generation

By using the following command, you can quickly generate an image with MIGC.

CUDA_VISIBLE_DEVICES=0 python inference_single_image.py

The following is an example of the generated image based on stable diffusion v1.4.

By using the following command, you can quickly generate an image with MIGC++, where both the box and mask are used to control the instance location.

CUDA_VISIBLE_DEVICES=0 python migc_plus_inference_single_image.py

The following are examples of the generated images using MIGC++.

🚀 Enhanced Attribute Control: For those seeking finer control over attribute management, consider exploring the python inferencev2_single_image.py script. This advanced version, InferenceV2, offers a significant improvement in mitigating attribute leakage issues. By accepting a slight increase in inference time, it enhances the Instance Success Ratio from 66% to an impressive 68% on COCO-MIG Benchmark. It is worth mentioning that increasing the NaiveFuserSteps in inferencev2_single_image.py can also gain stronger attribute control.

💡 Versatile Image Generation: MIGC stands out as a plug-and-play controller, enabling the creation of images with unparalleled variety and quality. By simply swapping out different base generator weights, you can achieve results akin to those showcased in our Gallery. For instance:

🌆 RV60B1: Ideal for those seeking lifelike detail, RV60B1 specializes in generating images with stunning realism.
🎨 Cetus-Mix and Ghost: These robust base models excel in crafting animated content.

[New] 🌈 Iterative Editing Mode: The Consistent-MIG algorithm improves the iterative MIG capabilities of MIGC facilitating modifying certain instances in MIG while preserving consistency in unmodified regions and maximizing the ID consistency of modified instances. You can explore the python inference_consistent_mig.py script to know the usage. For instance:

Training

Due to company requirements, we are unable to open the MIGC training code. For now, the best we can do is to provide the community with the script we use to process the COCO dataset data (i.e., obtaining each instance's box and caption). The relevant code is placed in the 'data_preparation' folder. If there are any changes in the future, such as if they grant permission, we will make it open source.

COCO-MIG Bench

To validate the model's performance in position and attribute control, we designed the COCO-MIG benchmark for evaluation and validation.

By using the following command, you can quickly run inference on our method on the COCO-MIG bench:

CUDA_VISIBLE_DEVICES=0 python inference_mig_benchmark.py

We sampled 800 images and compared MIGC with InstanceDiffusion, GLIGEN, etc. On COCO-MIG Benchmark, the results are shown below.

Method	MIOU↑						Instance Success Rate↑						Model Type	Publication
Method	L2	L3	L4	L5	L6	Avg	L2	L3	L4	L5	L6	Avg	Model Type	Publication
Box-Diffusion	0.37	0.33	0.25	0.23	0.23	0.26	0.28	0.24	0.14	0.12	0.13	0.16	Training-free	ICCV2023
Gligen	0.37	0.29	0.253	0.26	0.26	0.27	0.42	0.32	0.27	0.27	0.28	0.30	Adapter	CVPR2023
ReCo	0.55	0.48	0.49	0.47	0.49	0.49	0.63	0.53	0.55	0.52	0.55	0.55	Full model tuning	CVPR2023
InstanceDiffusion	0.52	0.48	0.50	0.42	0.42	0.46	0.58	0.52	0.55	0.47	0.47	0.51	Adapter	CVPR2024
Ours	0.64	0.58	0.57	0.54	0.57	0.56	0.74	0.67	0.67	0.63	0.66	0.66	Adapter	CVPR2024

MIGC-GUI

We have combined MIGC and GLIGEN-GUI to make art creation more convenient for users. 🔔This GUI is still being optimized. If you have any questions or suggestions, please contact me at [email protected].

Start with MIGC-GUI

Step 1: Download the MIGC_SD14.ckpt and place it in pretrained_weights/MIGC_SD14.ckpt. 🚨If you have already completed this step during the Installation phase, feel free to skip it.

Step 2: Download the CLIPTextModel and place it in migc_gui_weights/clip/text_encoder/pytorch_model.bin.

Step 3: Download the CetusMix model and place it in migc_gui_weights/sd/cetusMix_Whalefall2.safetensors. Alternatively, you can visit civitai to download other models of your preference and place them in migc_gui_weights/sd/.

├── pretrained_weights
│   ├── MIGC_SD14.ckpt
├── migc_gui_weights
│   ├── sd
│   │   ├── cetusMix_Whalefall2.safetensors
│   ├── clip
│   │   ├── text_encoder
│   │   │   ├── pytorch_model.bin
├── migc_gui
│   ├── app.py

Step 4: cd migc_gui

Step 5: Launch the application by running python app.py --port=3344. You can now access the MIGC GUI through http://localhost:3344/. Feel free to switch the port as per your convenience.

Consistent-MIG in MIGC-GUI

Tick the button EditMode in area IMAGE DIMENSIONS and try it!

MIGC + LoRA

MIGC can achieve powerful attribute-and-position control capabilities while combining with LoRA. 🚀 We will integrate this function into MIGC-GUI in the future, so stay tuned! 🌟👀

Ethical Considerations

The broad spectrum of image creation possibilities offered by MIGC might present comparable ethical dilemmas to those encountered with numerous other methods of generating images from text.

🏫About us

Thank you for your interest in this project. The project is supervised by the ReLER Lab at Zhejiang University’s College of Computer Science and Technology and HUAWEI. ReLER was established by Yang Yi, a Qiu Shi Distinguished Professor at Zhejiang University. Our dedicated team of contributors includes Dewei Zhou, You Li, Ji Xie, Fan Ma, Zongxin Yang, Yi Yang.

Contact us

If you have any questions, feel free to contact me via email [email protected]

Acknowledgements

Our work is based on stable diffusion, diffusers, CLIP, and GLIGEN-GUI. We appreciate their outstanding contributions.

Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@inproceedings{zhou2024migc,
  title={Migc: Multi-instance generation controller for text-to-image synthesis},
  author={Zhou, Dewei and Li, You and Ma, Fan and Zhang, Xiaoting and Yang, Yi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6818--6828},
  year={2024}
}

@article{zhou2024migc++,
  title={Migc++: Advanced multi-instance generation controller for image synthesis},
  author={Zhou, Dewei and Li, You and Ma, Fan and Yang, Zongxin and Yang, Yi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ CVPR2024 Highlight ] MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

[ TPAMI2024 ] MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

[MIGC Paper] [MIGC++ Paper] [Project Page] [ZhiHu(知乎)]

🔥🔥🔥 News

To Do List

Gallery

Installation

Conda environment setup

Checkpoints

Single Image Generation

Training

COCO-MIG Bench

MIGC-GUI

Start with MIGC-GUI

Consistent-MIG in MIGC-GUI

MIGC + LoRA

Ethical Considerations

🏫About us

Contact us

Acknowledgements

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
bench_file		bench_file
data_preparation		data_preparation
figures		figures
migc		migc
migc_gui		migc_gui
migc_gui_weights		migc_gui_weights
migc_plus		migc_plus
pretrained_weights		pretrained_weights
videos		videos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Rainbow-Party-2.ttf		Rainbow-Party-2.ttf
inference_consistent_mig.py		inference_consistent_mig.py
inference_mig_benchmark.py		inference_mig_benchmark.py
inference_single_image.py		inference_single_image.py
inferencev2_mig_benchmark.py		inferencev2_mig_benchmark.py
inferencev2_single_image.py		inferencev2_single_image.py
migc_plus_inference_single_image.py		migc_plus_inference_single_image.py
requirement.txt		requirement.txt
setup.py		setup.py

License

limuloo/MIGC

Folders and files

Latest commit

History

Repository files navigation

[ CVPR2024 Highlight ] MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

[ TPAMI2024 ] MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

[MIGC Paper] [MIGC++ Paper] [Project Page] [ZhiHu(知乎)]

🔥🔥🔥 News

To Do List

Gallery

Installation

Conda environment setup

Checkpoints

Single Image Generation

Training

COCO-MIG Bench

MIGC-GUI

Start with MIGC-GUI

Consistent-MIG in MIGC-GUI

MIGC + LoRA

Ethical Considerations

🏫About us

Contact us

Acknowledgements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages