Dreambooth-Depth2Img

This repository is for fine-tuning Dreambooth-Depth2Img which is originally suggested in stabilityai/stable-diffusion-2-depth.

This repository is implemented based on other repositories.

Depthmap Generation

Monocular Depth Estimation

To use 512 dimension depth map estimation, we use MiDas pretrained model.

$ cd depth/monodepth/weights
$ wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

Above command will save pretrained MiDas model into appropriate folder.

$ cd depth/monodepth
$ python estimate_depth.py --input_path ${SOURCE_DIR} --output_path ${DEPTH_DIR} --grayscale

Above command will inference depthmap of images in ${SOURCE_DIR}.

Edge Estimation

To add edge information to depthmap, we can use Canny Edge Detector.

$ cd depth/depth/edge
$ python estimate_edge.py --input_path ${SOURCE_DIR} --output_path ${EDGE_DIR}

(Optional) Face Parsing

If you are using facial data for training, you can add face parsing information to depthmap and generate final depth image.

$ python edit_depth.py --input_path ${SOURCE_DIR} --depth_path ${DEPTH_DIR} --edge_path ${EDGE_DIR} --output_path ${FINAL_DEPTH_DIR}

Above command uses original depth and edge images and apply it to final depthmap.

Fine Tuning

You can fine tune the model using below command. You should prepare for the few shot images and the corresponding prompt.

$ python train_dreambooth.py  \
--mixed_precision "fp16" \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-depth  \
--pretrained_txt2img_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \
--train_text_encoder \
--instance_data_dir ${FEWSHOT_IMAGES_DIR} \
--output_dir ${CHECKPOINT_SAVE_DIR} \
--instance_prompt ${FEWSHOT_OBJECT_PROMPT} \
--resolution 512 \
--train_batch_size 4 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-6 \
--lr_scheduler "constant" \
--lr_warmup_steps 0 \
--max_train_steps 500 \
--use_8bit_adam

Above command will generated overfitted, bias text encoder embedding space since it doesn't have prior preservation term. Use below command for use prior preservation.

python train_dreambooth.py  \
--mixed_precision="fp16" \
--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-depth  \
--pretrained_txt2img_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
--train_text_encoder  \
--instance_data_dir ${FEWSHOT_IMAGES_DIR} \
--class_data_dir ${PRIOR_IMAGES_DIR} \
--output_dir ${CHECKPOINT_SAVE_DIR} \
--with_prior_preservation \
--prior_loss_weight 1.0 \
--instance_prompt ${FEWSHOT_OBJECT_PROMPT} \
--class_prompt ${PRIOR_IMAGE_PROMPT} \
--resolution 512 \
--train_batch_size 4 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-6 \
--lr_scheduler "constant" \
--lr_warmup_steps 0 \
--num_class_images 200 \
--max_train_steps 300 \
--use_8bit_adam

Inference

By using fine tuned model, you can inference images with below command.

$ python inference.py  \
--ckpt ${CHECKPOINT_DIR} \
--source_dir ${IMAGES_FOR_INFERENCE} \
--save_dir ${IMAGE_SAVE_DIR} \
--seed 0 \
--positive_prompt ${POSITIVE_PROMPT} \
--negative_prompt ${NEGATIVE_PROMPT} \
--output_only

(Optional) Create Truncated FFHQ

If you want to transfer FFHQ data, I recommend to use gaussian-truncated version of generated FFHQ. You can download pretrained model in here and convert it into .pt format using this method. Put the converted .pt file into position such as ffhq/pretrain_models/stylegan2-ffhq-config-f.pt.

$ cd ffhq
$ python create_ffhq.py --num ${NUM_TO_GENERATE} --model_path ${PRETRAINED_MODEL_PATH} --output_path ${IMAGE_SAVE_DIR}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
depth		depth
ffhq		ffhq
.gitignore		.gitignore
README.md		README.md
create_grid.py		create_grid.py
inference.py		inference.py
requirements.txt		requirements.txt
train_dreambooth.py		train_dreambooth.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dreambooth-Depth2Img

Depthmap Generation

Monocular Depth Estimation

Edge Estimation

(Optional) Face Parsing

Fine Tuning

Inference

(Optional) Create Truncated FFHQ

About

Releases

Packages

Languages

SShowbiz/Dreambooth-D2I

Folders and files

Latest commit

History

Repository files navigation

Dreambooth-Depth2Img

Depthmap Generation

Monocular Depth Estimation

Edge Estimation

(Optional) Face Parsing

Fine Tuning

Inference

(Optional) Create Truncated FFHQ

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages