Skip to content

SShowbiz/Dreambooth-D2I

Repository files navigation

Dreambooth-Depth2Img

This repository is for fine-tuning Dreambooth-Depth2Img which is originally suggested in stabilityai/stable-diffusion-2-depth.

This repository is implemented based on other repositories.

  1. epitaque/dreambooth_depth2img
  2. isl-org/MiDaS
  3. zllrunning/face-parsing.PyTorch

Depthmap Generation

Monocular Depth Estimation

To use 512 dimension depth map estimation, we use MiDas pretrained model.

$ cd depth/monodepth/weights
$ wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

Above command will save pretrained MiDas model into appropriate folder.

$ cd depth/monodepth
$ python estimate_depth.py --input_path ${SOURCE_DIR} --output_path ${DEPTH_DIR} --grayscale

Above command will inference depthmap of images in ${SOURCE_DIR}.

Edge Estimation

To add edge information to depthmap, we can use Canny Edge Detector.

$ cd depth/depth/edge
$ python estimate_edge.py --input_path ${SOURCE_DIR} --output_path ${EDGE_DIR}

(Optional) Face Parsing

If you are using facial data for training, you can add face parsing information to depthmap and generate final depth image.

$ python edit_depth.py --input_path ${SOURCE_DIR} --depth_path ${DEPTH_DIR} --edge_path ${EDGE_DIR} --output_path ${FINAL_DEPTH_DIR}

Above command uses original depth and edge images and apply it to final depthmap.

Fine Tuning

You can fine tune the model using below command. You should prepare for the few shot images and the corresponding prompt.

$ python train_dreambooth.py  \
--mixed_precision "fp16" \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-depth  \
--pretrained_txt2img_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \
--train_text_encoder \
--instance_data_dir ${FEWSHOT_IMAGES_DIR} \
--output_dir ${CHECKPOINT_SAVE_DIR} \
--instance_prompt ${FEWSHOT_OBJECT_PROMPT} \
--resolution 512 \
--train_batch_size 4 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-6 \
--lr_scheduler "constant" \
--lr_warmup_steps 0 \
--max_train_steps 500 \
--use_8bit_adam

Above command will generated overfitted, bias text encoder embedding space since it doesn't have prior preservation term. Use below command for use prior preservation.

python train_dreambooth.py  \
--mixed_precision="fp16" \
--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-depth  \
--pretrained_txt2img_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
--train_text_encoder  \
--instance_data_dir ${FEWSHOT_IMAGES_DIR} \
--class_data_dir ${PRIOR_IMAGES_DIR} \
--output_dir ${CHECKPOINT_SAVE_DIR} \
--with_prior_preservation \
--prior_loss_weight 1.0 \
--instance_prompt ${FEWSHOT_OBJECT_PROMPT} \
--class_prompt ${PRIOR_IMAGE_PROMPT} \
--resolution 512 \
--train_batch_size 4 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-6 \
--lr_scheduler "constant" \
--lr_warmup_steps 0 \
--num_class_images 200 \
--max_train_steps 300 \
--use_8bit_adam

Inference

By using fine tuned model, you can inference images with below command.

$ python inference.py  \
--ckpt ${CHECKPOINT_DIR} \
--source_dir ${IMAGES_FOR_INFERENCE} \
--save_dir ${IMAGE_SAVE_DIR} \
--seed 0 \
--positive_prompt ${POSITIVE_PROMPT} \
--negative_prompt ${NEGATIVE_PROMPT} \
--output_only

(Optional) Create Truncated FFHQ

If you want to transfer FFHQ data, I recommend to use gaussian-truncated version of generated FFHQ. You can download pretrained model in here and convert it into .pt format using this method. Put the converted .pt file into position such as ffhq/pretrain_models/stylegan2-ffhq-config-f.pt.

$ cd ffhq
$ python create_ffhq.py --num ${NUM_TO_GENERATE} --model_path ${PRETRAINED_MODEL_PATH} --output_path ${IMAGE_SAVE_DIR}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published