add colorization and inpainting.

jackviv · Apr 9, 2023 · 4d7b6e8 · 4d7b6e8
1 parent 50489a6
commit 4d7b6e8
Show file tree

Hide file tree

Showing 24 changed files with 258 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -22,22 +22,17 @@ S-Lab, Nanyang Technological University
 
 **[<font color=#d1585d>News</font>]**: :whale: *We regret to inform you that the release of our code will be postponed from its earlier plan. Nevertheless, we assure you that it will be made available **by the end of this April**. Thank you for your understanding and patience. Our apologies for any inconvenience this may cause.* 
 ### Update
+- **2023.04.09**: Add features of inpainting and colorization for cropped and aligned face images.
 - **2023.02.10**: Include `dlib` as a new face detector option, it produces more accurate face identity.
-- **2022.10.05**: Support video input `--input_path [YOUR_VIDOE.mp4]`. Try it to enhance your videos! :clapper: 
+- **2022.10.05**: Support video input `--input_path [YOUR_VIDEO.mp4]`. Try it to enhance your videos! :clapper: 
 - **2022.09.14**: Integrated to :hugs: [Hugging Face](https://huggingface.co/spaces). Try out online demo! [![Hugging Face](https://img.shields.io/badge/Demo-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/sczhou/CodeFormer)
 - **2022.09.09**: Integrated to :rocket: [Replicate](https://replicate.com/explore). Try out online demo! [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/sczhou/codeformer)
-- **2022.09.04**: Add face upsampling `--face_upsample` for high-resolution AI-created face enhancement.
-- **2022.08.23**: Some modifications on face detection and fusion for better AI-created face enhancement.
-- **2022.08.07**: Integrate [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) to support background image enhancement.
-- **2022.07.29**: Integrate new face detectors of `['RetinaFace'(default), 'YOLOv5']`. 
-- **2022.07.17**: Add Colab demo of CodeFormer. <a href="https://colab.research.google.com/drive/1m52PNveE4PBhYrecj34cnpEeiHcC5LTb?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>
-- **2022.07.16**: Release inference code for face restoration. :blush:
-- **2022.06.21**: This repo is created.
+- [**More**](docs/history_changelog.md)
 
 ### TODO
-- [ ] Add checkpoint for face inpainting
-- [ ] Add checkpoint for face colorization
 - [ ] Add training code and config files
+- [x] Add checkpoint and script for face inpainting
+- [x] Add checkpoint and script for face colorization
 - [x] ~~Add background image enhancement~~
 
 #### :panda_face: Try Enhancing Old Photos / Fixing AI-arts
@@ -75,34 +70,41 @@ conda activate codeformer
 # install python dependencies
 pip3 install -r requirements.txt
 python basicsr/setup.py develop
-conda install -c conda-forge dlib (only for dlib face detector)
+conda install -c conda-forge dlib (only for face detection or cropping with dlib)
 ```
 <!-- conda install -c conda-forge dlib -->
 
 ### Quick Inference
 
 #### Download Pre-trained Models:
-Download the facelib and dlib pretrained models from [[Google Drive](https://drive.google.com/drive/folders/1b_3qwrzY_kTQh0-SnBoGBgOrJ_PLZSKm?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EvDxR7FcAbZMp_MA9ouq7aQB8XTppMb3-T0uGZ_2anI2mg?e=DXsJFo)] to the `weights/facelib` folder. You can manually download the pretrained models OR download by running the following command.
+Download the facelib and dlib pretrained models from [[Releases](https://github.com/sczhou/CodeFormer/releases) | [Google Drive](https://drive.google.com/drive/folders/1b_3qwrzY_kTQh0-SnBoGBgOrJ_PLZSKm?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EvDxR7FcAbZMp_MA9ouq7aQB8XTppMb3-T0uGZ_2anI2mg?e=DXsJFo)] to the `weights/facelib` folder. You can manually download the pretrained models OR download by running the following command:
 ```
 python scripts/download_pretrained_models.py facelib
 python scripts/download_pretrained_models.py dlib (only for dlib face detector)
 ```
 
-Download the CodeFormer pretrained models from [[Google Drive](https://drive.google.com/drive/folders/1CNNByjHDFt0b95q54yMVp6Ifo5iuU6QS?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EoKFj4wo8cdIn2-TY2IV6CYBhZ0pIG4kUOeHdPR_A5nlbg?e=AO8UN9)] to the `weights/CodeFormer` folder. You can manually download the pretrained models OR download by running the following command.
+Download the CodeFormer pretrained models from [[Releases](https://github.com/sczhou/CodeFormer/releases) | [Google Drive](https://drive.google.com/drive/folders/1CNNByjHDFt0b95q54yMVp6Ifo5iuU6QS?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EoKFj4wo8cdIn2-TY2IV6CYBhZ0pIG4kUOeHdPR_A5nlbg?e=AO8UN9)] to the `weights/CodeFormer` folder. You can manually download the pretrained models OR download by running the following command:
 ```
 python scripts/download_pretrained_models.py CodeFormer
 ```
 
 #### Prepare Testing Data:
-You can put the testing images in the `inputs/TestWhole` folder. If you would like to test on cropped and aligned faces, you can put them in the `inputs/cropped_faces` folder.
+You can put the testing images in the `inputs/TestWhole` folder. If you would like to test on cropped and aligned faces, you can put them in the `inputs/cropped_faces` folder. You can get the cropped and aligned faces by running the following command:
+```
+# you may need to install dlib via: conda install -c conda-forge dlib
+python scripts/crop_align_face.py -i [input folder] -o [output folder]
+```
 
 
-#### Testing on Face Restoration:
+#### Testing:
 [Note] If you want to compare CodeFormer in your paper, please run the following command indicating `--has_aligned` (for cropped and aligned face), as the command for the whole image will involve a process of face-background fusion that may damage hair texture on the boundary, which leads to unfair comparison.
 
+Fidelity weight *w* lays in [0, 1]. Generally, smaller *w* tends to produce a higher-quality result, while larger *w* yields a higher-fidelity result. The results will be saved in the `results` folder.
+
+
 🧑🏻 Face Restoration (cropped and aligned face)
 ```
-# For cropped and aligned faces
+# For cropped and aligned faces (512x512)
 python inference_codeformer.py -w 0.5 --has_aligned --input_path [image folder]|[image path]
 ```
 
@@ -121,14 +123,25 @@ conda install -c conda-forge ffmpeg
 ```
 ```
 # For video clips
-# video path should end with '.mp4'|'.mov'|'.avi'
+# Video path should end with '.mp4'|'.mov'|'.avi'
 python inference_codeformer.py --bg_upsampler realesrgan --face_upsample -w 1.0 --input_path [video path]
 ```
 
+🌈 Face Colorization (cropped and aligned face)
+```
+# For cropped and aligned faces (512x512)
+# Colorize black and white or faded photo
+python inference_inpainting.py --input_path [image folder]|[image path]
+```
 
-Fidelity weight *w* lays in [0, 1]. Generally, smaller *w* tends to produce a higher-quality result, while larger *w* yields a higher-fidelity result. 
+🎨 Face Inpainting (cropped and aligned face)
+```
+# For cropped and aligned faces (512x512)
+# Inputs could be masked by white brush using a image editing app, e.g., Photoshop 
+# (check out the examples in inputs/masked_faces)
+python inference_colorization.py --input_path [image folder]|[image path]
+```
 
-The results will be saved in the `results` folder.
 
 ### Citation
 If our work is useful for your research, please consider citing:

diff --git a/basicsr/utils/logger.py b/basicsr/utils/logger.py
@@ -67,10 +67,10 @@ def __call__(self, log_vars):
             message += f'{k}: {v:.4e} '
             # tensorboard logger
             if self.use_tb_logger:
-                if k.startswith('l_'):
-                    self.tb_logger.add_scalar(f'losses/{k}', v, current_iter)
-                else:
-                    self.tb_logger.add_scalar(k, v, current_iter)
+                # if k.startswith('l_'):
+                #     self.tb_logger.add_scalar(f'losses/{k}', v, current_iter)
+                # else:
+                self.tb_logger.add_scalar(k, v, current_iter)
         self.logger.info(message)
 
 

diff --git a/docs/history_changelog.md b/docs/history_changelog.md
@@ -0,0 +1,14 @@
+# History of Changelog
+
+- **2023.04.09**: Add features of inpainting and colorization for cropped face images.
+- **2023.02.10**: Include `dlib` as a new face detector option, it produces more accurate face identity.
+- **2022.10.05**: Support video input `--input_path [YOUR_VIDEO.mp4]`. Try it to enhance your videos! :clapper: 
+- **2022.09.14**: Integrated to :hugs: [Hugging Face](https://huggingface.co/spaces). Try out online demo! [![Hugging Face](https://img.shields.io/badge/Demo-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/sczhou/CodeFormer)
+- **2022.09.09**: Integrated to :rocket: [Replicate](https://replicate.com/explore). Try out online demo! [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/sczhou/codeformer)
+- **2022.09.04**: Add face upsampling `--face_upsample` for high-resolution AI-created face enhancement.
+- **2022.08.23**: Some modifications on face detection and fusion for better AI-created face enhancement.
+- **2022.08.07**: Integrate [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) to support background image enhancement.
+- **2022.07.29**: Integrate new face detectors of `['RetinaFace'(default), 'YOLOv5']`. 
+- **2022.07.17**: Add Colab demo of CodeFormer. <a href="https://colab.research.google.com/drive/1m52PNveE4PBhYrecj34cnpEeiHcC5LTb?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>
+- **2022.07.16**: Release inference code for face restoration. :blush:
+- **2022.06.21**: This repo is created.
diff --git a/inference_colorization.py b/inference_colorization.py
@@ -0,0 +1,86 @@
+import os
+import cv2
+import argparse
+import glob
+import torch
+from torchvision.transforms.functional import normalize
+from basicsr.utils import imwrite, img2tensor, tensor2img
+from basicsr.utils.download_util import load_file_from_url
+from basicsr.utils.misc import get_device
+from basicsr.utils.registry import ARCH_REGISTRY
+
+pretrain_model_url = 'https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer_colorization.pth'
+
+if __name__ == '__main__':
+    # device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    device = get_device()
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('-i', '--input_path', type=str, default='./inputs/gray_faces', 
+                    help='Input image or folder. Default: inputs/gray_faces')
+    parser.add_argument('-o', '--output_path', type=str, default=None, 
+                    help='Output folder. Default: results/<input_name>')
+    parser.add_argument('--suffix', type=str, default=None, 
+                    help='Suffix of the restored faces. Default: None')
+    args = parser.parse_args()
+
+    # ------------------------ input & output ------------------------
+    print('[NOTE] The input face images should be aligned and cropped to a resolution of 512x512.')
+    if args.input_path.endswith(('jpg', 'jpeg', 'png', 'JPG', 'JPEG', 'PNG')): # input single img path
+        input_img_list = [args.input_path]
+        result_root = f'results/test_colorization_img'
+    else: # input img folder
+        if args.input_path.endswith('/'):  # solve when path ends with /
+            args.input_path = args.input_path[:-1]
+        # scan all the jpg and png images
+        input_img_list = sorted(glob.glob(os.path.join(args.input_path, '*.[jpJP][pnPN]*[gG]')))
+        result_root = f'results/{os.path.basename(args.input_path)}'
+
+    if not args.output_path is None: # set output path
+        result_root = args.output_path
+
+    test_img_num = len(input_img_list)
+
+    # ------------------ set up CodeFormer restorer -------------------
+    net = ARCH_REGISTRY.get('CodeFormer')(dim_embd=512, codebook_size=1024, n_head=8, n_layers=9, 
+                                            connect_list=['32', '64', '128']).to(device)
+
+    # ckpt_path = 'weights/CodeFormer/codeformer.pth'
+    ckpt_path = load_file_from_url(url=pretrain_model_url, 
+                                    model_dir='weights/CodeFormer', progress=True, file_name=None)
+    checkpoint = torch.load(ckpt_path)['params_ema']
+    net.load_state_dict(checkpoint)
+    net.eval()
+
+    # -------------------- start to processing ---------------------
+    for i, img_path in enumerate(input_img_list):
+        img_name = os.path.basename(img_path)
+        basename, ext = os.path.splitext(img_name)
+        print(f'[{i+1}/{test_img_num}] Processing: {img_name}')
+        input_face = cv2.imread(img_path)
+        assert input_face.shape[:2] == (512, 512), 'Input resolution must be 512x512 for colorization.'
+        # input_face = cv2.resize(input_face, (512, 512), interpolation=cv2.INTER_LINEAR)
+        input_face = img2tensor(input_face / 255., bgr2rgb=True, float32=True)
+        normalize(input_face, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
+        input_face = input_face.unsqueeze(0).to(device)
+        try:
+            with torch.no_grad():
+                # w is fixed to 0 since we didn't train the Stage III for colorization
+                output_face = net(input_face, w=0, adain=True)[0] 
+                save_face = tensor2img(output_face, rgb2bgr=True, min_max=(-1, 1))
+            del output_face
+            torch.cuda.empty_cache()
+        except Exception as error:
+            print(f'\tFailed inference for CodeFormer: {error}')
+            save_face = tensor2img(input_face, rgb2bgr=True, min_max=(-1, 1))
+
+        save_face = save_face.astype('uint8')
+
+        # save face
+        if args.suffix is not None:
+            basename = f'{basename}_{args.suffix}'
+        save_restore_path = os.path.join(result_root, f'{basename}.png')
+        imwrite(save_face, save_restore_path)
+
+    print(f'\nAll results are saved in {result_root}')
+
diff --git a/inference_inpainting.py b/inference_inpainting.py
@@ -0,0 +1,91 @@
+import os
+import cv2
+import argparse
+import glob
+import torch
+from torchvision.transforms.functional import normalize
+from basicsr.utils import imwrite, img2tensor, tensor2img
+from basicsr.utils.download_util import load_file_from_url
+from basicsr.utils.misc import get_device
+from basicsr.utils.registry import ARCH_REGISTRY
+
+pretrain_model_url = 'https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer_inpainting.pth'
+
+if __name__ == '__main__':
+    # device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    device = get_device()
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('-i', '--input_path', type=str, default='./inputs/masked_faces', 
+                    help='Input image or folder. Default: inputs/masked_faces')
+    parser.add_argument('-o', '--output_path', type=str, default=None, 
+                    help='Output folder. Default: results/<input_name>')
+    parser.add_argument('--suffix', type=str, default=None, 
+                    help='Suffix of the restored faces. Default: None')
+    args = parser.parse_args()
+
+    # ------------------------ input & output ------------------------
+    print('[NOTE] The input face images should be aligned and cropped to a resolution of 512x512.')
+    if args.input_path.endswith(('jpg', 'jpeg', 'png', 'JPG', 'JPEG', 'PNG')): # input single img path
+        input_img_list = [args.input_path]
+        result_root = f'results/test_inpainting_img'
+    else: # input img folder
+        if args.input_path.endswith('/'):  # solve when path ends with /
+            args.input_path = args.input_path[:-1]
+        # scan all the jpg and png images
+        input_img_list = sorted(glob.glob(os.path.join(args.input_path, '*.[jpJP][pnPN]*[gG]')))
+        result_root = f'results/{os.path.basename(args.input_path)}'
+
+    if not args.output_path is None: # set output path
+        result_root = args.output_path
+
+    test_img_num = len(input_img_list)
+
+    # ------------------ set up CodeFormer restorer -------------------
+    net = ARCH_REGISTRY.get('CodeFormer')(dim_embd=512, codebook_size=512, n_head=8, n_layers=9, 
+                                            connect_list=['32', '64', '128']).to(device)
+
+    # ckpt_path = 'weights/CodeFormer/codeformer.pth'
+    ckpt_path = load_file_from_url(url=pretrain_model_url, 
+                                    model_dir='weights/CodeFormer', progress=True, file_name=None)
+    checkpoint = torch.load(ckpt_path)['params_ema']
+    net.load_state_dict(checkpoint)
+    net.eval()
+
+    # -------------------- start to processing ---------------------
+    for i, img_path in enumerate(input_img_list):
+        img_name = os.path.basename(img_path)
+        basename, ext = os.path.splitext(img_name)
+        print(f'[{i+1}/{test_img_num}] Processing: {img_name}')
+        input_face = cv2.imread(img_path)
+        assert input_face.shape[:2] == (512, 512), 'Input resolution must be 512x512 for inpainting.'
+        # input_face = cv2.resize(input_face, (512, 512), interpolation=cv2.INTER_LINEAR)
+        input_face = img2tensor(input_face / 255., bgr2rgb=True, float32=True)
+        normalize(input_face, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True)
+        input_face = input_face.unsqueeze(0).to(device)
+        try:
+            with torch.no_grad():
+                mask = torch.zeros(512, 512)
+                m_ind = torch.sum(input_face[0], dim=0)
+                mask[m_ind==3] = 1.0
+                mask = mask.view(1, 1, 512, 512).to(device)
+                # w is fixed to 1, adain=False for inpainting
+                output_face = net(input_face, w=1, adain=False)[0]
+                output_face = (1-mask)*input_face + mask*output_face
+                save_face = tensor2img(output_face, rgb2bgr=True, min_max=(-1, 1))
+            del output_face
+            torch.cuda.empty_cache()
+        except Exception as error:
+            print(f'\tFailed inference for CodeFormer: {error}')
+            save_face = tensor2img(input_face, rgb2bgr=True, min_max=(-1, 1))
+
+        save_face = save_face.astype('uint8')
+
+        # save face
+        if args.suffix is not None:
+            basename = f'{basename}_{args.suffix}'
+        save_restore_path = os.path.join(result_root, f'{basename}.png')
+        imwrite(save_face, save_restore_path)
+
+    print(f'\nAll results are saved in {result_root}')
+
diff --git a/inputs/gray_faces/067_David_Beckham_00.png b/inputs/gray_faces/067_David_Beckham_00.png
diff --git a/inputs/gray_faces/089_Miley_Cyrus_00.png b/inputs/gray_faces/089_Miley_Cyrus_00.png
diff --git a/inputs/gray_faces/099_Victoria_Beckham_00.png b/inputs/gray_faces/099_Victoria_Beckham_00.png
diff --git a/inputs/gray_faces/111_Alexa_Chung_00.png b/inputs/gray_faces/111_Alexa_Chung_00.png
diff --git a/inputs/gray_faces/132_Robert_Downey_Jr_00.png b/inputs/gray_faces/132_Robert_Downey_Jr_00.png
diff --git a/inputs/gray_faces/158_Jimmy_Fallon_00.png b/inputs/gray_faces/158_Jimmy_Fallon_00.png
diff --git a/inputs/gray_faces/161_Zac_Efron_00.png b/inputs/gray_faces/161_Zac_Efron_00.png
diff --git a/inputs/gray_faces/169_John_Lennon_00.png b/inputs/gray_faces/169_John_Lennon_00.png
diff --git a/inputs/gray_faces/170_Marilyn_Monroe_00.png b/inputs/gray_faces/170_Marilyn_Monroe_00.png
diff --git a/inputs/gray_faces/Einstein01.png b/inputs/gray_faces/Einstein01.png
diff --git a/inputs/gray_faces/Einstein02.png b/inputs/gray_faces/Einstein02.png
diff --git a/inputs/gray_faces/Hepburn01.png b/inputs/gray_faces/Hepburn01.png
diff --git a/inputs/gray_faces/Hepburn02.png b/inputs/gray_faces/Hepburn02.png
diff --git a/inputs/masked_faces/00105.png b/inputs/masked_faces/00105.png
diff --git a/inputs/masked_faces/00108.png b/inputs/masked_faces/00108.png
diff --git a/inputs/masked_faces/00169.png b/inputs/masked_faces/00169.png
diff --git a/inputs/masked_faces/00588.png b/inputs/masked_faces/00588.png
diff --git a/inputs/masked_faces/00664.png b/inputs/masked_faces/00664.png