Skip to content

This is the repository to the paper "Flow based video matting" by Zijian Kuang and Xinran Tie. We proposed to use optical flow and UNet to generate a refined mask for video matting purpose.

License

Notifications You must be signed in to change notification settings

kuangzijian/Flow-Based-Video-Segmentation

 
 

Repository files navigation

Flow-based-Video-Segmentation-Algorithm

We proposed a novel flow-based encoder-decoder network to detect a human head and shoulders from a video and remove the background to create elegant media for videoconferencing and virtual reality applications.

This is the repository to the paper Flow-based Video Segmentation for Human Head and Shoulders by Zijian Kuang and Xinran Tie.

Paper

Getting Started

You will need Python 3.6 and the packages specified in requirements.txt. We recommend setting up a virtual environment with pip and installing the packages there. The correlation layer is implemented in CUDA using CuPy, which is why CuPy is a required dependency. It can be installed using pip install cupy or alternatively using one of the provided binary packages as outlined in the CuPy repository.

Install packages with:

$ pip install -r requirements.txt

Or install with for Windows as per PyTorch official site:

$ pip install torch===1.6.0 torchvision===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install -r requirements.txt

Dataset

We created our own video segmentation dataset. The dataset includes four online conference style green screen videos. We extracted the data from video and generated ground truth mask for each character, and then we applied virtual background to the frames as our training/testing dataset. You can download the dataset from this link . The data examples are shown as below:

Input image 1:1 Input image 2: 2
Ground truth 1:3 Ground truth 2:4

To use our code to generate more video segmentation data and groudtruth, you can use the functions in dataset_generator.py

Configure and Run the Code

To train our model:

  1. Create folder structure like the example shows in the picture below, and then dump the training data into the original_training folder, and dump the ground truth data into the ground_truth_training folder:

1

  1. Run the training code:
python funet_train.py

optional arguments:
  -h, --help            show this help message and exit
  -e E, --epochs E      Number of epochs (default: 10)
  -b [B], --batch-size [B]
                        Batch size (default: 1)
  -l [LR], --learning-rate [LR]
                        Learning rate (default: 0.0001)
  -f LOAD, --load LOAD  Load model from a .pth file (default: False)
  -s SCALE, --scale SCALE
                        Downscaling factor of the dataset (default: 1)
  -v VAL, --validation VAL
                        Percent of the data that is used as validation (0-100)
                        (default: 20.0)
  -g GPU, --gpu GPU     Set the gpu for cuda (default: 0)

To predict using our model:

  1. Dump the testing data into the original_testing folder, and dump the ground truth data into the ground_truth_testing folder.
  2. Run the predicting code:
python funet_predict.py

optional arguments:
  -h, --help            show this help message and exit
  --model FILE, -m FILE
                        Specify the file in which the model is stored
                        (default: checkpoints/CP_epoch7.pth)
  --img INPUT [INPUT ...], -img INPUT [INPUT ...]
                        Path of original image dataset (default:
                        dataset/original_testing/)
  --mask INPUT [INPUT ...], -mask INPUT [INPUT ...]
                        Path of ground truth mask dataset (default:
                        dataset/ground_truth_testing/)
  --output INPUT [INPUT ...], -o INPUT [INPUT ...]
                        path of ouput dataset (default: dataset/mask_output/)
  --no-viz, -v          No visualize the dataset as they are processed
                        (default: False)
  --no-save, -n         Do not save the output masks (default: False)
  --no-eval, -e         Do not run evaluation. (default: False)
  --mask-threshold MASK_THRESHOLD, -t MASK_THRESHOLD
                        Minimum probability value to consider a mask pixel
                        white (default: 0.5)
  --scale SCALE, -s SCALE
                        Scale factor for the input dataset (default: 1)
  -g GPU, --gpu GPU     Set the gpu for cuda (default: 0)

Credits

We want to thank the work of the pythorch-pwc that implemented by sniklaus, we have used the pytorch-pwc to estimate optical flow in our project.

Citation

[1]  @inproceedings{Sun_CVPR_2018,
         author = {Deqing Sun and Xiaodong Yang and Ming-Yu Liu and Jan Kautz},
         title = {{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},
         booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
         year = {2018}
     }
[2]  @misc{pytorch-pwc,
         author = {Simon Niklaus},
         title = {A Reimplementation of {PWC-Net} Using {PyTorch}},
         year = {2018},
         howpublished = {\url{https://github.com/sniklaus/pytorch-pwc}}
    }
[3]  @misc{U-Net,
         author = {Olaf Ronneberger, Philipp Fischer, Thomas Brox},
         title = {U-Net: Convolutional Networks for Biomedical Image Segmentation},
         year = {2015},
         howpublished = {\url{https://arxiv.org/abs/1505.04597}}
    }

License

This project is licensed under the MIT License.

About

This is the repository to the paper "Flow based video matting" by Zijian Kuang and Xinran Tie. We proposed to use optical flow and UNet to generate a refined mask for video matting purpose.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%