This repository has been archived by the owner on Aug 10, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
tanguis
committed
May 25, 2023
0 parents
commit 8e46ab1
Showing
79 changed files
with
12,285 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Video Face Manipulation Detection Through Ensemble of CNNs | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-dfdc)](https://paperswithcode.com/sota/deepfake-detection-on-dfdc?p=video-face-manipulation-detection-through) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-faceforensics-1)](https://paperswithcode.com/sota/deepfake-detection-on-faceforensics-1?p=video-face-manipulation-detection-through) | ||
[![Build Status](https://travis-ci.org/polimi-ispl/icpr2020dfdc.svg?branch=master)](https://travis-ci.org/polimi-ispl/icpr2020dfdc) | ||
|
||
![](assets/faces_attention.png) | ||
|
||
<p align='center'> | ||
<img src='assets/mqzvfufzoq_face.gif'/> | ||
<img src='assets/mqzvfufzoq_face_att.gif'/> | ||
</p> | ||
|
||
This is the official repository of **Video Face Manipulation Detection Through Ensemble of CNNs**, | ||
presented at [ICPR2020](https://www.micc.unifi.it/icpr2020/) and currently available on [IEEExplore](https://ieeexplore.ieee.org/document/9412711) and [arXiv](https://arxiv.org/abs/2004.07676). | ||
If you use this repository for your research, please consider citing our paper. Refer to [How to cite](https://github.com/polimi-ispl/icpr2020dfdc#how-to-cite) section to get the correct entry for your bibliography. | ||
|
||
We participated as the **ISPL** team in the [Kaggle Deepfake Detection Challenge](https://www.kaggle.com/c/deepfake-detection-challenge/). | ||
With this implementation, we reached the 41st position over 2116 teams (**top 2%**) on the [private leaderboard](https://www.kaggle.com/c/deepfake-detection-challenge/leaderboard). | ||
|
||
This repository is currently under maintenance, if you are experiencing any problems, please open an [issue](https://github.com/polimi-ispl/icpr2020dfdc/issues). | ||
## Getting started | ||
|
||
### Prerequisites | ||
- Install [conda](https://docs.conda.io/en/latest/miniconda.html) | ||
- Create the `icpr2020` environment with *environment.yml* | ||
```bash | ||
$ conda env create -f environment.yml | ||
$ conda activate icpr2020 | ||
``` | ||
- Download and unzip the [datasets](#datasets) | ||
|
||
### Quick run | ||
If you just want to test the pre-trained models against your own videos or images: | ||
- [Video prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Video%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/12WnvmerHBNbJ49HdoH1lli_O8SwaFPjv?usp=sharing"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg"> | ||
</a> | ||
|
||
- [Image prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Image%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/19oVKlzEr58VZfRnSq-nW8kFYuxkh3GM8?usp=sharing"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg"> | ||
</a> | ||
|
||
- [Image prediction with attention](notebook/Image%20prediction%20and%20attention.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/1zcglis2Qx2vtJhrogn8aKA-mbUotLZLK?usp=sharing"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg"> | ||
</a> | ||
|
||
### The whole pipeline | ||
You need to preprocess the datasets in order to index all the samples and extract faces. Just run the script [make_dataset.sh](scripts/make_dataset.sh) | ||
|
||
```bash | ||
$ ./scripts/make_dataset.sh | ||
``` | ||
|
||
Please note that we use only 32 frames per video. You can easily tweak this parameter in [extract_faces.py](extract_faces.py) | ||
Also, please note that **for the DFDC** we have resorted to _the training split_ exclusively! | ||
In `scripts/make_dataset.sh` the value of `DFDC_SRC` should point to the directory containing the DFDC train split. | ||
|
||
|
||
### Celeb-DF (v2) | ||
Altough **we did not use this dataset in the paper**, we provide a script [index_celebdf.py](index_celebdf.py) to index the videos similarly to | ||
DFDC and FF++. Once you have the index, you can proceed with the pipeline starting from [extract_faces.py](extract_faces.py). You can also use the | ||
split `celebdf` during training/testing. | ||
|
||
### Train | ||
In [train_all.sh](scripts/train_all.sh) you can find a comprehensive list of all the commands to train the models presented in the paper. | ||
Please refer to the comments in the script for hints on their usage. | ||
|
||
#### Training a single model | ||
If you want to train some models without lunching the script: | ||
- for the **non-siamese** architectures (e.g. EfficientNetB4, EfficientNetB4Att), you can simply specify the model in [train_binclass.py](train_binclass.py) with the *--net* parameter; | ||
- for the **siamese** architectures (e.g. EfficientNetB4ST, EfficientNetB4AttST), you have to: | ||
1. train the architecture as a feature extractor first, using the [train_triplet.py](train_triplet.py) script and being careful of specifying its name with the *--net* parameter **without** the ST suffix. For instance, for training the EfficientNetB4ST you will have to first run `python train_triplet.py --net EfficientNetB4 --otherparams`; | ||
2. finetune the model using [train_binclass.py](train_binclass.py), being careful this time to specify the architecture's name **with** the ST suffix and to insert as *--init* argument the path to the weights of the feature extractor trained at the previous step. You will end up running something like `python train_binclass.py --net EfficientNetB4ST --init path/to/EfficientNetB4/weights/trained/with/train_triplet/weights.pth --otherparams` | ||
|
||
### Test | ||
In [test_all.sh](scripts/test_all.sh) you can find a comprehensive list of all the commands for testing the models presented in the paper. | ||
|
||
#### Pretrained weights | ||
We also provide pretrained weights for all the architectures presented in the paper. | ||
Please refer to this [Dropbox link](https://www.dropbox.com/sh/cesamx5ytd5j08c/AADG_eEmhskliMaT0Gbk-yHDa?dl=0). | ||
Each directory is named `$NETWORK_$DATASET` where `$NETWORK` is the architecture name and `$DATASET` is the training dataset. | ||
In each directory, you can find `bestval.pth` which are the best network weights according to the validation set. | ||
|
||
|
||
Additionally, you can find Jupyter notebooks for results computations in the [notebook](notebook) folder. | ||
|
||
|
||
## Datasets | ||
- [Facebook's DeepFake Detection Challenge (DFDC) train dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data) | [arXiv paper](https://arxiv.org/abs/2006.07397) | ||
- [FaceForensics++](https://github.com/ondyari/FaceForensics/blob/master/dataset/README.md) | [arXiv paper](https://arxiv.org/abs/1901.08971) | ||
- [Celeb-DF (v2)](http://www.cs.albany.edu/~lsw/celeb-deepfakeforensics.html) | [arXiv paper](https://arxiv.org/abs/1909.12962) (**Just for reference, not used in the paper**) | ||
|
||
## References | ||
- [EfficientNet PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch) | ||
- [Xception PyTorch](https://github.com/tstandley/Xception-PyTorch) | ||
|
||
## How to cite | ||
Plain text: | ||
``` | ||
N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation Detection Through Ensemble of CNNs," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 5012-5019, doi: 10.1109/ICPR48806.2021.9412711. | ||
``` | ||
|
||
Bibtex: | ||
```bibtex | ||
@INPROCEEDINGS{9412711, | ||
author={Bonettini, Nicolò and Cannas, Edoardo Daniele and Mandelli, Sara and Bondi, Luca and Bestagini, Paolo and Tubaro, Stefano}, | ||
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)}, | ||
title={Video Face Manipulation Detection Through Ensemble of CNNs}, | ||
year={2021}, | ||
volume={}, | ||
number={}, | ||
pages={5012-5019}, | ||
doi={10.1109/ICPR48806.2021.9412711}} | ||
``` | ||
## Credits | ||
[Image and Sound Processing Lab - Politecnico di Milano](http://ispl.deib.polimi.it/) | ||
- Nicolò Bonettini | ||
- Edoardo Daniele Cannas | ||
- Sara Mandelli | ||
- Luca Bondi | ||
- Paolo Bestagini |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .xception import xception |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
""" | ||
Ported to pytorch thanks to [tstandley](https://github.com/tstandley/Xception-PyTorch) | ||
@author: tstandley | ||
Adapted by cadene | ||
Creates an Xception Model as defined in: | ||
Francois Chollet | ||
Xception: Deep Learning with Depthwise Separable Convolutions | ||
https://arxiv.org/pdf/1610.02357.pdf | ||
This weights ported from the Keras implementation. Achieves the following performance on the validation set: | ||
Loss:0.9173 Prec@1:78.892 Prec@5:94.292 | ||
REMEMBER to set your image size to 3x299x299 for both test and validation | ||
normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5], | ||
std=[0.5, 0.5, 0.5]) | ||
The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299 | ||
""" | ||
from __future__ import print_function, division, absolute_import | ||
|
||
import torch.nn as nn | ||
import torch.nn.functional as F | ||
import torch.utils.model_zoo as model_zoo | ||
|
||
__all__ = ['xception'] | ||
|
||
pretrained_settings = { | ||
'xception': { | ||
'imagenet': { | ||
'url': 'http://data.lip6.fr/cadene/pretrainedmodels/xception-43020ad28.pth', | ||
'input_space': 'RGB', | ||
'input_size': [3, 299, 299], | ||
'input_range': [0, 1], | ||
'mean': [0.5, 0.5, 0.5], | ||
'std': [0.5, 0.5, 0.5], | ||
'num_classes': 1000, | ||
'scale': 0.8975 | ||
# The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299 | ||
} | ||
} | ||
} | ||
|
||
|
||
class SeparableConv2d(nn.Module): | ||
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0, dilation=1, bias=False): | ||
super(SeparableConv2d, self).__init__() | ||
|
||
self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation, groups=in_channels, | ||
bias=bias) | ||
self.pointwise = nn.Conv2d(in_channels, out_channels, 1, 1, 0, 1, 1, bias=bias) | ||
|
||
def forward(self, x): | ||
x = self.conv1(x) | ||
x = self.pointwise(x) | ||
return x | ||
|
||
|
||
class Block(nn.Module): | ||
def __init__(self, in_filters, out_filters, reps, strides=1, start_with_relu=True, grow_first=True): | ||
super(Block, self).__init__() | ||
|
||
if out_filters != in_filters or strides != 1: | ||
self.skip = nn.Conv2d(in_filters, out_filters, 1, stride=strides, bias=False) | ||
self.skipbn = nn.BatchNorm2d(out_filters) | ||
else: | ||
self.skip = None | ||
|
||
rep = [] | ||
|
||
filters = in_filters | ||
if grow_first: | ||
rep.append(nn.ReLU(inplace=True)) | ||
rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False)) | ||
rep.append(nn.BatchNorm2d(out_filters)) | ||
filters = out_filters | ||
|
||
for i in range(reps - 1): | ||
rep.append(nn.ReLU(inplace=True)) | ||
rep.append(SeparableConv2d(filters, filters, 3, stride=1, padding=1, bias=False)) | ||
rep.append(nn.BatchNorm2d(filters)) | ||
|
||
if not grow_first: | ||
rep.append(nn.ReLU(inplace=True)) | ||
rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False)) | ||
rep.append(nn.BatchNorm2d(out_filters)) | ||
|
||
if not start_with_relu: | ||
rep = rep[1:] | ||
else: | ||
rep[0] = nn.ReLU(inplace=False) | ||
|
||
if strides != 1: | ||
rep.append(nn.MaxPool2d(3, strides, 1)) | ||
self.rep = nn.Sequential(*rep) | ||
|
||
def forward(self, inp): | ||
x = self.rep(inp) | ||
|
||
if self.skip is not None: | ||
skip = self.skip(inp) | ||
skip = self.skipbn(skip) | ||
else: | ||
skip = inp | ||
|
||
x += skip | ||
return x | ||
|
||
|
||
class Xception(nn.Module): | ||
""" | ||
Xception optimized for the ImageNet dataset, as specified in | ||
https://arxiv.org/pdf/1610.02357.pdf | ||
""" | ||
|
||
def __init__(self, num_classes=1000): | ||
""" Constructor | ||
Args: | ||
num_classes: number of classes | ||
""" | ||
super(Xception, self).__init__() | ||
self.num_classes = num_classes | ||
|
||
self.conv1 = nn.Conv2d(3, 32, 3, 2, 0, bias=False) | ||
self.bn1 = nn.BatchNorm2d(32) | ||
self.relu1 = nn.ReLU(inplace=True) | ||
|
||
self.conv2 = nn.Conv2d(32, 64, 3, bias=False) | ||
self.bn2 = nn.BatchNorm2d(64) | ||
self.relu2 = nn.ReLU(inplace=True) | ||
# do relu here | ||
|
||
self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True) | ||
self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True) | ||
self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True) | ||
|
||
self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
|
||
self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | ||
|
||
self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False) | ||
|
||
self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1) | ||
self.bn3 = nn.BatchNorm2d(1536) | ||
self.relu3 = nn.ReLU(inplace=True) | ||
|
||
# do relu here | ||
self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1) | ||
self.bn4 = nn.BatchNorm2d(2048) | ||
|
||
self.fc = nn.Linear(2048, num_classes) | ||
|
||
# #------- init weights -------- | ||
# for m in self.modules(): | ||
# if isinstance(m, nn.Conv2d): | ||
# n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels | ||
# m.weight.data.normal_(0, math.sqrt(2. / n)) | ||
# elif isinstance(m, nn.BatchNorm2d): | ||
# m.weight.data.fill_(1) | ||
# m.bias.data.zero_() | ||
# #----------------------------- | ||
|
||
def features(self, input): | ||
x = self.conv1(input) | ||
x = self.bn1(x) | ||
x = self.relu1(x) | ||
|
||
x = self.conv2(x) | ||
x = self.bn2(x) | ||
x = self.relu2(x) | ||
|
||
x = self.block1(x) | ||
x = self.block2(x) | ||
x = self.block3(x) | ||
x = self.block4(x) | ||
x = self.block5(x) | ||
x = self.block6(x) | ||
x = self.block7(x) | ||
x = self.block8(x) | ||
x = self.block9(x) | ||
x = self.block10(x) | ||
x = self.block11(x) | ||
x = self.block12(x) | ||
|
||
x = self.conv3(x) | ||
x = self.bn3(x) | ||
x = self.relu3(x) | ||
|
||
x = self.conv4(x) | ||
x = self.bn4(x) | ||
return x | ||
|
||
def logits(self, features): | ||
x = nn.ReLU(inplace=True)(features) | ||
|
||
x = F.adaptive_avg_pool2d(x, (1, 1)) | ||
x = x.view(x.size(0), -1) | ||
x = self.last_linear(x) | ||
return x | ||
|
||
def forward(self, input): | ||
x = self.features(input) | ||
x = self.logits(x) | ||
return x | ||
|
||
|
||
def xception(num_classes=1000, pretrained='imagenet'): | ||
model = Xception(num_classes=num_classes) | ||
if pretrained: | ||
settings = pretrained_settings['xception'][pretrained] | ||
assert num_classes == settings['num_classes'], \ | ||
"num_classes should be {}, but is {}".format(settings['num_classes'], num_classes) | ||
|
||
model = Xception(num_classes=num_classes) | ||
model.load_state_dict(model_zoo.load_url(settings['url'])) | ||
|
||
model.input_space = settings['input_space'] | ||
model.input_size = settings['input_size'] | ||
model.input_range = settings['input_range'] | ||
model.mean = settings['mean'] | ||
model.std = settings['std'] | ||
|
||
# TODO: ugly | ||
model.last_linear = model.fc | ||
del model.fc | ||
return model |
Oops, something went wrong.