first commit

TanguiS · May 25, 2023 · 8e46ab1 · 8e46ab1
commit 8e46ab1
Show file tree

Hide file tree

Showing 79 changed files with 12,285 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -0,0 +1,120 @@
+# Video Face Manipulation Detection Through Ensemble of CNNs
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-dfdc)](https://paperswithcode.com/sota/deepfake-detection-on-dfdc?p=video-face-manipulation-detection-through)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-faceforensics-1)](https://paperswithcode.com/sota/deepfake-detection-on-faceforensics-1?p=video-face-manipulation-detection-through)
+[![Build Status](https://travis-ci.org/polimi-ispl/icpr2020dfdc.svg?branch=master)](https://travis-ci.org/polimi-ispl/icpr2020dfdc)
+
+![](assets/faces_attention.png)
+
+<p align='center'>
+ <img src='assets/mqzvfufzoq_face.gif'/>
+ <img src='assets/mqzvfufzoq_face_att.gif'/>
+</p>
+
+This is the official repository of **Video Face Manipulation Detection Through Ensemble of CNNs**,
+presented at [ICPR2020](https://www.micc.unifi.it/icpr2020/) and currently available on [IEEExplore](https://ieeexplore.ieee.org/document/9412711) and [arXiv](https://arxiv.org/abs/2004.07676).
+If you use this repository for your research, please consider citing our paper. Refer to [How to cite](https://github.com/polimi-ispl/icpr2020dfdc#how-to-cite) section to get the correct entry for your bibliography.
+
+We participated as the **ISPL** team in the [Kaggle Deepfake Detection Challenge](https://www.kaggle.com/c/deepfake-detection-challenge/).
+With this implementation, we reached the 41st position over 2116 teams (**top 2%**) on the [private leaderboard](https://www.kaggle.com/c/deepfake-detection-challenge/leaderboard).
+
+This repository is currently under maintenance, if you are experiencing any problems, please open an [issue](https://github.com/polimi-ispl/icpr2020dfdc/issues).
+## Getting started
+
+### Prerequisites
+- Install [conda](https://docs.conda.io/en/latest/miniconda.html)
+- Create the `icpr2020` environment with *environment.yml*
+```bash
+$ conda env create -f environment.yml
+$ conda activate icpr2020
+```
+- Download and unzip the [datasets](#datasets)
+
+### Quick run
+If you just want to test the pre-trained models against your own videos or images:
+- [Video prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Video%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/12WnvmerHBNbJ49HdoH1lli_O8SwaFPjv?usp=sharing">
+ <img src="https://colab.research.google.com/assets/colab-badge.svg">
+</a>
+
+- [Image prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Image%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/19oVKlzEr58VZfRnSq-nW8kFYuxkh3GM8?usp=sharing">
+ <img src="https://colab.research.google.com/assets/colab-badge.svg">
+</a>
+
+- [Image prediction with attention](notebook/Image%20prediction%20and%20attention.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/1zcglis2Qx2vtJhrogn8aKA-mbUotLZLK?usp=sharing">
+ <img src="https://colab.research.google.com/assets/colab-badge.svg">
+</a>
+
+### The whole pipeline
+You need to preprocess the datasets in order to index all the samples and extract faces. Just run the script [make_dataset.sh](scripts/make_dataset.sh)
+
+```bash
+$ ./scripts/make_dataset.sh
+```
+
+Please note that we use only 32 frames per video. You can easily tweak this parameter in [extract_faces.py](extract_faces.py) 
+Also, please note that **for the DFDC** we have resorted to _the training split_ exclusively! 
+In `scripts/make_dataset.sh` the value of `DFDC_SRC` should point to the directory containing the DFDC train split.
+
+
+### Celeb-DF (v2)
+Altough **we did not use this dataset in the paper**, we provide a script [index_celebdf.py](index_celebdf.py) to index the videos similarly to 
+DFDC and FF++. Once you have the index, you can proceed with the pipeline starting from [extract_faces.py](extract_faces.py). You can also use the 
+split `celebdf` during training/testing.
+
+### Train
+In [train_all.sh](scripts/train_all.sh) you can find a comprehensive list of all the commands to train the models presented in the paper. 
+Please refer to the comments in the script for hints on their usage. 
+
+#### Training a single model
+If you want to train some models without lunching the script:
+- for the **non-siamese** architectures (e.g. EfficientNetB4, EfficientNetB4Att), you can simply specify the model in [train_binclass.py](train_binclass.py) with the *--net* parameter;
+- for the **siamese** architectures (e.g. EfficientNetB4ST, EfficientNetB4AttST), you have to:
+ 1. train the architecture as a feature extractor first, using the [train_triplet.py](train_triplet.py) script and being careful of specifying its name with the *--net* parameter **without** the ST suffix. For instance, for training the EfficientNetB4ST you will have to first run `python train_triplet.py --net EfficientNetB4 --otherparams`;
+ 2. finetune the model using [train_binclass.py](train_binclass.py), being careful this time to specify the architecture's name **with** the ST suffix and to insert as *--init* argument the path to the weights of the feature extractor trained at the previous step. You will end up running something like `python train_binclass.py --net EfficientNetB4ST --init path/to/EfficientNetB4/weights/trained/with/train_triplet/weights.pth --otherparams`
+
+### Test 
+In [test_all.sh](scripts/test_all.sh) you can find a comprehensive list of all the commands for testing the models presented in the paper. 
+
+#### Pretrained weights
+We also provide pretrained weights for all the architectures presented in the paper. 
+Please refer to this [Dropbox link](https://www.dropbox.com/sh/cesamx5ytd5j08c/AADG_eEmhskliMaT0Gbk-yHDa?dl=0).
+Each directory is named `$NETWORK_$DATASET` where `$NETWORK` is the architecture name and `$DATASET` is the training dataset.
+In each directory, you can find `bestval.pth` which are the best network weights according to the validation set.
+
+
+Additionally, you can find Jupyter notebooks for results computations in the [notebook](notebook) folder.
+
+
+## Datasets
+- [Facebook's DeepFake Detection Challenge (DFDC) train dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data) | [arXiv paper](https://arxiv.org/abs/2006.07397)
+- [FaceForensics++](https://github.com/ondyari/FaceForensics/blob/master/dataset/README.md) | [arXiv paper](https://arxiv.org/abs/1901.08971)
+- [Celeb-DF (v2)](http://www.cs.albany.edu/~lsw/celeb-deepfakeforensics.html) | [arXiv paper](https://arxiv.org/abs/1909.12962) (**Just for reference, not used in the paper**)
+
+## References
+- [EfficientNet PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch)
+- [Xception PyTorch](https://github.com/tstandley/Xception-PyTorch)
+
+## How to cite
+Plain text:
+```
+N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation Detection Through Ensemble of CNNs," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 5012-5019, doi: 10.1109/ICPR48806.2021.9412711.
+```
+
+Bibtex:
+```bibtex
+@INPROCEEDINGS{9412711,
+ author={Bonettini, Nicolò and Cannas, Edoardo Daniele and Mandelli, Sara and Bondi, Luca and Bestagini, Paolo and Tubaro, Stefano},
+ booktitle={2020 25th International Conference on Pattern Recognition (ICPR)}, 
+ title={Video Face Manipulation Detection Through Ensemble of CNNs}, 
+ year={2021},
+ volume={},
+ number={},
+ pages={5012-5019},
+ doi={10.1109/ICPR48806.2021.9412711}}
+```
+## Credits
+[Image and Sound Processing Lab - Politecnico di Milano](http://ispl.deib.polimi.it/)
+- Nicolò Bonettini
+- Edoardo Daniele Cannas
+- Sara Mandelli
+- Luca Bondi
+- Paolo Bestagini
diff --git a/architectures/__init__.py b/architectures/__init__.py
diff --git a/architectures/externals/__init__.py b/architectures/externals/__init__.py
@@ -0,0 +1 @@
+from .xception import xception
diff --git a/architectures/externals/xception.py b/architectures/externals/xception.py
@@ -0,0 +1,236 @@
+"""
+Ported to pytorch thanks to [tstandley](https://github.com/tstandley/Xception-PyTorch)
+
+@author: tstandley
+Adapted by cadene
+
+Creates an Xception Model as defined in:
+
+Francois Chollet
+Xception: Deep Learning with Depthwise Separable Convolutions
+https://arxiv.org/pdf/1610.02357.pdf
+
+This weights ported from the Keras implementation. Achieves the following performance on the validation set:
+
+Loss:0.9173 Prec@1:78.892 Prec@5:94.292
+
+REMEMBER to set your image size to 3x299x299 for both test and validation
+
+normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5],
+ std=[0.5, 0.5, 0.5])
+
+The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299
+"""
+from __future__ import print_function, division, absolute_import
+
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.model_zoo as model_zoo
+
+__all__ = ['xception']
+
+pretrained_settings = {
+ 'xception': {
+ 'imagenet': {
+ 'url': 'http://data.lip6.fr/cadene/pretrainedmodels/xception-43020ad28.pth',
+ 'input_space': 'RGB',
+ 'input_size': [3, 299, 299],
+ 'input_range': [0, 1],
+ 'mean': [0.5, 0.5, 0.5],
+ 'std': [0.5, 0.5, 0.5],
+ 'num_classes': 1000,
+ 'scale': 0.8975
+ # The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299
+ }
+ }
+}
+
+
+class SeparableConv2d(nn.Module):
+ def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0, dilation=1, bias=False):
+ super(SeparableConv2d, self).__init__()
+
+ self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation, groups=in_channels,
+ bias=bias)
+ self.pointwise = nn.Conv2d(in_channels, out_channels, 1, 1, 0, 1, 1, bias=bias)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.pointwise(x)
+ return x
+
+
+class Block(nn.Module):
+ def __init__(self, in_filters, out_filters, reps, strides=1, start_with_relu=True, grow_first=True):
+ super(Block, self).__init__()
+
+ if out_filters != in_filters or strides != 1:
+ self.skip = nn.Conv2d(in_filters, out_filters, 1, stride=strides, bias=False)
+ self.skipbn = nn.BatchNorm2d(out_filters)
+ else:
+ self.skip = None
+
+ rep = []
+
+ filters = in_filters
+ if grow_first:
+ rep.append(nn.ReLU(inplace=True))
+ rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False))
+ rep.append(nn.BatchNorm2d(out_filters))
+ filters = out_filters
+
+ for i in range(reps - 1):
+ rep.append(nn.ReLU(inplace=True))
+ rep.append(SeparableConv2d(filters, filters, 3, stride=1, padding=1, bias=False))
+ rep.append(nn.BatchNorm2d(filters))
+
+ if not grow_first:
+ rep.append(nn.ReLU(inplace=True))
+ rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False))
+ rep.append(nn.BatchNorm2d(out_filters))
+
+ if not start_with_relu:
+ rep = rep[1:]
+ else:
+ rep[0] = nn.ReLU(inplace=False)
+
+ if strides != 1:
+ rep.append(nn.MaxPool2d(3, strides, 1))
+ self.rep = nn.Sequential(*rep)
+
+ def forward(self, inp):
+ x = self.rep(inp)
+
+ if self.skip is not None:
+ skip = self.skip(inp)
+ skip = self.skipbn(skip)
+ else:
+ skip = inp
+
+ x += skip
+ return x
+
+
+class Xception(nn.Module):
+ """
+ Xception optimized for the ImageNet dataset, as specified in
+ https://arxiv.org/pdf/1610.02357.pdf
+ """
+
+ def __init__(self, num_classes=1000):
+ """ Constructor
+ Args:
+ num_classes: number of classes
+ """
+ super(Xception, self).__init__()
+ self.num_classes = num_classes
+
+ self.conv1 = nn.Conv2d(3, 32, 3, 2, 0, bias=False)
+ self.bn1 = nn.BatchNorm2d(32)
+ self.relu1 = nn.ReLU(inplace=True)
+
+ self.conv2 = nn.Conv2d(32, 64, 3, bias=False)
+ self.bn2 = nn.BatchNorm2d(64)
+ self.relu2 = nn.ReLU(inplace=True)
+ # do relu here
+
+ self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True)
+ self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True)
+ self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True)
+
+ self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+
+ self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+ self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
+
+ self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False)
+
+ self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1)
+ self.bn3 = nn.BatchNorm2d(1536)
+ self.relu3 = nn.ReLU(inplace=True)
+
+ # do relu here
+ self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1)
+ self.bn4 = nn.BatchNorm2d(2048)
+
+ self.fc = nn.Linear(2048, num_classes)
+
+ # #------- init weights --------
+ # for m in self.modules():
+ # if isinstance(m, nn.Conv2d):
+ # n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+ # m.weight.data.normal_(0, math.sqrt(2. / n))
+ # elif isinstance(m, nn.BatchNorm2d):
+ # m.weight.data.fill_(1)
+ # m.bias.data.zero_()
+ # #-----------------------------
+
+ def features(self, input):
+ x = self.conv1(input)
+ x = self.bn1(x)
+ x = self.relu1(x)
+
+ x = self.conv2(x)
+ x = self.bn2(x)
+ x = self.relu2(x)
+
+ x = self.block1(x)
+ x = self.block2(x)
+ x = self.block3(x)
+ x = self.block4(x)
+ x = self.block5(x)
+ x = self.block6(x)
+ x = self.block7(x)
+ x = self.block8(x)
+ x = self.block9(x)
+ x = self.block10(x)
+ x = self.block11(x)
+ x = self.block12(x)
+
+ x = self.conv3(x)
+ x = self.bn3(x)
+ x = self.relu3(x)
+
+ x = self.conv4(x)
+ x = self.bn4(x)
+ return x
+
+ def logits(self, features):
+ x = nn.ReLU(inplace=True)(features)
+
+ x = F.adaptive_avg_pool2d(x, (1, 1))
+ x = x.view(x.size(0), -1)
+ x = self.last_linear(x)
+ return x
+
+ def forward(self, input):
+ x = self.features(input)
+ x = self.logits(x)
+ return x
+
+
+def xception(num_classes=1000, pretrained='imagenet'):
+ model = Xception(num_classes=num_classes)
+ if pretrained:
+ settings = pretrained_settings['xception'][pretrained]
+ assert num_classes == settings['num_classes'], \
+ "num_classes should be {}, but is {}".format(settings['num_classes'], num_classes)
+
+ model = Xception(num_classes=num_classes)
+ model.load_state_dict(model_zoo.load_url(settings['url']))
+
+ model.input_space = settings['input_space']
+ model.input_size = settings['input_size']
+ model.input_range = settings['input_range']
+ model.mean = settings['mean']
+ model.std = settings['std']
+
+ # TODO: ugly
+ model.last_linear = model.fc
+ del model.fc
+ return model