Skip to content
This repository has been archived by the owner on Aug 10, 2023. It is now read-only.

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
tanguis committed May 25, 2023
0 parents commit 8e46ab1
Show file tree
Hide file tree
Showing 79 changed files with 12,285 additions and 0 deletions.
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

120 changes: 120 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Video Face Manipulation Detection Through Ensemble of CNNs
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-dfdc)](https://paperswithcode.com/sota/deepfake-detection-on-dfdc?p=video-face-manipulation-detection-through)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-face-manipulation-detection-through/deepfake-detection-on-faceforensics-1)](https://paperswithcode.com/sota/deepfake-detection-on-faceforensics-1?p=video-face-manipulation-detection-through)
[![Build Status](https://travis-ci.org/polimi-ispl/icpr2020dfdc.svg?branch=master)](https://travis-ci.org/polimi-ispl/icpr2020dfdc)

![](assets/faces_attention.png)

<p align='center'>
<img src='assets/mqzvfufzoq_face.gif'/>
<img src='assets/mqzvfufzoq_face_att.gif'/>
</p>

This is the official repository of **Video Face Manipulation Detection Through Ensemble of CNNs**,
presented at [ICPR2020](https://www.micc.unifi.it/icpr2020/) and currently available on [IEEExplore](https://ieeexplore.ieee.org/document/9412711) and [arXiv](https://arxiv.org/abs/2004.07676).
If you use this repository for your research, please consider citing our paper. Refer to [How to cite](https://github.com/polimi-ispl/icpr2020dfdc#how-to-cite) section to get the correct entry for your bibliography.

We participated as the **ISPL** team in the [Kaggle Deepfake Detection Challenge](https://www.kaggle.com/c/deepfake-detection-challenge/).
With this implementation, we reached the 41st position over 2116 teams (**top 2%**) on the [private leaderboard](https://www.kaggle.com/c/deepfake-detection-challenge/leaderboard).

This repository is currently under maintenance, if you are experiencing any problems, please open an [issue](https://github.com/polimi-ispl/icpr2020dfdc/issues).
## Getting started

### Prerequisites
- Install [conda](https://docs.conda.io/en/latest/miniconda.html)
- Create the `icpr2020` environment with *environment.yml*
```bash
$ conda env create -f environment.yml
$ conda activate icpr2020
```
- Download and unzip the [datasets](#datasets)

### Quick run
If you just want to test the pre-trained models against your own videos or images:
- [Video prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Video%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/12WnvmerHBNbJ49HdoH1lli_O8SwaFPjv?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg">
</a>

- [Image prediction notebook](https://github.com/polimi-ispl/icpr2020dfdc/blob/master/notebook/Image%20prediction.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/19oVKlzEr58VZfRnSq-nW8kFYuxkh3GM8?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg">
</a>

- [Image prediction with attention](notebook/Image%20prediction%20and%20attention.ipynb) <a target="_blank" href="https://colab.research.google.com/drive/1zcglis2Qx2vtJhrogn8aKA-mbUotLZLK?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg">
</a>

### The whole pipeline
You need to preprocess the datasets in order to index all the samples and extract faces. Just run the script [make_dataset.sh](scripts/make_dataset.sh)

```bash
$ ./scripts/make_dataset.sh
```

Please note that we use only 32 frames per video. You can easily tweak this parameter in [extract_faces.py](extract_faces.py)
Also, please note that **for the DFDC** we have resorted to _the training split_ exclusively!
In `scripts/make_dataset.sh` the value of `DFDC_SRC` should point to the directory containing the DFDC train split.


### Celeb-DF (v2)
Altough **we did not use this dataset in the paper**, we provide a script [index_celebdf.py](index_celebdf.py) to index the videos similarly to
DFDC and FF++. Once you have the index, you can proceed with the pipeline starting from [extract_faces.py](extract_faces.py). You can also use the
split `celebdf` during training/testing.

### Train
In [train_all.sh](scripts/train_all.sh) you can find a comprehensive list of all the commands to train the models presented in the paper.
Please refer to the comments in the script for hints on their usage.

#### Training a single model
If you want to train some models without lunching the script:
- for the **non-siamese** architectures (e.g. EfficientNetB4, EfficientNetB4Att), you can simply specify the model in [train_binclass.py](train_binclass.py) with the *--net* parameter;
- for the **siamese** architectures (e.g. EfficientNetB4ST, EfficientNetB4AttST), you have to:
1. train the architecture as a feature extractor first, using the [train_triplet.py](train_triplet.py) script and being careful of specifying its name with the *--net* parameter **without** the ST suffix. For instance, for training the EfficientNetB4ST you will have to first run `python train_triplet.py --net EfficientNetB4 --otherparams`;
2. finetune the model using [train_binclass.py](train_binclass.py), being careful this time to specify the architecture's name **with** the ST suffix and to insert as *--init* argument the path to the weights of the feature extractor trained at the previous step. You will end up running something like `python train_binclass.py --net EfficientNetB4ST --init path/to/EfficientNetB4/weights/trained/with/train_triplet/weights.pth --otherparams`

### Test
In [test_all.sh](scripts/test_all.sh) you can find a comprehensive list of all the commands for testing the models presented in the paper.

#### Pretrained weights
We also provide pretrained weights for all the architectures presented in the paper.
Please refer to this [Dropbox link](https://www.dropbox.com/sh/cesamx5ytd5j08c/AADG_eEmhskliMaT0Gbk-yHDa?dl=0).
Each directory is named `$NETWORK_$DATASET` where `$NETWORK` is the architecture name and `$DATASET` is the training dataset.
In each directory, you can find `bestval.pth` which are the best network weights according to the validation set.


Additionally, you can find Jupyter notebooks for results computations in the [notebook](notebook) folder.


## Datasets
- [Facebook's DeepFake Detection Challenge (DFDC) train dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data) | [arXiv paper](https://arxiv.org/abs/2006.07397)
- [FaceForensics++](https://github.com/ondyari/FaceForensics/blob/master/dataset/README.md) | [arXiv paper](https://arxiv.org/abs/1901.08971)
- [Celeb-DF (v2)](http://www.cs.albany.edu/~lsw/celeb-deepfakeforensics.html) | [arXiv paper](https://arxiv.org/abs/1909.12962) (**Just for reference, not used in the paper**)

## References
- [EfficientNet PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch)
- [Xception PyTorch](https://github.com/tstandley/Xception-PyTorch)

## How to cite
Plain text:
```
N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation Detection Through Ensemble of CNNs," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 5012-5019, doi: 10.1109/ICPR48806.2021.9412711.
```

Bibtex:
```bibtex
@INPROCEEDINGS{9412711,
author={Bonettini, Nicolò and Cannas, Edoardo Daniele and Mandelli, Sara and Bondi, Luca and Bestagini, Paolo and Tubaro, Stefano},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
title={Video Face Manipulation Detection Through Ensemble of CNNs},
year={2021},
volume={},
number={},
pages={5012-5019},
doi={10.1109/ICPR48806.2021.9412711}}
```
## Credits
[Image and Sound Processing Lab - Politecnico di Milano](http://ispl.deib.polimi.it/)
- Nicolò Bonettini
- Edoardo Daniele Cannas
- Sara Mandelli
- Luca Bondi
- Paolo Bestagini
Empty file added architectures/__init__.py
Empty file.
1 change: 1 addition & 0 deletions architectures/externals/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .xception import xception
236 changes: 236 additions & 0 deletions architectures/externals/xception.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
"""
Ported to pytorch thanks to [tstandley](https://github.com/tstandley/Xception-PyTorch)
@author: tstandley
Adapted by cadene
Creates an Xception Model as defined in:
Francois Chollet
Xception: Deep Learning with Depthwise Separable Convolutions
https://arxiv.org/pdf/1610.02357.pdf
This weights ported from the Keras implementation. Achieves the following performance on the validation set:
Loss:0.9173 Prec@1:78.892 Prec@5:94.292
REMEMBER to set your image size to 3x299x299 for both test and validation
normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5])
The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299
"""
from __future__ import print_function, division, absolute_import

import torch.nn as nn
import torch.nn.functional as F
import torch.utils.model_zoo as model_zoo

__all__ = ['xception']

pretrained_settings = {
'xception': {
'imagenet': {
'url': 'http://data.lip6.fr/cadene/pretrainedmodels/xception-43020ad28.pth',
'input_space': 'RGB',
'input_size': [3, 299, 299],
'input_range': [0, 1],
'mean': [0.5, 0.5, 0.5],
'std': [0.5, 0.5, 0.5],
'num_classes': 1000,
'scale': 0.8975
# The resize parameter of the validation transform should be 333, and make sure to center crop at 299x299
}
}
}


class SeparableConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0, dilation=1, bias=False):
super(SeparableConv2d, self).__init__()

self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation, groups=in_channels,
bias=bias)
self.pointwise = nn.Conv2d(in_channels, out_channels, 1, 1, 0, 1, 1, bias=bias)

def forward(self, x):
x = self.conv1(x)
x = self.pointwise(x)
return x


class Block(nn.Module):
def __init__(self, in_filters, out_filters, reps, strides=1, start_with_relu=True, grow_first=True):
super(Block, self).__init__()

if out_filters != in_filters or strides != 1:
self.skip = nn.Conv2d(in_filters, out_filters, 1, stride=strides, bias=False)
self.skipbn = nn.BatchNorm2d(out_filters)
else:
self.skip = None

rep = []

filters = in_filters
if grow_first:
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(out_filters))
filters = out_filters

for i in range(reps - 1):
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(filters, filters, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(filters))

if not grow_first:
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(in_filters, out_filters, 3, stride=1, padding=1, bias=False))
rep.append(nn.BatchNorm2d(out_filters))

if not start_with_relu:
rep = rep[1:]
else:
rep[0] = nn.ReLU(inplace=False)

if strides != 1:
rep.append(nn.MaxPool2d(3, strides, 1))
self.rep = nn.Sequential(*rep)

def forward(self, inp):
x = self.rep(inp)

if self.skip is not None:
skip = self.skip(inp)
skip = self.skipbn(skip)
else:
skip = inp

x += skip
return x


class Xception(nn.Module):
"""
Xception optimized for the ImageNet dataset, as specified in
https://arxiv.org/pdf/1610.02357.pdf
"""

def __init__(self, num_classes=1000):
""" Constructor
Args:
num_classes: number of classes
"""
super(Xception, self).__init__()
self.num_classes = num_classes

self.conv1 = nn.Conv2d(3, 32, 3, 2, 0, bias=False)
self.bn1 = nn.BatchNorm2d(32)
self.relu1 = nn.ReLU(inplace=True)

self.conv2 = nn.Conv2d(32, 64, 3, bias=False)
self.bn2 = nn.BatchNorm2d(64)
self.relu2 = nn.ReLU(inplace=True)
# do relu here

self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True)
self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True)
self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True)

self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)

self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)
self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True)

self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False)

self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1)
self.bn3 = nn.BatchNorm2d(1536)
self.relu3 = nn.ReLU(inplace=True)

# do relu here
self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1)
self.bn4 = nn.BatchNorm2d(2048)

self.fc = nn.Linear(2048, num_classes)

# #------- init weights --------
# for m in self.modules():
# if isinstance(m, nn.Conv2d):
# n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
# m.weight.data.normal_(0, math.sqrt(2. / n))
# elif isinstance(m, nn.BatchNorm2d):
# m.weight.data.fill_(1)
# m.bias.data.zero_()
# #-----------------------------

def features(self, input):
x = self.conv1(input)
x = self.bn1(x)
x = self.relu1(x)

x = self.conv2(x)
x = self.bn2(x)
x = self.relu2(x)

x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = self.block6(x)
x = self.block7(x)
x = self.block8(x)
x = self.block9(x)
x = self.block10(x)
x = self.block11(x)
x = self.block12(x)

x = self.conv3(x)
x = self.bn3(x)
x = self.relu3(x)

x = self.conv4(x)
x = self.bn4(x)
return x

def logits(self, features):
x = nn.ReLU(inplace=True)(features)

x = F.adaptive_avg_pool2d(x, (1, 1))
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x

def forward(self, input):
x = self.features(input)
x = self.logits(x)
return x


def xception(num_classes=1000, pretrained='imagenet'):
model = Xception(num_classes=num_classes)
if pretrained:
settings = pretrained_settings['xception'][pretrained]
assert num_classes == settings['num_classes'], \
"num_classes should be {}, but is {}".format(settings['num_classes'], num_classes)

model = Xception(num_classes=num_classes)
model.load_state_dict(model_zoo.load_url(settings['url']))

model.input_space = settings['input_space']
model.input_size = settings['input_size']
model.input_range = settings['input_range']
model.mean = settings['mean']
model.std = settings['std']

# TODO: ugly
model.last_linear = model.fc
del model.fc
return model
Loading

0 comments on commit 8e46ab1

Please sign in to comment.