GitHub - saraferreirascf/Photos-Videos-Manipulations-Dataset: Dataset for multimedia manipulation detection

Authors

Sara Ferreira - Department of Computer Science; Faculty of Sciences; University of Porto, Porto, Portugal; [email protected]
Mário Antunes - Computer Science and Communication Research Centre (CIIC), School of Technology and Management, Polytechnic of Leiria; Leiria; Portugal; [email protected]
INESC TEC, CRACS; Porto; Portugal
Manuel E. Correira - Department of Computer Science; Faculty of Sciences; University of Porto, Porto, Portugal; [email protected]
INESC TEC, CRACS; Porto; Portugal

Objects and faces manipulations dataset

This dataset represents a compilations of several dataset that contains real photos and video frames and forged ones. This forged photos and video frames contais several types of manipulation like copy-move, splicing and deepfake.

Name	Fake	Real
CelebA-HQ dataset	-	10000
Flickr-Faces-HQ dataset	-	10000
100K Faces Project	10000	-
This person does not exist	10000	-
COVERAGE dataset	97	97
Columbia Image Splicing Dataset	180	183
Dataset created by us	14	14
Celeb-DFv1*	795	158

*This dataset only contains videos. Between 3-4 fps were extracted from each video and added to the final dataset

The final dataset already labeled is available here

Features

Features extraction with Discrete Fourier Transform implementation
Images and videos classification with SVM-based model
Combines both objects and faces.

Experimental setup

In order to transform the simple dataset to a labeled dataset it is needed some pre-processing. The goal here is to use the photos and videos present in this dataset to classify other photos and videos. To achieve this, the first step is to extract features from each file. Afterwards, it will be possible to compare this features with the features of multimedia content target of investigation, inferring if they are manipulated or not. This features will be extracted using the method "Unmasking deepfake using Simple Features". To automate this feature extraction process a python script was created. To use this script it is needed to identify the folder where the files to extract features are and the number of files to be analyzed (normally the minimum between the two classes). After extracting the features of a photo or video frame, this file will be classified considering the folder where it is. All files in the folder "fake" are going to be classified with 0 and all files in the folder "real" will be classified with 1. After iterating through all the files, extracting all features and labeling, the result is a fully labeled dataset.

Publications

Ferreira, S., Antunes, M., & Correia, M. E. "Forensic analysis of tampered digital photos"; 25th Iberoamerican Congress on Pattern Recognition (CIARP); May 2021; Porto; Portugal; to be published in Springer Lecture Notes on Computer Science.
- Paper is available in the proceedings of the conference, at https://ciarp25.org/wp-content/uploads/sites/10/2021/05/CIARP25-Papers.pdf (accessed on 16 June 2021), pp.402-411.
Ferreira, S., Antunes, M., & Correia, M. E. (2021). Exposing Manipulated Photos and Videos in Digital Forensics Analysis. Journal of Imaging, 7(7), 102. doi:10.3390/jimaging7070102
Ferreira, S.; Antunes, M.; Correia, M.E. A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing. Data 2021, 6, 87. https://doi.org/10.3390/data6080087

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Results		Results
Scripts		Scripts
datasets		datasets
LICENSE		LICENSE
README.md		README.md
experimental_setup_cropped.png		experimental_setup_cropped.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authors

Objects and faces manipulations dataset

Features

Experimental setup

Publications

About

Releases

Packages

Languages

License

saraferreirascf/Photos-Videos-Manipulations-Dataset

Folders and files

Latest commit

History

Repository files navigation

Authors

Objects and faces manipulations dataset

Features

Experimental setup

Publications

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages