This is the official implementation of the paper Revisiting Tampered Scene Text Detection in the Era of Generative AI. paper
The Open-Set Text Forensics (OSTF) dataset is now publicly available at Google Drive and Baidu Drive.
Researchers are welcome 😃 to apply for this dataset by sending an email to [email protected] (with institution email address) and introducing:
- Who you are and your institution.
- Who is your supervisor/mentor.
- Apply, download and unzip the OSTF dataset.
- Move all the 18 *.pk files from the mmacc_pks dir into the mmacc dir.
- Move the mmacc dir into this main dir. Finally, after the above 3 steps, in this main dir, you will get such dir structre:
FBCNN---...
|
configs---...
|
mmcv_custom---...
|
mmdet---...
|
tools---...
|
mmacc---srnet---...
|
srnet_train.pk
|
srnet_test.pk
|
anytext---...
|
anytext_train.pk
|
anytext_test.pk
|
...
- Download and unzip the pretrain_pk.zip in this dir. After unzip, you will get a new dir named "pretrain" with 7 sub-dirs (ArT, ICDAR2013, ICDAR2015, ICDAR2017-MLT, LSVT, ReCTS, TextOCR).
- Download and unzip the msk.zip in the new "pretrain" dir. After unzip, you will get 7 new dirs all named 'msk' under the above 7 sub-dirs.
- Download the dataset training set images from ArT, ICDAR2013 (Task 2.4: End to End (2015 edition)), ICDAR2015, ICDAR2017-MLT, LSVT (train_full_images_0/1.tar.gz 4.1G), ReCTS, TextOCR.
- Rename the 7 downloaded image dirs into an "img" dir under the 7 sub-dirs. For example, "mv [Your downloaded ArT train images] pretrain/ArT/img" and "mv [Your downloaded ReCTS train images] pretrain/ReCTS/img".
- Make a new dir named "revjpegs" in this main dir, and make "pretrain" dir and sub-dirs to make sure that the dir "revjpegs" has the same sub-dir structure as the "pretrain" dir. For example, it should has the dirs "revjpegs/pretrain/ArT/img" and "revjpegs/pretrain/ReCTS/img", etc, corresponding to "pretrain/ArT/img" and "pretrain/ReCTS/img" respectively.
- Download the fbcnn_color.pth following this Readme.md. In the FBCNN dir, run the command to create reverse jpeg images for each of the 7 sub-dir images of the pretrain dir. For example, run "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ArT/img/ --out revjpegs/ArT/img/" and "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ReCTS/img/ --out revjpegs/ReCTS/img/".
Finally, after the above 5 steps, in this main dir, you will get such dir structre:
FBCNN---...
|
configs---...
|
pretrain---ArT---img---....
| | |
| | train.pk
| |
| ICDAR2015---img---...
| | |
| | train.pk
| |
| ...
|
revjpeg---pretrain---ArT---img---....
| | |
| | train.pk
| |
| ICDAR2015---img---...
| | |
| | train.pk
| |
| ...
|
mmcv_custom---...
|
mmdet---...
|
tools---...
|
mmacc---...
The Texture Jitter method is implemented as "TextureSG" in "txt_pipeline" of the config files (e.g. here), its source code is here. The key function for the Texture Jitter method is the function "img_tamper" in Line450.
The DAF framework is implemented as DFPNCMap3 and CascadeCMap3 for Faster R-CNN and Cascade R-CNN respectively.
DAF key implementation (take the Faster R-CNN based DAF as an example):
- Line47 Authentic Kernel implementation. The variable "self.sgl" implements the Authentic Kernel (the variable "self.C" in Line17) and its loss function (this forward function in Line21).
- Line379 Authentic Kernel Modulation implements the modulation between te Authentic Kernel (the variable "self.sgl.C") and the global features (the variable "gloabl_feats"). In this line, the resulting variable "gloabl_feats" is the modulated authentic kernel.
- Line324 Training model to learn real/fake classification with feature difference. During training, the feature difference between each RoI vector (the variable "mskf" in this line) and the modulated authentic kernel (the variable "glb_feats" in this line) is obtained by "mskf - glb_feats[gt_valid]". Then, this feature difference vector is fed into a fully-connected layer for final real/fake prediction as "self.fc(mskf - glb_feats[gt_valid])". During training, in this Line324, the loss between model prediction "self.fc(mskf - glb_feats[gt_valid])" and the ground-truth "gt_label[gt_valid].long()" is calculated to help the model learn real/fake classification.
- Line548 Model predicts real/fake with feature difference. In this line, the modulated authentic kernel is the variables "g" and "glb_feats", the input RoI feature vectors are "m" and "mask_feats". The feature difference is obtained by "(self.convert(m)-g)" and the final classification score is obtained by feeding it to the final binary classifier and softmax layer F.softmax(self.fc(self.convert(m)-g),1).
Enviroment based on Python 3.9.12
pip install -r requirements.txt
bash tools/dist_train.sh [your config_py_file] [Your gpu number, should be consistent with the dist_train.sh]
The config files are in the config dir, all the config files following such name roles: ModelType_AblationType+TrainData.py. For the AblationType, o is for the original one without Texture Jitter Pre-training and x is the one with the Texture Jitter Pre-training. For example, fasterrcnn_xsrnet is the Faster R-CNN model pre-trained with Texture Jitter, and fine-tuned with the SR-Net training data and the Texture Jitter method.
Given a config file that contains the model defination you want to pretrain (e.g. cascade R-CNN):
- Modify the datasets line of the train dataloader, delete the fine-tune data. For example, in cascade_xsrnet.py, modify the Line435 from "datasets = [ptdatas, ftdatas]," into "datasets = [ptdatas,],".
- Modify the datasets line of the "pt_data". For example, in cascade_xsrnet.py, modify the Line412 from "datasets = [ic13,ic15,ic17]," into "datasets = [ic13,ic15,ic17,art,rects,lsvt,textocrpt],".
- Modify the pre-trained weight to initialize. We use the official COCO-pretrained backbone and detection modules (RPN, RoI Heads) for initialization in the pre-training stage. For example, in cascade_xsrnet.py, modify the Line602 from 'cascade.pth' into your initial weights path (e.g. "rcnn_swin.pth"): The initial weights for Cascade R-CNN can be downloaded from my another repo ("rcnn_swin.pth" in the baseline zip file), the initial weights for Faster R-CNN can be downloaded from here.
Just need to modify the pre-trained weight into your pre-trained weights, the official ones are here. For example, in cascade_xsrnet.py, modify the Line602 from 'cascade.pth' into your pre-trained weights.
Here is a tiny implement of the training code and prepared data for fast try. The playground training and test data are all prepared. This is the Cascade R-CNN training only with the Texture Jitter ICDAR2013 and tested on Tampered IC13. To run the code only needs 3 steps:
- Download and unzip the file. In the new dir, rename the rcnn_swin.pth ("rcnn_swin.pth" in the baseline zip file) to'cascade.pth'.
- Modify the tools/dist_train.sh "CUDA_VISIBLE_DEVICES=6,7" to your own GPU ids.
- run "bash tools/dist_train.sh cascade_debug.py 2"
I have kept almost all the trained models, but the google drive space is not enough to hold all of them. So I provide the Texture Jitter pre-trained models and SR-Net fine-tuned models of Cascade R-CNN and Faster R-CNN being trained with our methods in this file. If you need more model weights, you can concat me to get them via educational email.
"mv mmdet mmdet_train; unzip mmdet_test.zip
Then modify the config file you used in training:
- Find the line with "max_per_img=100)))" and modify this into "max_per_img=1000)))".
- Find the line with "FTIC15" and modify this into "FTIC15PK"
- (Optional) Find the line "datasets = [test_srnet, test_stefann, test_mostel, test_derend, test_diffste, test_anytext, test_udifftext],", and modify it into your test dataset (may be a single one such as 'datasets = test_srnet,').
bash tools/dist_test.sh [your config file] [model weights to evaluate] 1
Then you will get a new .pk file in a newly created dir named "results"
After inference, the model prediction is converted into .txt files, zipped and evaluated following the same official Tampered-IC13 evaluation tools and methods.