Skip to content
/ OSTF Public

[AAAI2025] Revisiting Tampered Scene Text Detection in the Era of Generative AI

Notifications You must be signed in to change notification settings

qcf-568/OSTF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Revisiting Tampered Scene Text Detection in the Era of Generative AI [AAAI2025]

This is the official implementation of the paper Revisiting Tampered Scene Text Detection in the Era of Generative AI. paper


The Open-Set Text Forensics (OSTF) dataset is now publicly available at Google Drive and Baidu Drive.

Researchers are welcome 😃 to apply for this dataset by sending an email to [email protected] (with institution email address) and introducing:

  1. Who you are and your institution.
  2. Who is your supervisor/mentor.

OSTF Train data preparation

  1. Apply, download and unzip the OSTF dataset.
  2. Move all the 18 *.pk files from the mmacc_pks dir into the mmacc dir.
  3. Move the mmacc dir into this main dir. Finally, after the above 3 steps, in this main dir, you will get such dir structre:
FBCNN---...
  |
configs---...
  |
mmcv_custom---...
  |
mmdet---...
  |
tools---...
  |
mmacc---srnet---...
          |
        srnet_train.pk
          |
        srnet_test.pk
          |
        anytext---...
          |
        anytext_train.pk
          |
        anytext_test.pk
          |
         ...

Texture Jitter train data preparation

  1. Download and unzip the pretrain_pk.zip in this dir. After unzip, you will get a new dir named "pretrain" with 7 sub-dirs (ArT, ICDAR2013, ICDAR2015, ICDAR2017-MLT, LSVT, ReCTS, TextOCR).
  2. Download and unzip the msk.zip in the new "pretrain" dir. After unzip, you will get 7 new dirs all named 'msk' under the above 7 sub-dirs.
  3. Download the dataset training set images from ArT, ICDAR2013 (Task 2.4: End to End (2015 edition)), ICDAR2015, ICDAR2017-MLT, LSVT (train_full_images_0/1.tar.gz 4.1G), ReCTS, TextOCR.
  4. Rename the 7 downloaded image dirs into an "img" dir under the 7 sub-dirs. For example, "mv [Your downloaded ArT train images] pretrain/ArT/img" and "mv [Your downloaded ReCTS train images] pretrain/ReCTS/img".
  5. Make a new dir named "revjpegs" in this main dir, and make "pretrain" dir and sub-dirs to make sure that the dir "revjpegs" has the same sub-dir structure as the "pretrain" dir. For example, it should has the dirs "revjpegs/pretrain/ArT/img" and "revjpegs/pretrain/ReCTS/img", etc, corresponding to "pretrain/ArT/img" and "pretrain/ReCTS/img" respectively.
  6. Download the fbcnn_color.pth following this Readme.md. In the FBCNN dir, run the command to create reverse jpeg images for each of the 7 sub-dir images of the pretrain dir. For example, run "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ArT/img/ --out revjpegs/ArT/img/" and "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ReCTS/img/ --out revjpegs/ReCTS/img/".

Finally, after the above 5 steps, in this main dir, you will get such dir structre:

FBCNN---...
  |
configs---...
  |
pretrain---ArT---img---....
  |         |     |
  |         |   train.pk
  |         |
  |        ICDAR2015---img---...
  |         |           |
  |         |         train.pk
  |         |
  |        ...
  |
revjpeg---pretrain---ArT---img---....
  |                   |     |
  |                   |   train.pk
  |                   |
  |                   ICDAR2015---img---...
  |                   |           |
  |                   |         train.pk
  |                   |
  |                  ...
  |
mmcv_custom---...
  |
mmdet---...
  |
tools---...
  |
mmacc---...

The Texture Jitter method is implemented as "TextureSG" in "txt_pipeline" of the config files (e.g. here), its source code is here. The key function for the Texture Jitter method is the function "img_tamper" in Line450.

The DAF framework is implemented as DFPNCMap3 and CascadeCMap3 for Faster R-CNN and Cascade R-CNN respectively.

DAF key implementation (take the Faster R-CNN based DAF as an example):

  1. Line47 Authentic Kernel implementation. The variable "self.sgl" implements the Authentic Kernel (the variable "self.C" in Line17) and its loss function (this forward function in Line21).
  2. Line379 Authentic Kernel Modulation implements the modulation between te Authentic Kernel (the variable "self.sgl.C") and the global features (the variable "gloabl_feats"). In this line, the resulting variable "gloabl_feats" is the modulated authentic kernel.
  3. Line324 Training model to learn real/fake classification with feature difference. During training, the feature difference between each RoI vector (the variable "mskf" in this line) and the modulated authentic kernel (the variable "glb_feats" in this line) is obtained by "mskf - glb_feats[gt_valid]". Then, this feature difference vector is fed into a fully-connected layer for final real/fake prediction as "self.fc(mskf - glb_feats[gt_valid])". During training, in this Line324, the loss between model prediction "self.fc(mskf - glb_feats[gt_valid])" and the ground-truth "gt_label[gt_valid].long()" is calculated to help the model learn real/fake classification.
  4. Line548 Model predicts real/fake with feature difference. In this line, the modulated authentic kernel is the variables "g" and "glb_feats", the input RoI feature vectors are "m" and "mask_feats". The feature difference is obtained by "(self.convert(m)-g)" and the final classification score is obtained by feeding it to the final binary classifier and softmax layer F.softmax(self.fc(self.convert(m)-g),1).

Train

Enviroment based on Python 3.9.12

pip install -r requirements.txt

bash tools/dist_train.sh [your config_py_file] [Your gpu number, should be consistent with the dist_train.sh]

About the config files

The config files are in the config dir, all the config files following such name roles: ModelType_AblationType+TrainData.py. For the AblationType, o is for the original one without Texture Jitter Pre-training and x is the one with the Texture Jitter Pre-training. For example, fasterrcnn_xsrnet is the Faster R-CNN model pre-trained with Texture Jitter, and fine-tuned with the SR-Net training data and the Texture Jitter method.

Use the config files in the pre-training stage

Given a config file that contains the model defination you want to pretrain (e.g. cascade R-CNN):

  1. Modify the datasets line of the train dataloader, delete the fine-tune data. For example, in cascade_xsrnet.py, modify the Line435 from "datasets = [ptdatas, ftdatas]," into "datasets = [ptdatas,],".
  2. Modify the datasets line of the "pt_data". For example, in cascade_xsrnet.py, modify the Line412 from "datasets = [ic13,ic15,ic17]," into "datasets = [ic13,ic15,ic17,art,rects,lsvt,textocrpt],".
  3. Modify the pre-trained weight to initialize. We use the official COCO-pretrained backbone and detection modules (RPN, RoI Heads) for initialization in the pre-training stage. For example, in cascade_xsrnet.py, modify the Line602 from 'cascade.pth' into your initial weights path (e.g. "rcnn_swin.pth"): The initial weights for Cascade R-CNN can be downloaded from my another repo ("rcnn_swin.pth" in the baseline zip file), the initial weights for Faster R-CNN can be downloaded from here.

Use the config files in the fine-tuning stage

Just need to modify the pre-trained weight into your pre-trained weights, the official ones are here. For example, in cascade_xsrnet.py, modify the Line602 from 'cascade.pth' into your pre-trained weights.


Tiny training implement

Here is a tiny implement of the training code and prepared data for fast try. The playground training and test data are all prepared. This is the Cascade R-CNN training only with the Texture Jitter ICDAR2013 and tested on Tampered IC13. To run the code only needs 3 steps:

  1. Download and unzip the file. In the new dir, rename the rcnn_swin.pth ("rcnn_swin.pth" in the baseline zip file) to'cascade.pth'.
  2. Modify the tools/dist_train.sh "CUDA_VISIBLE_DEVICES=6,7" to your own GPU ids.
  3. run "bash tools/dist_train.sh cascade_debug.py 2"

Pre-trained models

I have kept almost all the trained models, but the google drive space is not enough to hold all of them. So I provide the Texture Jitter pre-trained models and SR-Net fine-tuned models of Cascade R-CNN and Faster R-CNN being trained with our methods in this file. If you need more model weights, you can concat me to get them via educational email.


Inference

"mv mmdet mmdet_train; unzip mmdet_test.zip

Then modify the config file you used in training:

  1. Find the line with "max_per_img=100)))" and modify this into "max_per_img=1000)))".
  2. Find the line with "FTIC15" and modify this into "FTIC15PK"
  3. (Optional) Find the line "datasets = [test_srnet, test_stefann, test_mostel, test_derend, test_diffste, test_anytext, test_udifftext],", and modify it into your test dataset (may be a single one such as 'datasets = test_srnet,').

bash tools/dist_test.sh [your config file] [model weights to evaluate] 1

Then you will get a new .pk file in a newly created dir named "results"

The offcial mmdetection inference code also applies to this repo.

Evaluation

After inference, the model prediction is converted into .txt files, zipped and evaluated following the same official Tampered-IC13 evaluation tools and methods.


Any bug or question please open an issue or concat me with email.

About

[AAAI2025] Revisiting Tampered Scene Text Detection in the Era of Generative AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published