The implementation of Towards Defending against Adversarial Examples via Attack-Invariant Features (ICML 2021).
Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-theart approaches, especially against unseen types of attacks and adaptive attacks
A visual illustration of the natural example (
- This codebase is written for
python3
andpytorch
. - To install necessary python packages, run
pip install -r requirements.txt
.
- To generate adversarial data for training or testing the model
python craft_adversarial_examples.py
We use the "advertorch" toolbox to help generate adversairal samples. This code provides PGD,
CW, DDN and STA, etc., to generate different adversarial samples.
The generated samples can be saved with ".png" and ".npy" format. The storage directory defaults to "adv_example".
- To train the "Adversarial noise Removing Network (ARN)"?
python train_ARN.py
See './config/adver.yaml' for network configurations and data selection.
The training data includes natural data and two types of adversarial data.
To test the ARN
python test_ARN.py
See './config/adver.yaml' for network configurations and data selection.
Input the natural or adversairal data into the ARN and obtain the processed data. Then, input the processed data into the target model.
- This README is formatted based on paperswithcode.
- Feel free to post issues via Github.
If you find the code useful in your research, please consider citing our paper:
@inproceedings{zhou2021towards, title={Towards defending against adversarial examples via attack-invariant features}, author={Zhou, Dawei and Liu, Tongliang and Han, Bo and Wang, Nannan and Peng, Chunlei and Gao, Xinbo}, booktitle={International Conference on Machine Learning}, pages={12835--12845}, year={2021}, organization={PMLR} }