Towards Defending against Adversarial Examples via Attack-Invariant Features

The implementation of Towards Defending against Adversarial Examples via Attack-Invariant Features (ICML 2021).

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-theart approaches, especially against unseen types of attacks and adaptive attacks

A visual illustration of the natural example ($x$), adversarial example ($\tilde{x}$), latent feature ($\mathcal{F}(\cdot)$) and attack-specific feature ($\Delta(\mathcal{F},\tilde{x},x)=[\mathcal{F}(\tilde{x})\text{ - }\mathcal{F}(x)]\times10^{4}$). The latent feature is extracted from the first ReLu layer of the ResNet-110 model \citep{he2016deep}. Distinct types of attacks (e.g., PGD, AA and STA) generally only modify tiny information and their adversarial examples sufficiently retain invariant features from the natural example.

A visual illustration of our Adversarial noise Removing Network (ARN). Our main idea is to restore natural examples by exploiting invariant features. ARN is composed of an encoder network and a decoder network. The encoder network learns Attack-Invariant Features (AIF) via an adversarial feature learning mechanism and a normalization term. The decoder is trained to restore natural examples from AIF via a pixel similarity metric and an image discriminator.

Requirements

This codebase is written for python3 and pytorch.
To install necessary python packages, run pip install -r requirements.txt.

Experiments

Training

To generate adversarial data for training or testing the model

python craft_adversarial_examples.py

We use the "advertorch" toolbox to help generate adversairal samples. This code provides PGD, CW, DDN and STA, etc., to generate different adversarial samples.

The generated samples can be saved with ".png" and ".npy" format. The storage directory defaults to "adv_example".

To train the "Adversarial noise Removing Network (ARN)"?

python train_ARN.py

See './config/adver.yaml' for network configurations and data selection.

The training data includes natural data and two types of adversarial data.

Test

To test the ARN

python test_ARN.py

See './config/adver.yaml' for network configurations and data selection.

Input the natural or adversairal data into the ARN and obtain the processed data. Then, input the processed data into the target model.

License and Contributing

This README is formatted based on paperswithcode.
Feel free to post issues via Github.

Reference

If you find the code useful in your research, please consider citing our paper:

@inproceedings{zhou2021towards,
  title={Towards defending against adversarial examples via attack-invariant features},
  author={Zhou, Dawei and Liu, Tongliang and Han, Bo and Wang, Nannan and Peng, Chunlei and Gao, Xinbo},
  booktitle={International Conference on Machine Learning},
  pages={12835--12845},
  year={2021},
  organization={PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
adv_example		adv_example
config		config
utils		utils
README.md		README.md
accuracy.py		accuracy.py
arch.png		arch.png
craft_adversarial_examples.py		craft_adversarial_examples.py
method.png		method.png
requirements.txt		requirements.txt
test_ARN.py		test_ARN.py
train_ARN.py		train_ARN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Defending against Adversarial Examples via Attack-Invariant Features

Requirements

Experiments

Training

Test

License and Contributing

Reference

About

Releases

Packages

Languages

dwDavidxd/ARN

Folders and files

Latest commit

History

Repository files navigation

Towards Defending against Adversarial Examples via Attack-Invariant Features

Requirements

Experiments

Training

Test

License and Contributing

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages