Skip to content

Must-read Papers on Textual Adversarial Attack and Defense

Notifications You must be signed in to change notification settings

xingbowen/TAADpapers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 

Repository files navigation

Must-read Papers on Textual Adversarial Attack and Defense (TAAD)

Contributed by Fanchao Qi, Chenghao Yang and Yuan Zang.

Contents

1. Survey Papers

  1. Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. arXiv 2020. [pdf]
  2. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, Anil K. Jain. arXiv 2019. [pdf]
  3. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. arXiv 2019. [pdf]
  4. Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [pdf]

2. Attack Papers

Each paper is attached to one or more following labels indicating how much information the attack model knows about the victim model: gradient (=white, all information), score (output decision and scores), decision (only output decision) and blind (nothing)

2.1 Sentence-level Attack

  1. Probing Neural Network Understanding of Natural Language Arguments. Timothy Niven, Hung-Yu Kao. ACL 2019. score [pdf] [code&data]
  2. Robust Neural Machine Translation with Doubly Adversarial Inputs. Yong Cheng, Lu Jiang, Wolfgang Macherey. ACL 2019. gradient [pdf]
  3. Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019. score [pdf]
  4. PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019. blind [pdf] [dataset]
  5. Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Minhao Cheng, Wei Wei, Cho-Jui Hsieh. NAACL-HLT 2019. gradient score [pdf] [code]
  6. Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018. decision [pdf] [code]
  7. Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Tong Niu, Mohit Bansal. CoNLL 2018. score [pdf] [code&data]
  8. Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge. Pasquale Minervini, Sebastian Riedel. CoNLL 2018. score [pdf] [code&data]
  9. Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018. decision [pdf] [dataset]
  10. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018. blind [pdf] [code&data]
  11. Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. decision [pdf] [code]
  12. Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, and Percy Liang. EMNLP 2017. score decision blind [pdf] [code]
  13. Adversarial Sets for Regularising Neural Link Predictors. Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel. UAI 2017. score [pdf] [code]

2.2 Word-level Attack

  1. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. AAAI-20. score [pdf] [code]
  2. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che. ACL 2019. score [pdf] [code]
  3. Generating Fluent Adversarial Examples for Natural Languages. Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li. ACL 2019. gradient score [pdf] [code]
  4. Universal Adversarial Attacks on Text Classifiers. Melika Behjati, Seyed-Mohsen Moosavi-Dezfooli, Mahdieh Soleymani Baghshah, Pascal Frossard. ICASSP 2019. gradient [pdf]
  5. Generating Natural Language Adversarial Examples. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang. EMNLP 2018. score [pdf] [code]
  6. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences. Max Glockner, Vered Shwartz, Yoav Goldberg. ACL 2018. blind [pdf] [dataset]
  7. Deep Text Classification Can be Fooled. Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi. IJCAI 2018. gradient score [pdf]
  8. Interpretable Adversarial Perturbation in Input Embedding Space for Text. Sato, Motoki, Jun Suzuki, Hiroyuki Shindo, and Yuji Matsumoto. IJCAI 2018. gradient [pdf] [code]
  9. Towards Crafting Text Adversarial Samples. Suranjana Samanta, Sameep Mehta. ECIR 2018. gradient [pdf]
  10. Crafting Adversarial Input Sequences For Recurrent Neural Networks. Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang. MILCOM 2016. gradient [pdf]

2.3 Char-level Attack

  1. Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. Steffen Eger, Gözde Gül ¸Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych. NAACL-HLT 2019. score [pdf] [code&data]
  2. TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang. NDSS 2019. gradient score [pdf]
  3. Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi. IEEE SPW 2018. score blind [pdf] [code]
  4. On Adversarial Examples for Character-Level Neural Machine Translation. Javid Ebrahimi, Daniel Lowd, Dejing Dou. COLING 2018. gradient [pdf] [code]
  5. Synthetic and Natural Noise Both Break Neural Machine Translation. Yonatan Belinkov, Yonatan Bisk. ICLR 2018. blind [pdf] [code&data]

2.4 Multi-level Attack

  1. Universal Adversarial Triggers for Attacking and Analyzing NLP. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. EMNLP-IJCNLP 2019. gradient [pdf] [code] [website]
  2. Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model. Prashanth Vijayaraghavan, Deb Roy. ECMLPKDD 2019. score [pdf]
  3. HotFlip: White-Box Adversarial Examples for Text Classification. Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou. ACL 2018. gradient [pdf] [code]
  4. Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension. Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu. CoNLL 2018. gradient [pdf] [code]

3. Defense Papers

  1. Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification. Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang. EMNLP-IJCNLP 2019. [pdf] [code]

  2. Combating Adversarial Misspellings with Robust Word Recognition. Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton. ACL 2019. [pdf] [code]

4. Certified Robustness Papers

  1. Robustness Verification for Transformers. Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh. ICLR 2020. [pdf] [code]
  2. Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli. EMNLP-IJCNLP 2019. [pdf]
  3. Certified Robustness to Adversarial Word Substitutions. Robin Jia, Aditi Raghunathan, Kerem Göksel, Percy Liang. EMNLP-IJCNLP 2019. [pdf] [code]
  4. POPQORN: Quantifying Robustness of Recurrent Neural Networks. Ching-Yun Ko, Zhaoyang Lyu, Lily Weng, Luca Daniel, Ngai Wong, Dahua Lin. ICML 2019. [pdf] [code]

5. Other Papers

  1. LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification. Jingjing Xu, Liang Zhao, Hanqi Yan, Qi Zeng, Yun Liang, Xu Sun. EMNLP-IJCNLP 2019. [pdf] [code]
  2. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. Paul Michel, Xian Li, Graham Neubig, Juan Miguel Pino. NAACL-HLT 2019. [pdf] [code]
  3. Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations. Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma. CVPR 2019. [pdf]
  4. AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples. Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy. ACL 2018. [pdf] [code]
  5. Learning Visually-Grounded Semantics from Contrastive Adversarial Samples. Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun. COLING 2018. [pdf] [code]

About

Must-read Papers on Textual Adversarial Attack and Defense

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published