Artificial intelligence systems have achieved widespread applications across many fields such as image classification, speech recognition, and game playing. However, as their decision-making logic is primarily learned from data, their outputs are highly sensitive to data anomalies and are particularly vulnerable to adversarial perturbations. This paper conducts a comprehensive survey on the robustness of artificial intelligence systems, reviewing classical adversarial attack and defense methods, and summarizing future development trends. We hope this work can provide valuable insights for research on the robustness of artificial intelligence systems and support the development of trustworthy artificial intelligence.
Zhao J, Zhao W, Deng B, et al., 2024, Autonomous Driving System a Comprehensive Survey. Expert Systems with Applications, 242: 122836.
Al Kuwaiti A, Nazer K, Al-Reedy A, et al., 2023, A Review of the Role of Artificial Intelligence in Healthcare. Journal of Personalized Medicine, 13(6): 951.
Pei K, Cao Y, Yang J, et al., 2017, DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Proceedings of the 26th Symposium on Operating Systems Principles: 1–18.
Szegedy C, Zaremba W, Sutskever I, et al., 2013, Intriguing Properties of Neural Networks. arXiv Preprint arXiv:1312.6199.
Hamon R, Junklewitz H, Sanchez I, 2020, Robustness and Explainability of Artificial Intelligence. Publications Office of the European Union, 207(40): 1–40.
Javed H, El-Sappagh S, Abuhmed T, 2024, Robustness in Deep Learning Models for Medical Diagnostics: Security and Adversarial Challenges Towards Robust AI Applications. Artificial Intelligence Review, 58(1): 12.
Tocchetti A, Corti L, Balayn A, et al., 2025, AI Robustness: A Human-Centered Perspective on Technological Challenges and Opportunities. ACM Computing Surveys, 57(6): 1–38.
Wang Y, Sun T, Li S, et al., 2023, Adversarial Attacks and Defenses in Machine Learning-Empowered Communication Systems and Networks: A Contemporary Survey. IEEE Communications Surveys & Tutorials, 25(4): 2245–2298.
Xu H, Mannor S, 2012, Robustness and Generalization. Machine Learning, 86(3): 391–423.
Xu H, Ma Y, Liu H, et al., 2020, Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. International Journal of Automation and Computing, 17(2): 151–178.
Goyal S, Doddapaneni S, Khapra M, et al., 2023, A Survey of Adversarial Defenses and Robustness in NLP. ACM Computing Surveys, 55(14s): 1–39.
Zhang W, Sheng Q, Alhazmi A, et al., 2020, Adversarial Attacks on Deep-Learning Models in Natural Language Processing: A Survey. ACM Transactions on Intelligent Systems and Technology, 11(3): 1–41.
Carlini N, Wagner D, 2018, Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. IEEE Security and Privacy Workshops: 1–7.
Deldjoo Y, Noia T, Merra F, 2021, A Survey on Adversarial Recommender Systems: From Attack and Defense Strategies to Generative Adversarial Networks. ACM Computing Surveys, 54(2): 1–38.
Shayegani E, Mamun M, Fu Y, et al., 2023, Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. arXiv Preprint arXiv:2310.10844.
Goodfellow I, Shlens J, Szegedy C, 2014, Explaining and Harnessing Adversarial Examples. arXiv Preprint arXiv:1412.6572.
Tramèr F, Kurakin A, Papernot N, et al., 2017, Ensemble Adversarial Training: Attacks and Defenses. arXiv Preprint arXiv:1705.07204.
Kurakin A, Goodfellow I, Bengio S, 2018, Adversarial Examples in the Physical World. Artificial Intelligence Safety and Security. Chapman and Hall/CRC: 99–112.
Madry A, Makelov A, Schmidt L, et al., 2017, Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv Preprint arXiv:1706.06083.
Dong Y, Liao F, Pang T, et al., 2018, Boosting Adversarial Attacks with Momentum. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 9185–9193.
Carlini N, Wagner D, 2017, Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy: 39–57.
Feng S, Feng F, Xu X, et al., 2021, Digital Watermark Perturbation for Adversarial Examples to Fool Deep Neural Networks. International Joint Conference on Neural Networks: 1–8.
Baluja S, Fischer I, 2017, Adversarial Transformation Networks: Learning to Generate Adversarial Examples. arXiv Preprint arXiv:1703.09387.
Poursaeed O, Katsman I, Gao B, et al., 2018, Generative Adversarial Perturbations. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4422–4431.
Chen X, Gao X, Zhao J, et al., 2023, AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models. IEEE/CVF International Conference on Computer Vision: 4562–4572.
Na T, Ko J, Mukhopadhyay S, 2017, Cascade Adversarial Machine Learning Regularized with a Unified Embedding. arXiv Preprint arXiv:1708.02582.
Hendrycks D, Lee K, Mazeika M, 2019, Using Pre-Training Can Improve Model Robustness and Uncertainty. International Conference on Machine Learning: 2712–2721.
Jiang Z, Chen T, Chen T, et al., 2020, Robust Pre-Training by Adversarial Contrastive Learning. Advances in Neural Information Processing Systems, 33: 16199–16210.
Wang H, Deng Y, Yoo S, et al., 2021, AGKD-BML: Defense Against Adversarial Attack by Attention-Guided Knowledge Distillation and Bi-Directional Metric Learning. IEEE/CVF International Conference on Computer Vision: 7658–7667.
Bai T, Zhao J, Wen B, 2023, Guided Adversarial Contrastive Distillation for Robust Students. IEEE Transactions on Information Forensics and Security, 19: 9643–9655.
Guo C, Rana M, Cisse M, et al., 2017, Countering Adversarial Images Using Input Transformations. arXiv Preprint arXiv:1711.00117.
Liao F, Liang M, Dong Y, et al., 2018, Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser. IEEE Conference on Computer Vision and Pattern Recognition: 1778–1787.
Bian H, Chen D, Zhang K, et al., 2021, Adversarial Defense via Self-Orthogonal Randomization Super-Network. Neurocomputing, 452: 147–158.
Alotaibi A, Rassam M, 2023, Adversarial Machine Learning Attacks Against Intrusion Detection Systems: A Survey on Strategies and Defense. Future Internet, 15(2): 62.
Aldahdooh A, Hamidouche W, Fezza S, et al., 2022, Adversarial Example Detection for DNN Models: A Review and Experimental Comparison. Artificial Intelligence Review, 55(6): 4403–4462.