Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into adversary-chosen target labels. While extensive research has focused on mitigating these attacks in object recognition models through weight fine-tuning and other reactive strategies, much less attention has been given to detecting backdoored samples directly. Given the vast datasets used in training models, manual inspection for backdoor triggers is impractical, and even state-of-the-art defense mechanisms fail to fully neutralize their impact. To address this gap, we introduce a groundbreaking method to detect unseen backdoored images during both training and inference. Leveraging the transformative success of prompt tuning in Vision Language Models (VLMs), our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers. Comprehensive experiments on CIFAR-10 and GTSRB covering six diverse attack families demonstrate the robustness of our detector. When exposed to unseen backdoor threats, the learned prompts achieve an average 86% accuracy at distinguishing previously unseen backdoor images from clean ones, outperforming baselines by up to 30 percentage points. These results establish prompt-tuned VLMs as an effective first line of defense against backdoor threats. Code and datasets will be available.
Related links
Details
Title
Proactive Adversarial Defense
Publication Details
MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM)
Resource Type
Conference proceeding
Conference
IEEE Military Communications Conference (MILCOM) (Los Angeles, California, USA, 10/06/2025–10/10/2025)
Institute for Human and Machine Cognition; Center for Cybersecurity; Intelligent Systems and Robotics; Division of Academic Affairs; Hal Marcus College of Science and Engineering