Logo image
Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images
Conference proceeding   Peer reviewed

Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images

Kyle Stein, Andrew Arash Mahyari, Guillermo Francia and Eman El-Sheikh
MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM)
IEEE Military Communications Conference
IEEE Military Communications Conference (MILCOM) (Los Angeles, California, USA, 10/06/2025–10/10/2025)
10/06/2025
Web of Science ID: WOS:001708627800027

Metrics

4 Record Views

Abstract

Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into adversary-chosen target labels. While extensive research has focused on mitigating these attacks in object recognition models through weight fine-tuning and other reactive strategies, much less attention has been given to detecting backdoored samples directly. Given the vast datasets used in training models, manual inspection for backdoor triggers is impractical, and even state-of-the-art defense mechanisms fail to fully neutralize their impact. To address this gap, we introduce a groundbreaking method to detect unseen backdoored images during both training and inference. Leveraging the transformative success of prompt tuning in Vision Language Models (VLMs), our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers. Comprehensive experiments on CIFAR-10 and GTSRB covering six diverse attack families demonstrate the robustness of our detector. When exposed to unseen backdoor threats, the learned prompts achieve an average 86% accuracy at distinguishing previously unseen backdoor images from clean ones, outperforming baselines by up to 30 percentage points. These results establish prompt-tuned VLMs as an effective first line of defense against backdoor threats. Code and datasets will be available.

Details

Logo image