In recent years, convolutional neural networks have demonstrated outstanding performance in image recognition tasks. However, they face the challenge of vulnerability to" adversarial samples "―images that intentionally induce misclassification by introducing minute perturbations imperceptible to humans. Various defense methods have been proposed, including input preprocessing, adversarial training, and anomaly detection-based approaches. Anomaly detection-based defenses primarily use Autoencoders (AE) or diffusion models to detect anomalies in contaminated inputs or remove perturbations causing misclassification. Diffusion models offer high performance but come with extremely high computational costs. Traditional AE-based defense methods, such as the Defense-VAE proposed by Li et al., rely on anomaly detection based on the"reconstruction error "(the difference between input and output) or remove noise through reconstruction. Therefore, methods using reconstruction error face challenges against sophisticated attacks like Madry et al.'s PGD attack, which intentionally minimizes error while causing misclassification. Furthermore, noise removal alone cannot completely neutralize strong perturbations.Therefore, this paper focuses on a computationally efficient AE-based approach. We propose a novel two-stage defense method, " DSVDD-AE, " which performs anomaly detection based on feature distances in the latent space, independent of reconstruction error. This method employs Deep Support Vector Data Description (DSVDD) in the first stage to detect anomalies using feature distances in the latent space. In the second stage, it performs noise removal via AE on inputs that pass detection. This study defines defense success as either successfully detecting an adversarial image as an anomaly or correctly classifying it after noise removal. The effectiveness of the proposed method was verified on the CIFAR-10 dataset against White-box PGD attacks, achieving a 28% improvement in defense performance compared to Defense-VAE.

Top