岡田 健太

Abstract Membership Inference Attack (MIA) against machine learning models is an attack that aims to infer whether a specific data point is included in the training dataset. The concept of MIA was first introduced in 2016 by Shokri et al., who proposed a black-box attack model. In 2019, Nasr et al. extended this con cept by proposing a white-box attack model. For black-box attacks, Salem et al. relaxed the assumptions set by Shokri et al., introducing a more flexible MIA. However, their attack is limited in its effectiveness against models that output only predicted labels instead of confidence score vectors, as well as against defensive mechanisms like MemGuard, proposed by Jia et al., which adds noise to confidence score vec tors. To address these limitations, Choquette et al. proposed an MIA that remains effective against models that output only predicted labels and can bypass MemGuard's defensive mechanism. By relying solely on output labels, Choquette et al.'s attack poses a greater threat. This paper investigates defense methods against the attack proposed by Choquette et al. and experimentally evaluates the robustness of specific defenses against their approach