Mathematical Sciences Department Discrete Math Seminar - Jonathan Shafer, MIT
3:00 pm to 3:50 pm
Tuesday, October 8
3:00pm – 3:50pm
Stratton Hall 205
Title: Mitigating Undetectable Backdoors in Machine Learning
Abstract:
As society grows more reliant on machine learning, ensuring the security of machine learning systems against sophisticated attacks becomes a pressing concern. A striking result of Goldwasser et al. (FOCS 2022) shows that an adversary can plant undetectable backdoors in machine learning models, allowing the adversary to covertly control the model’s behavior. The backdoors can be planted in such a way that the backdoored machine learning model is computationally indistinguishable from an honest model without backdoors. Goldwasser et al. show undetectability in both black-box and white-box settings.
In this talk, I’ll discuss strategies for defending against undetectable backdoors. The main observation is that, while backdoors may be undetectable, it is possible in some cases to remove the backdoors without needing to detect them. This idea goes back to early works on program self-correcting and random self-reducibility. We show two types of results. First, for binary classification, we show a “global mitigation” technique, which removes all backdoors from a machine learning model under the assumption that the ground-truth labels are close to a decision tree or a Fourier-sparse function. Next, we consider regression where the ground-truth labels are close to a linear or polynomial function in R^n. Here, we show “local mitigation” techniques, which remove backdoors for specific inputs of interest, and are computationally cheaper than global mitigation. All our constructions are black-box. Along the way we prove a simple result for robust mean estimation.
Joint work with Shafi Goldwasser, Vinod Vaikuntanathan, and Neekon Vafa.