Sniper GMMs: Structured Gaussian mixtures poison ML on large n small p data with high efficacyDownload PDF

Anonymous

19 Oct 2020 (modified: 05 May 2023)Submitted to NeurIPSW 2020: DL-IGReaders: Everyone
Keywords: Structured mixture distribution learning, data poisoning, modified EM, distance correlation, Gaussian mixtures
Abstract: We propose a method for structured learning of Gaussian mixtures with low KL-divergence from target mixture models that in turn model the raw data. We show that samples from these structured distributions are highly effective and evasive in poisoning training datasets of popular machine learning training pipelines such as neural networks, XGBoost and random forests. Such attacks are especially destructive given the current uptrends towards distributed machine learning with several untrusted client devices that provide their data to servers and cloud service providers for privacy preserving distributed machine learning. In current day and age of machine learning, Gaussian mixtures are perceived to be an older/classical technique in practice, although they are still actively studied from a theoretical perspective. Therefore it is quite interesting to see that they can be highly effective in performing data poisoning attacks on complex ML pipelines if learned with the right structural constraints.
3 Replies

Loading