Abstract: This paper analyzes $\ell_1$ regularized linear regression under the challenging scenario of having only adversarially corrupted data for training. Firstly, we prove existing deterministic adversarial attacks (e.g., FGSM and variants) focusing on maximizing the loss function can be easily handled with a few samples for support recovery. Hence, we consider a more general, challenging stochastic adversary which can be conditionally dependent on uncorrupted data and show existing models attacking support (Goodfellow et al., 2014; Madry et al., 2017) or Huber model (Prasad et al. 2018) are particular cases of our adversarial model. This enables us to show the counter-intuitive result that an adversary can influence sample complexity by corrupting the ``irrelevant features'', i.e., non-support. Secondly, as any adversarially robust algorithm has limitations, our theoretical analysis identifies that the dependence (covariance) between adversarial perturbations and uncorrupted data plays a critical role in defining the regimes under which this challenging adversary or Lasso can dominate over each other. Thirdly, we derive a necessary condition for support recovery for any algorithm (not restrictive to Lasso), which corroborates our theoretical findings for Lasso. Fourthly, we identify the fundamental limits and address critical scientific questions of which parameters (i.e., mutual incoherence, the maximum and minimum eigenvalue of the covariance matrix, and the budget of adversarial perturbations) play a role in the high or low probability of success of the Lasso algorithm. Also, the derived sample complexity is logarithmic with respect to the size of the regression parameter vector. Our theoretical claims are validated by empirical analysis.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Removed text is in struckthrough red. Added text is in blue.
Assigned Action Editor: ~Robert_Legenstein1
Submission Number: 3075
Loading