Adversarial Attacks in Weight-Space Classifiers

Adversarial Attacks in Weight-Space Classifiers

TMLR Paper6533 Authors

17 Nov 2025 (modified: 04 Feb 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Implicit Neural Representations (INRs) have been recently garnering increasing interest in various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream tasks can be performed directly in the INR parameter-space. Doing so can substantially reduce the computational resources required to process the represented data in their native domain. A major difficulty in using modern machine-learning approaches, is their high susceptibility to adversarial attacks, which have been shown to greatly limit the reliability and applicability of such methods in a wide range of settings. In this work, we show that parameter-space models trained for classification are inherently robust to adversarial attacks – without the need of any robust training. To support our claims, we develop a novel suite of adversarial attacks targeting parameter-space classifiers, and furthermore analyze practical considerations of such attacks.

Submission Type: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: The main claims have been changed from "inherent adversarial robustness" to "robustness for gradient-based white-box adversarial attacks, relative to parameter-space classifier". We added empirical evidence for gradient obfuscation and corrected presentation errors.

Assigned Action Editor: ~Guillermo_Ortiz-Jimenez1

Submission Number: 6533

Loading