Learning reliably under adversarial attacks, distribution shifts and strategic agents

Published: 29 Sept 2025, Last Modified: 22 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robustly-reliable learning, adversarial robustness, distribution shift, instance-targeted attacks, per-point reliability guarantees
TL;DR: We present a visionary perspective on designing learners with strong theoretical reliability guarantees, surveying some recent foundational work and listing several exciting directions for future work.
Abstract: The impressive strengths of generative AI brings along with it a host of new challenges since the tool is equally available to good and bad actors. For example, malicious agents can use it to create a large number of unreliable online reviews with very low effort, enabling more aggressive targeted attacks on recommendation systems. Therefore, learning models that depend on this data must incorporate strong reliability guarantees about their predictions to be at all useful. A recent line of work initiates this important direction by formalizing these reliability guarantees, along with tight upper and lower bounds on when they may be achieved. Initiated in the context of poisoning attacks on the training data, follow-up works have successfully shown how similar guarantees may be given for test-time adversarial attacks, distribution shifts and strategic manipulations. We discuss several future directions for this line of research.
Submission Number: 145
Loading