Making Hard Problems Easier with Custom Data Distributions and Loss Regularization: A Case Study in Modular Arithmetic

Eshika Saxena; Alberto Alfarano; Emily Wenger; Kristin E. Lauter

Making Hard Problems Easier with Custom Data Distributions and Loss Regularization: A Case Study in Modular Arithmetic

Eshika Saxena, Alberto Alfarano, Emily Wenger, Kristin E. Lauter

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce two techniques, varying the diversity of training data and introducing a regularized loss function, to improve transformer learning on modular arithmetic, cryptanalysis, and other synthetic tasks

Abstract: Recent work showed that ML-based attacks on Learning with Errors (LWE), a hard problem used in post-quantum cryptography, outperform classical algebraic attacks in certain settings. Although promising, ML attacks struggle to scale to more complex LWE settings. Prior work connected this issue to the difficulty of training ML models to do modular arithmetic, a core feature of the LWE problem. To address this, we develop techniques that significantly boost the performance of ML models on modular arithmetic tasks—enabling the models to sum up to $N=128$ elements modulo $q \le 974269$. Our core innovation is the use of custom training data distributions and a carefully designed loss function that better represents the problem structure. We apply an initial proof of concept of our techniques to LWE specifically and find that they allow recovery of 2x harder secrets than prior work. Our techniques also help ML models learn other well-studied problems better, including copy, associative recall, and parity, motivating further study.

Lay Summary: Recently, researchers have found that machine learning (ML) models can be trained to solve hard math problems that are used in cryptography to keep information secure. However, these models still struggle to do modular arithmetic, which is a core part of these math problems. In this work, we develop methods that improve model performance on modular arithmetic. Namely, we train the model on a curated mixture of easy and hard problems while also penalizing the model for predicting the same output for every input. We show that these methods can be extended beyond arithmetic to assessing the security of existing cryptography systems and also improving performance on other well-studied problems in ML.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/facebookresearch/arithmetic

Primary Area: Applications

Keywords: transformers, modular arithmetic, math, cryptography

Submission Number: 14087

Loading