Bridging Crypto with ML-based Solvers: the SAT Formulation and Benchmarks

Xinhao Zheng; Xinhao Song; Bolin Qiu; Yang Li; Zhongteng Gui; Junchi Yan

Bridging Crypto with ML-based Solvers: the SAT Formulation and Benchmarks

Xinhao Zheng, Xinhao Song, Bolin Qiu, Yang Li, Zhongteng Gui, Junchi Yan

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: SAT, ANF, CNF, Cryptanalysis

TL;DR: This study propose a comprehensive benchmark for evaluating ML-based SAT solvers on cryptographic problems.

Abstract: The Boolean Satisfiability Problem (SAT) plays a crucial role in cryptanalysis, enabling tasks like key recovery and distinguisher construction. Conflict-Driven Clause Learning (CDCL) has emerged as the dominant paradigm in modern SAT solving, and machine learning has been increasingly integrated with CDCL-based SAT solvers to tackle complex cryptographic problems. However, the lack of a unified evaluation framework, inconsistent input formats, and varying modeling approaches hinder fair comparison. Besides, cryptographic SAT instances also differ structurally from standard SAT problems, and the absence of standardized datasets further complicates evaluation. To address these issues, we introduce SAT4CryptoBench, the first comprehensive benchmark for assessing machine learning–based solvers in cryptanalysis. SAT4CryptoBench provides diverse SAT datasets in both Arithmetic Normal Form (ANF) and Conjunctive Normal Form (CNF), spanning various algorithms, rounds, and key sizes. Our framework evaluates three levels of machine learning integration: standalone distinguishers for instance classification, heuristic enhancement for guiding solving strategies, and hyperparameter optimization for adapting to specific problem distributions. Experiments demonstrate that ANF-based networks consistently achieve superior performance over CNF-based networks in learning cryptographic features. Nonetheless, current ML techniques struggle to generalize across algorithms and instance sizes, with computational overhead potentially offsetting benefits on simpler cases. Despite this, ML-driven optimization strategies notably improve solver efficiency on cryptographic SAT instances. Besides, we propose BASIN, a bitwise solver taking plaintext-ciphertext bitstrings as inputs. Crucially, its superior performance on high-round problems highlights the importance of input modeling and the advantage of direct input representations for complex cryptographic structures.

Croissant File: json

Dataset URL: https://www.kaggle.com/datasets/sclear7/satcryptobench

Code URL: https://github.com/void-zxh/SAT4CryptoBench

Primary Area: Dataset and Benchmark for Optimization (e.g., convex and non-convex, stochastic, robust, metrics for optimization, scaling of datasets, benchmarks)

Submission Number: 1364

Loading