Can Test-time Computation Mitigate Reproduction Bias in Neural Symbolic Regression?

Can Test-time Computation Mitigate Reproduction Bias in Neural Symbolic Regression?

TMLR Paper8196 Authors

31 Mar 2026 (modified: 28 May 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Mathematical expressions play a central role in scientific discovery. Symbolic regression aims to automatically discover such expressions from given numerical data. Recently, Neural symbolic regression (NSR) methods that involve Transformers pre-trained on synthetic datasets have gained attention for their fast inference, but they often perform poorly, especially with many input variables. In this study, we analyze NSR from both theoretical and empirical perspectives and show that (1) ordinary token-by-token generation may be ill-suited for NSR, as Transformers with insufficient model complexity cannot compositionally generate lengthy expressions while validating numerical consistency, and (2) the search space of NSR methods is greatly restricted due to reproduction bias, where the majority of generated expressions are merely copied from the training data. We further examine whether tailored test-time strategies can reduce reproduction bias and show that providing additional information at test time effectively mitigates it. These findings contribute to a deeper understanding of the limitation of NSR approaches and provide guidance for designing more robust and generalizable methods.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Zhangyang_Wang1

Submission Number: 8196

Loading