Decoding-based Regression

Published: 09 Aug 2025, Last Modified: 09 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Updated with camera-ready, following Action Editor's requested changes. Specifically: * Additional Section 5 for "Discussion: Limitations and Extensions" * Two decimals in Table 1 * All four regression heads in Figure 4 * Figure 5 points to full results in Appendix A3, Figure 13. Figure 6 is added to also compare decoders vs. Riemann. * Figure 7 points to Appendix A1, Figure 11 for full results. Unnormalized decoder is explained to not be in these experiments.
Code: https://github.com/google-research/optformer/tree/main/optformer/decoding_regression
Assigned Action Editor: ~Andreas_Lehrmann1
Submission Number: 4806
Loading