Decoding-based Regression

TMLR Paper4806 Authors

09 May 2025 (modified: 19 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Following discussion with Reviewer tEwp: * Change abstract to use "numeric regression heads", which is a standard term * Intro explicitly mentions "output head" as representing $p(y|x)$ * More precision on contributions * Change wording from "decoder" -> "decoding head" to emphasize we're only ablating regression heads Following discussion with Reviewer zAuL: * Made abstract more precise on our contributions * More explicit naming / defining of regression heads * Small notation fixes to theory * Small changes to theoretical implications * Figure 6 x-labels included
Assigned Action Editor: ~Andreas_Lehrmann1
Submission Number: 4806
Loading