Keywords: Behavioral choice prediction, Progressive Dual-Head Transformer, Urban Mobility, Tabular data
Abstract: Many applications require joint prediction of interdependent behavioral choices, yet existing models often treat each choice independently (e.g., through parallel prediction heads), overlooking the influence of one on the other. In this work, we propose Progressive Dual-Head Transformer (PDFormer), a novel framework that performs two-step prediction: the model first estimates one choice and then conditions the second on this upstream estimate through an explicit head-to-head pathway. A shared encoder captures the common structure of two prediction tasks, while the dual-head module explicitly reflect cross-choice dependence. A gated residual mechanism integrated into the embedding layer and the dual-head modules further improves the training stability and the prediction performance.
Extensive experiments on an urban mobility behavioral choice dataset and a real-world manufacturing dataset demonstrate that PDFormer consistently outperforms state-of-the-art machine learning models, deep tabular models, as well as parallel-head Transformer variants across multiple metrics. Moreover, our ablation study confirms that both the proposed progressive dual-head and gated residual mechanism are key contributors to the observed gains in different prediction tasks.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18435
Loading