Responsible Imputation of User Behavior Surveys via Mask-Aware Transformers

Aman Shukla; Rishabh Kumar; Daniel Patrick Scantlebury

Responsible Imputation of User Behavior Surveys via Mask-Aware Transformers

Aman Shukla, Rishabh Kumar, Daniel Patrick Scantlebury

Published: 29 Sept 2025, Last Modified: 22 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Missing Data, Fairness Study, Imputation

TL;DR: A masked Transformer model imputes sparse survey responses with high accuracy and fairness under real-world constraints.

Abstract: User behavior data collected through surveys is foundational to applications in AdTech, personalization, and consumer intelligence. However, the structured nature of survey fielding governed by routing logic, platform constraints, and user fatigue results in pervasive missingness that is non-random and logic-driven. These gaps hinder the effectiveness of downstream systems that rely on user representations. We present a Transformer-based framework for imputing missing responses in multi-choice behavioral survey data. Our model encodes survey responses as flattened multi-hot vectors with associated binary masks indicating fielded questions. Through column-wise attention and mask-aware supervision, the model learns high-fidelity imputations while honoring routing logic. To enforce plausibility, we apply strict logical enforcement that filters predictions based on domain-aligned consistency rules. Empirically, we evaluate imputation performance under synthetic masking across increasing sparsity levels, demonstrating robust F1 and recall even in highly incomplete settings. Our ablation studies confirm the importance of structured attention and supervision masking. We further conduct a responsible imputation audit, assessing fairness across age, gender, and ethnicity- capturing both model fit and outcome parity. The results reveal stable performance across subgroups, indicating suitability for equitable industrial deployment. Our approach closes a critical gap between modeling sophistication and real-world deployment constraints in survey data pipelines, setting a precedent for responsible and scalable imputation.

Submission Number: 121

Loading