TRUST – Transformer-Driven U-Net for Sparse Target Recovery

Di An; Juyang Bai; Dylan Poppert; Jiayue Li; Mark Foster; Trac Tran

TRUST – Transformer-Driven U-Net for Sparse Target Recovery

Di An, Juyang Bai, Dylan Poppert, Jiayue Li, Mark Foster, Trac Tran

20 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inverse Problem, Sparse recovery, Compressed sensing, Cross-domain reconstruction

TL;DR: We introduce TRUST, a hybrid Transformer–U-Net architecture that jointly learns unknown sensing operators and reconstructs signals, using attention to recover sparse support directly from measurements.

Abstract: Many inverse problems---from coded aperture optics to undersampled MRI---operate with unknown or poorly characterized sensing operators $ \mathbf{A} $. Yet most sparse recovery methods assume $ \mathbf{A} $ is precisely known, forcing costly calibration or restrictive acquisition protocols. We address the more realistic setting in which only limited number of observation--target pairs $ (\mathbf{y},\mathbf{x}) $ are available, necessitating joint operator learning and signal reconstruction. The core challenge is cross-domain dispersion: local structures in the signal $ \mathbf{x} $ are spread globally into measurements $ \mathbf{y} = \mathbf{A}\mathbf{x} $, while CNN architectures rely on local receptive fields. We propose TRUST, a hybrid model that uses multi-resolution attention to recover sparse support directly from measurements. Theoretically, under the standard RIP conditions on $ \mathbf{A} $, we show that attention maps computed on $ \mathbf{y} $ approximate those computed on the true signal $ \mathbf{x} $, with error bounded by the RIP constant. Architecturally, a Vision Transformer encoder estimates global sparse support from $ \bf y $, and attention-guided skip connections steer a U-Net decoder to concentrate reconstruction capacity on support-consistent regions, coupling global contexts with local details. TRUST resolves the mismatch between measurement dispersion and the locality bias of CNN-only approaches. Across optical imaging, FastMRI, and ImageNet benchmarks, it consistently surpasses strong baselines -- both objectively and subjectively -- with marked reductions in hallucination artifacts. These results establish attention-guided support estimation as a principled and effective approach to high-quality reconstruction while jointly learning unknown sensing operators, enabling robust performance on inverse problems where conventional methods require the precise knowledge of forward models.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 23131

Loading