RETRIEVALFORMER: TRANSFORMER-QUALITY RECOMMENDATIONS WITH EFFICIENT ANN RETRIEVAL AND COLD-START RESILIENCE

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: recommendation systems, two-tower models, transformers, user modeling, neural networks, deep learning, information retrieval
TL;DR: A flexible two-tower architecture that combines transformer-based user modeling with efficient retrieval, matching state-of-the-art performance while maintaining sub-millisecond inference latency.
Abstract: We propose RetrievalFormer, a transformer-based dual-encoder recommender ar- chitecture that combines competitive accuracy with strong transformer-based se- quential baselines, efficient Approximate Nearest Neighbor (ANN) retrieval, and the ability to score feature-described items that are unseen during training. Our ar- chitecture uses an attention-based heterogeneous feature encoder that aggregates item and user attributes via shared embedding tables and an AttentionFusion mod- ule, so that the resulting user and item representations lie in a shared embedding space suitable for ANN search. On Amazon and MovieLens benchmarks, Re- trievalFormer achieves competitive recommendation accuracy, reaching 86–91% of the Recall@20 of strong transformer-based sequential baselines while enabling up to 288× lower latency at a 10M-item scale via ANN retrieval. On MovieLens- 1M, RetrievalFormer attains Recall@20 of 0.337. In cold-start experiments where entire items and all of their interactions are held out during training, Retrieval- Former successfully recommends completely unseen items from their features in a leave-one-out cold (LOOC) protocol with zero item leakage between training and evaluation, in which ID-softmax transformer baselines cannot produce scores for such items at all, and it outperforms a strong content-based baseline on a 100% cold-start production dataset. Our approach enables practical deployment of ef- ficient recommendations at scale, offering a compelling trade-off between model accuracy and serving efficiency.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21558
Loading