RETRIEVALFORMER: TRANSFORMER-QUALITY RECOMMENDATIONS WITH EFFICIENT ANN RETRIEVAL AND COLD-START RESILIENCE

RETRIEVALFORMER: TRANSFORMER-QUALITY RECOMMENDATIONS WITH EFFICIENT ANN RETRIEVAL AND COLD-START RESILIENCE

ICLR 2026 Conference Submission21558 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: recommendation systems, two-tower models, transformers, user modeling, neural networks, deep learning, information retrieval

TL;DR: A flexible two-tower architecture that combines transformer-based user modeling with efficient retrieval, matching state-of-the-art performance while maintaining sub-millisecond inference latency.

Abstract: We propose RetrievalFormer, a novel two-tower neural recommender architecture that bridges the gap between transformer accuracy and retrieval efficiency for sequential recommendation. RetrievalFormer employs a transformer encoder to model user interaction sequences while using a lightweight item tower to encode items from their content features, enabling efficient approximate nearest neighbor (ANN) retrieval at serving time. The key innovation is an attentionbased heterogeneous feature encoder that enriches both user and item representations by learning to weight and combine different feature modalities. By sharing embedding tables across towers and leveraging feature-rich representations, our model achieves three critical capabilities: (1) transformer-level recommendation accuracy while avoiding expensive full-catalog softmax computation, (2) immediate recommendation of new items without retraining, and (3) dramatic inference speedup through ANN search. On standard benchmarks (Amazon Beauty, Amazon Toys & Games, MovieLens-1M), RetrievalFormer achieves competitive performance, reaching 86-91% of established transformer baselines’ Recall@20 while delivering up to 288× speedup at inference for large catalogs. In cold-start experiments with held-out items, RetrievalFormer successfully recommends completely unseen items while baseline models fail entirely. Our approach enables practical deployment of efficient recommendations at scale, offering a compelling trade-off between model accuracy and serving efficiency.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 21558

Loading