RETRIEVALFORMER: TRANSFORMER-QUALITY RECOMMENDATIONS WITH EFFICIENT ANN RETRIEVAL AND COLD-START RESILIENCE
Keywords: recommendation systems, two-tower models, transformers, user modeling, neural networks, deep learning, information retrieval
TL;DR: A flexible two-tower architecture that combines transformer-based user modeling with efficient retrieval, matching state-of-the-art performance while maintaining sub-millisecond inference latency.
Abstract: We propose RetrievalFormer, a transformer-based dual-encoder recommender ar-
chitecture that combines competitive accuracy with strong transformer-based se-
quential baselines, efficient Approximate Nearest Neighbor (ANN) retrieval, and
the ability to score feature-described items that are unseen during training. Our ar-
chitecture uses an attention-based heterogeneous feature encoder that aggregates
item and user attributes via shared embedding tables and an AttentionFusion mod-
ule, so that the resulting user and item representations lie in a shared embedding
space suitable for ANN search. On Amazon and MovieLens benchmarks, Re-
trievalFormer achieves competitive recommendation accuracy, reaching 86–91%
of the Recall@20 of strong transformer-based sequential baselines while enabling
up to 288× lower latency at a 10M-item scale via ANN retrieval. On MovieLens-
1M, RetrievalFormer attains Recall@20 of 0.337. In cold-start experiments where
entire items and all of their interactions are held out during training, Retrieval-
Former successfully recommends completely unseen items from their features in
a leave-one-out cold (LOOC) protocol with zero item leakage between training
and evaluation, in which ID-softmax transformer baselines cannot produce scores
for such items at all, and it outperforms a strong content-based baseline on a 100%
cold-start production dataset. Our approach enables practical deployment of ef-
ficient recommendations at scale, offering a compelling trade-off between model
accuracy and serving efficiency.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21558
Loading