Reasoning-to-Encoder Distillation for Recommendation

07 Sept 2025 (modified: 04 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: recommender systems, large language models, distillations
TL;DR: To address slow and flawed LLM reasoning in recommendations, we generate oracle-guided reasoning and distill it into a Text Encoder. Our resulting encoder-only system is faster, cheaper, and more accurate than state-of-the-art methods.
Abstract: Large Language Models (LLMs) have significantly advanced recommendation systems by leveraging their extensive knowledge and reasoning skills. However, applying them to large-scale systems faces two main problems: prohibitive inference latency, especially in autoregressive models, and the generation of misaligned reasoning that is not grounded in actual user preferences. Existing distillation methods attempt to solve these problems but often fall short either by failing to transfer the essential reasoning capabilities of LLMs or by distilling flawed, misaligned reasoning, which compromises the performance and reliability of the student model. To address these challenges, we introduce a new framework, Reasoning-to-Encoder Distillation (R2END). This framework is designed to effectively transfer an LLM’s complex reasoning into an efficient, embedding-based architecture. To ensure the distilled reasoning is grounded in actual user behavior, we employ an “oracle-guided” process where the ground-truth item is provided to the LLM to generate a well-aligned reasoning. This reasoning is then distilled into a text encoder, which learns to create a “reasoning-infused” embedding from user history, eliminating the need for the LLM during inference. Extensive experiments on three benchmark datasets demonstrate that our method substantially outperforms state-of-the-art distillation-based methods in terms of both accuracy and diversity of recommendations. Most importantly, R2END drastically reduces inference latency and computational costs, demonstrating that it provides a practical and efficient approach to creating scalable recommendation systems that benefit from the deep reasoning capabilities of LLMs.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 2703
Loading