Keywords: LLM; Recommendation; Generative Recommendation
Abstract: Recent advances in large language models (LLMs) have sparked a new line of *generative recommendation*, in which the recommender **autoregressively** outputs a sequence of *Semantic IDs* (SIDs)—item identifiers that live in a structured SID space—rather than ranking a pre-selected candidate list of item titles in natural language.
Although the prevailing *supervised fine-tuning followed by reinforcement learning* (SFT-then-RL) pipeline improves performance, it still fails to model the SID space adequately:
1. **Superficial SID understanding.**
SFT often ends up memorising a closed SID vocabulary instead of learning its semantics.
2. **Coarse-grained rewards.**
Rule-based RL treats all incorrect SIDs equally, ignoring the varying difficulty of different errors.
To address these limitations, we propose **SINGER** (*SID-Navigated GEnerative Recommender*), a framework that injects fine-grained SID knowledge into every training phase.
SINGER consists of two key components:
1. **Full-Process SID Alignment**
Alignment objectives are embedded in both SFT and RL stages, deepening the model’s grasp of the SID space.
2. **SID-Navigated Reinforcement Learning**
- *SID-level rewards* grade each trajectory by the deepest correctly matched SID layer.
- *SID-prefix curriculum sampling* supplies partial prefixes as intermediate guidance for hard cases.
Experiments on public benchmarks show that SINGER consistently outperforms strong sequential, generative, and recent LLM-based baselines across standard metrics, confirming the value of combining hierarchical SID signals with the world knowledge of pretrained LLMs.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 870
Loading