Domain Adaptation for Cold-Start Users in Sequential Recommendation

23 Feb 2026 (modified: 05 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sequential recommendation tracks users' preferences over time based on users' historical activities and makes prediction on their next most probable action. However, this approach faces limitations when dealing with cold-start users who possess minimal interaction data, leading to difficulty in learning their preferences. To address this challenge, by taking regular users with longer interaction histories and cold-start users as two domains, this paper introduces domain adaptation techniques to narrow the performance gap caused by knowledge shifts in domains. We propose a dual-transformer framework with separate models for long (source) and short (target) sequences, collaboratively trained with shared item embeddings. To enable effective knowledge transfer, we introduce an emulated target domain by sampling short sequences from the source, and apply contrastive learning to align their contextual representations. To further improve adaptation under complex knowledge shifts, we reduce item popularity bias and incorporate user similarity into the contrastive loss. Experiments on five public datasets show consistent improvements over strong baselines, demonstrating the robustness of our approach under both length shifts and compounded shifts involving item distribution changes. Our code can be found at https://anonymous.4open.science/r/DACSR-1.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have revised the paper in the following key aspects. For ease of reference, the main revisions are highlighted in red. First, we expanded the related work to better position our problem and method in Section 2. We added a discussion of cross-domain sequential recommendation (CDSR), including EMCDR, CDRIB, AMID, HeroGraph, MIFN, and BiTGCF. We clarify that most CDSR methods rely on shared users, item-side knowledge, or auxiliary content, while our setting assumes disjoint source and target user populations. We also added a dedicated positioning paragraph explaining how our DACSR differs from implicit combined training, transfer learning, meta-learning, contrastive sequential recommendation, and open-world CDSR. References are updated accordingly. Second, we clarified the role of the emulated target domain in Section 3.1. The revised text explains that it serves two purposes: length-invariant user alignment between each user's full source sequence and simulated short sequence, and cross-user similarity transfer from reliable long-history source users to the short-sequence regime. Third, we added a discussion of alternative target-aware sampling in Section 3.3. While the main method uses uniform random subsampling, we mention a QUILT-inspired temporally stratified sampling extension. Fourth, we refined the popularity-weighted user representation in Section 3.4. We now explicitly define the bounded inverse-popularity weight, explain its range and maximum amplification ratio, and clarify that it rebalances frequent and rare items without allowing unstable weights or collapsing representations toward rare items. Fifth, we strengthened the experimental setup and baselines in Section 4.1 and 4.2. We clarify that the random split should be interpreted as a controlled length-reduction experiment rather than a realistic temporal cold-start scenario. We also add CL4SRec as a contrastive-learning baseline and AMID as an open-world CDSR/domain-adaptation baseline that does not require fully shared users. Sixth, we updated the main performance tables and captions in Table 2 and 3. We report CL4SRec and AMID results, clarify how the improvement is computed, and explain the tuning protocol for CL4SRec and implementation choice for AMID. We also added analysis explaining why CL4SRec helps in the random split but is less consistent under the time-based split, and why AMID performs poorly when target users have only a few interactions in Section 4.3. Seventh, we added full-item ranking evaluation under the time-based split in Table 13 and 14 in Appendix E. These supplementary results show that the conclusions are consistent with that of the 100-negative evaluation protocol. Eighth, we expanded the ablation and model-variant discussion in Section 4.4. We clarify that DACSR++ is generally strongest, but Kindle is an exception likely due to weaker source-domain user similarity. We support this with a user-similarity analysis in the appendix G. Table 6 is updated to include more baseline models. Ninth, we added a computational complexity in Section 5. This section analyzes the cost of in-batch similarity computation, the dual-encoder parameter overhead, training-time complexity, empirical runtime overhead, inference cost, and additional hyperparameter tuning cost. Tenth, we added complementary experiments on ML-1M in Appendix H. We explain why ML-1M does not support a meaningful strict timestamp-based cold-start split, justify the use of a length-based split, and report full-item ranking results as a complementary short-history benchmark. Finally, we added a QUILT-inspired target-aware sampling extension in the Appendix I. This section describes how temporal segments are assigned different sampling probabilities, reports results under both random and time-based splits, and discusses the additional Bayesian-optimization tuning cost.
Assigned Action Editor: ~Yi_Zhou2
Submission Number: 7637
Loading