Short-form Video Needs Long-term Interests: An Industrial Solution for Serving Large User Sequence Models
Abstract: Sequential models are invaluable for powering personalized recommendation systems. In the context of short-form video (SFV) feeds, where user behavior history is typically longer, systems must be able to understand users’ long-term interests. However, deploying large sequence models to extensive web-scale applications faces challenges due to high serving cost. To address this, we propose an industrial framework designed for efficiently serving large user sequence models. Specifically, the proposed infrastructure decouples serving of the user sequence model and the main recommendation model, with the user sequence model being served offline (asynchronously) with periodical refresh. The proposed infrastructure is also model-agnostic; thus, it can be used to support any type of user sequence models (even LLMs) with controllable costs. Empirical results show that large user models deployed with our framework significantly and consistently enhance the quality of the main recommendation model with minimal serving costs increase.
Loading