User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
Keywords: conversational AI, LLM, personalization, user modeling
TL;DR: We propose a pipeline-agnostic method that model user's preferences as vectors through weak feedback during interactions, and results show that the vectors correctly represent preference and improve retrieval.
Abstract: We present a frozen-backbone user modeling framework that
represents each user as a low-dimensional dual vector (long-term
and short-term) in a shared preference space, updated online from
weak scalar rewards via REINFORCE—without modifying any backbone
model. The framework is pipeline-agnostic: any feedback reducible
to a scalar reward can drive user-vector learning. Preferences are
extracted as structured condition--action rules, stored in a
retrieval-augmented memory, and the user vector modulates retrieval
scores to surface the most relevant preferences for each query.
We evaluate on \textsc{MultiSessionCollab}, an online multi-session
benchmark with LLM-simulated users who enforce rich style
preferences, across three task domains (math-hard, math-500,
bigcodebench) with $60$ user profiles over $60$ sessions each. Our
RAG+Vector agent achieves the highest task success ($55.2\%$) among
six system modes and significantly reduces interaction friction
versus a Reflection baseline: timeout rate drops by $2.4$\,pp
($p = 0.046$) and user effort by $6.7\%$ ($p = 0.021$), yielding
the highest interaction efficiency ($2.83$ successes per $1{,}000$
user tokens). Analysis of the learned vectors confirms that the
dual-vector design induces meaningful preference geometry: long-term
vectors significantly associate with cross-user preference overlap
($p = 0.006$), while short-term vectors do not ($p = 0.586$),
validating the separation of stable user identity from
session-specific context.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 86
Loading