Keywords: Treatment rating, Unpaired preference learning, A/B Testing, Ecommerce content rating
TL;DR: We show how A/A data can be used to improve LLM ratings of treatment/content
Abstract: A/B testing to evaluate user preferences and engagement is a cornerstone of the modern digital landscape. However, in the current era, the feedback cycle is considerably shortened while the experimentation space expands significantly, necessitating novel and efficient ways to assess user engagement. A/A testing, which compares identical content variants, offers a complementary approach by establishing baselines for engagement metrics and identifying natural variability in user behavior. However, A/A tests inherently lack paired samples, limiting their direct applicability to standard preference alignment methods, which require positive and negative samples for the same context. To address this gap, we propose a novel utility theory framework that enables the integration of unpaired A/A data into content evaluation systems. By translating Large Language Model (LLM) rewards into a utility framework, our approach allows for the incorporation of A/A test results, into predictive models.
Submission Number: 662
Loading