Abstract: We propose a domain-adapted reward model that works alongside
an Offline A/B testing system for evaluating ranking models. This
approach effectively measures reward for ranking model changes in
large-scale Ads recommender systems, where model-free methods
like IPS are not feasible. Our experiments demonstrate that the
proposed technique outperforms both the vanilla IPS method and
approaches using non-generalized reward models.
Loading