Learning Parametric Distributions from Samples and Preferences

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY-SA 4.0
TL;DR: Preference feedback can significantly enhance parameter estimation in continuous parametric distributions, achieving faster convergence rates and lower variance compared to sample-only methods.
Abstract: Recent advances in language modeling have underscored the role of preference feedback in enhancing model performance. This paper investigates the conditions under which preference feedback improves parameter estimation in classes of continuous parametric distributions. In our framework, the learner observes pairs of samples from an unknown distribution along with their relative preferences depending on the same unknown parameter. We show that preferences-based M-estimators achieve a better asymptotic variance than sample-only M-estimators, further improved by deterministic preferences. Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $\mathcal{O}(1/n)$---a significant improvement over the $\Theta(1/\sqrt{n})$ rate attainable with samples alone. Next, we establish a lower bound that matches this accelerated rate; up to problem-dependent constants. While the assumptions underpinning our analysis are restrictive, they are satisfied by notable cases such as Gaussian or Laplace distributions for preferences based on the log-probability reward.
Lay Summary: Many AI systems today learn from examples, but newer methods are starting to use preferences---comparisons between options---to improve learning. We asked: When and how can preference feedback help models learn faster? In our study, we built a framework where a learner sees pairs of examples and is told which one is better. We found that this kind of feedback can significantly sharpen the model's estimates, especially when preferences are consistent and deterministic. In the best case, learning becomes much faster: the error drops much more quickly than when using examples alone. We also proved that this speed-up is the best possible under certain conditions. While our results rely on specific assumptions, they hold in important practical cases---like when AI models use scores based on how likely something is to happen.
Link To Code: https://github.com/tml-epfl/learning-parametric-distributions-from-samples-and-preferences
Primary Area: Theory->Learning Theory
Keywords: Statistical learning, Continuous parametric distributions, Preference feedback, Estimation error rate
Submission Number: 6981
Loading