Choice is what matters after Attention
Abstract: The decoding strategies widely used in large language models (LLMs) today are Top-$p$ Sampling and Top-$k$ Sampling, both of which are methods situated between greedy decoding and random sampling. Inspired by the concept of loss aversion from prospect theory in behavioral economics, and the endowment effect as highlighted by Richard H. Thaler, the 2017 Nobel Memorial Prize in Economic Sciences — particularly the principle that "the negative utility of an equivalent loss is approximately twice the positive utility of a comparable gain" — we have developed a new decoding strategy called Loss Sampling. We have demonstrated the effectiveness and validity of our method on several LLMs, including Llama-2, Llama-3 and Mistral. Our approach improves text quality by 4-30\% across four pure text tasks while maintaining diversity in text generation. Furthermore, we also extend our method to multimodal large models (LMs) and Beam Search, demonstrating the effectiveness and versatility of Loss Sampling with improvements ranging from 1-10\%.
Submission Number: 109
Loading