Improving LLM Generation with Inverse and Forward Alignment: Reward Modeling, Prompting, Fine-Tuning, and Inference-Time Optimization

Published: 10 Oct 2024, Last Modified: 28 Oct 2024Sys2-Reasoning PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inference-Time Optimization, Large Language Model Alignment
Abstract: Large Language Models (LLMs) are often characterized as samplers or generators in the literature, yet maximizing their capabilities in these roles is a complex challenge. Previous work has extensively explored the diverse applications of LLMs across various domains, including enhancing chat abilities, solving mathematical problems, adopting LLMs for evaluation, generating synthetic data, improving Bayesian optimization, and designing reward functions for reinforcement learning. Despite these advancements, key methods for improving LLM performance --- such as prompt optimization, in-context learning, supervised fine-tuning, and reinforcement learning from human feedback—are typically studied in isolation. In this work, we propose a unified optimization framework that encapsulates these diverse applications, providing a systematic approach to analyzing existing methods and uncovering potential improvements. We highlight (1) while LLMs \textbf{can} perform a wide range of tasks, truly \textbf{mastering} these tasks requires alignment, suggesting that any use of LLMs can benefit from alignment beyond mere sampling; (2) reward modeling is crucial for enhancing the effectiveness of LLMs, offering the \textbf{only viable path for inference-time optimization}; and (3) the choice of reward model depends on the specific \textbf{task properties and dataset availability}, necessitating careful consideration in its design.
Submission Number: 39
Loading