Keywords: Fine-Grained RLHF, Learning from Label Proportions, Multiple Instance Learning, Fine-Grained Labels, Sentence Labels
Abstract: Fine-Tuning of LLMs using RLHF / RLAIF has been shown as a critical step to improve the performance of LLMs in complex generation tasks. In such methods, typically the responses are sampled from LLMs and human or model feedback is provided at the response level. The feedback is then used to align the LLMs to prefer decoding paths that will agree with the human feedback. Recent works Amplayo et al. [2022], Wu et al. [2023] indicate that sentence-level labels provide more accurate and interpretable feedback for LLM optimization. In this work, we propose FRACTAL a suite of models to disaggregate response-level labels into sentence-level (pseudo-)labels through a Multiple Instance Learning (MIL) formulation, novel usage of prior information and maximum likelihood calibration. We perform close to 2000 experiments across 6 datasets and 4 tasks that show that FRACTAL can reach up to 93\% of the performance of the fully supervised baseline while requiring only around 10\% of the gold labels.
Furthermore, in a downstream eval, employing these sentence-level pseudo scores in RLHF on the Question Answering task leads to 6\% improved performance. Our work is the first to develop response-level feedback to sentence-level scoring techniques, leveraging sentence-level prior information, along with comprehensive evaluations on multiple tasks as well as end-to-end finetuning evaluation.
Submission Number: 71
Loading