Multiple Choice Learning of Low-Rank Adapters for Language Modeling

Victor Letzelter; Hugo Malard; Mathieu Fontaine; Gaël Richard; Slim Essid; Andrei Bursuc; Patrick Perez

Multiple Choice Learning of Low-Rank Adapters for Language Modeling

Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Perez

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time.

Abstract: We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple ``futures'' may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the winner-takes-all loss to efficiently handle ambiguity through Low-Rank Adaptation. We provide a theoretical interpretation of applying MCL to language modeling, assuming the data is generated from a mixture of distributions. We illustrate the proposed approach using mixtures of Markov chains. We then demonstrate with experiments on audio and visual captioning, as well as machine translation, that our method achieves high diversity and relevance in generated outputs. We release the code for applying LoRA-MCL to a wide range of language models.

Lay Summary: Predicting what a person will say next or describing the content of an audio or visual scene with text is difficult, if not impossible, to do with perfect accuracy. When the context is not informative enough, external factors may lead to different scenarios of plausible text continuations. To address this, we trained multiple versions of a language model, each specializing in a different type of answer. This is done by creating a competition between the models during training, where only the version that performs best on a given example is updated. To keep this computationally affordable, rather than training entirely separate models, we only adjust a small, targeted portion of each model's parameters. We tested this on tasks such as describing sounds, captioning images, and translating text, showing that each specialized model captures a different aspect of the answer, together producing outputs that are both high quality and diverse.

Originally Submitted Supplementary Material: zip

Link To Code: https://github.com/Victorletzelter/LoRA-MCL

Primary Area: Probabilistic Methods

Keywords: Multiple Choice Learning, Winner-takes-all, Diversity, Ambiguity, Language Modeling, Low-rank adapters

Originally Submitted PDF: pdf

Submission Number: 18304

Loading