Keywords: llm, finetuning, task mixtures, pointwise mutual information
Abstract: The performance of fine-tuned large language models (LLMs) hinges critically
on the composition of the training mixture. However, selecting an optimal blend
of task datasets remains a largely manual, heuristic-driven process, with practitioners
often relying on uniform or size-based sampling strategies. We introduce
TaskMixPGM, a principled and scalable framework for mixture optimization
that selects continuous task proportions by minimizing an energy function
over a Markov Random Field (MRF). Task relationships are modeled using behavioral
divergences—such as Jensen-Shannon Divergence and Pointwise Mutual Information—computed
from the predictive distributions of single-task fine-tuned models. Our
method yields a closed-form solution under simplex constraints and provably
balances representativeness and diversity among tasks. We provide theoretical
guarantees, including weak submodularity for budgeted variants, and demonstrate
consistent empirical improvements on Llama-2 and Mistral across evaluation suites
such as MMLU and BIG-Bench-Hard. Beyond performance, TaskMixPGM offers
interpretable insights into task influence and mixture composition, making it
a powerful tool for efficient and robust LLM fine-tuning.
Supplementary Material: pdf
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 23941
Loading