Landmark-Guided Policy Optimization for Multi-Objective Language Model Selection

Landmark-Guided Policy Optimization for Multi-Objective Language Model Selection

ICLR 2026 Conference Submission16806 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AutoML, metalearning, multi-objective optimization, model selection, reinforcement learning, large language models

TL;DR: A novel and open-source multi-objective AutoML framework called LaMPS that quickly identifies and produces fine-tuned, near–Pareto-optimal pretrained language models for a task-specific dataset.

Abstract: Selecting a pretrained large language model (LLM) to fine-tune for a task-specific dataset can be time-consuming and costly. With several candidate models available to choose from, varying in size, architecture, and pretraining data, finding the best often involves extensive trial and error. In addition, the "best" model may not necessarily be the one with the lowest test loss, as practical considerations such as deployment costs, inference throughput, and limited search budgets might also play crucial roles. To address this, we introduce LAMPS (LAnguage Model Pareto Selection), a novel and open-source multi-objective AutoML framework that quickly identifies near-Pareto-optimal pretrained LLMs for a task-specific dataset. It is based on two key ideas: (1) landmark fine-tuning, which generates early performance indicators of the candidate models, and (2) meta-learning via reinforcement learning, which learns an effective selection policy from historical performance data (a meta-dataset). Our results show that, on held-out datasets, LAMPS reduces search time by an average of 71% compared to exhaustive search, while still covering more than 98% of the optimal target space hypervolume.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 16806

Loading