Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Benchmark, Combinatorial Optimization, Preference Constraints, Reasoning, College Admission Problem
Abstract: Recent advances in reasoning with large language models (LLMs) have shown strong performance on complex tasks in mathematics, including combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLMs accessible tools for non-experts. However, applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored. In this work, we introduce a novel benchmark of 369 instances of the College Admission Problem to evaluate LLMs across key dimensions: feasibility, stability, and optimality. We employ this benchmark to assess the performance of several open-weight LLMs. Results reveal that while LLMs can approximate certain constraints, they struggle to consistently meet all evaluation criteria, but reasoning LLMs significantly outperform base ones. Performance improvements in one metric often come at the cost of another, while an iterative prompting strategy for LLMs with auto-generated feedback isn't always monotone. More precisely, while it can slightly improve performance with their best attempt, the last one can be significantly worse than with any feedback.
Submission Number: 43
Loading