Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets
Abstract: Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLM agents both powerful and accessible tools for a wide range of users, including non-experts. However, the application of such agents to problems arising in operations research, particularly those at the intersection of combinatorial optimization and game theory that require domain expertise, remains underexplored. To address this gap, we introduce a benchmark of 369 instances for the College Admission Problem, a canonical many-to-one matching problem that requires reasoning about agents’ preferences, stability, feasibility, and optimality. We evaluate several open-weight LLM models, both reasoning-specialized and more traditional, defined here as models used without any dedicated reasoning mechanisms. Even though no prompt consistently offered the best performance, using strategies such as Chain-of-Thought, In-Context Learning and role-based prompting, reasoning LLMs reacted differently from the traditional ones. While the reasoning enhanced models significantly outperform traditional ones, they all struggle to meet all evaluation criteria consistently. Finally, we report the performances from iterative prompting with auto-generated feedback and show that they are not monotonic; they can peak early and then significantly decline in later attempts. Overall, this work offers a new perspective on model reasoning performance and the effectiveness of prompting strategies in combinatorial optimization problems with preferential constraints.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ian_A._Kash1
Submission Number: 7648
Loading