COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

ACL ARR 2026 January Submission10094 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Collective Intent Understanding, LLM Evaluation
Abstract: Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill \textbf{Collective Intent}—the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions—remains largely unexplored. To bridge this gap, we introduce \textsc{\bench}, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, \textsc{\bench} operationalizes intent as a \textbf{hierarchical cognitive structure} (from explicit scenarios to deep causal reasoning). We implement a robust evaluation pipeline that synergizes rule-based method with an LLM-as-the-Judge approach. This framework incorporates \textsc{\tree} for hierarchical cognitive structuring and retrieval-augmented verification (\textsc{\rag}) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions—\textit{depth, breadth, informativeness, and correctness}—reveals that while current models handle surface-level aggregation, they struggle with the analytical depth required for complex intent synthesis. \textsc{\bench} establishes a new standard for advancing LLMs from passive instruction-followers to expert-level analytical agents capable of deciphering the collective voice of the real world.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: english
Submission Number: 10094
Loading