Transforming User-Defined Criteria into Explainable Indicators with an Integrated LLM–AHP System

Transforming User-Defined Criteria into Explainable Indicators with an Integrated LLM–AHP System

ACL ARR 2026 January Submission8103 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automated Evaluation, Large Language Model, Explainable Reasoning, Aggregation Methods

Abstract: Evaluating complex texts across domains requires converting user-defined criteria into quantitative, explainable indicators, which is a persistent challenge in search and recommendation systems. Single-prompt LLM evaluations suffer from complexity and latency issues, while criterion-specific decomposition approaches rely on naive averaging or opaque black-box aggregation. We present an interpretable aggregation framework combining LLM scoring with the Analytic Hierarchy Process (AHP). Our method generates criterion-specific scores via LLM-as-judge, measures discriminative power using Hellinger distance, and derives statistically grounded weights through AHP pairwise comparison matrices. Experiments on Amazon review helpfulness prediction, summarization quality assessment, and depression-related text scoring demonstrate that our approach achieves high explainability and operational efficiency while maintaining predictive power comparable to black-box alternatives, making it suitable for latency-sensitive web services.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: educational applications, essay scoring

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Submission Number: 8103

Loading