ConceptPsy: A comprehensive benchmark suite for hierarchical psychological concept understanding in LLMs

Published: 01 Jan 2025, Last Modified: 08 Apr 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Propose a metric to quantify low concept coverage in Chinese MMLU benchmarks.•Propose a conceptually comprehensive college-level psychological benchmark.•Propose using concept-wise labels for fine-grained model evaluation.•Fine-grained results reveal weaknesses of models on specific concepts.•Fine-grained results offer insights beyond subject-level, aiding model refinement.
Loading