PesTest: A Comprehensive Benchmark for Psychological Emotional Support Capability of Large Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large language models with good psychological emotional support capabilities can provide users with effective psychological comfort and help users maintain a good psychological environment. However, there is currently a lack of evaluation datasets with a comprehensive psychological system for the psychological emotional support capabilities of large language models. In this paper, we propose PesTest, a large language model psychological emotional support capability assessment benchmark with comprehensive topics and rich task types. PesTest has a comprehensive psychological system, specifically including 7 major categories and 40 sub-categories of topics. We use PesTest to evaluate the performance of existing large language models on psychological emotional support tasks and discover their deficiencies on certain topics, making up for the shortcomings in comprehensiveness of previous evaluations. Furthermore, we fine-tune the model using PesTest's training set and achieve better results than the original model on the test set, which proves the effect of PesTest on improving the psychological emotional support capabilities of large language models and provides a reference for future research. We will make our benchmark publicly available at Anonymous_Link.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data resources
Languages Studied: Chinese, English
Preprint Status: We plan to release a non-anonymous preprint in the next two months (i.e., during the reviewing process).
A1: yes
A1 Elaboration For Yes Or No: limitations
A2: yes
A2 Elaboration For Yes Or No: 3 & Ethics Statement
A3: yes
A3 Elaboration For Yes Or No: abstract and introduction
B: yes
B1: yes
B2: yes
B3: yes
B4: yes
B5: yes
B6: yes
C: yes
C1: yes
C2: yes
C3: yes
C4: yes
D: yes
D1: yes
D2: no
D4: yes
E: no
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview