TestAgent: An Adaptive and Intelligent Expert for Human Assessment

Junhao Yu; Yan Zhuang; Yuxuan Sun; Weibo Gao; Qi Liu; Mingyue Cheng; Zhenya Huang; Enhong Chen

TestAgent: An Adaptive and Intelligent Expert for Human Assessment

Junhao Yu, Yan Zhuang, Yuxuan Sun, Weibo Gao, Qi Liu, Mingyue Cheng, Zhenya Huang, Enhong Chen

27 Sept 2024 (modified: 09 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: TestAgent, Large Language Model, Adaptive Testing, Personalized Testing，Intelligent Assessment

TL;DR: We propose a conversational testing approach using Large Language Models to reduce test length and improve accuracy and user experience in adaptive assessments.

Abstract: Accurately assessing internal human states is critical for understanding their preferences, providing personalized services, and identifying challenges in various real-world applications. Originating from psychology, adaptive testing has become the mainstream method for human measurement. It customizes assessments by selecting the fewest necessary test questions (e.g., math problems) based on the examinee's performance (e.g., answer correctness), ensuring precise evaluation. However, current adaptive testing methods still face several challenges. The mechanized nature of most adaptive algorithms often leads to guessing behavior and difficulties in addressing open-ended questions. Additionally, subjective assessments suffer from noisy response data and coarse-grained test outputs, further limiting their effectiveness. To move closer to an ideal adaptive testing process, we propose TestAgent, a large language model (LLM)-empowered adaptive testing agent designed to enhance adaptive testing through interactive engagement. This marks the first application of LLMs in adaptive testing. To ensure effective assessments, TestAgent supports personalized question selection, captures examinees' response behavior and anomalies, and provides precise testing outcomes through dynamic, conversational interactions. Extensive experiments on psychological, educational, and lifestyle assessments demonstrates that our approach achieves more accurate human assessments with approximately 20\% fewer test questions compared to state-of-the-art baselines. In actual tests, it received testers' favor in terms of speed, smoothness, and other two dimensions.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8737

Loading