Analyzing Pedagogical Quality and Efficiency of LLM Responses with TA Feedback to Live Student Questions

Published: 01 Jan 2025, Last Modified: 15 May 2025SIGCSE (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While Large Language Models (LLMs) have emerged as promising methods for automated student question-answering, guaranteeing consistent instructional effectiveness of the response remains a key challenge. Therefore, there is a need for fine-grained analysis of State-Of-The-Art (SOTA) LLM-powered educational assistants.This work evaluates Edison: a Retrieval Augmented Generation (RAG) pipeline based on GPT-4. We determine the pedagogical effectiveness of Edison's responses through expert Teaching Assistant (TA) evaluation of the answers. After the TA edits and improves the response, we analyze the original LLM response, the TA-assigned ratings, and the TA's edits to ascertain the essential characteristics of a high-quality response. Some key insights of our evaluation are as follows: (1) Edison can give relevant and factual answers in an educational style for conceptual and assignment questions, (2) Most TA edits are deletions made to improve the style of the response, and finally (3) Our analysis indicates that Edison improves TAs' efficiency by reducing the effort required to respond to student questions.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview