Analyzing Pedagogical Quality and Efficiency of LLM Responses with TA Feedback to Live Student Questions
Abstract: While Large Language Models (LLMs) have emerged as promising methods for automated student question-answering, guaranteeing consistent instructional effectiveness of the response remains a key challenge. Therefore, there is a need for fine-grained analysis of State-Of-The-Art (SOTA) LLM-powered educational assistants.This work evaluates Edison: a Retrieval Augmented Generation (RAG) pipeline based on GPT-4. We determine the pedagogical effectiveness of Edison's responses through expert Teaching Assistant (TA) evaluation of the answers. After the TA edits and improves the response, we analyze the original LLM response, the TA-assigned ratings, and the TA's edits to ascertain the essential characteristics of a high-quality response. Some key insights of our evaluation are as follows: (1) Edison can give relevant and factual answers in an educational style for conceptual and assignment questions, (2) Most TA edits are deletions made to improve the style of the response, and finally (3) Our analysis indicates that Edison improves TAs' efficiency by reducing the effort required to respond to student questions.
Loading