LLM Ability to Answer University Student Questions in CS Trained Exclusively on Auto-Generated QA

Ann Stone, Samantha Jaehnig, Yuxuan Jiang, Boyuan Zheng

Published: 04 Nov 2024, Last Modified: 05 Nov 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: The integration of AI into the classroom is inevitable. Using AI chatbots in the classroom has the potential to assist students around-the-clock, making help more accessible. Chatbots and LLMs, though, require immense training. It may be easy to obtain some data for classes that already have QA forums, however not all courses have semesters of student data to lean on to train an LLM. Moreover, gaps in data may lead to gaps in a chatbot's knowledge. Just because a student hasn’t asked a specific type of question previously, doesn’t mean a future student won’t. The persistence of such knowledge gaps can lead to generation of hallucinative responses, as a result of an attempt to answer questions by an LLM with insufficient domain or contextual knowledge. Our project aims to improve the training of LLMs for CS courses that may not have sufficient data on their own. The main questions we are looking to answer are: Can we use an LLM to generate QA based on a given project/assignment specification? Is that generated QA sufficient to train an LLM that can answer student questions with high accuracy?