XES3G5M: A Knowledge Tracing Benchmark Dataset with Auxiliary Information

XES3G5M: A Knowledge Tracing Benchmark Dataset with Auxiliary Information

NeurIPS 2023 Track Datasets and Benchmarks Submission101 Authors

Published: 26 Sept 2023, Last Modified: 03 Feb 2024NeurIPS 2023 Datasets and Benchmarks PosterEveryoneRevisionsBibTeX

Keywords: knowledge tracing, benchmark, online education

TL;DR: We construct a novel knowledge tracing dataset, i.e., XES3G5M, which is made up of 18,066 students, 7,652 questions, 865 knowledge components(KCs) and 5,549,635 interactions.

Abstract: Knowledge tracing (KT) is a task that predicts students' future performance based on their historical learning interactions. With the rapid development of deep learning techniques, existing KT approaches follow a data-driven paradigm that uses massive problem-solving records to model students' learning processes. However, although the educational contexts contain various factors that may have an influence on student learning outcomes, existing public KT datasets mainly consist of anonymized ID-like features, which may hinder the research advances towards this field. Therefore, in this work, we present, \emph{XES3G5M}, a large-scale dataset with rich auxiliary information about questions and their associated knowledge components (KCs)\footnote{\label{ft:kc}A KC is a generalization of everyday terms like concept, principle, fact, or skill.}. The XES3G5M dataset is collected from a real-world online math learning platform, which contains 7,652 questions, and 865 KCs with 5,549,635 interactions from 18,066 students. To the best of our knowledge, the XES3G5M dataset not only has the largest number of KCs in math domain but contains the richest contextual information including tree structured KC relations, question types, textual contents and analysis and student response timestamps. Furthermore, we build a comprehensive benchmark on 19 state-of-the-art deep learning based knowledge tracing (DLKT) models. Extensive experiments demonstrate the effectiveness of leveraging the auxiliary information in our XES3G5M with DLKT models. We hope the proposed dataset can effectively facilitate the KT research work.

Supplementary Material: pdf

Submission Number: 101

Loading