Abstract: Constrained action-based decision-making is one of the most challenging decision-making problems. It refers to a scenario where an agent takes action in an environment not only to maximize the expected cumulative reward but where it is subject to certain action-based constraints; for example, an upper limit on the total number of certain actions being carried out. In this work, we construct a general data-driven framework called Constrained Action-based Partially Observable Markov Decision Process (CAPOMDP) to induce effective pedagogical policies. Specifically, we induce two types of policies: CAPOMDPLG using learning gain as reward with the goal of improving students' learning performance, and CAPOMDPTime using time as reward for reducing students' time on task. The effectiveness of CAPOMDPLG is compared against a random yet reasonable policy and the effectiveness of CAPOMDPTime is compared against both a Deep Reinforcement Learning induced policy and a random policy. Empirical results show that there is an Aptitude-Treatment Interaction effect: students are split into High vs. Low based on their incoming competence; while no significant difference is found among the High incoming competence groups, for the Low groups, students following CAPOMDPTime indeed spent significantly less time than those using the two baseline policies and students following CAPOMDPLG significantly outperform their peers on both learning gain and learning efficiency.
Loading