Abstract: A majority of the current automated evaluation tools focus on grading a program based only on functionally testing the outputs. This approach suffers both false positives (i.e. finding errors where there are not any) and false negatives (missing out on actual errors). In this paper, we present a novel system which emulates manual evaluation of programming assignments based on the structure and not the functional output of the program using structural similarity between the given program and a reference solution. We propose an evaluation rubric for scoring structural similarity with respect to a reference solution. We present an ML based approach to map the system predicted scores to the scores computed using the rubric. Empirical evaluation of the system is done on a corpus of Python programs extracted from the popular programming platform, HackerRank, in combination with programming assignments submitted by students undertaking an undergraduate Python programming course. The preliminary results have been encouraging with the errors reported being as low as 12 percent with a deviation of about 3 percent, showing that the automatically generated scores are in high correlation with the instructor assigned scores.
External IDs:dblp:conf/aiml2/VermaUSNC21
Loading