Abstract: In this paper, approaches to minimizing the efforts involved in creating annotated instances when training supervised automatic short answer scoring (ASAS) systems are explored since training supervised ASAS systems require a huge amount of annotated sets and also annotating training sets is a time-consuming, expensive and tedious task. To address the problem, we proposed a method that helps the annotators in selecting a small proportion of short answers to be annotated. The system utilizes a semantic intelligent k-means model which will be later used to train the scoring engine using semantic-based locality sensitive hashing (Sem-LSH). Sem-LSH uses annotated answers to score the remaining unscored answers. When a new answer arrives, Sem-LSH uses the same set of hash functions to map it into buckets and retrieves all answers from the buckets using the defined similarity function and similarity threshold value. The score of the closest answers among the retrieved ones will be assigned to each new answer using Sem-LSH. To evaluate the performance of the proposed approach, we experimented with the data sets provided by the Hewlett Foundation at Kaggle for ASAS competition. The performance of the Sem-LSH ASAS system was evaluated and compared with two other baselines. The best accuracy of 90.2% was achieved by using only 40% of short training responses selected with semantic intelligent k-means and manually annotated by the teacher.
0 Replies
Loading