Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment

Diana Pérez; Enrique Alfonseca; Pilar Rodríguez

Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment

Diana Pérez, Enrique Alfonseca, Pilar Rodríguez

Published: 01 Jan 2004, Last Modified: 18 May 2025LREC 2004EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We have applied BLEU (Papineni et al., 2001), a method originally designed to evaluate automatic Machine Translation systems, in assessing short essays written by students. We study how much BLEU scores correlate to human scorings and other keyword-based evaluation metrics. We conclude that, although it is only applicable to a restricted category of questions, BLEU attains better results than other keyword-based procedures. Its simplicity and language-independence makes it a good candidate to be combined with other well-studied computer assessment scoring procedures.

Loading