Modeling of learning curves with applications to POS tagging

Manuel Vilares Ferro, Víctor Manuel Darriba Bilbao, Francisco José Ribadas Pena

27 Jun 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: We introduce an algorithm to estimate the evolution of accuracy in part-of-speech tagging on the whole of a training corpus, based on the results obtained from a portion of the latter. The technique approximates iteratively the vallue that we seek in the position desired, independently of the statistical model and dataset used. The process proves to be formally correct with respect to our working specifications and includes a stable stopping criterion. This allows the user to fix a reliable convergence threshold with respect to the accuracy finally achievable. Our aim is to evaluate the training effort, supporting decision making in order to reduce the need for both human and computational resources during tagger generation. The proposal is of interest in at least three operational procedures. The first is the anticipation of accuracy gain, with the purpose of measuring how much work is needed to achieve a certain level of performance. The second relates the comparison between taggers at training time, with the objective of completing this task only for the tool that predictably better suits our requirements. The prediction of accuracy is also a valuable item of information for the customization of the tagger, for example to select the tag-set, since we can estimate in advance its impact on both the performance and the development costs. The experiments corroborate our initial expectations.

0 Replies