Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic TextsDownload PDFOpen Website

2007 (modified: 10 Nov 2022)EMNLP-CoNLL 2007Readers: Everyone
Abstract: We demonstrate an approach for inducing a tagger for historical languages based on existing resources for their modern varieties. Tags from Present Day English source text are projected to Middle English text using alignments on parallel Biblical text. We explore the use of multiple alignment approaches and a bigram tagger to reduce the noise in the projected tags. Finally, we train a maximum entropy tagger on the output of the bigram tagger on the target Biblical text and test it on tagged Middle English text. This leads to tagging accuracy in the low 80’s on Biblical test material and in the 60’s on other Middle English material. Our results suggest that our bootstrapping methods have considerable potential, and could be used to semi-automate an approach based on incremental manual annotation.
0 Replies

Loading