Name Tagging with Word Clusters and Discriminative Training

Scott Miller, Jethran Guinness, Alex Zamanian

2004 (modified: 16 Jul 2019)HLT-NAACL 2004Readers: Everyone

Abstract: We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is encoded in features that are incorporated in a discriminatively trained tagging model. Active learning is used to select training examples. We evaluate the technique for named-entity tagging. Compared with a state-of-the-art HMM-based name finder, the presented technique requires only 13% as much annotated data to achieve the same level of performance. Given a large annotated training set of 1,000,000 words, the technique achieves a 25% reduction in error over the state-of-the-art HMM trained on the same material.

0 Replies