- Abstract: This paper improves upon the line of research that formulates named entity recognition (NER) as a sequence-labeling problem. We use so-called black-box long short-term memory (LSTM) encoders to achieve state-of-the-art results while providing insightful understanding of what the auto-regressive model learns with a parallel self-attention mechanism. Specifically, we decouple the sequence-labeling problem of NER into entity chunking, e.g., Barack_B Obama_E was_O elected_O, and entity typing, e.g., Barack_PERSON Obama_PERSON was_NONE elected_NONE, and analyze how the model learns to, or has difficulties in, capturing text patterns for each of the subtasks. The insights we gain then lead us to explore a more sophisticated deep cross-Bi-LSTM encoder, which proves better at capturing global interactions given both empirical results and a theoretical justification.
- Keywords: interpretability, sequence labeling, named entity recognition, LSTM, attention
- TL;DR: We provide insightful understanding of sequence-labeling NER and propose to use two types of cross structures, both of which bring theoretical and empirical improvements.