Abstract: With the outburst of available data, Definition Extraction has emerged as an important technique as it is a precursor to many other tasks like ontology generation, glossary creation, and question answering. Definition Extraction is most commonly treated as a binary classification problem of definitional and non-definitional sentences. Traditional techniques for definition extraction involve rule-based approaches, which did not yield good results because of the overwhelming complexity of natural language. Incorporating linguistic information via syntactic dependencies turned out to be useful in identifying sentences containing a definition. In this paper, we explore the performance of Transformer based architectures, like Bidirectional Encoder Representations from Transformers (BERT), which produce state-of-the-art results on many Natural Language Processing (NLP) tasks. Experiments on an annotated dataset of definitional sentences prove that BERT obtains results comparable to the state-of-the-art benchmark. In further experiments, we look under the hood of BERT, trying to figure out the reason for its success. Analyzing the outputs of the attention heads reveals that BERT captures not only syntactic dependencies but many other relevant dependencies within the words of the sentence, which proves beneficial in Definition Extraction.
External IDs:doi:10.1007/978-981-15-6318-8_13
Loading