Very-Long-Distance Dependency Capturing Evaluation via Language Modeling Based on Gender Consistency
Abstract: Capturing Long-Distance Dependencies (LDDs) is crucial for NLP applications. However, the longest relation evaluated in early studies is only around 50 words, which is not sufficient to evaluate a model's ability in capturing Very-Long-Distance (VLD) dependencies. Recent work on capturing LDDs either is affected by the instruction following ability of language models or requires training on synthetic tasks unrelated to natural languages. In this paper, we present an approach to automatically constructing LDD test instances (as opposed to training examples) for any distance by mentioning an antecedent with singular number and a specific grammatical gender at the start of the first sentence, building the first sentence of arbitrary length by sampling plural nouns, and asking the pre-trained language model to predict a singular pronoun with the correct gender at the start of the next sentence. We evaluate the performance of LLMs and neural language models with different settings.
Loading