Refining Natural Language Inferences Using Cross-Document Structure Theory

Published: 01 Jan 2024, Last Modified: 14 May 2025ICCCI (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this study, we compare Natural Language Inference (NLI) and Cross-Document Structure Theory (CST) in transfer learning setting. While NLI datasets have gained considerable attention in the past, one can find artifacts that potentially compromise performance in subsequent tasks. On the other hand, CST provides a more nuanced methodology for identifying semantic dependencies but has not garnered significant recognition in the scientific community. We evaluate language models within the framework of CST theory and investigate their performance on a collection of downstream tasks under transfer learning conditions. Our research indicates that despite relatively smaller size, CST datasets prove to be a more effective basis compared to existing NLI resources. Furthermore, we present new high-quality bilingual NLI and CST datasets for Polish and English languages. All the models and datasets can be accessed via HuggingFace (https://huggingface.co/datasets/clarin-knext/cst_directed_datasets) platform.
Loading