ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction

Nicolay Rusnachenko, Huizhi Liang, Maksim Kalameyets, Lei Shi

Published: 2024, Last Modified: 10 Aug 2024ECIR (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The escalating volume of textual data necessitates adept and scalable Information Extraction (IE) systems in the field of Natural Language Processing (NLP) to analyse massive text collections in a detailed manner. While most deep learning systems are designed to handle textual information as it is, the gap in the existence of the interface between a document and the annotation of its parts is still poorly covered. Concurrently, one of the major limitations of most deep-learning models is a constrained input size caused by architectural and computational specifics. To address this, we introduce ARElight\(^1\), a system designed to efficiently manage and extract information from sequences of large documents by dividing them into segments with mentioned object pairs. Through a pipeline comprising modules for text sampling, inference, optional graph operations, and visualisation, the proposed system transforms large volumes of text in a structured manner. Practical applications of ARElight are demonstrated across diverse use cases, including literature processing and social network analysis.(\(^1\)https://github.com/nicolay-r/ARElight)