Generating Qualitative Descriptions of Diagrams with a Transformer-Based Language Model

Marco Schorlemmer, Mohamad Ballout, Kai-Uwe Kühnberger

Published: 01 Jan 2024, Last Modified: 20 May 2025Diagrams 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: To address the task of diagram understanding we propose to distinguish between the perception of the geometric configuration of a diagram from the assignment of meaning to the geometric entities and their topological relationships. As a consequence, diagram parsing does not need to assume any particular a priori interpretations of diagrams and their constituents. Focussing on Euler diagrams, we tackle the first of these subtasks—that of identifying the geometric entities that constitute a diagram (i.e., circles, rectangles, lines, arrows, etc.) and their topological relations—as an image captioning task, using a Vision Transformer for image recognition combined with language model GPT-2 to generate qualitative spatial descriptions of Euler diagrams with an encoder-decoder model. Due to the lack of sufficient high-quality data to train the pre-trained language model for this task, we describe how we generated a synthetic dataset of Euler diagrams annotated with qualitative spatial representations based on the Region Connection Calculus (RCC8). Results showed over 95% accuracy of the transformer-based language model in the generation of meaning-carrying RCC8 specifications for given Euler diagrams.