Bessarion: Medieval Greek Inscriptions on a Challenging Dataset for Vision and NLP Tasks

Published: 01 Jan 2024, Last Modified: 11 Nov 2024DAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a text and imaging dataset of Byzantine-era Medieval Greek inscriptions, suitable as a challenging testbed for Computer Vision and Natural Language Processing tasks. The lack of sizable related training sets, as well as difficulties related to the historical character and content of the inscriptions (natural wear of characters, systematic misspellings, etc.) make for a context where modern resource-hungry techniques are not straightforward to apply. We describe the dataset contents – images, geometric and text annotation, metadata – and discuss baselines for three Computer Vision tasks (Inscription Detection, Text Recognition) and one Natural Language Processing task (Word Classification). The dataset is publicly available at https://github.com/Archaeocomputers/Bessarion.
Loading