ST-WID: Self-supervised transformer for writer identification in arabic handwritten scripts

Islem Fatnassi, Sana Khamekhem Jemni, Sourour Ammar, Yousri Kessentini

Published: 2025, Last Modified: 25 Feb 2026Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite significant advancements in handwriting analysis, writer identification in Arabic manuscripts remains challenging due to the scarcity of large-scale annotated datasets. Traditional methods, which primarily rely on supervised learning and convolutional architectures, often struggle in such low-resource conditions. In contrast, unlabeled Arabic handwritten data is often available in large quantities. To address this gap, we propose in this paper an end-to-end Vision Transformer (ViT)-based framework for Arabic writer identification. Our approach leverages self-supervised learning during pretraining to acquire robust and transferable feature representations from unlabeled handwritten words, followed by supervised fine-tuning phase for writer identification task. Additionally, we integrate a synthetic data generation strategy to further mitigate data scarcity issues. The proposed framework achieves state-of-the-art performance on two benchmark Arabic datasets: the IFN/ENIT and the AHTID/MW datasets.

External IDs:dblp:journals/sivp/FatnassiJAK25