Abstract: Despite significant advancements in handwriting analysis, writer identification in Arabic manuscripts remains challenging due to the scarcity of large-scale annotated datasets. Traditional methods, which primarily rely on supervised learning and convolutional architectures, often struggle in such low-resource conditions. In contrast, unlabeled Arabic handwritten data is often available in large quantities. To address this gap, we propose in this paper an end-to-end Vision Transformer (ViT)-based framework for Arabic writer identification. Our approach leverages self-supervised learning during pretraining to acquire robust and transferable feature representations from unlabeled handwritten words, followed by supervised fine-tuning phase for writer identification task. Additionally, we integrate a synthetic data generation strategy to further mitigate data scarcity issues. The proposed framework achieves state-of-the-art performance on two benchmark Arabic datasets: the IFN/ENIT and the AHTID/MW datasets.
External IDs:dblp:journals/sivp/FatnassiJAK25
Loading