Convolutional Swin Encoder

Aditya Majithia, Arthur Paul Pedersen, Michael Grossberg

Published: 14 May 2025, Last Modified: 11 Feb 2026The International FLAIRS Conference ProceedingsEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper focuses on developing a deep learning architecture capable of identifying writers' attributes from their handwriting. It introduces Convolutional Swin Encoder (CSE), a novel architecture combining Visual Geometry Group Network (VGGNet) and Swin Transformer blocks. CSE is designed to handle multi-label classification using images of individual handwritten words. As a unified encoder, it can predict writers' attributes such as identity, gender, age, and handedness. Using a word-level segmentation approach, CSE achieves competitive performance compared to page-level methods, which typically rely on separate classifiers instead of a unified one.

External IDs:doi:10.32473/flairs.38.1.138949