Abstract: This paper focuses on developing a deep learning architecture capable of identifying writers' attributes from their handwriting. It introduces Convolutional Swin Encoder (CSE), a novel architecture combining Visual Geometry Group Network (VGGNet) and Swin Transformer blocks. CSE is designed to handle multi-label classification using images of individual handwritten words. As a unified encoder, it can predict writers' attributes such as identity, gender, age, and handedness. Using a word-level segmentation approach, CSE achieves competitive performance compared to page-level methods, which typically rely on separate classifiers instead of a unified one.
External IDs:doi:10.32473/flairs.38.1.138949
Loading