An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation

Published: 27 Oct 2023, Last Modified: 14 Nov 2023Gaze Meets ML 2023 OralEveryoneRevisionsBibTeX
Submission Type: Full Paper
Keywords: Visual attention, glimpses, perception, proprioception, multimodal, handwritten numeral/alphabet recognition and generation
TL;DR: An attention-based predictive agent that learns to recognize handwritten numerals and alphabets by generating them.
Abstract: A number of attention-based models for either classification or generation of handwritten numerals/alphabets have been reported in the literature. However, generation and classification are done jointly in very few end-to-end models. We propose a predictive agent model that actively samples its visual environment via a sequence of glimpses. The attention is driven by the agent's sensory prediction (or generation) error. At each sampling instant, the model predicts the observation class and completes the partial sequence observed till that instant. It learns where and what to sample by jointly minimizing the classification and generation errors. Three variants of this model are evaluated for handwriting generation and recognition on images of handwritten numerals and alphabets from benchmark datasets. We show that the proposed model is more efficient in handwritten numeral/alphabet recognition than human participants in a recently published study as well as a highly-cited attention-based reinforcement model. This is the first known attention-based agent to interact with and learn end-to-end from images for recognition via generation, with high degree of accuracy and efficiency.
Supplementary Material: zip
Submission Number: 18
Loading