Head Tail Open: Open Tailed Classification of Imbalanced Document Data

Published: 01 Jan 2024, Last Modified: 20 May 2025SAI (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning models for scanned document image classification and form understanding have made significant progress in the last few years. High accuracy can be achieved by a model with the help of copious amounts of labelled training data for closed-world classification. However, very little work has been done in the domain of fine-grained and head-tailed (class imbalance with some classes having high numbers of data points and some having a low number of data points) open-world classification for documents. Our proposed method achieves a better classification results than the baseline of the head-tail-novel/open dataset. Our techniques include separating the head-tail classes and transferring the knowledge from head data to the tail data. This transfer of knowledge also improves the capability of recognizing a novel category by 15% as compared to the baseline.
Loading