Abstract: Detecting out-of-distribution (OOD) data is cru-
cial in machine learning applications to miti-
gate the risk of model overconfidence, thereby
enhancing the reliability and safety of deployed
systems. The majority of existing OOD detec-
tion methods predominantly address uni-modal
inputs, such as images or texts. In the context of
multi-modal documents, there is a notable lack
of extensive research on the performance of
these methods, which have primarily been de-
veloped with a focus on computer vision tasks.
We propose a novel methodology termed as at-
tention head masking (AHM) for multi-modal
OOD tasks in document classification systems.
Our empirical results demonstrate that the pro-
posed AHM method outperforms all state-of-
the-art approaches and significantly decreases
the false positive rate (FPR) compared to ex-
isting solutions up to 7.5%. This methodology
generalizes well to multi-modal data, such as
documents, where visual and textual informa-
tion are modeled under the same Transformer
architecture. To address the scarcity of high-
quality publicly available document datasets
and encourage further research on OOD detec-
tion for documents, we introduce, FinanceDocs,
a new document AI dataset. Our code and
dataset are publicly available.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Out of Distribution Detection, Multi-modal Document Classification, Supervised Learning, Transformer models, Attention Methods
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 419
Loading