Evaluating Out-of-Distribution Performance on Document Image Classifiers

Stefan Larson; Gordon Lim; Yutong Ai; David Kuang; Kevin Leach

Evaluating Out-of-Distribution Performance on Document Image Classifiers

Stefan Larson, Gordon Lim, Yutong Ai, David Kuang, Kevin Leach

Published: 17 Sept 2022, Last Modified: 06 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: document classification, RVL-CDIP, out-of-distribution

Abstract: The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers.

Author Statement: Yes

URL: https://github.com/gxlarson/rvl-cdip-ood

Dataset Url: https://github.com/gxlarson/rvl-cdip-ood

License: CC BY-NC 3.0

TL;DR: Our paper introduces new out-of-distribution data for evaluating document classifiers, and finds that models trained on RVL-CDIP but tested on our new out-of-distribution data tend to underperform.

Supplementary Material: pdf

Contribution Process Agreement: Yes

In Person Attendance: No

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/evaluating-out-of-distribution-performance-on/code)

20 Replies

Loading