Test-Time Adaptation for Visual Document Understanding

Sayna Ebrahimi; Sercan O Arik; Tomas Pfister

Test-Time Adaptation for Visual Document Understanding

Sayna Ebrahimi, Sercan O Arik, Tomas Pfister

Published: 01 Feb 2023, Last Modified: 30 Dec 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Test-time adaptation, source data-free domain adaptation, visual document understanding

TL;DR: Proposing a novel test-time adaptation approach and three benchmarks for visual document understanding via masked language modeling and pseudo labeling.

Abstract: Self-supervised pretraining has been able to produce transferable representations for various visual document understanding (VDU) tasks. However, the ability of such representations to adapt to new distribution shifts at test-time has not been studied yet. We propose DocTTA, a novel test-time adaptation approach for documents that leverages cross-modality self-supervised learning via masked visual language modeling as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We also introduce new benchmarks using existing public datasets for various VDU tasks including entity recognition, key-value extraction, and document visual question answering tasks where DocTTA improves the source model performance up to 1.79\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/test-time-adaptation-for-visual-document/code)

12 Replies

Loading