LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Jia Shi; Gautam Rajendrakumar Gare; Jinjin Tian; Siqi Chai; Zhiqiu Lin; Arun Balajee Vasudevan; Di Feng; Francesco Ferroni; Shu Kong; Deva Ramanan

LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Jia Shi, Gautam Rajendrakumar Gare, Jinjin Tian, Siqi Chai, Zhiqiu Lin, Arun Balajee Vasudevan, Di Feng, Francesco Ferroni, Shu Kong, Deva Ramanan

Published: 28 Oct 2023, Last Modified: 02 Apr 2024DistShift 2023 PosterEveryoneRevisionsBibTeX

Keywords: Out-of-Distribution Generalization, representation evaluation, Hierarchy, Vision Language Model, Class Taxonomy, Zero-shot

TL;DR: We propose to use LCA distance on WordNet hierarchy to estimate ImageNet OOD performance, and for the first time, we have shown a strong linear correlation across 75 models (including vision-only models and VLM) over 5 natural shrift dataset.

Abstract: We introduce `Least Common Ancestor (LCA)-on-the-line' as a method for predicting models' Out-of-Distribution (OOD) performance using in-distribution measurements, without the need for OOD data. We revisit the LCA distance, a concept from the pre-deep-learning era, which calculates the hierarchical distance between labels and predictions in a predefined class hierarchy tree, such as WordNet. Our evaluation of 75 models across five significantly shifted ImageNet-OOD datasets demonstrates the robustness of LCA-on-the-line. It reveals a strong linear correlation between in-domain ImageNet LCA distance and OOD Top-1 accuracy across various datasets, including ImageNet-S/R/A/ObjectNet. Compared to previous methods such as Accuracy-on-the-line and Agreement-on-the-line, LCA-on-the-line shows superior generalization across a wide range of models. This includes models trained with different supervision types, such as class labels for vision models (VMs) and textual captions for vision-language models (VLMs). Our method offers a compelling alternative perspective on why vision-language models tend to generalize better to OOD data compared to vision models, even those with similar or lower in-domain (ID) performance. In addition to presenting an OOD performance indicator, we also demonstrate that aligning model predictions more closely with the class hierarchy and integrating a training loss objective with soft-labels can enhance model OOD performance.

Submission Number: 37

Loading