Keywords: Out-of-Distribution Generalization, representation evaluation, Hierarchy, Vision Language Model, Class Taxonomy, Zero-shot
TL;DR: We propose to use LCA distance on WordNet hierarchy to estimate ImageNet OOD performance, and for the first time, we have shown a strong linear correlation across 75 models (including vision-only models and VLM) over 5 natural shrift dataset.
Abstract: We introduce `Least Common Ancestor (LCA)-on-the-line' as a method for predicting models' Out-of-Distribution (OOD) performance using in-distribution measurements, without the need for OOD data. We revisit the LCA distance, a concept from the pre-deep-learning era, which calculates the hierarchical distance between labels and predictions in a predefined class hierarchy tree, such as WordNet. Our evaluation of 75 models across five significantly shifted ImageNet-OOD datasets demonstrates the robustness of LCA-on-the-line. It reveals a strong linear correlation between in-domain ImageNet LCA distance and OOD Top-1 accuracy across various datasets, including ImageNet-S/R/A/ObjectNet. Compared to previous methods such as Accuracy-on-the-line and Agreement-on-the-line, LCA-on-the-line shows superior generalization across a wide range of models. This includes models trained with different supervision types, such as class labels for vision models (VMs) and textual captions for vision-language models (VLMs). Our method offers a compelling alternative perspective on why vision-language models tend to generalize better to OOD data compared to vision models, even those with similar or lower in-domain (ID) performance. In addition to presenting an OOD performance indicator, we also demonstrate that aligning model predictions more closely with the class hierarchy and integrating a training loss objective with soft-labels can enhance model OOD performance.
Submission Number: 37
Loading