T-QPM: Enabling Temporal Out-Of-Distribution Detection and Domain Generalization for Vision-Language Models in Open-World

Aditi Naiknaware, Salimeh Sekeh

Published: 19 Mar 2026, Last Modified: 25 Mar 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license

Abstract: Out-of-distribution (OOD) detection remains a critical chal- lenge in open-world learning, where models must adapt to evolving data distributions.Whilerecentvision-languagemodels(VLMS)likeCLIPen- ablemultimodalOODdetectionthroughDual-PatternMatching(DPM), existing methods typically suffer from two major shortcomings: (1) They rely on fixed fusion rules and assume static environments, failing under temporal drift; and (2) they lack robustness against covariate shifted in- puts. In this paper, we propose a novel two-step framework to enhance OOD detection and covariate distribution shift robustness in dynamic settings. We extend the dual-pattern regime into Temporal Quadruple- Pattern Matching (T-QPM). First, by pairing OOD images with text descriptions, we introduce cross-modal consistency patterns between ID and OOD signals, refining the decision boundary through joint image- text reasoning. Second, we address temporal distribution shifts by learn- ing lightweight fusion weights to optimally combine semantic matching and visual typicality. To ensure stability, we enforce explicit regulariza- tionbasedonAverageThresholdedConfidence(ATC),preventingperfor- mance degradation as distributions evolve. Experiments on temporally partitionedbenchmarksdemonstratethatourapproachsignificantlyout- performs static baselines, offering a robust, temporally-consistent frame- work for multimodal OOD detection in non-stationary environments.