T-QPM: Enabling Temporal Out-Of-Distribution Detection and Domain Generalization for Vision-Language Models in Open-World
Abstract: Out-of-distribution (OOD) detection remains a critical chal-
lenge in open-world learning, where models must adapt to evolving data
distributions.Whilerecentvision-languagemodels(VLMS)likeCLIPen-
ablemultimodalOODdetectionthroughDual-PatternMatching(DPM),
existing methods typically suffer from two major shortcomings: (1) They
rely on fixed fusion rules and assume static environments, failing under
temporal drift; and (2) they lack robustness against covariate shifted in-
puts. In this paper, we propose a novel two-step framework to enhance
OOD detection and covariate distribution shift robustness in dynamic
settings. We extend the dual-pattern regime into Temporal Quadruple-
Pattern Matching (T-QPM). First, by pairing OOD images with text
descriptions, we introduce cross-modal consistency patterns between ID
and OOD signals, refining the decision boundary through joint image-
text reasoning. Second, we address temporal distribution shifts by learn-
ing lightweight fusion weights to optimally combine semantic matching
and visual typicality. To ensure stability, we enforce explicit regulariza-
tionbasedonAverageThresholdedConfidence(ATC),preventingperfor-
mance degradation as distributions evolve. Experiments on temporally
partitionedbenchmarksdemonstratethatourapproachsignificantlyout-
performs static baselines, offering a robust, temporally-consistent frame-
work for multimodal OOD detection in non-stationary environments.
Loading