Towards Making Effective Machine Learning Decisions Against Out-of-Distribution Data

Lakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal

Published: 18 Jul 2024, Last Modified: 28 Sept 2024ACM Conference on Information and Knowledge Management CIKM 2024EveryoneCC BY 4.0

Abstract: Conventional machine learning systems operate on the assumption of independent and identical distribution (i.i.d), where both the training and test data share a similar sample space, and no distribu- tion shift exists between them. However, this assumption does not hold in practical deployment scenarios, making it crucial to develop methodologies that address the non-trivial task of data distribution shift. In our research, we aim to address this problem by develop- ing ML algorithms that explicitly achieve promising performance when subjected to various types of out-of-distribution (OOD) data. Specifically, we approach the problem by categorizing the data dis- tribution shifts into two types: covariate shifts and semantic shifts, and proposing effective methodologies to tackle each type indepen- dently and conjointly while validating them with different types of datasets. We aim to propose ideas that are compatible with existing deep neural networks to perform detection and/or generalization of the test instances that are shifted in semantic and covariate space, respectively