DEALING WITH OUT OF DISTRIBUTION IN PREDICTION PROBLEM

Achmad Ginanjar; Xue Li; PRIYANKA SINGH; Wen Hua

DEALING WITH OUT OF DISTRIBUTION IN PREDICTION PROBLEM

Achmad Ginanjar, Xue Li, PRIYANKA SINGH, Wen Hua

27 Sept 2024 (modified: 29 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: representation learning, tabular data, out of distribution

Abstract: The open world assumption in model development means that a model may lack sufficient information to effectively handle data that is completely different or out of distribution (OOD). When a model encounters OOD data, its performance can significantly decrease. Improving the model’s performance in dealing with OOD can be achieved through gener- alization by adding noise, which can be easily done with deep learning. However, many advanced machine learning models are resource-intensive and designed to work best with specialized hardware (GPU), which may not always be available for common users with hardware limitations. To provide a deep understanding and solution on OOD for gen- eral user, this study explores detection, evaluation, and prediction tasks within the context of OOD on tabular datasets using common consumer hardware (CPU). It demonstrates how users can identify OOD data from available datasets and provide guidance on eval- uating the OOD selection through simple experiments and visualizations. Furthermore, the study introduces Tabular Contrast Learning (TCL), a technique specifically designed for tabular prediction tasks. While achieving better results compared to heavier models, TCL is more efficient even when trained without specialised hardware, making it useful for general machine-learning users with computational limitations. This study includes a comprehensive comparison with existing approaches within their best hardware setting (GPU) compared with TCL on common hardware (CPU), focusing on both accuracy and efficiency. The results show that TCL exceeds other models, including gradient boosting decision trees, contrastive learning, and other deep learning models, on the classification task.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9421

Loading