Shifts 2.0: Extending The Dataset of Real Distributional Shifts

Andrey Malinin; andreas athanasopoulos; Muhamed Barakovic; Meritxell Bach Cuadra; Mark Gales; Cristina Granziera; Mara Graziani; Nikolay Kartashev; Konstantinos Kyriakopoulos; Po-Jui Lu; Nataliia Molchanova; Antonis Nikitakis; Vatsal Raina; Francesco La Rosa; Eli Sivena; Vasileios Tsarsitalidis; Efi Tsompopoulou; Elena Volf

Shifts 2.0: Extending The Dataset of Real Distributional Shifts

Andrey Malinin, andreas athanasopoulos, Muhamed Barakovic, Meritxell Bach Cuadra, Mark Gales, Cristina Granziera, Mara Graziani, Nikolay Kartashev, Konstantinos Kyriakopoulos, Po-Jui Lu, Nataliia Molchanova, Antonis Nikitakis, Vatsal Raina, Francesco La Rosa, Eli Sivena, Vasileios Tsarsitalidis, Efi Tsompopoulou, Elena Volf

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Distributional Shift, Uncertainty Estimation, Benchmark, MRI 3D segmentation, medical data, industrial tabular data

Abstract: Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML datasets do not allow these properties to be assessed, as the training, validation and test data are often identically distributed. Recently, a range of dedicated benchmarks have appeared, featuring both distributionally matched and shifted data. The Shifts dataset stands out in terms of the diversity of tasks and data modalities it features. Unlike most benchmarks, which are dominated by 2D image data, Shifts contains tabular weather forecasting, machine translation, and vehicle motion prediction tasks. This enables models to be assessed on a diverse set of industrial-scale tasks and either universal or directly applicable task-specific conclusions to be reached. In this paper, we extend the Shifts Dataset with two datasets sourced from industrial, high-risk applications of high societal importance. Specifically, we consider the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels. Both tasks feature ubiquitous distributional shifts and strict safety requirements due to the high cost of errors. These new datasets will allow researchers to explore robust generalization and uncertainty estimation in new situations. This work provides a description of the dataset and baseline results for both tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)

TL;DR: We introduce two new datasets into the Shifts Benchmark for assessing robustness and uncertainty - Multiple Sclerosis Lesion Segmentation in MRI images and Cargo Vessel Power Consumption Prediction

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/shifts-2-0-extending-the-dataset-of-real/code)

24 Replies

Loading