Feature Shift Localization Network

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
TL;DR: We introduce the Feature Shift Localization Network, a model that can localize feature shifts in a single forward pass, without requiring dataset-specific training.
Abstract: Feature shifts between data sources are present in many applications involving healthcare, biomedical, socioeconomic, financial, survey, and multi-sensor data, among others, where unharmonized heterogeneous data sources, noisy data measurements, or inconsistent processing and standardization pipelines can lead to erroneous features. Localizing shifted features is important to address the underlying cause of the shift and correct or filter the data to avoid degrading downstream analysis. While many techniques can detect distribution shifts, localizing the features originating them is still challenging, with current solutions being either inaccurate or not scalable to large and high-dimensional datasets. In this work, we introduce the Feature Shift Localization Network (FSL-Net), a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner. The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts from previously unseen datasets and shifts without the need for re-training. The code and ready-to-use trained model are available at \url{https://github.com/AI-sandbox/FSL-Net}.
Lay Summary: When combining data from different sources—like medical records from different hospitals or sensor readings from different devices—some measurements may be inconsistent or corrupted due to different collection methods or errors. While we can tell something is wrong with the combined data, pinpointing exactly which measurements are problematic is difficult, especially with large datasets containing many variables. We developed FSL-Net, an AI system that quickly identifies which specific measurements in your data are causing problems. The system learns patterns from thousands of example datasets and can then spot problematic measurements in completely new data without needing additional training—like a quality control inspector who has seen so many products they can instantly spot defects in new items. This helps researchers and analysts clean their data before running important analyses, preventing errors that could lead to wrong conclusions in healthcare, finance, or scientific research. We've made the tool freely available so anyone working with data from multiple sources can ensure their results are reliable.
Link To Code: https://github.com/AI-sandbox/FSL-Net
Primary Area: Deep Learning->Everything Else
Keywords: feature shift localization, shift detection, shift, feature shift, distribution shift
Submission Number: 11426
Loading