Keywords: noisy label, contrastive learning, noisy label detection
Abstract: Deep neural networks can suffer severe performance degradation when trained on datasets with
instance-dependent label noise—annotation errors that correlate with input features.
To address this issue, we propose a lightweight, model-agnostic preprocessing
framework based on an ensemble of contrastive Siamese networks.
Our method detects and corrects noisy labels by measuring embedding consistency:
clean samples yield stable representations across models, while noisy
samples exhibit high variability and increased misclassification rates.
Each Siamese model is trained on a subset of image pairs, and we demonstrate
that noisy instances are significantly more likely to be misclassified under
this subset-driven embedding process, with the ensemble’s false-positive
rate decaying exponentially with the number of models. Ultimately, samples
with high model disagreement are flagged and either relabeled by consensus or
discarded. Empirically, on real-world CIFAR-10N (9.01\% natural noise), our method
reduces label corruption to 4.45\% and achieves 88.51\% accuracy on the cleaned
dataset—0.26 percentage points ahead of the nearest baseline.
Under synthetic instance-dependent noise, label corruption on CIFAR-10 is reduced
from 40\% to 25.9\% (yielding a 12.54 percentage point accuracy gain) and on
Fashion-MNIST from 40\% to 4.6\% (a 2.23 percentage point accuracy gain).
Our preprocessing step adds minimal overhead, produces interpretable
uncertainty scores, and can be seamlessly integrated with any downstream
learner to enhance robustness against label noise.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 22927
Loading