Getting the Data Right: A Physics-Consistent, Calibrated Dataset for SEM-Based Defect Localization in PEM Fuel Cells
Keywords: PEM fuel cells, scanning electron microscopy, defect localization, object detection, curated dataset, data-centric AI, physics-consistent preprocessing, calibration, augmentation ablation, shortcut learning, reproducible benchmarking, automated material characterization, scientific imaging
TL;DR: We introduce a physics-consistent SEM defect localization benchmark for PEM fuel cells and show that even plausible augmentations like 90° rotations can degrade detection accuracy
Abstract: High-quality data is a key bottleneck for vision systems in scientific imaging, yet publicly available datasets for defect localization in proton exchange membrane fuel cells remain scarce. We present a curated grayscale scanning electron microscopy dataset for single-class defect localization consisting of 1,107 images with bounding-box annotations, fixed train/validation/test splits, and a single canonical annotation source to ensure reproducibility. A physics-consistent preprocessing pipeline removes acquisition artifacts, enforces spatial standardization, and applies global intensity normalization to mitigate shortcut learning from non-physical cues. Controlled learnability and augmentation ablations show that even physically plausible transformations, including 90° rotations, can degrade detection performance, highlighting the need for dataset-specific validation rather than heuristic augmentation. By providing a rigorously validated and transparent benchmark for SEM-based defect localization, this dataset supports reliable automated characterization workflows and reduces a key data bottleneck in data-driven materials discovery and diagnostic pipelines.
Submission Track: Findings, Tools, & Open Challenges (Tiny Paper)
Submission Category: Automated Material Characterization
Submission Number: 25
Loading