Keywords: Benchmark dataset generation, Materials informatics, Measurement informatics
TL;DR: Our pipeline combines simulation, noise injection, and CycleGAN texture transfer to create labeled microscopy datasets, providing annotation masks even in challenging domains and enabling fair, reproducible benchmarks in materials science.
Abstract: Manual annotation of material microscopy images is time-consuming, costly, and requires domain expertise. This annotation bottleneck limits model training and fair benchmarking. Prior cycle-consistent generative adversarial network (CycleGAN)-based data generation, despite being promising, often relied on computationally expensive simulations and struggled to capture the diverse noise characteristics, making it task-specific. In this study, we introduce an automated pipeline which simplifies dataset generation and improves generality by combining parametric simulations, diverse modality-specific noise injection, and CycleGAN-based texture transfer while preserving the ground-truth masks. Case studies on rubber materials with stripe-like noise in optical microscopy highlight its versatility. This pipeline was evaluated on a public transmission electron microscopy (TEM) nanoparticle dataset to obtain a quantitative comparison with manual annotations. Our results show that the segmentation accuracy approached that of human-labeled data while also reproducing characteristic imaging artifacts. This framework reduces dataset cost, explicitly addresses noise diversity, and enables customized, reproducible, and noise-aware benchmarks aligned with real experimental settings.
Submission Track: Benchmarking in AI for Materials Design - Short Paper
Submission Category: Automated Material Characterization
Institution Location: Japan
AI4Mat Journal Track: Yes
AI4Mat RLSF: Yes
Submission Number: 127
Loading