Stratified Fr\'echet Distance: A Three-Layer Diagnostic Framework\\for Conditional Time Series Generation under Data Scarcity

Published: 26 May 2026, Last Modified: 26 May 2026Machine Learning and Knowledge Extraction, MDPIEveryoneCC BY-NC-ND 4.0
Abstract: Evaluating conditional time-series generation models remains challenging in battery research, where degradation data are often limited and experiments cover only a small number of operating conditions. The widely used Fr\'echet Inception Distance (FID) summarizes all conditions into a single score, which can obscure failures under rare but safety-critical conditions. Several condition-aware extensions of FID, including Conditional Fr\'echet Inception Distance (CFID), partially address this limitation by evaluating each condition separately. However, these approaches do not assess whether physically meaningful relationships between operating conditions are preserved, and their reliability deteriorates when only a few samples are available for each condition. To address these issues, we propose a three-layer diagnostic framework for evaluating conditional generative models under limited-data conditions. The first layer, Stratified Fr\'echet Distance, identifies the specific operating conditions and degradation phases where generation quality degrades. The second layer, based on Conditional Response Consistency (CRC), Conditional Distance Ratio (CDR), and Mean-Order Preservation (MOP), evaluates whether the model preserves the distance structure and ordering between conditions. MOP detects condition-ordering defects that CRC cannot identify when the real data distance matrix is non-monotone. This layer also enables statistically meaningful comparisons even when only a small number of samples are available. The third layer detects strata where statistical estimates are unreliable and provides a more stable alternative for evaluation. We validate the framework on four battery degradation datasets using two generative model architectures. The proposed approach reveals condition-specific failures that are not captured by conventional FID. It localizes generation errors to the late-stage high-temperature degradation regime that is most relevant to battery safety. The framework also detects structural distortions with statistical significance. In addition, it consistently ranks physics-informed model variants across quality differences spanning seven orders of magnitude. These results demonstrate that the proposed framework provides a practical and physically interpretable evaluation methodology for conditional generative modeling in battery degradation analysis.
Loading