Abstract: Synthetic Sensory Data Generators (SSDGs) promise to advance the state of intelligent sensing by providing labelled training data at almost no cost. Such data can be used to train real-world sensory classification models without manual data collection and annotation. In this work, we dissect a promising paradigm of SSDGs (based on human motion generation) and reveal a culprit that could hinder future progress. SSDGs are postfixed with a simple "calibration" component; to bridge the distributional gap between real and synthetic data. In this study, we conduct a critical review of this component and analyse its contribution to the data synthesis pipeline. Our finding reveals that, without a proper understanding of the calibration process, the performance of SSDGs is often overestimated.We make a number of observations demonstrating that the performance of current SSDGs heavily depends on the calibration process. First, generating synthetic data without calibration leads to poorly performing down stream classifiers (when trained on synthetic data). Second, while calibration can be unsupervised, only supervised implementation is usable. This raises the question of whether SSDGs are better than the relatable few-shot learners that doesn’t require data synthesis effort. We advocate for fully unsupervised SSDGs. Third, in some cases, the calibration value outweighs that of the actual data generation process. Specifically, our experiments demonstrate that a classifier trained on random data is equally good to that trained on synthetic data when both are calibrated! Thus, downstream classification performance isn’t necessarily a good metric of the generated data quality. Our findings call for rethinking the current evaluation protocols of SSDGs.
External IDs:dblp:conf/percom/SharmaKKBK25a
Loading