Balancing Quality and Quantity: The Impact of Synthetic Data on Smoke Detection Accuracy in Computer Vision
Keywords: Computer Vision, Synthetic Data, Amorphous Objects
TL;DR: Utilizing a small real-world dataset, we explored the impact of synthetic smoke data from Unreal Engine and NVIDIA Omniverse, identifying optimal quality and quantity thresholds for the task of smoke detection.
Abstract: Synthetic data plays a crucial role in augmenting limited or challenging datasets. One domain with a scarcity of publicly available datasets is environmental monitoring of smoke opacity. Smoke presents a novel challenge for computer vision because its shape is amorphous and the texture is inconsistent. The dearth of public smoke datasets necessitates the generation of synthetic data to augment existing datasets. However, the generation of synthetic smoke, and explorations of how quantity and synthetic quality affects downstream model performance, remains largely unexplored. Here, we present SemiS, a novel, state-of-the-art deep learning model tailored to extract features from smoke, and use it to investigate the impact of synthetic smoke data. We used two synthetic smoke pipelines: 1) lower quality but quick to produce smoke generated with Unreal Engine, and 2) higher quality but slow to produce smoke from NVIDIA Omniverse. Across both pipelines, we found SemiS's performance peaked when synthetic data constituted approximately 30\% of the initial training data. Further, higher quality data enhanced training accuracy by approximately 5\%, compared to a 2.5\% increase achieved with lower quality data. However, Omniverse was $\sim$12\% slower to generate than Unreal. Finally, we dissect the quality of the generated smoke features in comparison with non-synthetic smoke. These results demonstrate the usefulness of developing a methodology that determines the value of synthetic data by analyzing their ability to improve model performance in smoke detection and similar applications.
Submission Number: 25
Loading