Seasons in Drift: A Long Term Thermal Imaging Dataset for Studying Concept DriftDownload PDF

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
Keywords: long-term dataset, large dataset, thermal imaging, concept drift, anomaly detection, object detection
TL;DR: A long-term thermal dataset used for studying the changes of performance of surveillance models based on concept drift from environmental effects like weather, day/night cycle, seasonality, scene activity, etc.
Abstract: The time dimension of datasets and the long-term performance of machine learning models have received little attention. With extended deployments in the wild, models are bound to encounter novel scenarios and concept drift that cannot be accounted for during development and training. In order for long-term patterns and cycles to appear in datasets, the datasets must cover long periods of time. Since this is rarely the case, it is difficult to explore how computer vision algorithms cope with changes in data distribution occurring across long-term cycles such as seasons. Video surveillance is an application area clearly affected by concept drift. For this reason, we publish the Long-term Thermal Drift (LTD) dataset. LTD consists of thermal surveillance imaging from a single location across 8 months. Along with thermal images we provide relevant metadata such as weather, the day/night cycle, and scene activity. In this paper, we use the metadata for in-depth analysis of the causal and correlational relationships between environmental variables and the performance of selected computer vision algorithms used for anomaly and object detection. Long-term performance is shown to be most correlated with temperature, humidity, the day/night cycle, and scene activity level. This suggests that the coverage of these variables should be prioritised when building datasets for similar applications. As a baseline, we propose to mitigate the impact of concept drift by first detecting points in time where drift occurs. At this point, we collect additional data that is used to retraining the models. This improves later performance by an average of 25% across all tested algorithms.
URL: https://www.kaggle.com/ivannikolov/longterm-thermal-drift-dataset
Supplementary Material: zip
Contribution Process Agreement: Yes
Dataset Url: https://www.kaggle.com/ivannikolov/longterm-thermal-drift-dataset
License: Attribution 4.0 International (CC BY 4.0) Please cite the paper connected to this dataset
Author Statement: Yes
7 Replies

Loading