Enhancing Training Set Through Multi-Temporal Attention Analysis in Transformers for Multi-Year Land Cover Mapping
Abstract: The continuous stream of high spatial resolution satellite data offers the opportunity to regularly produce land cover (LC) maps. To this end, Transformer deep learning (DL) models have recently proven their effectiveness in accurately classifying long time series (TS) of satellite images. The continual generation of regularly updated LC maps can be used to analyze dynamic phenomena and extract multi-temporal information. However, several challenges need to be addressed. Our paper aims to study how the performance of a Transformer model changes when classifying TS of satellite images acquired in years later than those in the training set. In particular, the behavior of the attention in the Transformer model is analyzed to determine when the information provided by the initial training set needs to be updated to keep generating accurate LC products. Preliminary results show that: (i) the selection of the positional encoding strategy used in the Transformer has a significant impact on the classification accuracy obtained with multi-year TS, and (ii) the most affected classes are the seasonal ones.
Loading