Snapshot ensemble-based residual network (SnapEnsemResNet) for remote sensing image scene classification

Muhammad Ibraheem Siddiqui; Khurram Khan; Adnan Fazil; Muhammad Zakwan

Snapshot ensemble-based residual network (SnapEnsemResNet) for remote sensing image scene classification

Muhammad Ibraheem Siddiqui, Khurram Khan, Adnan Fazil, Muhammad Zakwan

Published: 01 Jan 2023, Last Modified: 26 Feb 2025GeoInformatica 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to their exceptional discriminative ability, the convolutional neural networks (CNNs) have been the center of attention for the research community to achieve scene classification in remote sensing imagery (RSI). The scarcity in the availability of large-scale remote sensing scene classification datasets has held back the researchers to realize the full potential of deep CNN models such as ResNets. As deeper networks tend to overfit limited training data, effective techniques to counter the overfitting phenomenon along with the ability to target the inter-class similarity and intra-class diversity challenges in RSI are deemed necessary. This research, therefore, proposes a snapshot ensemble-based residual network (SnapEnsemResNet) which consists of two sub-networks (FC-1024 and Dilated-Conv) designed to realize the full potential of ResNets. FC-1024 architecture targets the overfitting phenomenon by adding an extra fully connected layer in existing ResNet architecture for effective implementation of regularization techniques resulting in improved generalization ability of the network. Whereas Dilated-Conv architecture focuses on extracting more descriptive features by introducing an additional dilated convolutional layer in the final convolution block which assists in minimizing inter-class similarity. To further enhance the individual sub-network performance, the SnapEnsemResNet is integrated with a two-tier snapshot-based ensembling strategy which is called ensembling the ensembled snapshots. The final prediction of the class label is achieved using the majority voting technique. Comparing the SnapEnsemResNet classification performance with state-of-the-art methods using the challenging NWPU-RESISC45 scene classification dataset as the benchmark, we obtained competitive accuracy results for a training ratio of 20%, whereas a new top performance is achieved with a 10% training ratio.

Loading