Abstract: Of late, convolutional neural networks have shown significant performance improvement over the traditional machine learning and dominate the classification tasks in the field of computer vision. Considering the success of CNNs in deep learning, and the utilization of self-attention mechanism, the vision transformers can better model the global contextual information as compared to CNNs, have seen rapid interest in vision community. In this paper, an attention-aided CNN and transformer model is proposed to identify pest and disease infestations of 14 crop species with 26 diseases (or absence thereof) on a public dataset of 54,309 images are collected under the controlled conditions. The proposed model achieves an accuracy of 94.24% in training and a test accuracy of 94.74% with a standard deviation of 3.66 on a fivefold cross-validation, which provides the better result as compared to the convolutional block attention module (CBAM) and Cross-Vision transformer, whereas using CBAM on the same dataset reaches a training accuracy of \((90.36 \pm 3.91)\%\) and a test accuracy of \((90.36 \pm 3.91)\%\) and for Cross-Vision transformer training accuracy is \((91.55 \pm 3.67)\%\) and test accuracy is \((91.89 \pm 3.75)\%\), respectively.
Loading