Abstract: Our group is studying tree species recognition using image processing technology. In the previous research, we proposed an image-based bark recognition using CNN. In this paper, we propose a method of recognizing bark image using Vision Transformer (ViT), which has attracted attention in the image recognition task in recent years. Four public datasets of NewBarkTex, TRUNK12, BarkNet1.0, and Bark-101, and a new dataset of 150 tree species originally collected, KyutechBark150, were used in the evaluation experiment. Several CNN models were used as comparison methods. As a result of the recognition experiment, the highest recognition accuracy of ViT was obtained in all the datasets. In addition, the trained model was visualized by t-SNE and attention map, and this paper shows that ViT is effective for bark image recognition.
Loading