Vision Transformer-Based Bark Image Recognition for Tree Identification

Towa Yamabe, Takeshi Saitoh

Published: 2022, Last Modified: 18 Nov 2024IVCNZ 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Our group is studying tree species recognition using image processing technology. In the previous research, we proposed an image-based bark recognition using CNN. In this paper, we propose a method of recognizing bark image using Vision Transformer (ViT), which has attracted attention in the image recognition task in recent years. Four public datasets of NewBarkTex, TRUNK12, BarkNet1.0, and Bark-101, and a new dataset of 150 tree species originally collected, KyutechBark150, were used in the evaluation experiment. Several CNN models were used as comparison methods. As a result of the recognition experiment, the highest recognition accuracy of ViT was obtained in all the datasets. In addition, the trained model was visualized by t-SNE and attention map, and this paper shows that ViT is effective for bark image recognition.