XMNet: XGBoost with Multitasking Network for Classification and Segmentation of Ultra-Fine-Grained Datasets

Ramy Farag, Jacket Demby's, Muhammad Arifuzzaman, Guilherme N. DeSouza

Published: 01 Jan 2024, Last Modified: 19 Sept 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Classification and segmentation using ultra-fine-grained datasets can be challenging due to the small nuances between adjacent classes. This problem can be exacerbated by the fact that variations within classes can be much larger than other variations between classes. Some approaches have resorted to attention mechanisms that focus on the source or the properties of the features that cause these minor changes in samples between or within classes. In some cases, the attention mechanism can be derived from spatial, temporal, modal, or other types of features in the dataset. Sometimes, attention can be drawn from external sources such as the shape of the object, its skeleton, contour, etc. Finally, some approaches use completely independently extracted information to guide the attention mechanism in a supervised fashion (privileged information, guided-attention, etc). In this paper, we claim that in the context of ultra-fine datasets with a small number of samples, a simple attention mechanism can improve the classification results. Moreover, the same simple attention mechanism can be employed in a backbone topology for the segmentation of the same information that would otherwise be used to guide the attention mechanism in other methods. In other words, unlike the state-of-the-art model for ultra-fine-grained classification of, for example, plant leaves datasets, which uses segmentation masks to guide its attention mechanism, our proposed network can simultaneously provide a classification label and a segmentation mask. The XGBoost algorithm was applied to the attention-modulated feature map for classification, and the Optuna hyperparameter optimization framework was used to tune XGBoost. Three state-of-the-art methods were compared against ours using three benchmark datasets, and our model, XMNet, achieved the best results for the vein segmentation task. For the classification part, our network achieved comparable performance with respect to two state-of-the-art as well as various other more traditional methods.