Self-Accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain Method

Published: 2024, Last Modified: 21 Jan 2026ECCV Workshops (19) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This study introduces a novel approach to bone age assessment (BAA) utilizing a multi-view and multi-task classification model based on the Sauvegrain method, which assesses a maturity score for several landmarks in the elbow and predicts bone age. A straightforward solution to automating the Sauvegrain method leveraging deep neural networks is to train classifiers independently to score each region of interest, but this approach restricts the analysis to isolated anatomical details and increases computational costs. To address these challenges, we propose a self-accumulative vision transformer (SAT) designed to manage anisotropic behaviors commonly encountered in multi-view, multi-task scenarios. The SAT enhances feature integration by employing two key strategies: token replay, which uses residual connections to maintain semantic representations of tokens from the same landmark, and regional attention bias, a modified self-attention mechanism that focuses on intra-region details. Extensive experiments show that the SAT not only effectively captures the interconnections between landmarks but also assimilates global morphological features, reducing the mean absolute error in BAA by 0.11 compared to prior methods. Furthermore, the proposed method has four times reduced parameters than an ensemble of individual classifiers of the previous work. These improvements in our model highlight its increased efficiency and accuracy, offering a valuable advancement for clinical applications in the field of BAA. Code is available at https://github.com/hongchunchoi/SAT.
Loading