Pyramid multi-loss vision transformer for thyroid cancer classification using cytological smear

Published: 01 Jan 2023, Last Modified: 22 May 2024Knowl. Based Syst. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-instance learning, a commonly used technique in artificial intelligence for analyzing slides, can be applied to diagnose thyroid cancer based on cytological smears. Since smears do not have multidimensional histological features similar to histopathology, mining potential contextual information and diversity of features is crucial for better classification performance. In this paper, we propose a pyramid multi-loss vision transformer model called PyMLViT, a novel algorithm with two core modules to address these issues. Specifically, we design a pyramid token extraction module to acquire potential contextual information on smears. The pyramid token structure extracts multi-scale local features, and the vision transformer structure further obtains global information through the self-attention mechanism. Furthermore, we construct multi-loss fusion module based on the conventional multi-instance learning framework. With carefully designed bag and patch weight allocation strategies, we incorporate slide-level annotations as pseudo-labels for patches to participate in training, thus enhancing the diversity of supervised information. Extensive experimental results on the real-world dataset show that PyMLViT has a high performance and a competitive number of parameters compared to popular methods for diagnosing thyroid cancer in cytological smears.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview