KAN-Semi: A Semi-Supervised Approach Combining Self-Supervised Pre-training, Hierarchical Priors, and Kolmogorov-Arnold Networks for Landmark-based Biometry Estimation

KAN-Semi: A Semi-Supervised Approach Combining Self-Supervised Pre-training, Hierarchical Priors, and Kolmogorov-Arnold Networks for Landmark-based Biometry Estimation

ICLR 2026 Conference Submission18566 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Semi-Supervised Learning, Self-Supervised Learning, Medical Image Analysis, Landmark Detection, Kolmogorov-Arnold Networks (KAN)

TL;DR: We propose KAN-Semi, a semi-supervised framework that synergizes self-supervised pre-training, hierarchical architectural priors, and KAN-enhanced heads to achieve robust landmark localization in medical ultrasound.

Abstract: Ultrasound (US)-based biometric estimation is crucial for monitoring labor progression and diagnosing fetal and maternal abnormalities. Reliable biometry estimation relies heavily on accurate landmark localization on standard planes, a process traditionally performed by sonographers. However, manual measurement is time-consuming, operator-dependent, and prone to variability. Although automated segmentation methods based on fully supervised models show promise, they often suffer from multi-stage error accumulation and a lack of expertly annotated data. To address these challenges, we introduce KAN-Semi, a semi-supervised network that combines self-supervised pre-training, hierarchical priors, and Kolmogorov-Arnold Networks (KANs). First, we utilize in-domain self-supervised pre-training with a Masked Autoencoder (MAE) to learn robust, domain-adapted representations for a novel CNN-ViT hybrid backbone. Next, we propose a Hierarchical Guidance Decoder, which encodes symbolic medical priors to regularize the model’s reasoning, progressively guiding it from stable to variable structures. Finally, we explore Kolmogorov-Arnold Network (KAN)-enhanced heads as an alternative to conventional predictors, demonstrating their efficacy in complex spatial regression tasks. We perform extensive experiments on three intrapartum ultrasound datasets collected from 24 medical centers and institutions, showing that our approach significantly outperforms fully supervised models in landmark detection performance. Our work offers a structured framework for designing effective learning systems that integrate self-supervision, knowledge-based architectural design, and emerging network paradigms.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18566

Loading