SFA-KAN: Spatial-Frequency Aggregation Kolmogorov-Arnold Network for OCT Segmentation

Genghui Wu; Fengtao Nan; Meili Wang

SFA-KAN: Spatial-Frequency Aggregation Kolmogorov-Arnold Network for OCT Segmentation

Genghui Wu, Fengtao Nan, Meili Wang

05 Sept 2025 (modified: 24 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: OCT Image, Kolmogorov-Arnold Network, Dual-Domain

TL;DR: Our work proposes SFA-KAN to address poor robustness in OCT segmentation via its SFA module (S2KA for spatial, S2FT for frequency features, cross-attention aggregation), achieving SOTA on two private OCT datasets.

Abstract: Current medical image segmentation methods exhibit significant limited robustness in optical coherence tomography (OCT) images, primarily attributable to incomplete representation of organ structures and the illumination heterogeneity during image acquisition. To this end, we propose an efficient approach for extracting complete structure and fine-grained details of OCT images, the Spatial-Frequency Aggregation Kolmogorov-Arnold Network (SFA-KAN). Specifically, our method introduces the Spatial-Frequency Aggregation (SFA) module, which operates in the latent space of a convolutional encoder-decoder architecture. This module hierarchically aggregates features from both the spatial and frequency domains. For spatial-domain feature extraction, we propose the Spatial-Shift KAN (S2KA) block, which employs width and height directions channel-mixing KAN linear layers combined with spatial-shift operations. This design facilitates patch-wise communication and captures long-distance multi-directional dependencies across the entire image within a single computational pass. For frequency-domain feature extraction, we introduce the Spatial-Shift Frequency Transform (S2FT) block, which employs the same spatial operations as the S2KA block followed by multi-scale fast Fourier transform to isolate clinically-relevant frequency components, enhancing segmentation of anatomically diverse structures. Subsequently, the features from these two different domains are channel-wise concatenated and aggregated via cross attention, enabling the model to reconstruct high-frequency details while preserving global structural integrity. Experiments conducted on two privately collected OCT image datasets employing pixel-based metrics and clinical metrics demonstrated that SFA-KAN achieves state-of-the-art performance for OCT image segmentation.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 2244

Loading