Secure LLM-Assisted Labeling and Spatiotemporal CMR Representation for Sequence and View Recognition

04 Dec 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: cardiovascular magnetic resonance, sequence classification, view classification, spatiotemporal representation learning, large language models
TL;DR: We propose a clinically guided LLM prompting to turn messy CMR series descriptions into reliable pseudo labels, then train a spatiotemporal ConvNeXt plus xLSTM model that outperforms strong baselines on CMR sequence and view recognition.
Abstract: Cardiovascular magnetic resonance (CMR) studies combine diverse pulse sequences and imaging planes, which is clinically valuable but makes large scale data curation and automated analysis difficult. In routine practice, series descriptions in DICOM headers are heterogeneous across technologists, scanners, vendors, and time, so manual sequence and view labeling does not scale beyond small cohorts. We develop a secure labeling pipeline that uses a domain knowledge guided prompt for large language models (LLMs) with explicit CMR protocol based mapping rules to drive a locally deployed GPT-OSS model. From raw series descriptions, our prompt generates standardized pseudo labels for sequence type and cardiac view for approximately 76,000 CMR series from 1,000 patients entirely offline, preserving data security while capturing local naming conventions. These labels are used to train a spatiotemporal CMR encoder that combines a ConvNeXt image backbone with an xLSTM temporal module and maps heterogeneous series into a compact low dimensional embedding for multi-class sequence and view classification. On an expert annotated test set, the domain knowledge guided prompt reduces the number of unknown labels by two orders of magnitude and improves sequence and view label accuracy compared with a generic prompt. Models trained on these optimized pseudo labels achieve sequence and view classification accuracy of 0.983 and 0.989 respectively, outperforming existing 2D and Vision Transformer baselines. The proposed framework shows that clinically informed prompting and explicit spatiotemporal modeling together enable secure CMR curation and accurate sequence and view recognition at scale.
Primary Subject Area: Learning with Noisy Labels and Limited Data
Secondary Subject Area: Application: Cardiology
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 361
Loading