Secure LLM-Assisted Labeling and Spatiotemporal CMR Representation for Sequence and View Recognition

Yixuan Liu; Zhenyu Bu; Yi Yu; Parker Martin; Yuchi Han; Orlando Simonetti; Yuan Xue

Secure LLM-Assisted Labeling and Spatiotemporal CMR Representation for Sequence and View Recognition

Yixuan Liu, Zhenyu Bu, Yi Yu, Parker Martin, Yuchi Han, Orlando Simonetti, Yuan Xue

04 Dec 2025 (modified: 14 Feb 2026)Submitted to MIDL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: cardiovascular magnetic resonance, sequence classification, view classification, spatiotemporal representation learning, large language models

TL;DR: We propose a clinically guided LLM prompting to turn messy CMR series descriptions into reliable pseudo labels, then train a spatiotemporal ConvNeXt plus xLSTM model that outperforms strong baselines on CMR sequence and view recognition.

Abstract: Cardiovascular magnetic resonance (CMR) studies combine diverse pulse sequences and imaging planes, which is clinically valuable but makes large scale data curation and automated analysis difficult. In routine practice, series descriptions in DICOM headers are heterogeneous across technologists, scanners, vendors, and time, so manual sequence and view labeling does not scale beyond small cohorts. We develop a secure labeling pipeline that uses a domain knowledge guided prompt for large language models (LLMs) with explicit CMR protocol based mapping rules to drive a locally deployed GPT-OSS model. From raw series descriptions, our prompt generates standardized pseudo labels for sequence type and cardiac view for approximately 76,000 CMR series from 1,000 patients entirely offline, preserving data security while capturing local naming conventions. These labels are used to train a spatiotemporal CMR encoder that combines a ConvNeXt image backbone with an xLSTM temporal module and maps heterogeneous series into a compact low dimensional embedding for multi-class sequence and view classification. On an expert annotated test set, the domain knowledge guided prompt reduces the number of unknown labels by two orders of magnitude and improves sequence and view label accuracy compared with a generic prompt. Models trained on these optimized pseudo labels achieve sequence and view classification accuracy of 0.983 and 0.989 respectively, outperforming existing 2D and Vision Transformer baselines. The proposed framework shows that clinically informed prompting and explicit spatiotemporal modeling together enable secure CMR curation and accurate sequence and view recognition at scale.

Primary Subject Area: Learning with Noisy Labels and Limited Data

Secondary Subject Area: Application: Cardiology

Registration Requirement: Yes

Visa & Travel: Yes

Read CFP & Author Instructions: Yes

Originality Policy: Yes

Single-blind & Not Under Review Elsewhere: Yes

LLM Policy: Yes

Submission Number: 361

Loading