Lung cancer is a major cause of cancer-related mortality worldwide~\cite{american_cancer}. Early detection of pulmonary nodules through low-dose computed tomography (CT) screening has been shown to reduce mortality by approximately $20\%$ compared to chest radiography~\cite{LUNA16}, yet manual image assessment remains time-consuming and resource-intensive for radiologists~\cite{national2011reduced}. Deep learning-based methods have achieved strong performance in automatic pulmonary nodule detection~\cite{dutande2022deep}, but their training typically relies on large quantities of annotated data. Weakly supervised segmentation (WSS) offers a potential strategy to address this limitation by learning from weaker forms of supervision, such as image-level labels, points or bounding boxes. However, deriving accurate segmentations from weak supervision alone, particularly for small structures such as lung nodules, remains highly challenging. In this work, we investigate a plug-and-play method for weakly supervised lung nodule segmentation by combining a pretrained 3D rectified flow generative model with a weakly supervised target predictor using the training-free guidance (TFG) framework.

\paragraph{Related work.} CAM-based methods~\cite{selvaraju2017grad,wang2020score,CAM_paper} are often used to obtain segmentation pseudo-labels by highlighting regions that contribute most to a classification network's prediction, but they tend to emphasize only the most discriminative parts, leading to low-quality segmentation masks. Recent work has improved attribution-based pseudo-labels through explicit constraints and affinity-based refinement~\cite{wang2025weakmedsam,wargnier2023weakly}. Reconstruction-based anomaly detection is another common strategy, where variants of autoencoders~\cite{baur2021autoencoders}, generative adversarial networks~\cite{di2019survey,schlegl2019f,wolleb2020descargan}, and diffusion models~\cite{mousakhan2024anomaly,sanchez2022healthy,wyatt2022anoddpm,xing2023diff} are trained to reconstruct normal images, with anomalies inferred from reconstruction errors. Guided diffusion has been explored for reconstruction-based anomaly detection in 2D medical imaging~\cite{wolleb2022diffusion}; however, this approach requires training both a diffusion model and a noise-dependent classifier from scratch, which limits generalization to new imaging domains without retraining both components. In addition, inference relies on a large number of sampling steps for each 2D slice. 

As an alternative, rectified flow models~\cite{rectified-flow} provide a deterministic formulation that enables substantially faster generation while preserving high-quality results. Latent rectified flow models such as MAISI-V2~\cite{zhao2025maisi} pretrained on CT volumes have demonstrated high image quality across diverse anatomies and resolutions compared to diffusion-based counterparts~\cite{guo2025maisi,wang20253d,xu2024medsyn}. MAISI-V2 supports both unconditional and conditional 3D CT image generation by integrating ControlNet~\cite{zhang2023adding}. However, ControlNet introduces important limitations: it relies on dense annotations, and extending the model to new conditioning signals requires additional retraining.
% over several weeks on high-end GPUs, making it expensive to train
The TFG framework enables guiding an off-the-shelf generative model using a pretrained differentiable target predictor~\cite{bansal2023universal,patel2025flowchef,ye2024tfg,yu2023freedom}. Unlike classifier-guidance~\cite{classifier_guidance}, where the predictor needs to be trained on noisy samples, the target predictor in TFG is trained only on clean samples. This opens the possibility of combining pretrained generative models and predictors, which is particularly appealing in medical imaging settings with limited annotations.

\paragraph{Contribution.} We investigate a plug-and-play framework for WSS in CT volumes by combining a pretrained 3D rectified flow generative model with a weakly supervised predictor via the TFG framework. To our knowledge, this is the first application of TFG-guided rectified flow models to weakly supervised medical image segmentation, and our main contribution lies in adapting and evaluating these components for volumetric pseudo-label generation. Unlike prior counterfactual anomaly detection approaches, which require training the generative model and/or auxiliary noise-conditioned classifiers, the method operates directly on an off-the-shelf generative model and requires only a differentiable predictor trained on clean samples with image-level labels. This enables counterfactual generation without modifying the generative model, allowing scalable reuse of large pretrained rectified flow models. In contrast to previous diffusion-based methods, the method operates in 3D, preserving volumetric anatomical consistency. We demonstrate that the generated pseudo-labels achieve improved agreement compared to attribution-based pseudo-label generation methods, and that they can support downstream fully supervised segmentation training, narrowing the performance gap to models trained on manual voxel-wise annotations. 


\iffalse
In this work, we perform weakly-supervised lung nodule segmentation in CT images by combining pretrained state-of-the-art rectified flow models with weakly-supervised predictor models for medical imaging, requiring minimal additional training. The predictor model can be obtained with weakly-supervised fine-tuning using image-level labels only, yet the proposed method produce implicit segmentations that demonstrate improved agreement with nodule size and contours compared to conventional methods for extracting implicit segmentation from trained predictors. Furthermore, the method operates fully in 3D, avoiding structural inconsistencies that commonly arise in slice-wise 2D approaches. 
\fi
% Ett huvudargument Hur man extraherar implicita segmenteringar på redan tränade modeller?
%- Motivera fördelar med generativa metod vs begränsningarna/problem med CAM baserade etc (får tillgång till genererade bilder med utan noduler etc) \\
% - Lyft att MAISI är en väldigt kapabel 3D FM model för 3D data som redan är tränad på massvis av data.
% - SKildra hur MAISI betingar på hela segmenterings masker och kräver träna ett COntrolNet på dessa osv.
% Lyft att vi vill jämföra mot metoder som inte kräver någon ytterliggare handpåläggning utöver fine-tuning. Tex att träna VAE, GANS, olika exotiska cam metoder som kräver extra träning osv eller ändring av modellarktitektur. Vi vill jämföra med andra metoder som extraherar implicitia segmenteringar från ett nätverk utan handpåöäggning.
% Vi vill visa hur man kan extrahera implicita segmentering frånn redan förtränade modeller. 