\documentclass{midl} 

\usepackage{mwe} 
\usepackage{booktabs} 
\jmlryear{2026}
\jmlrworkshop{Full Paper -- MIDL 2026}
\jmlrvolume{-- nnn}
\editors{Accepted for publication at MIDL 2026}

\title[Test-Time Scaling in Clinical Decision Making]{Test-Time Scaling in Clinical Decision Making}


\midlauthor{\Name{Ji Young Byun\nametag{$^{1,2}$}} \Email{jbyun13@jhu.edu}\\
\Name{Young-Jin Park\nametag{$^{3}$}} \Email{youngp@mit.edu}\\
\Name{Navid Azizan\nametag{$^{3}$}} \Email{azizan@mit.edu}\\
\Name{Rama Chellappa\nametag{$^{1,2}$}} \Email{rchella4@jhu.edu}\\
\addr $^{1}$ Johns Hopkins University, Baltimore, MD 21218 \\
\addr $^{2}$ Johns Hopkins University, School of Medicine, Baltimore, MD 21218 \\
\addr $^{3}$ Massachusetts Institute of Technology, Cambridge, MA 02139
}


\usepackage{graphicx,verbatim}
\usepackage{makecell}
\usepackage{multicol}
\usepackage{multirow}
\usepackage{siunitx}
\usepackage{enumitem}

\newcommand{\gain}[1]{\textcolor{teal!60!black}{\scriptsize\,(↑\,#1)}}
\newcommand{\loss}[1]{\textcolor{red!70!black}{\scriptsize\,(↓\,#1)}}
\newcommand{\greedy}{Single}
\newcommand{\scaling}{\textbf{+TTS}}

\newtheorem{prop}{Proposition}

\newcommand{\MV}{\hat{y}_{\mathrm{MV}}}


% --- COLOR DEFINITIONS ---
\usepackage[most]{tcolorbox}

% % --- PAGE GEOMETRY ---
% \geometry{a4paper, margin=1in}
% \hypersetup{
%     colorlinks=true,
%     linkcolor=blue,
%     filecolor=magenta,      
%     urlcolor=cyan,
% }

\definecolor{systemcolor}{HTML}{E8F5E9}     
\definecolor{usercolor}{HTML}{E3F2FD}       
\definecolor{assistantcolor}{HTML}{FFF3E0}  
\definecolor{tokencolor}{HTML}{616161}      

% --- CUSTOM TCOLORBOX ENVIRONMENTS ---
\newtcolorbox{systembox}{
    colback=systemcolor,
    colframe=systemcolor!70!black,
    fonttitle=\bfseries,
    coltitle=black,
    arc=4mm,
    boxrule=0.5pt,
    title=System Prompt,
}

\newtcolorbox{userbox}{
    colback=usercolor,
    colframe=usercolor!70!black,
    fonttitle=\bfseries,
    coltitle=black,
    arc=4mm,
    boxrule=0.5pt,
    title=User Input,
}

\newtcolorbox{assistantbox}{
    colback=assistantcolor,
    colframe=assistantcolor!70!black,
    fonttitle=\bfseries,
    coltitle=black,
    arc=4mm,
    boxrule=0.5pt,
    title=Assistant Response (Prefix),
}

% --- CUSTOM COMMANDS ---
\newcommand{\token}[1]{{\color{tokencolor}\texttt{#1}}}

\input{notation}

\begin{document}

\maketitle

\begin{abstract}
    Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning and knowledge-intensive tasks, yet their potential for clinical decision making through test-time scaling (TTS) remains largely unexplored. While TTS has shown promise in improving reasoning performance by leveraging additional inference-time computation, its effectiveness in the medical domain has not been systematically investigated. 
    This gap is further exacerbated by the impracticality of supervised fine-tuning for clinical reasoning tasks, owing to limited data availability and high annotation costs.
    %%
    In this work, we present a comprehensive study of TTS for clinical decision making. 
    %%
    We systematically investigate the interaction between TTS and inference strategies, including direct answering, chain-of-thought prompting, and two-stage reasoning. We generate multiple candidate outputs in parallel using large reasoning models and aggregate them via self-consistency decoding.
    %%
    This approach does not need any supervision while it leverages additional inference-time computation to improve the performance.
    %
    We provide a comprehensive empirical evaluation across both text-based medical question answering benchmarks and medical imaging modalities, demonstrating consistent improvements over single-pass inference baselines with performance gains of up to 30 percentage points.
    %
    %
    Finally, we provide an analytical characterization of TTS, deriving scaling laws that describe how performance improves with the number of samples and identifying conditions under which TTS yields reliable gains, along with empirical validation on diverse medical decision-making tasks.

\end{abstract}

\begin{keywords}
Medical Image Diagnosis, Large Reasoning Model, Vision Language Model, Test-time Scaling
\end{keywords}

\input{sec/1_intro}
\input{sec/2_related}
\input{sec/3_problem}
\input{sec/4_proposed}
\input{sec/5_results}
\input{sec/6_conclusion}
\midlacknowledgments{Ji Young Byun was supported in part by a discretionary fund at the Johns Hopkins Whiting School of Engineering. Young-Jin Park and Navid Azizan acknowledge support from the MIT-Amazon Science Hub, the MIT-IBM Watson AI Lab, Jane Street, and MathWorks.}

\bibliography{midl26_175}

\newpage
\appendix
\input{sec/7_appendix}


\end{document}
