
\begin{figure*}[htbp]
    \centering
\includegraphics[width=0.6\textwidth]{Figure_pdf/Figure1_proposal.pdf}
    
    \caption{\textbf{Comparison between traditional ST sampling and our active sampling.} 
\textit{Left:} Traditional ST methods rely on fixed-grid sampling regardless of biological importance, leading to redundant measurements in similar regions and inefficient use of sequencing budgets. \textit{Right: } Our proposed approach actively selects informative spots by incorporating single-cell prior knowledge, reducing redundancy while preserving biologically diverse regions.}

    \label{fig:proposal}
\end{figure*}

\section{Introduction}
Spatial transcriptomics (ST) provides a new perspective for studying the relationship between pathological tissue structures and their spatial gene expression patterns~\cite{burgess2019spatial, asp2019spatiotemporal,asp2020spatially}. However, acquiring ST data remains relatively expensive~\cite{choe2023advances}, which together pose challenges for large-scale data collection in practice~\cite{he2020integrating, zhu2025asign}. 

Histology features exhibit strong correlations with gene expression patterns~\cite{badea2020identifying}, providing a foundation for image-based gene expression prediction~\cite{he2020integrating, zhu2025computer}. Deep learning methods have begun leveraging histology images to infer ST expression profiles of each tissue slide~\cite{xie2024spatially, yang2023exemplar, zhu2025img2st, zhu2025magnet}. Representative approaches include regression-based ST-Net~\cite{he2020integrating}, HisToGene~\cite{pang2021leveraging}, and EGN~\cite{yang2023exemplar}, which directly predict expression values from local image appearance; and retrieval-based vision–omics contrastive learning methods, such as BLEEP~\cite{xie2024spatially} and mlxSTExp~\cite{min2024multimodal}.


However, traditional fixed-grid sampling inevitably acquires many spatially adjacent regions with highly similar morphology, leading to substantial molecular redundancy and reduced biological diversity. Its non-selective nature also results in the inclusion of biologically uninformative areas~\cite{schroeder2025scaling,grases2025practical}. Consequently, the effective information density of the dataset is low, causing a mismatch between sequencing cost and informative yield, thus constraining the performance and scalability of image-based ST prediction methods.

The limited availability of ST data motivates the integration of external biological knowledge to compensate for inherent constraints in coverage and data quality. In particular, the single-cell sequencing field provides substantially richer priors, supported by large-scale datasets~\cite{regev2017human} and powerful foundation models~\cite{cui2024scgpt}, with sample sizes typically exceeding ST by more than an order of magnitude~\cite{svensson2018exponential}. Single-cell profiles resolve cellular types, states, and regulatory programs~\cite{cao2019single,stuart2019comprehensive}, offering mechanistic insight into gene expression variation across tissues. Incorporating such fine-grained priors into ST analysis introduces valuable structural guidance and biological constraints, helping mitigate challenges related to limited sampling.

To achieve this, we introduce SCR$2$-ST, a unified framework that leverages single-cell prior knowledge to guide both efficient data acquisition and expression prediction. Our framework comprises two components. First, we develop a single-cell guided reinforcement learning-based (SCRL) active sampling strategy that jointly leverages single-cell priors and spatial tissue cues to construct a biological reward function, which enables the policy network to adaptively prioritize informative regions while avoiding redundant measurements, maximizing the utility of each sequenced spot under constrained budgets. Within this framework, we further propose a hybrid regression-retrieval prediction network SCR$2$Net, which integrates regression modeling with retrieval-augmented inference. The retrieval branch aggregates signals from morphologically similar spots, while a majority cell-type filtering mechanism suppresses unreliable matches, balancing global structural learning with context-aware expression transfer. Our contributions can be summarized as fourfold:


\begin{itemize}
    \item We introduce SCR$^2$-ST, a pioneering and generalizable framework that leverages single-cell prior as an auxiliary source to overcome the scarcity of ST data. It jointly enables efficient data acquisition and accurate expression prediction for ST profile. 
    
    \item Within this framework, we propose a reinforcement learning-based (SCRL) active sampling strategy that prioritizes informative regions under constrained sequencing budgets through biologically grounded reward signals.
    
    \item Building upon this, we develop SCR$^2$Net, a hybrid prediction network that integrates direct regression with retrieved soft label supervision, with a majority cell-type filtering module that suppresses unreliable matches in heterogeneous tissues.

    \item We provide a systematic benchmark across three public ST datasets under varied sampling budgets, with code and tools released to support reproducible research.

\end{itemize}
