\section{Introduction}

Spatial transcriptomics (ST) provides a novel view for correlating pathological tissue structures with their spatial gene expression patterns~\cite{burgess2019spatial, asp2019spatiotemporal,he2020integrating, zhu2024asign}. This approach advances the development of effective treatment strategies~\cite{asp2020spatially}. Studies have demonstrated a strong correlation between features of pathological images and their gene expression patterns~\cite{badea2020identifying}. Such findings have motivated the development of image-based methods for predicting gene expression, offering a non-destructive and cost-effective alternative to traditional sequencing techniques.

In recent years, the widespread application of deep learning methods in medical image analysis~\cite{ke2023clusterseg,zhu2023anti,qu2025post} has provided multiple useful tools. These methods have facilitated the integration of pathology images with other data modalities by automating image interpretation processes~\cite{deng2025casc, zhu2025cross}. Currently, several studies have employed methods such as convolutional neural networks (CNNs)~\cite{he2020integrating, yang2023exemplar} and graph neural networks (GNNs)~\cite{pang2021leveraging, zeng2022spatial, jia2024thitogene} to predict spatial transcriptomic expression at the spot level with low resolution. These approaches exploit spatial dependencies~\cite{zeng2022spatial, pang2021leveraging} and image similarities~\cite{xie2024spatially,yang2023exemplar} inherent in pathological images, thus integrating information to optimize the fusion of image features. Such advances address the challenges of scarce high-quality spatial transcriptomic data and the high cost of acquisition.

Continuous advancements in ST sequencing technology~\cite{staahl2016visualization, wang2018three, eng2019transcriptome} have significantly improved the resolution of existing ST data, as is shown in Figure~\ref{fig:Demo}, which has progressed from the initial 55~$\mu$m spots to higher resolutions, such as Visium HD data with bin diameters of 8~$\mu$m or even 2~$\mu$m. Such advancement enables a more comprehensive analysis of the relationship between pathological tissues and gene expression at the single-cell level~\cite{benjamin2024multiscale,oliveira2024characterization,janesick2023high}. However, current deep-learning methods face an information bottleneck when dealing with high-resolution HD data~\cite{tishby2015deep}. Specifically, the limited information from low-resolution input images is insufficient to effectively support the prediction of high-dimensional gene expression. The features extracted by these models may lack the complexity required to represent the intricate details of high-resolution, high-dimensional gene expression data.

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{Figure/Figure1_demo.png}
    \caption{\textbf{Spatial transcriptomics data at different resolutions.} (A) Traditional low-resolution 10X Visium v2 barcoded spots, where spots are discretely distributed with a diameter of 55~$\mu$m. (B) Current high-resolution 10X Visium HD barcoded squares, where bins are densely distributed with a diameter of 8~$\mu$m.}
    \label{fig:Demo}
\end{figure*}

To address this issue, this paper proposes MagNet, a Multi-Level Attention Graph Network designed for accurate prediction of high-resolution HD data. MagNet integrates information across multiple resolutions, including the bin, spot, and region levels, through cross-attention layers. MagNet also extracts and combines features from neighboring regions with Graph Attention Network (GAT) and Transformer layers. Thus, our proposed framework overcomes the information bottleneck posed by low-resolution inputs when predicting high-resolution, high-dimensional gene expression by efficient extraction and integration of multisource and multilevel features. Furthermore, the model incorporates cross-resolution constraints on gene expression within the same region, further enhancing its performance in HD gene expression prediction.
Our contributions can be summarized in three aspects:

\textbullet\ We present MagNet, a Multi-Level Attention Graph Network designed for accurate prediction of high-resolution HD data. To our knowledge, it is the first model dedicated to HD-level gene expression prediction.

\textbullet\ Our proposed framework leverages cross-attention layers and GAT-Transformer blocks to effectively extract and integrate multi-source and multi-level features, tackling the information bottleneck of low-resolution inputs in predicting high-resolution ST expression.

\textbullet\ We provide our model as an open-source tool, benchmarking and providing a systematic evaluation on a privately-collected kidney HD ST dataset and a public colorectal cancer HD ST dataset.