\section{Introduction}\label{sec:intro}

Digital pathology, powered by AI, is revolutionizing cancer diagnosis by automating cell detection and classification \cite{Song_2023}. However, computational efficiency and robustness remain key challenges for real-world deployment. Pathologists analyze Whole Slide Images (WSIs) across multiple histological stains, such as Hematoxylin and Eosin (H\&E) and immunohistochemical markers like Ki-67, to assess tumor characteristics. This manual process is time-consuming and subject to interobserver variability \cite{corona1996interobserver, dano2020interobserver}, making automated solutions essential for improving efficiency and consistency in clinical workflows.

Cell detection and classification are fundamental tasks in computational pathology, as accurate quantification of different cell types informs diagnostic and prognostic decisions. While segmentation aids visualization, classification remains the primary clinical objective. Convolutional Neural Networks (CNNs) are widely used for these tasks, with U-Net ~\cite{ronneberger2015u} being a popular choice due to its encoder-decoder structure and skip connections. However, U-Net struggles with overlapping and clustered cells, leading to the development of more advanced models like HoVer-Net ~\cite{graham2019hover}, which employs a three-decoder architecture: one for binary segmentation, another for horizontal-vertical (HV) vector field prediction to separate clustered cells, and a third for cell classification. While this multi-task approach has been widely adopted~\cite{cellvit, tommasino2024nulite, chen2025histonext}, maintaining three decoder heads increases computational cost and inference time, limiting clinical feasibility.

Beyond segmentation and classification, stain variability poses an additional challenge. Differences in staining protocols, scanner settings, and tissue preparation introduce significant variations across datasets, affecting model generalization. Robust models must be resilient to these variations to ensure reliable performance across different laboratories.

In this paper, we propose \textit{DualU-Net}, a streamlined deep learning architecture for cell classification and segmentation across multiple histological stains. Our primary contribution is demonstrating that \textit{two decoder heads are enough}, challenging the need for HoVer-Net’s three-decoder scheme. We dispense with the binary segmentation branch and we replace the HV vector field branch with a Gaussian-based centroid estimation approach. Our key contributions include:  
\textit{i)} a dual-decoder architecture proving that two heads are sufficient for cell detection, classification and segmentation in multiple stains;  
\textit{ii)} comparable classification and detection performance, aligning with pathologists’ focus on cell quantification over precise segmentation contours;  
\textit{iii)} fast and efficient inference, making the model suitable for real-time clinical deployment;  
\textit{iv)} robustness to stain variations, ensuring consistent performance across different histological markers; and  
\textit{v)} real-world deployment, with DualU-Net integrated into the DigiPatICS project~\cite{digipatics} and deployed across eight hospitals within the Institut Català de la Salut (ICS) of Catalunya. 