\section{Implementation Details}\label{ap:implementation}

The models were trained using Adam Optimizer with a base learning rate of 0.0001 for 256\(\times\)256 images and 0.002 for 1024\(\times\)1024 images. Training was performed for 100 epochs with a weight decay of 0.0001, and the learning rate was reduced by a factor of 0.1 at epochs 70 and 90. As commented in Section \ref{sec:methods}, a hyperparameter search determined that equal contributions from all loss components yielded the best results, so all loss weights (\(\lambda_{\text{dice}}\), \(\lambda_{\text{ce}}\), and \(\lambda_{\text{mse}}\)) were set to 1. Gaussian density maps for cell centroid estimation were generated using a fixed standard deviation of \(\sigma = 5\).  

For the PanNuke dataset, the best checkpoint was selected based on the detection and classification metrics on the validation fold. In contrast, for the Ki-67 and CoNSeP datasets, where validation sets are unavailable, the final model at the 100th epoch was used. The models were trained on 2 NVIDIA GeForce RTX 3090 GPUs (24 GB each), using a batch size of 4 per GPU for 1024\(\times\)1024 images and 8 per GPU for 256\(\times\)256 images. For data augmentation, horizontal and vertical flips and 90-degree rotations were applied, each with a probability of \(p = 0.5\).  

