Evaluating cell AI foundation models in kidney pathology with human-in-the-loop enrichment

Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Yu Wang, Shilin Zhao, Catie Chang, Mitchell Wilkes, Agnes Fogo, Mengmeng Yin, Haichun Yang, Yuankai Huo

Published: 24 Nov 2025, Last Modified: 26 Feb 2026Communications MedicineEveryoneRevisionsCC BY-SA 4.0

Abstract: Large-scale artificial intelligence foundation models have emerged as promising tools for addressing healthcare challenges, including digital pathology. While many have been developed for complex tasks such as disease diagnosis and tissue quantification using extensive and diverse datasets, their readiness for seemingly simpler tasks, such as nuclei segmentation within a single organ (for example, the kidney), remains unclear. This study answers two questions: How good are current cell foundation models? and How can we improve them? We curated a multi-center, multi-disease, and multi-species dataset sampled from 2542 kidney whole slide images. Three state-of-the-art cell foundation models—Cellpose, StarDist, and CellViT—were evaluated. To enhance performance, we developed a human-in-the-loop strategy that distilled multi-model predictions, improving data quality while reducing reliance on pixel-level annotation. Fine-tuning was performed using the enriched datasets, and segmentation performance was quantitatively assessed. Here we show that cell nuclei segmentation in kidney pathology still requires improvement with more organ-targeted foundation models. Among the evaluated models, CellViT achieves the highest baseline performance, with an F1 score of 0.78. Fine-tuning with enriched data improves all three models, with StarDist achieving the highest F1 score of 0.82. The combination of the foundation model-generated pseudo-labels and a subset of pathologist-corrected “hard” patches yields consistent performance gains across all models. This study establishes a benchmark for the development and deployment of cell AI foundation models tailored to real-world data. The proposed framework, which leverages foundation models with reduced expert annotation, supports more efficient workflows in clinical pathology. The rise of digital pathology has transformed traditional histology slides into vast collections of high-resolution images, enabling medical research on a much larger scale. However, analysing this data remains challenging. Foundation models—advanced AI systems trained on diverse datasets—offer a promising solution, but their ability to perform simpler yet essential tasks, such as identifying cell nuclei in kidney tissue, is unclear. We evaluated three leading models on a large, curated kidney image dataset and found that cell nuclei segmentation in kidney pathology still requires improvement with more organ-targeted foundation models. To enhance performance, we introduced a “human-in-the-loop” approach that combines multiple foundation models with expert labeling of only the most difficult cases, improving accuracy, reducing manual labeling, and enabling more efficient pathology workflows. Guo et al. evaluate three cutting-edge cell foundation models on a diverse kidney nuclei dataset and develop a training strategy leveraging multiple cell foundation models to reduce pathologist labeling costs. Findings reveal that current histopathology models need organ-targeted improvements, and the framework consistently boosts performance.

External IDs:doi:10.1038/s43856-025-01205-x