KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers

Published: 22 Jul 2025, Last Modified: 08 Aug 2025COMPAYL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Banff Tubulitis Scoring, Inter-Rater Reliability, Reproducibility, Digital Pathology, Deep Learning
TL;DR: We provide the first automated approach for fine-grained Banff tubulitis scoring, using weakly supervised learning from slide level labels.
Abstract: Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa_w$=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubule-centric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa_w = 0.75$ ($4.4\times$ improvement over expert agreement), 83.3\% within-one-grade accuracy, and strong correlation with expert scores ($r = 0.81$). Top-attended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on \href{https://github.com/abrar-rashid/kidney-grader}{github}.
Submission Number: 15
Loading