\section{Introduction}
% 개괄: 펜데믹 이후로 증가한 검체수와 MIL 알고리즘의 활약
The exponential increase in the demand for pathological diagnoses after the COVID-19 pandemic has significantly burdened a limited number of pathological specialists~\cite{bychkov2023constant,bray2021ever}. At the same time, the deep learning (DL) community has actively pursued alleviating this workload by developing automated diagnostic models to assist pathological decision making~\cite{echle2021deep}. Recently, Multiple Instance Learning (MIL), which uses only weak labels at the Whole Slide Image (WSI) level for model training, not pixel-level annotations by experts, has emerged as the golden standard in digital pathology diagnosis~\cite{zhang2025patches}.



\begin{figure}[t!]
\centering
\includegraphics[width=0.95\linewidth]{figs/overview}
\caption{(a) A diagram illustrating class relationships and hierarchy. We denote the structure from root to leaves as $\mathcal{H}=0$ to $\mathcal{H}=2$. (b) The proposed framework offers a two-phase design, which is trained end-to-end manner.} \label{fig1}
\end{figure}
% enabling users to choose a suitable MIL architecture based on their requirements. Since MILs operate within a unified logic, the entire framework is trained end-to-end manner.




% Multiclass WSI 진단에서 발생하는 문제들과 Hierechy Training이 제안됨
Although MIL offers promising results for expert assistance in clinical settings, it reveals shortcomings in multiclass scenarios, as most MIL studies have been conducted in binary settings~\cite {shao2021transmil,zhang2022dtfd}. Unlike binary classification, a multiclass task commonly involves a hierarchy because lower-level classes can be organized into groups of higher levels~\cite{bertinetto2020making}, potentially reflecting priorities or different clinical urgencies between those higher groups. The DL community has made several attempts to leverage hierarchy, such as loss-centric methodologies, which penalize predictions based on class relationships~\cite{bertinetto2020making,chang2021your}. Structure-based methods try to establish these class relationships within the framework~\cite{redmon2017yolo9000}, graphs~\cite{brust2019integrating}, and hyperbolic space~\cite{nickel2017poincare}. The underlying objective of these inter-hierarchy approaches is to prevent networks from making critical fine-level errors in classification, which correspond to type II errors in medical field (\textit{e}.\textit{g}., A model might mistake a stage of tumor, but should not confuse a cancerous cell with a normal one).





% 선행 연구의 문제점
Despite previous attempts to address the hierarchy issues, the inherent properties of WSIs impose limitations on conventional multiclass hierarchy approaches. Although WSI training uses only one label, clinical inference often involves multiple symptoms, which requires pathologists to identify the most urgent problem ~\cite{wong2022current,williams2017future}. As models are trained with the assumption of a strict label, they are prone to concentrate on the most probable class, rather than the most hazardous sign~\cite{guo2017calibration,goyal2017multi}. We refer to this issue, 
ignoring priority within the horizontal hierarchy, as an intra-hierarchy problem. 
% In addition, experts often check the sampling site when making finer-grained diagnoses. However, frameworks that selectively utilize this idea in MIL are underexplored. 





% Despite the preceding approaches, the multiclass hierarchy considered so far is an inter-hierarchy from coarse to fine-grained of vertical class relationships. Thus, the priority between horizontal classes within the same hierarchy has been overlooked. We refer to this priority issue as intra-hierarchy. While WSIs for training are limited to a single, weak label due to labeling costs, real-world inference necessitates that the model diagnoses by considering the intra-hierarchy, much like a pathologist. Besides, pathologists check the sampling site when making diagnoses at a finer-grained level. However, frameworks that selectively utilize this in MIL diagnosis are underexplored.
% Model predictions that ignore intra-hierarchy, which is prone to underrate more urgent diagnoses, can make pathologists distrust the DL system in real-world applications.
% In addition, WSIs frequently include multiple symptoms, where the more urgent case is prioritized by pathologists~\cite{raab2005clinical}. Consequently, the model should diagnose the more critical class within the same hierarchy level in such scenarios.





% 우리의 제안
We address the hierarchy issues in multiple ways. For inter-hierarchy, we utilize a probability alignment term between each hierarchy. Concurrently, we propose a probability adjustment that allows the coarse-grained hierarchy to influence the predictions of the fine-grained hierarchy. We also present an implicit feature remix to handle the intra-hierarchy problem. Given that the input of MIL is a set of multiple instances, we implicitly train class priority by mixing instances from two samples. We have confirmed that it enables the model to focus on the more urgent class in a complex test set where two cases are mixed. The proposed framework flexibly employs MIL architectures and leverages multimodal data. 
\begin{table}[t]
\centering
\caption{Data distribution over the classes. The values in parentheses represent the number of extra test samples.}
\label{tab:data}
\resizebox{0.65\columnwidth}{!}{%
\begin{tabular}{c|ccccccc|c}
\hline
           & TA     & TVA   & TSA    & HP    & SSL    & IP & LP  & $\sum$      \\ \hline
Train      & 317    & 232   & 300    & 257   & 130    & 99 & 266 & 1,601    \\
Validation & 69     & 51    & 65     & 55    & 29     & 21 & 57  & 347      \\
Test       & 164(95) & 57(6) & 84(18) & 64(8) & 84(55) & 21 & 57  & 531(182) \\ \hline
\end{tabular}%
}
\end{table}
Experiments conducted on real-world clinical data show that the proposed method outperforms the extant methods while properly respecting multi-class hierarchies. Through ablation studies, we confirm the contribution of each component. Additional qualitative evaluations examine the predictions of the proposed methodology on challenging diagnostic images.