\section{Method}
% Intro to Overview
HARP involves three steps that we are explaining in order. First, we give a brief explanation of the \textbf{artifact detection}. Then, we explain the training of the novel inpainting diffusion model, which we leverage for \textbf{artifact localization} and \textbf{artifact restoration}.
\begin{figure}[t]
    \centering
    \includegraphics[width=0.7\textwidth]{figures/artifact_restoration.png}
    \caption{\textbf{Detailed overview of the methods deployed in HARP:} (I) Detecting artifacts with FastFlow. (II) Our training process of the inpainting diffusion denoising model $f_\theta$, with synthetic localization masks. (III) Artifact localization pipeline, for which $f_\theta$, together with SAM and DBSCAN, selects 5 localization masks. (IV) Artifact restoration inference with image and localization masks. The best restoration is selected by the artifact detection.} 
    \label{fig:method}
\end{figure}

\textbf{Artifacts Detection:}
In HARP, the initial step is the efficient and reliable detection of artifacts, which is crucial for effective restoration in clinical workflows. Utilizing AnomaLib, we explored various anomaly detection methods tailored for histopathology images. Among these, we identified FastFlow~\cite{yu2021fastflow}, which employs a Vision Transformer (ViT) Encoder to conduct normalizing flow on latent representations, as the most suitable for our needs. This method demonstrated superior performance in artifact detection and was easily integrated into our pipeline. The implementation of this method marks the beginning of our artifact detection phase in HARP, as illustrated in Figure~\ref{fig:method}, setting the stage for accurate histopathological image restoration and enhancing diagnostic reliability. 


\textbf{Training the Conditional Diffusion Model:}
The purpose of conditional generative models is to estimate the data distribution $p(y|x)$, where $x \in [0, 1]^{H \times W} $ is a conditional image and $y \in [0, 1]^{H \times W}$ is the target, with height $H$ and width $W$. Given this goal, Denoising Diffusion Models (DDMs) operate by reversing the process of gradually introducing Gaussian noise into image samples $y_0 \sim p(y_0)$. After a series of $T=1000$ diffusion steps, the resulting sequence $y_1, ..., y_T$ converges towards a Gaussian noise profile, particularly as $T$ approaches infinity. 
Given a well-calibrated variance schedule ${\beta_1, ..., \beta_T} \in (0,1)^T$, small steps and large $T$, we train a denoising model $f_\theta$ to reverse each step in this sequence. The reverse distribution is defined by the conditional:
$p_\theta(y_{t-1}|y_t,x) := \mathcal{N}(y_{t-1};\mu_\theta(x,y_t,t),\Sigma_\theta(x,y_t,t))$.% unconditional diffusion: p_\theta(y_{t-1}|y_t) := \mathcal{N}(y_{t-1};\mu_\theta(y_t,t),\Sigma_\theta(y_t,t))

Here, $\mu_\theta$ and $\Sigma_\theta$ are predictions given the model parameters $\theta$. To condition our denoising model $f_\theta$ for inpainting, we randomly mask out an area $\tilde{x}:= y_T * m + y_0 * (1-m)$, with a binary mask image $m \in \{0, 1\}^{H \times W}$. With $p(y_T|\tilde{x})$ being a known distribution, one can initiate the reverse process by sampling $y_T \sim \mathcal{N}(0,\mathbf{I})$ and iteratively applying the denoising model, thereby transforming it back into the conditional data distribution. The training of $f_\theta$ is simplified to the following loss~\cite{ho2020denoising}:

\begin{equation}
    \label{eq:L_simple}
    \mathbb{E}_{y_0 \sim p(y_0),\epsilon \sim \mathcal{N}(0,1),m,t}[ \|f_\theta(\tilde{x}, \underbrace{(\sqrt{\Bar{\alpha}_t}y_0 + \sqrt{1 - \Bar{\alpha}_t}\epsilon)* m  + y_0 * (1-m)}_{\tilde{y}_t},t) - \epsilon\parallel_1]
\end{equation}
with $\epsilon \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$, and the factorization $\alpha_t := 1 - \beta_t$ and $\Bar{\alpha}_t := \prod_{s=1}^t \alpha_s$. The novel approach of incorporating $\tilde{x}$ in each step of the denoising process is critical for our model's performance and allows for fewer RePaint~\cite{lugmayr2022repaint} cycles during artifact restoration.

\textbf{Artifact Localization:}
In order to localize potential artifacts without supervision, we gather an activation map of our \textbf{novel denoising diffusion model} $f_\theta$ when conditioned on the entire noised image $\bar{x}:=\bar{y}_T$. If we noise an artifact image $\bar{y}$ up to 900 steps and denoise again, all the major features tend to stay the same, but minor details on known structures change slightly. As our model does not know artifacts, minor details do not change on artifacts. Using this, we calculate the reconstruction error: $\|\bar{y}_0-y_0\|$, which aggregate as an activation map over 25 denoising repetitions. 
We prompt SAM~\cite{kirillov2023segment} and DBSCAN~\cite{ester1996density} to generate object localizations on the artifact image $y_0$. We remove masks that are too large($>60\%$) and small($<0.4\%$), as reconstruction would be unfeasible or not worthwhile, and sort out duplicated masks. Then, we score each mask by aggregating the activations over the mask area and select the top 5 binary masks $m_0, ..., m_4$ based on the lowest activation received. Finally, we dilate the masks to smooth the inpainting boundary of artifacts. Most importantly, this is - to the best of our knowledge - the first method to \textbf{localize histological artifacts without any supervision}.

%%%%%%%%%%%%%%%DRAFT%%%%%%%%%%%%%%%
%If we noise an artifact image upto 900 steps and denoise it, most of the major features tend to stay the same, but minor details like the cells structures, cell locations and overall texture tend to change slightly. These changes depends completely on the chosen random noise. These is true for everywhere except where the artifacts are located. Because since the model have not seen these artifacts, these artifact areas does not change much. Also the artifacts are surpressing any texture underneath, causing the details not to change much when denoised. We can use this randomness of the denoising process to roughly figure where the artifacts are located. For implementation, we are doing this noising and denoisning 25 different times, and at each time the denoised image is slightly different. We add up the difference between the artifact image and each denoised image and average them before applying threshold function to them. This averaged difference can be used to built heatmap/CAM to visualize the artifact location.

%For the segmentation of the artifact, we use a combination of 2 methods, which are SAM and DBSCAN. Both of the methods will provide list of viable masks where one of the mask might accurately only cover the artifact. Before choosing the mask with highest posibilty from the list, we do few preprocessing. Any masks that are bigger than 49 percent will, have its inverted self included on the list. This is because SAM sometimes can cover only surrounding of an artifact especially when they are large. We also removes very similar looking masks, using dice score (higher than 97 percent gets rejected). Masks that too small (covering only 0.5 percent or too large (covering 60 percent get rejected). We also run two different DBSCAN, one clusters low valued pixels and the other does clustering on high valued pixels. We only choose the biggest cluster from each of them.

%Now we try to score each of the mask using the averaged difference and choose 5 best scored mask to apply inpainting. The score is combination of awarding (lower density of heatmap at mask) and (higher area of mask) (area of activation within mask / area mask) / (0.25 * area mask) -> Lower score means better. 

%The chosen mask are slightly dilated (A bit more for DBSCAN mask) to avoid the boundary of the artifact affecting from affecting inpaint quality. Then, to choose the best mask from the top 5 masks, we solely rely on the artifact detection algorithm (FastFlow) on the inpainted image to rank the least probability of the inpainted image still having an artifact. 
%%%%%%%%%%%%%%%DRAFT%%%%%%%%%%%%%%%

\textbf{Image Restoration:}
In order to restorate an artifact image $\hat{\hat{y}}_0$, we condition on an image $\hat{x} := \hat{y}_T$ with $\hat{y}_t:=\hat{\hat{y}}_t * \hat{m} + \hat{y}_0 * (1-\hat{m})$, where $\hat{m}$ is one of the previously determined artifact localizations. To harmonize the masked area with the image, we apply a denoising procedure to our novel inpainting denoising diffusion model similar to RePaint~\cite{lugmayr2022repaint} with $resampling=3$ and $jump sampling=10$ to generate the restorated and artifact-free image $\tilde{y}$. The best reconstruction from the 5 gets chosen by FastFlow~\cite{yu2021fastflow}, having the least probability of containing an artifact.

%%%%%%%%%%%%%%%DRAFT%%%%%%%%%%%%%%%
%Trained on 1000 steps and inference on 250 steps

%The final repaint parameter that was opted for this implementation is resampling=3 and jump_sampling=10. ArtiFusion paper didnt specify what they used, but from the code it seems like they are using what repaint authors suggested which is resampling=10, jump=10.

%Ours: 250 -> 240 -> 230 -> 240 -> 230 -> 220
%Artifusion: 250 -> 240 -> 230 -> 240 -> 230 -> 240 -> 230 -> 240 -> 230 -> 240 -> 230 -> 220 
%%%%%%%%%%%%%%%DRAFT%%%%%%%%%%%%%%%