\section{Related Work}


\label{sec:relatedwork}

Prior computational models trained on high-resolution medical images generally downsize images using interpolation methods, which are provided as a part of standard machine learning packages including PyTorch and OpenCV. However, such approaches have been shown to result in degraded performance, raising the possibility of using learned approaches, such as autoencoders \cite{sabottke2020resolution}. Autoencoders are classic machine learning models consisting of encoders, which encode input images as downsized latent representations, and decoders, which decode latent representations back to the pixel space \cite{kingma2013vae,he2021masked,vincent2008}. In this work, we specifically focus on autoencoders that generate structured latent representations interpretable as downsized images. We note here that many standard autoencoders instead yield latent representations that take the form of vectors \cite{he2021masked,zhou2023mae}; we consider these models out of scope for this work, since resulting latents cannot be used as drop-in replacements for images in CAD pipelines and will require downstream architecture modifications (particularly for fully-convolutional CAD models). Additionally as mentioned in the introduction (Section \ref{sec:intro}), it is unclear whether existing natural image autoencoders \cite{rombach2022high,kingma2013vae,esser2021taming,openimages} can effectively capture fine-grained clinically-relevant features. Therefore, in our work, we focus on developing and evaluating large-scale, generalizable autoencoders capable of operating on diverse features, anatomical regions, and modalities.

