Meta-research papers in medical imaging often focus on \textbf{methods}, for example surveys on deep learning~\cite{litjens2017survey}, different types of supervision~\cite{cheplygina2019not}, human-in-the-loop methods \cite{budd2021survey} and so forth. As a by-product of annotating and categorizing papers, some surveys also provide lists of commonly used datasets~\cite{calli2021chestsurvey}. 

More recently some \textbf{dataset}-focused reviews started to emerge, in particular for dermatology \cite{daneshjou2021lack,wen2022characteristics} and ophtalmology \cite{khan2021global}. These reviews focus on the type of data that is available, and find various biases in the patient populations, and/or that metadata about the patient demographics is missing. However these papers do not examine dataset use. 

Perhaps at the intersection of datasets and methods, there is work focusing on challenges \cite{eisenmann2022biomedical} which review participation in medical image competitions at MICCAI and ISBI. Such competitions are often seen as one of the drivers of publicly available datasets, but the impact of these datasets beyond these competitions is not known. 

The closest to our work are studies that examine dataset usage in other published works. 
\cite{koch2021reduced} analyse dataset usage on PapersWithCode across various applications of machine learning, and find that the diversity of datasets used is decreasing. Within medical imaging, Heller et al \cite{heller2019role} examined the role of publicly available data in MICCAI papers between 2014 and 2018, and found among others that over 20\% of papers using public data did not cite the dataset. Simkó et al \cite{simko2022reproducibility} examined reproducibility in  MIDL papers between 2018 and 2022 and found that papers using public datasets are becoming more common but without proper citations or links.

