Keywords: Histopathology, caption prediction, adenocarcinoma, stomach
TL;DR: We compiled and publicly released a captioned dataset of gastric adenocarcinoma H&E-stained patches, and we trained a baseline attention model to predict the captions.
Abstract: Computational histopathology has made significant strides in the past few years, slowly getting closer to clinical adoption. One area of benefit would be the automatic generation of diagnostic reports from H&E-stained whole slide images which would further increase the efficiency of the pathologists' routine diagnostic workflows. In this study, we compiled a dataset of histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens which we extracted from diagnostic reports and paired with patches extracted from the associated whole slide images. The dataset contains a variety of gastric adenocarcinoma subtypes. We trained a baseline attention-based model to predict the captions from features extracted from the patches and obtained promising results. We make the captioned dataset of 260K patches publicly available.
Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.
Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
Paper Type: validation/application paper
Primary Subject Area: Application: Histopathology
Secondary Subject Area: Detection and Diagnosis
Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2202.03432/code)