Abstract: Generating radiology reports from medical images has garnered sufficient attention in the
research community. While existing methods
have demonstrated promise, they often tend to
generate reports that are factually incomplete
and inconsistent, fail to focus on informative
regions within an image, and impose strong
annotation assumptions, such as bounding box
annotations, image level annotations (which
can be challenging to obtain) for model training. In this paper, we propose MediVLM, a
vision language model (VLM) for radiology
report generation from medical images. The
proposed model consists of a pre-trained object detector to extract the salient anatomical
regions from the images, an image encoder, a
text encoder, a module to align the visual and
text representations, a cross attention layer to
fuse the two representations and finally, a transformer based decoder to generate the final report. MediVLM can generate radiology reports
even when no reports are available for training;
this is an extremely useful feature, as curating
such reports is a labor-intensive task. Further,
it computes a severity score (depicting the seriousness of a patient’s medical condition) from
the generated radiology reports, which can be
used to prioritize patients who need immediate medical attention. Our extensive empirical
analyses on three benchmark datasets corroborate the promise and potential of our method
against competing baselines. Our code is opensourced in our project webpage at: https:
//sites.google.com/view/medivlm/home
Loading