Abstract: Highlights•CvT2DistilGPT2 generates reports with a higher similarity to radiologist reports.•The Convolutional vision Transformer is best for warm starting the encoder.•GPT2 is better for warm starting the decoder than BERT.•Domain-specific checkpoints are better for than general-domain checkpoints.•The best performing checkpoint depends on multiple factors.
Loading