Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks

Mohammad Alsharid, Yifan Cai, Harshita Sharma, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Published: 2022, Last Modified: 08 Oct 2025Medical Image Anal. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Captioning describes spatio-temporal ultrasound content with sonographer spoken words•Word2vec embedding outperforms BioWordVec embedding in gaze-assisted captioning•Gaze helps achieve higher scores on evaluation metrics, specifically B1 to B4 and F1