Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks
Abstract: Highlights•Captioning describes spatio-temporal ultrasound content with sonographer spoken words•Word2vec embedding outperforms BioWordVec embedding in gaze-assisted captioning•Gaze helps achieve higher scores on evaluation metrics, specifically B1 to B4 and F1
Loading