Abstract: Early diagnosis of rectal cancer is essential to improve patient survival. Existing diagnostic methods mainly rely on complex MRI as well as pathology-level co-diagnosis. In contrast, in this paper, we collect and annotate for the first time a rectal cancer ultrasound endoscopy video dataset containing 207 patients for rectal cancer video risk assessment. Additionally, we introduce the Rectal Cancer Video Risk Assessment Network (RCVA-Net), a temporal logic-based framework designed to tackle the classification of rectal cancer ultrasound endoscopy videos. In RCVA-Net, we propose a novel adjacent frames fusion module that effectively integrates the temporal local features from the original video with the global features of the sampled video frames. The intra-video fusion module is employed to capture and learn the temporal dynamics between neighbouring video frames, enhancing the network’s ability to discern subtle nuances in video sequences. Furthermore, we enhance the classification of rectal cancer by randomly incorporating video-level features extracted from the original videos, thereby significantly boosting the performance of rectal cancer classification using ultrasound endoscopic videos. Experimental results on our labelled dataset show that our RCVA-Net can serve as a scalable baseline model with leading performance. The code of this paper can be accessed at: https://github.com/JsongZhang/RCVA-Net.
Loading