Multimodal and Multitask Approaches for Cataract Grading

Prakanshul Saxena, Gagan Raj Gupta, Madhur Bhattad

Published: 01 Jan 2024, Last Modified: 07 Oct 2024COMAD/CODS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cataracts are the principal cause of blindness and moderate-severe visual impairment in developing countries as well as globally. This work presents an approach to perform multi-task cataract grading: Nuclear(NCG), Posterior(PCG), and Cortical Cataract Grading(CCG), by leveraging multi-modal inputs of the eye captured under varying illuminations, through the use of attention-based neural network architecture. To the best of our knowledge, we are the first to incorporate multi-modal inputs to grade and label all types of cataracts, with a single smaller model. The training data comprises 226 patients and the test data has 50 patients and is labeled by expert doctors according to LOCS-III grading conventions. We have performed rotations to augment the dataset for training and testing because that does not alter the grade of cataracts. In this work, with the help of multi-modal inputs, we surpass the SOTA accuracy by achieving an exact accuracy of 97.83% for NCG. We also achieved a new SOTA exact accuracy - 99.69% and 99.39% for CCG and PCG respectively. The same approach is extended for accurate multi-label classification of other types of cataracts: Hyper-Mature Senile Cataract, Mature Senile Cataract, and Posterior Polar Cataract. The models are embedded in a mobile application and have been successfully used in clinical settings.