Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks

Neil Band; Tim G. J. Rudner; Qixuan Feng; Angelos Filos; Zachary Nado; Michael W Dusenberry; Ghassen Jerfel; Dustin Tran; Yarin Gal

Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks

Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Michael W Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal

Published: 11 Oct 2021, Last Modified: 04 May 2025NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: Bayesian Deep Learning, Bayesian Neural Networks, Variational Inference, Uncertainty Quantification

Abstract: Bayesian deep learning seeks to equip deep neural networks with the ability to precisely quantify their predictive uncertainty, and has promised to make deep learning more reliable for safety-critical real-world applications. Yet, existing Bayesian deep learning methods fall short of this promise; new methods continue to be evaluated on unrealistic test beds that do not reflect the complexities of downstream real-world tasks that would benefit most from reliable uncertainty quantification. We propose the RETINA Benchmark, a set of real-world tasks that accurately reflect such complexities and are designed to assess the reliability of predictive models in safety-critical scenarios. Specifically, we curate two publicly available datasets of high-resolution human retina images exhibiting varying degrees of diabetic retinopathy, a medical condition that can lead to blindness, and use them to design a suite of automated diagnosis tasks that require reliable predictive uncertainty quantification. We use these tasks to benchmark well-established and state-of-the-art Bayesian deep learning methods on task-specific evaluation metrics. We provide an easy-to-use codebase for fast and easy benchmarking following reproducibility and software design principles. We provide implementations of all methods included in the benchmark as well as results computed over 100 TPU days, 20 GPU days, 400 hyperparameter configurations, and evaluation on at least 6 random seeds each.

TL;DR: This paper presents an easy-to-use, expert-guided, open-source suite of diabetic retinopathy detection benchmarking tasks for Bayesian deep learning.

Supplementary Material: pdf

URL: https://github.com/google/uncertainty-baselines/tree/main/baselines/diabetic_retinopathy_detection

Contribution Process Agreement: Yes

Dataset Url: https://github.com/google/uncertainty-baselines/tree/main/baselines/diabetic_retinopathy_detection

License: Code: Apache License 2.0 EyePACS and APTOS Datasets: Public Access on Kaggle

Author Statement: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/benchmarking-bayesian-deep-learning-on/code)

11 Replies

Loading