TL;DR: We created CLIMB, a massive multimodal medical dataset with 4.51 million patient samples, and showed that training AI models on diverse clinical data types simultaneously improves performance by up to 29% compared to single-task approaches.
Abstract: Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-scale Integrative Multimodal Benchmark (CLIMB), a comprehensive clinical benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities. CLIMB comprises 4.51 million patient samples totaling 19.01 terabytes distributed across 2D imaging, 3D video, time series, graphs, and multimodal data. Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over single-task learning. Pretraining on CLIMB also effectively improves models' generalization capability to new tasks, and strong unimodal encoder performance translates well to multimodal performance when paired with task-appropriate fusion strategies. Our findings provide a foundation for new architecture designs and pretraining strategies to adavance clinical AI research. Code is released at https://github.com/DDVD233/climb.
Lay Summary: Current AI systems designed to help doctors analyze medical data typically focus on just one type of information at a time—like reading X-rays or processing patient notes—but doctors naturally combine many different types of medical data to make comprehensive diagnoses. This narrow focus limits AI's ability to provide the holistic patient assessments that clinicians need for effective healthcare decisions.
We created CLIMB, a massive medical dataset that brings together 4.51 million patient samples across diverse data types: medical images (X-rays, CT scans, ultrasounds), patient records, EEG/ECG signals, genetic information, and molecular data from 33 medical institutions. We then trained AI models on this comprehensive dataset to learn patterns across all these different medical data types simultaneously, rather than learning each type in isolation.
Our approach dramatically improved AI performance on challenging medical tasks, achieving up to 29% better accuracy in ultrasound analysis and 23% improvement in ECG diagnosis compared to traditional single-task methods. This research demonstrates that AI systems trained on diverse medical data can better generalize to new clinical challenges and provides a foundation for developing more comprehensive AI tools that could assist doctors in making more informed, holistic patient care decisions across multiple medical specialties.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/DDVD233/CLIMB
Primary Area: Applications->Health / Medicine
Keywords: Dataset, Evaluation, Benchmark, Multimodal Learning, Healthcare
Submission Number: 2974
Loading