FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare SettingsDownload PDF

02 Jun 2022, 14:07 (modified: 14 Oct 2022, 15:58)NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: Federated Learning, Healthcare
Abstract: Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
Supplementary Material: pdf
URL: https://github.com/owkin/FLamby
Dataset Url: https://github.com/owkin/FLamby
Dataset Embargo: None
License: The code of FLamby is provided under the MIT license. FLamby does not provide new datasets, but rather code to easily access existing datasets. Users are still required to follow the licenses of these datasets, which are listed below: - Camelyon16: CC 0; - LIDC-IDRI: CC-BY 3 and TCIA data usage policy; - IXITiny: CC-BY SA 3; - TCGA-BRCA: GDC open access (https://gdc.cancer.gov/access-data/data-access-processes-and-tools ); - KITS2019: CC-BY NC SA 3; - ISIC2019: CC-BY 4; - Heart Disease: CC-BY 4.
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes
25 Replies