DABS 2.0: Improved Datasets and Algorithms for Universal Self-SupervisionDownload PDF

Published: 17 Sept 2022, Last Modified: 23 May 2023NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: self-supervised learning, domain agnostic
TL;DR: We extend the DABS benchmark, presenting improved datasets and algorithms for universal self-supervision
Abstract: Universal self-supervised (SSL) algorithms hold enormous promise for making machine learning accessible to high-impact domains such as protein biology, manufacturing, and genomics. We present DABS 2.0: a set of improved datasets and algorithms for advancing research on universal SSL. We extend the recently-introduced DABS benchmark with the addition of five real-world science and engineering domains: protein biology, bacterial genomics, multispectral satellite imagery, semiconductor wafers, and particle physics, bringing the total number of domains in the benchmark to twelve. We also propose a new universal SSL algorithm, Capri, and a generalized version of masked autoencoding, and apply both on all twelve domains---the most wide-ranging exploration of SSL yet. We find that multiple algorithms show gains across domains, outperforming previous baselines. In addition, we demonstrate the usefulness of DABS for scientific study of SSL by investigating the optimal corruption rate for each algorithm, showing that the best setting varies based on the domain. Code will be released at http://github.com/alextamkin/dabs}{http://github.com/alextamkin/dabs
Author Statement: Yes
URL: dabs.stanford.edu
Supplementary Material: pdf
Contribution Process Agreement: Yes
In Person Attendance: Yes
License: MIT License
13 Replies