Bias in the Benchmark: Systematic experimental errors in bioactivity databases confound multi-task and meta-learning algorithms
TL;DR: Systematic experimental errors in bioactivity databases confound multi-task and meta-learning algorithms
Abstract: There is considerable interest in employing deep learning algorithms to predict pharmaceutically relevant properties of small molecules. To overcome the issues inherent in this low-data regime, researchers are increasingly exploring multi-task and meta-learning algorithms that leverage sets of related biochemical and toxicological assays to learn robust and generalisable representations. However, we show that the data from which commonly used multi-task benchmarks are derived often exhibits systematic experimental errors that lead to confounding statistical dependencies across tasks. Representation learning models that aim to acquire an inductive bias in this domain risk compounding these biases and may overfit to patterns that are counterproductive to many downstream applications of interest. We investigate to what extent these issues are reflected in the molecular embeddings learned by multi-task graph neural networks and discuss methods to address this pathology.
Track: Attention Track