A reproducibility study of "Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space"

Kevin Maik Jablonka; Fergus Mcilwaine; Susana Garcia; Berend Smit; Brian Yoo

A reproducibility study of "Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space"

Kevin Maik Jablonka, Fergus Mcilwaine, Susana Garcia, Berend Smit, Brian Yoo

31 Jan 2021 (modified: 05 May 2023)ML Reproducibility Challenge 2020 Blind SubmissionReaders: Everyone

Abstract: Scope of Reproducibility Nigam et al. report a genetic algorithm (GA) utilizing the SELFIES representation and also propose an adaptive, neural network-based, penalty that is supposed to improve the diversity of the generated molecules. The main claims of the paper are that this GA outperforms other generative techniques (as measured by the penalized logP) and that a neural network-based adaptive penalty increases the diversity of the generated molecules. Methodology We re-used the code published by the authors after minor refactoring and re-ran the key experiments on a typical workstation (two 16 core Intel Xeon Gold 5218s, Quadro RTX 6000) within two weeks using more recent versions of the dependencies. In particular, we used a new, major version of the SELFIES library and also quantified the diversity of the generated molecules and the effect of different hyperparameters. All of our experiments were tracked on the Weights and Biases platform. Results Overall, we were able to reproduce comparable results using the SELFIES-based GA---but mostly by exploiting deficiencies of the (easily optimizable) penalized logP fitness function (i.e., generating long, sulfur-containing chains). In addition, we also reproduce results showing that the discriminator can be used to bias the generation of molecules to ones that are similar to the reference set. Moreover, we propose a new similarity-based adaptive penalty that outperforms the original algorithm on the penalized logP score. We perform ablation studies and show that the complexity of a multilayer network discriminator, as used in the original work, is not crucial to reproduce the dependence of the penalized logP on the penalty weight. Importantly, we emphasize the need for a more representative comparison between algorithms. We analyze the performance of the original algorithm on the Guacamol benchmark set and find that it tends to show low intra-generation diversity What was easy Reproducing all the key results (including the plots) was easy since the authors provided code with pre-defined settings and useful comments for every relevant experiment. Hence, it did not require complete implementation from scratch. What was difficult Without the provided code, reproducing some parts of the papers would have been significantly more time-consuming as the paper did not provide the complete settings required to reproduce the data. In the original article, there was also no indication of how the hyperparameters (e.g., architecture of the discriminator model, weighting of different parts of the fitness function, choice of discriminator loss) were optimized. Communication with original authors We contacted the authors to clarify some questions about discrepancies with the baseline experiment and also provided them with a draft of this report. The original reacted appreciative to the draft of our report.

Paper Url: https://openreview.net/forum?id=H1lmyRNFvr

4 Replies

Loading