HAPNEST: An efficient tool for generating large-scale genetics datasets from limited training dataDownload PDF

03 Oct 2022 (modified: 05 May 2023)Neurips 2022 SyntheticData4MLReaders: Everyone
Keywords: synthetic data, ML for healthcare, computational genetics, approximate Bayesian computation
TL;DR: HAPNEST is a new, efficient software tool for generating and evaluating large synthetic datasets for human genetics applications
Abstract: In this extended abstract we present a new highly efficient software tool called HAPNEST that enables machine learning practitioners to easily generate and evaluate large synthetic datasets for human genetics applications. HAPNEST enables the generation of diverse synthetic datasets from small, publicly accessible reference datasets. We demonstrate the suitability of HAPNEST-generated data for supervised tasks such as genetic risk scoring.
