A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
Abstract: Author summary An important problem in genomics is how to detect the genetic signatures associated with disease. When a disease is caused by a single well-defined biological mechanism, the genetic signature often involves a handful of genes present across the majority of the diseased patients. On the other hand, when a disease is complicated or poorly understood there may be many possible biological mechanisms at play. Any genetic signatures associated with such a disease may involve multiple genes, and each signature might only manifest across a subset of the diseased patients. Finding the signatures responsible for this heterogeneity requires searching through subsets of genes and subsets of patients—a problem referred to as ‘biclustering’. In this paper we present a new biclustering method which can scale up efficiently to handle large genomic data sets, such as GWAS-data. Our method is quite accurate, outperforming current ‘spectral’ biclustering methods for many problems of interest. Perhaps most importantly, our method can be corrected for many features of experimental design, such as controls, covariates and sparsity—all of which are especially important when analyzing real data sets.
Loading