Abstract: In the training of predictive models using high-dimensional genomic data, multiple studies’ worth of data are often combined to increase sample size and improve generalizability. A drawback of this approach is that there may be different sets of features measured in each study due to variations in expression measurement platform or technology. It is often common practice to work only with the intersection of features measured in common across all studies, which results in the blind discarding of potentially useful feature information that is measured in individual or subsets of studies.
Loading