Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification

Yishai Shimoni

Published: 2018, Last Modified: 12 May 2023PLoS Comput. Biol. 2018Readers: Everyone

Abstract: Author summary Multiple gene sets have been published as predictive of cancer progression and metastasis in several cancer types. Although many of these sets proved to be highly predictive of survival, even gene sets for the same cancer (but from different data-sets or different analyses) exhibit very little overlap and to date did not provide functional therapeutic targets. Recent studies found that in breast cancer, even random gene sets can predict survival much better than would be expected, and on average are better than many published gene sets. Together, these results undermine the causal role of the published gene sets and their potential clinical implications. We show that random gene sets predict survival in many cancer types, and that this property no longer exists after splitting the data into subclasses based on data-driven clusters. This suggests that such sub-classification could increase the likelihood to identify causal genes that are potential therapeutic targets, and that this property can be used as an indication that there may be subclasses within the dataset.

0 Replies