Abstract: Dimensionality–reduction–based visualization is essential for interpreting complex biological data. Yet, unsupervised methods such as $t$-SNE, UMAP, and Isomap reflect only the dominant data structure that may not align with the goals of downstream analysis or expert-provided annotations. Existing supervised variants only partially address this mismatch and introduce new limitations. Here we present RF-PHATE, a supervised visualization approach that incorporates expert knowledge to reveal label-relevant structure while suppressing extraneous variation. RF-PHATE uses random forests to learn relationships between features and labels and translates this information into low-dimensional embeddings. RF-PHATE handles large datasets and suits both classification and regression. We demonstrate its use across four case studies, including longitudinal multiple-sclerosis data, Raman spectral measurements of antioxidant effects, COVID-19 patient outcomes, and RNA-sequencing data with simulated dropout. These applications highlight RF-PHATE’s ability to enhance interpretability, manage noise, and expose meaningful biological structure, suggesting broad potential for improving data exploration and discovery.
Loading