Abstract: Advancements in sequencing technology have expanded data availability, capturing diverse phenotypic traits and biological perturbations. However, increased resolution also raises complexity, as studies now examine multiple dimensions, including donor phenotypes, anatomical regions, cell types, and time points. Integrating datasets across studies promises insights into health and disease beyond the scope of individual experiments, but this requires methods that can separate technical artifacts from meaningful biological signals while providing interpretable insights into condition-related genetic factors. Existing approaches tend to focus on either integration or interpretability, rarely addressing both simultaneously. To overcome these challenges, we introduce ALPINE, a joint supervised-unsupervised non-negative matrix factorization framework that disentangles technical and biological variation while directly identifying condition-associated genes.
Loading