Abstract: Embedding techniques have become valuable strategies for extracting crucial information from high-dimensional data and transforming it into more interpretable lower-dimensional spaces. In biology, embeddings are frequently used to capture a variety of functional relationships between genes to encode individual genes in a compact latent space. Genes, however, do not function in isolation but in coordinated gene sets where groups of proteins form complexes, function in pathways, or, more simply, have a localized set of possible interactions. Gene embeddings have been used mostly for downstream machine learning tasks, or, at best, comparisons between pairs of genes. There has been limited methodological development towards comparing gene sets in embedding spaces. Here, we propose a new method, ANDES, that compares how two gene sets are related in gene embedding spaces. ANDES uses a novel best-match approach that considers gene similarity while reconciling gene set diversity. ANDES is a flexible framework that has wide-ranging potential, especially when combined with different types of embeddings.
Loading