Ancestry Inference with GNNs on IBD Graphs for Genetically Similar Populations

Published: 02 Mar 2026, Last Modified: 17 Apr 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Graph Neural Networks (GNNs) have recently shown significant effectiveness in analyzing structured graph data across diverse domains. At the same time, accurate inference of ancestry from genetic data, especially among genetically similar populations, remains challenging due to internal complexity of the genetic relationships and high dimensionality of SNP data. To address these challenges, we propose a novel GNN-based method for inferring individual's ancestry from a graph that represents the genetic relatedness between individuals. Genetic relatedness between two individuals is measured according to shared identity-by-descent (IBD) segments, which are the segments of a genome inherited from a close common ancestor. In this context, the ancestry inference task is formalized as node classification on graphs. We present three key contributions. First, we advance the population genetics methodology with a unique GNN-based framework for ancestry inference for closely related populations. Second, we present a novel GNN architecture which improves training stability and predictive performance for ancestry inference on IBD graphs. Third, we demonstrate that augmenting the dataset with unlabeled vertices (individuals with unknown ancestry) significantly improves prediction scores, because message-passing in GNNs effectively propagates ancestry-related information throughout the network.
Track: Main track
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 103
Loading