A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies

Published: 05 Mar 2025, Last Modified: 24 Apr 2025MLGenX 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Main track (up to 8 pages)
Abstract:

This paper proposes a feature selection framework for machine learning–based bacterial genome-wide association studies aimed at uncovering resistance-causing traits. Using a well-characterized Staphylococcus aureus pangenome as a ground truth for causal‐variant labels, we demonstrate improved control for population structure and enhanced interpretability through the explicit incorporation of genomic context derived from graph-structured data, based on the compacted de Bruijn graph for an assembled pangenome. Our framework successfully uncovers resistance-causing traits for 9 of 14 antibiotics using a significantly reduced feature set, while preserving genomic marker identifiability via unique mappings between the encoded feature space and sequential representations that tag specific genomic loci.

Submission Number: 72
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview