Efficient data reduction for large-scale genetic mapping

Veronika Strnadová-Neeley, Aydin Buluç, Jarrod Chapman, John R. Gilbert, Joseph Gonzalez, Leonid Oliker

Published: 2015, Last Modified: 12 May 2023BCB 2015Readers: Everyone

Abstract: We present a fast and accurate algorithm for reducing large-scale genetic marker data to a smaller, less noisy, and more complete set of bins, representing uniquely identifiable locations on a chromosome. Our experimental results on real and synthetic data show that our algorithm runs in near-linear time, allowing for the analysis of millions of markers. Our algorithm reduces the problem scale while preserving accuracy, making it feasible to use existing genetic mapping tools without resorting to complex, time-intensive preprocessing methods to filter or sample the original data set. Additionally, our approach also decreases the uncertainty in genotype calls, improving the quality of the data. Preliminary results demonstrate that existing methods for marker ordering designed for the small scale settings perform with equivalent accuracy when given our reduced bin set as input.

0 Replies