Track: Main track (up to 8 pages)
Abstract: The ability to quickly and accurately identify microbial species in a sample, known as metagenomic profiling, is critical across various fields, from healthcare to environmental science. This paper introduces a novel method to profile signals coming from sequencing devices in parallel with determining their nucleotide sequences, a process known as basecalling, via a multi-task deep neural network for simultaneous basecalling and multi-class genome classification. We introduce a new multi-objective loss strategy where basecalling and classification losses are back-propagated separately, with model weights combined for the shared layers, and a pre-configured ranking strategy allowing top-$\textit{K}$ species accuracy, giving users flexibility to choose between higher accuracy or lower latency at identifying the species. We achieve state-of-the-art basecalling accuracies, while multi-class classification accuracies meet and exceed the results of state-of-the-art binary classifiers, attaining an average of 92.5\%/98.9\% accuracy at identifying the top-1/3 species among a total of 17 genomes in the Wick bacterial dataset. This work has implications for future studies in metagenomic profiling by accelerating the bottleneck step of matching the DNA sequence to the correct genome.
Submission Number: 3
Loading