Semi-Supervised and Incremental Sequence Analysis for Taxonomic Classification

Published: 01 Jan 2023, Last Modified: 19 Feb 2025SSCI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Metagenomic analysis is vital in determining what organisms are present in a microbial sample and why they are present. In this study, we explore the utility of MMseqs2, a bioinformatics pipeline, for taxonomic classification in metagenomics, focusing on 16S rRNA gene sequences. We evaluate the algorithm's performance in full dataset as well as batch-by-batch incremental processing, and more importantly, we add the capability of semi-supervised classification to this otherwise clustering only algorithm. Incremental updating is important because it allows seamless integration and processing of new data, whereas semi-supervised classification allows taxonomic identification of previously unknown organisms. We also evaluate the different clustering modes offered by MMseqs2, and compare MMseqs2 to our previously developed semi-supervised incremental algorithm SSI-VSEARCH. We show that MMseqs2's built-in clusterupdate function works well, and our semi-supervised classification capability adds new functionality to this bioinformatics processing pipeline.
Loading