A biopsy/non-biopsy approach to voice disorder classification using deep learning

Frank Conway, Ross Perry, Gaetano Di Caterina, Wendy Cohen, David M. Wynne

Published: 13 Jan 2026, Last Modified: 27 Feb 20262025 IEEE Symposium on Computers and Communications (ISCC)EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Systems to detect vocal pathologies have gained increasing attention due to the advancement of machine learning and the potential positive impact it can have on the healthcare industry. However, when developing such systems, many existing methods share the challenges of small sample sizes within voice pathology datasets. Many methods have chosen to group these samples and compare them to healthy ones for a binary “has voice pathology/healthy” approach, which does not prove useful in real-world applications, i.e. clinical settings. This research proposes a novel, practical method of grouping voice pathologies for feature learning, which showed promising results on the Saarbrucken Voice Database (SVD) and a local Recurrent Respiratory Papillomatosis (RRP) dataset. Mel-frequency coefficients were used with various RNN networks for feature learning. These models were compared using a multi-stage approach. The first stage, classifying all classes available in the SVD, predictably produced the worst results, likely due to features being hard to distinguish when the sample sizes are few and the classes are many. The second stage investigated the impact of grouping the SVD into Functional, Structural or Neurological classes and saw that the F1-Score increased to 41.04%. In the last stage, each voice pathology was grouped into whether or not the clinician would require a biopsy or not, which increased the F1-Score to 69.81% on the SVD and 64.25% on a local RRP dataset. Although this novel approach shows promising results, further research using more sophisticated deep learning models is needed to confirm its reliability
Loading