DNA sequence classification based on MLP with PILAE algorithm

Mohammed A. B. Mahmoud, Ping Guo

2021 (modified: 02 Nov 2022)Soft Comput. 2021Readers: Everyone

Abstract: In the bioinformatics field, the classification of unknown biological sequences is a key task that is fundamental for simplifying the consistency, aggregation, and survey of organisms and their evolution. We can view biological sequences as data components of higher non-fixed dimensions, corresponding to the length of the sequences. Numerical encoding performs an important function in DNA sequence evaluation via computational procedures such as one-hot encoding (OHE). However, the OHE method has drawbacks: 1) it does not add any details that may produce the additional predictive variable, and 2) if the variable has many classes, then OHE increases the feature space significantly. To overcome these drawbacks, this paper presents a computationally effective framework for classifying DNA sequences of living organisms in the image domain. The proposed strategy relies upon multilayer perceptron trained by a pseudoinverse learning autoencoder (PILAE) algorithm. The PILAE training process does not have to set the learning control parameters or indicate the number of hidden layers. Therefore, the PILAE classifier can accomplish better performance contrasting with other deep neural network (DNNs) strategies such as VGG-16 and Xception models. Experimental results have demonstrated that this proposed strategy achieves high prediction accuracy as well as to a significant degree high computational efficiency over different datasets.

0 Replies