Abstract: Identifying DNA N4-methylcytosine (4mC) sites is of great significance in biological research, such as chromatin structure, DNA stability, DNA–protein interaction, and controlling gene expression. However, the traditional sequencing technology to identify 4mC sites is very time-consuming. In order to detect 4mC sites, we develop a multiview learning method for achieving more effectively via merging multiple feature spaces. Furthermore, we think about whether the multiview learning method can improve the across species classification ability by fusing data of multiple species. In our study, we propose a multiview Laplacian kernel sparse representation-based classifier, called MvLapKSRC-HSIC. First, we make use of three feature extraction methods [position-specific trinucleotide propensity, nucleotide chemical property, and DNA physicochemical properties) to extract the DNA sequence features. MvLapKSRC-HSIC uses a kernel sparse representation-based classifier with graph regularization. In order to maintain the independence between various views, we add a multiview regularization term constructed by Hilbert–Schmidt independence criterion (HSIC). In the experiments, MvLapKSRC-HSIC is applied on six datasets, so as to compare with other popular methods in single-species and cross-species experiments. All experimental results show that MvLapKSRC-HSIC is superior to other outstanding methods on both single species and cross species. Importantly, MvLapKSRC-HSIC can identify a series of potential DNA 4mC sites, which have not yet been experimentally evaluate on multiple species and merit further research.
Loading