Spike2Signal: Classifying Coronavirus Spike Sequences with Deep Learning

Sarwan Ali, Taslim Murad, Prakash Chourasia, Murray Patterson

2022 (modified: 16 Apr 2023)BigDataService 2022Readers: Everyone

Abstract: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes the COVID-19 disease in humans, which has reached the scale of a global pandemic. Changes in the composition of the genome of the virus, in the form of mutations, can alter its ability to infect host cells. These modified forms of the virus are known as variants. The spike region of the SARS-CoV-2 genome has a crown-like structure — where “coronavirus” gets its name. In SARS-CoV-2, it has been noted that mutations happen disproportionately many in the spike region, making this region important for distinguishing different variants.Since amino acids (of the spike protein sequence) are not in a numerical form, they are of no direct use to machine learning algorithms. Thus we use various embedding techniques to make such spike sequence data amenable to machine learning approaches. However, there is ongoing research to find better solutions to study these variants using classification. This paper presents a transformation for spike sequences, called Spike2Signal, to allow the classification of different variants of SARS-CoV-2 using deep learning algorithms. Spike2Signal converts spike sequences into a signal-like representation to allow the classification by state-of-the-art time-series classifiers. Further, we transform this Spike2Signal representation into an image (Spike2Image) to allow the usage of state-of-the-art image classifiers and compare these results with those obtained purely with Spike2Signal. In a wider comparison with existing feature engineering-based methods, we show that the Spike2Signal representation allows to outperform all baselines in predictive power.

0 Replies