Objectives The objective of this study is the implementation of an automatic procedure to weekly detect new SARS-CoV-2 variants and non-neutral variants (variants of concern (VOC) and variants of interest (VOI)). Methods We downloaded spike protein primary sequences from the public resource GISAID and we represented each sequence as k-mer counts. For each week since 1 July 2020, we evaluate if each sequence represents an anomaly based on a One Class support vector machine (SVM) classification algorithm trained on neutral protein sequences collected from February to June 2020. Results We assess the ability of the One Class classifier to detect known VOC and VOI, such as Alpha, Delta or Omicron, ahead of their official classification by health authorities. In median, the classifier predicts a non-neutral variant as outlier 10 weeks before the official date of designation as VOC/VOI. Discussion The identification of non-neutral variants during a pandemic usually relies on indicators available during time, such as changing population size of a variant. Automatic variant surveillance systems based on protein sequences can enhance the fast identification of variants of potential concern. Conclusion Machine learning, and in particular One Class SVM classification, can support the detection of potentially VOC/VOI variants during an evolving pandemics.

Predicting emerging SARS-CoV-2 variants of concern through a One Class dynamic anomaly detection algorithm

Nicora G.
Writing – Original Draft Preparation
;
Bellazzi R.
Methodology
2022-01-01

Abstract

Objectives The objective of this study is the implementation of an automatic procedure to weekly detect new SARS-CoV-2 variants and non-neutral variants (variants of concern (VOC) and variants of interest (VOI)). Methods We downloaded spike protein primary sequences from the public resource GISAID and we represented each sequence as k-mer counts. For each week since 1 July 2020, we evaluate if each sequence represents an anomaly based on a One Class support vector machine (SVM) classification algorithm trained on neutral protein sequences collected from February to June 2020. Results We assess the ability of the One Class classifier to detect known VOC and VOI, such as Alpha, Delta or Omicron, ahead of their official classification by health authorities. In median, the classifier predicts a non-neutral variant as outlier 10 weeks before the official date of designation as VOC/VOI. Discussion The identification of non-neutral variants during a pandemic usually relies on indicators available during time, such as changing population size of a variant. Automatic variant surveillance systems based on protein sequences can enhance the fast identification of variants of potential concern. Conclusion Machine learning, and in particular One Class SVM classification, can support the detection of potentially VOC/VOI variants during an evolving pandemics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1482440
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact