In this work we show that Incremental Machine Learning can be used to predict the classification of emerging SARS-CoV-2 lineages, dynamically distinguishing between neutral variants and non-neutral ones, i.e. variants of interest or variants of concerns. Starting from the Spike protein primary sequences collected in the GISAID db, we have derived a set of k-mers features, i.e., aminoacid subsequences with fixed length k. We have then implemented a Logistic Regression Incremental Learner that was monthly tested on the variants collected since February 2020 until October 2021. The average value of balanced accuracy of the classifier is 0.72 ± 0.2, which increased to 0.78 ± 0.16 in the last 12 months. The alpha, beta, gamma, eta, kappa and delta variants were recognized as non-neutral variants with mean recall 90%. In summary, incremental learning proved to be a useful instrument for pandemic surveillance, given its capability to update the model on new data over time

Dynamic Prediction of Non-Neutral SARS-Cov-2 Variants Using Incremental Machine Learning

Nicora G.
Writing – Original Draft Preparation
;
Bellazzi R.
Conceptualization
2022-01-01

Abstract

In this work we show that Incremental Machine Learning can be used to predict the classification of emerging SARS-CoV-2 lineages, dynamically distinguishing between neutral variants and non-neutral ones, i.e. variants of interest or variants of concerns. Starting from the Spike protein primary sequences collected in the GISAID db, we have derived a set of k-mers features, i.e., aminoacid subsequences with fixed length k. We have then implemented a Logistic Regression Incremental Learner that was monthly tested on the variants collected since February 2020 until October 2021. The average value of balanced accuracy of the classifier is 0.72 ± 0.2, which increased to 0.78 ± 0.16 in the last 12 months. The alpha, beta, gamma, eta, kappa and delta variants were recognized as non-neutral variants with mean recall 90%. In summary, incremental learning proved to be a useful instrument for pandemic surveillance, given its capability to update the model on new data over time
2022
Studies in Health Technology and Informatics
Public Health & Health Care Science
Research/Laboratory Medicine & Medical Technology
Inglese
294
654
658
5
Incremental Machine Learning; SARS-Cov-2 variant prediction
2 Contributo in Volume::2.1 Contributo in volume (Capitolo o Saggio)
4
268
none
Nicora, G.; Marini, S.; Salemi, M.; Bellazzi, R.
info:eu-repo/semantics/bookPart
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1482446
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact