The present thesis is focused on the use and development of bioinformatics methods to tackle bacterial nosocomial infections from different standpoints including software development, genomic epidemiology/evolution and machine learning approaches. The hospital is an environmental niche where the antibiotic pressures promote the selection of resistant strains able to spread and affect patients. During the last decade, the increasing implementation of Next Generation Sequences-based approaches within hospital and research institutes is helping clinicians in shedding light on the evolution and epidemiology of nosocomial infections. The thrive of NGS applications is driven by the more convenient costs and timescales and resulted in an increasing amount of sequenced strains that can help researchers to improve their knowledge about the epidemiology and evolutions of these pathogens. The thesis is structured into three sections. The first section is related to the development of two novel software for surveillance and outbreak investigations. The rapid identification of pathogen clones is pivotal for effective epidemiological control strategies in hospital settings. In this point of view, I cooperated in the development of two software approaches for bacterial typing. The first program, called P-DOR, is a bioinformatic pipeline for rapid WGS-based bacterial outbreak detection and characterization. P-DOR integrates genomics and clinical metadata and uses a curated global genomic database to contextualize the strains of interest within the appropriate evolutionary frame. The second software tool is called MeltingPlot and is based instead on the analysis of High Resolution Melting (HRM) data, a molecular biology technique suitable for fast and inexpensive pathogen typing. MeltingPlot is designed to help the user to track the epidemiological events by combining HRM-based clustering methods and the isolate/patient metadata. This approach facilitates the application of HRM typing to large real-time surveillance programs and to rapid outbreak reconstructions. The second section of this thesis includes a work focused on the investigation of the progressive shift in prevalence from CC258 to the ST307 as the main nosocomial K. pneumoniae MDR lineage worldwide. Using a collection of >3000 genomes sampled from 2012 until 2018 and incorporating the clinical metadata where available, we identified three hypothetical driving forces that could have led to the shift. Specifically, we investigated i) Antibiotic stewardship changes; ii) Pan-genome evolution; iii) Genome erosion. Finally, the genetic causes involved in this progressive shift can be possibly used in the future to detect and predict novel emerging high-risk clones. Lastly, the project in the third section is related to the use of machine learning methods to predict Minimum Inhibitory concentrations (MIC) in K. pneumoniae starting from genomic data. Minimum inhibitory concentration is the gold standard test for measuring antibiotic resistance of bacteria in clinical settings. However, the MIC measurements require a long time to be operated and have non-trivial reproducibility issues among different settings and interpretation guidelines. For this reason, in-silico MIC prediction is currently a valid alternative to be explored and possibly used as a future diagnostic tool. In this work, we benchmarked different machine learning methods using both real and simulated data from >4000 genomes. In detail, four quantitative traits simulations were carried out in order to assess the reliability of the machine learning methods across different genetic scenarios. Due to the high clonality of bacterial populations, we also adjusted for the population structure. Then, we sought to highlight the effect on model accuracy of treating the MIC data as ordinal or numerical variables. The results obtained can provide some insights about how MIC should be measured when building predictive models.

Gestione delle infezioni batteriche nosocomiali attraverso l'integrazione di bioinformatica, epidemiologia genomica e apprendimento automatico

BATISTI BIFFIGNANDI, GHERARD
2023-03-31

Abstract

The present thesis is focused on the use and development of bioinformatics methods to tackle bacterial nosocomial infections from different standpoints including software development, genomic epidemiology/evolution and machine learning approaches. The hospital is an environmental niche where the antibiotic pressures promote the selection of resistant strains able to spread and affect patients. During the last decade, the increasing implementation of Next Generation Sequences-based approaches within hospital and research institutes is helping clinicians in shedding light on the evolution and epidemiology of nosocomial infections. The thrive of NGS applications is driven by the more convenient costs and timescales and resulted in an increasing amount of sequenced strains that can help researchers to improve their knowledge about the epidemiology and evolutions of these pathogens. The thesis is structured into three sections. The first section is related to the development of two novel software for surveillance and outbreak investigations. The rapid identification of pathogen clones is pivotal for effective epidemiological control strategies in hospital settings. In this point of view, I cooperated in the development of two software approaches for bacterial typing. The first program, called P-DOR, is a bioinformatic pipeline for rapid WGS-based bacterial outbreak detection and characterization. P-DOR integrates genomics and clinical metadata and uses a curated global genomic database to contextualize the strains of interest within the appropriate evolutionary frame. The second software tool is called MeltingPlot and is based instead on the analysis of High Resolution Melting (HRM) data, a molecular biology technique suitable for fast and inexpensive pathogen typing. MeltingPlot is designed to help the user to track the epidemiological events by combining HRM-based clustering methods and the isolate/patient metadata. This approach facilitates the application of HRM typing to large real-time surveillance programs and to rapid outbreak reconstructions. The second section of this thesis includes a work focused on the investigation of the progressive shift in prevalence from CC258 to the ST307 as the main nosocomial K. pneumoniae MDR lineage worldwide. Using a collection of >3000 genomes sampled from 2012 until 2018 and incorporating the clinical metadata where available, we identified three hypothetical driving forces that could have led to the shift. Specifically, we investigated i) Antibiotic stewardship changes; ii) Pan-genome evolution; iii) Genome erosion. Finally, the genetic causes involved in this progressive shift can be possibly used in the future to detect and predict novel emerging high-risk clones. Lastly, the project in the third section is related to the use of machine learning methods to predict Minimum Inhibitory concentrations (MIC) in K. pneumoniae starting from genomic data. Minimum inhibitory concentration is the gold standard test for measuring antibiotic resistance of bacteria in clinical settings. However, the MIC measurements require a long time to be operated and have non-trivial reproducibility issues among different settings and interpretation guidelines. For this reason, in-silico MIC prediction is currently a valid alternative to be explored and possibly used as a future diagnostic tool. In this work, we benchmarked different machine learning methods using both real and simulated data from >4000 genomes. In detail, four quantitative traits simulations were carried out in order to assess the reliability of the machine learning methods across different genetic scenarios. Due to the high clonality of bacterial populations, we also adjusted for the population structure. Then, we sought to highlight the effect on model accuracy of treating the MIC data as ordinal or numerical variables. The results obtained can provide some insights about how MIC should be measured when building predictive models.
31-mar-2023
File in questo prodotto:
File Dimensione Formato  
PhD_thesis_GBB.pdf

Open Access dal 10/10/2024

Descrizione: Tackling nosocomial bacterial infections through the integration of bioinformatics, genomic epidemiology and machine learning
Tipologia: Tesi di dottorato
Dimensione 12.8 MB
Formato Adobe PDF
12.8 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1474235
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact