Species detection is one of the older problems encountered by science. This detection implicitly needs the identification of several features that are peculiar of a group, or taxa, such that their cumulative presence or absence defines the category of the target. Initially, taxonomies were based on morphological characters, with some limitations as there are species lacking the needed informative features, as prokaryotic or cryptic species. From the discovery of genotype, sequences have proven as powerful features to be used to infer taxonomy and phylogenetic trees. A sample can be affiliated to a taxonomy by finding a homology to an already taxonomically classified sequence present in a public database. However, if the target sequence does not have a match, then a phylogenetic tree constructed together with the found similar sequences can be used to infer their relationships. This type of taxonomy annotation can be challenging as there is not a consensus about how much distant two taxa have to be to be considered of different species. In this PhD thesis I consider two taxonomically challenging case studies, the first involving a complex of morphologically cryptic interbreeding Anuran species, the second a bacterial endosymbiont of a ciliate protozoa. The first case is focused on Pelophylax, a genus of morphologically cryptic anuran species that show a form of sexual parasitism, called hybridogenesis, where inter-species mate produces viable offspring. Some species can be identified on a bio-acoustic basis, but the presence of hybrids makes the detection unaffordable. The identification is more precisely performed using mitochondrial DNA (mtDNA) and Short Tandem Repeats (STR) markers. I coupled these two techniques to classify animals sampled in the Po Valley, near Pavia, where two autochthonous taxa can be found together with allochthonous. A correct detection of these species is important to assess the conservation status of the local taxa and the impact of the allochthonous ones. The second case is focused on the study if a prokaryotic endosymbiont of a ciliate Protista, where morphological traits are virtually useless to determinate the taxonomy. Due to differences between host and symbionts, it is possible to use molecular traits to discern the species, as using 16s rRNA for Bacteria and 18s rRNA for Eukaryotes. However, this approach may only provide species identification, not genomic information, which is needed to get a functional understanding of the symbiotic system, as also of the endosymbiotic species, which may be unknown. With this purpose, both organisms are sequenced together by using Whole Genome Sequencing (WGS) with Next Generation Sequencing (NGS) technologies. However, this procedure allows to obtain portion of the genomes fragmented into different contigs, which than have to be deconvolved to obtain separate genomes. We then decided to develop a fully automated tool, called SeqDex, able to deconvolve host-endosymbiont dataset by coupling partial taxonomic affiliations (homology derived) to composition analysis to predict the affiliations of all the sequences 4 using state of the art machine learning algorithms. The second case study is composed by three Spirostomum samples, which showed evidence of presence of a Neisseriales bacterium inside the ciliate cells. I have used SeqDex to deconvolve this dataset to reconstruct partially the endosymbionts genomes and perform functional analysis to infer their role and the nature of the relationship that bound the hosts and the bacteria.

Molecular markers and bioinformatics in species detection: two case studies in Pelophylax spp. (Amphibia, Ranidae) and in a novel bacterial endosymbiont, in ciliate protista

CHIODI, ALICE
2020-01-20

Abstract

Species detection is one of the older problems encountered by science. This detection implicitly needs the identification of several features that are peculiar of a group, or taxa, such that their cumulative presence or absence defines the category of the target. Initially, taxonomies were based on morphological characters, with some limitations as there are species lacking the needed informative features, as prokaryotic or cryptic species. From the discovery of genotype, sequences have proven as powerful features to be used to infer taxonomy and phylogenetic trees. A sample can be affiliated to a taxonomy by finding a homology to an already taxonomically classified sequence present in a public database. However, if the target sequence does not have a match, then a phylogenetic tree constructed together with the found similar sequences can be used to infer their relationships. This type of taxonomy annotation can be challenging as there is not a consensus about how much distant two taxa have to be to be considered of different species. In this PhD thesis I consider two taxonomically challenging case studies, the first involving a complex of morphologically cryptic interbreeding Anuran species, the second a bacterial endosymbiont of a ciliate protozoa. The first case is focused on Pelophylax, a genus of morphologically cryptic anuran species that show a form of sexual parasitism, called hybridogenesis, where inter-species mate produces viable offspring. Some species can be identified on a bio-acoustic basis, but the presence of hybrids makes the detection unaffordable. The identification is more precisely performed using mitochondrial DNA (mtDNA) and Short Tandem Repeats (STR) markers. I coupled these two techniques to classify animals sampled in the Po Valley, near Pavia, where two autochthonous taxa can be found together with allochthonous. A correct detection of these species is important to assess the conservation status of the local taxa and the impact of the allochthonous ones. The second case is focused on the study if a prokaryotic endosymbiont of a ciliate Protista, where morphological traits are virtually useless to determinate the taxonomy. Due to differences between host and symbionts, it is possible to use molecular traits to discern the species, as using 16s rRNA for Bacteria and 18s rRNA for Eukaryotes. However, this approach may only provide species identification, not genomic information, which is needed to get a functional understanding of the symbiotic system, as also of the endosymbiotic species, which may be unknown. With this purpose, both organisms are sequenced together by using Whole Genome Sequencing (WGS) with Next Generation Sequencing (NGS) technologies. However, this procedure allows to obtain portion of the genomes fragmented into different contigs, which than have to be deconvolved to obtain separate genomes. We then decided to develop a fully automated tool, called SeqDex, able to deconvolve host-endosymbiont dataset by coupling partial taxonomic affiliations (homology derived) to composition analysis to predict the affiliations of all the sequences 4 using state of the art machine learning algorithms. The second case study is composed by three Spirostomum samples, which showed evidence of presence of a Neisseriales bacterium inside the ciliate cells. I have used SeqDex to deconvolve this dataset to reconstruct partially the endosymbionts genomes and perform functional analysis to infer their role and the nature of the relationship that bound the hosts and the bacteria.
20-gen-2020
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_AliceChiodi.pdf

Open Access dal 01/08/2021

Descrizione: tesi di dottorato
Dimensione 3.15 MB
Formato Adobe PDF
3.15 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1318448
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact