Hierarchical Naive Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper we present the implementation of HNB using a parallel approach based on the Map-Reduce paradigm built natively on the Hadoop framework, relying on the Amazon Cloud Infrastructure. We tested our approach on two GWAS datasets aimed at identifying the genetic bases of Type 1 (T1D) and Type 2 Diabetes (T2D). Both datasets include individual level data of 1,900 cases and 1,500 controls with similar to 420,000 SNPs. For T2D the best results were obtained using the complete set of SNPs, whereas for T1D the best performances were reached using few SNPs selected through standard univariate association tests. Our cloud-based implementation allows running genome wide simulations cutting down computational time and overall infrastructure costs.

Running genome wide data analysis using a parallel approach on a cloud platform

DEMARTINI, ANDREA;CAPOZZI, DAVIDE;MALOVINI, ALBERTO LUIGI;BELLAZZI, RICCARDO
2015-01-01

Abstract

Hierarchical Naive Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper we present the implementation of HNB using a parallel approach based on the Map-Reduce paradigm built natively on the Hadoop framework, relying on the Amazon Cloud Infrastructure. We tested our approach on two GWAS datasets aimed at identifying the genetic bases of Type 1 (T1D) and Type 2 Diabetes (T2D). Both datasets include individual level data of 1,900 cases and 1,500 controls with similar to 420,000 SNPs. For T2D the best results were obtained using the complete set of SNPs, whereas for T1D the best performances were reached using few SNPs selected through standard univariate association tests. Our cloud-based implementation allows running genome wide simulations cutting down computational time and overall infrastructure costs.
2015
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Holmes, JH; Bellazzi, R; Sacchi, L; Peek, N
Computer Science & Engineering includes resources on computer hardware and architecture, computer software, software engineering and design, computer graphics, programming languages, theoretical computing, computing methodologies, broad computing topics, and interdisciplinary computer applications.
Medical Research, General Topics covers a wide array of topics in medical and biomedical research, with a specific emphasis on human disease, human tissues, and all levels of research into the pathogenesis of clinically significant conditions. Specific medical fields that are characterized by the inclusion of material from several other specializations are also covered here; these include general and internal medicine, tropical medicine, pediatrics, gerontology, epidemiology, and public health. Resources dealing with specific clinical interventions are excluded and are placed in the Medical Research: Diagnosis & Treatment category. Resources that emphasize the specific disease types, or specific systems affected are also excluded and are categorized according to the pathogen or system pathophysiology.
Molecular Biology & Genetics considers all aspects of basic and applied genetics, including molecular genetics, prokaryotic and eukaryotic gene expression, mechanisms of mutagenesis, structure, function and regulation of genetic material. Also included are resources concerned with clinical genetics, patterns of inheritance, genetic cause, and screening and treatment of disease. Resources dealing specifically with developmentally regulated gene expression, or with signal transduction pathways that modulate gene expression at the cellular level are excluded and are covered in the Cell and Developmental Biology category.
Esperti anonimi
Inglese
contributo
15th Conference on Artificial Intelligence in Medicine, AIME 2015
2015
ita
Internazionale
ELETTRONICO
9105
188
192
5
9783319195506
9783319195506
Springer Verlag
Cloud computing; Data mining algorithm; Genome-wide association studies; Map reduce; Computer Science (all); Theoretical Computer Science
http://springerlink.com/content/0302-9743/copyright/2005/
no
none
Demartini, Andrea; Capozzi, Davide; Malovini, ALBERTO LUIGI; Bellazzi, Riccardo
273
info:eu-repo/semantics/conferenceObject
4
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1127091
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact