The retrieval and identification of geometrical motifs is an important open problem in bioinformatics. In previous works we presented Cross Motif Search (CMS), a novel algorithm which is able to search for recurring geometrical patterns in the secondary structure of proteins. A single run of CMS is able to look for similarities between a pair of proteins, and can be easily extended to compare each pair of proteins in an arbitrarily large dataset. We have implemented a shared memory parallel version of CMS and analyzed its scalability, which is limited to 8 cores. So, when the number of proteins in the set increases, the execution time of the algorithm quickly becomes unmanageable and the OpenMP implementation cannot keep up by just increasing the number of cores. In this paper we present a new hybrid parallel implementation of CMS, which combines the previous OpenMP approach with OpenMPI. Experimental runs on the same small-sized server (32 cores) show that the best hybrid OpenMP-OpenMPI configuration outperforms the best OpenMP one by a factor of 13.52. This result is confirmed on a medium-sized cluster with 256 cores, that allows the processing a larger data set in reasonable times. We also show that the new design is able to achieve great efficiency and scalability, which allows us to process huge data-set of proteins up to, in theory, the entire Protein Data Bank.

A Hybrid OpenMP and OpenMPI approach to geometrical motif search in proteins

FERRETTI, MARCO;SANTANGELO, LUIGI
2014

Abstract

The retrieval and identification of geometrical motifs is an important open problem in bioinformatics. In previous works we presented Cross Motif Search (CMS), a novel algorithm which is able to search for recurring geometrical patterns in the secondary structure of proteins. A single run of CMS is able to look for similarities between a pair of proteins, and can be easily extended to compare each pair of proteins in an arbitrarily large dataset. We have implemented a shared memory parallel version of CMS and analyzed its scalability, which is limited to 8 cores. So, when the number of proteins in the set increases, the execution time of the algorithm quickly becomes unmanageable and the OpenMP implementation cannot keep up by just increasing the number of cores. In this paper we present a new hybrid parallel implementation of CMS, which combines the previous OpenMP approach with OpenMPI. Experimental runs on the same small-sized server (32 cores) show that the best hybrid OpenMP-OpenMPI configuration outperforms the best OpenMP one by a factor of 13.52. This result is confirmed on a medium-sized cluster with 256 cores, that allows the processing a larger data set in reasonable times. We also show that the new design is able to achieve great efficiency and scalability, which allows us to process huge data-set of proteins up to, in theory, the entire Protein Data Bank.
9781479955473
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11571/938434
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 7
social impact