This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI-CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP-CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI-CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI-CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI-CMS. We conclude the paper describing a revised implementation of MPI-CMS, which takes into account the size of the protein pairs to fine-tune the parallelization strategy.

MPI-CMS: A hybrid parallel approach to geometrical motif search in proteins

FERRETTI, MARCO
Conceptualization
;
MUSCI, MIRTO
Methodology
;
SANTANGELO, LUIGI
Methodology
2015-01-01

Abstract

This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI-CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP-CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI-CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI-CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI-CMS. We conclude the paper describing a revised implementation of MPI-CMS, which takes into account the size of the protein pairs to fine-tune the parallelization strategy.
File in questo prodotto:
File Dimensione Formato  
First submission paper.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 341.9 kB
Formato Adobe PDF
341.9 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1121622
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 9
social impact