This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI-CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP-CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI-CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI-CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI-CMS. We conclude the paper describing a revised implementation of MPI-CMS, which takes into account the size of the protein pairs to fine-tune the parallelization strategy.
MPI-CMS: A hybrid parallel approach to geometrical motif search in proteins
FERRETTI, MARCO
Conceptualization
;MUSCI, MIRTOMethodology
;SANTANGELO, LUIGIMethodology
2015-01-01
Abstract
This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI-CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP-CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI-CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI-CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI-CMS. We conclude the paper describing a revised implementation of MPI-CMS, which takes into account the size of the protein pairs to fine-tune the parallelization strategy.File | Dimensione | Formato | |
---|---|---|---|
First submission paper.pdf
accesso aperto
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
341.9 kB
Formato
Adobe PDF
|
341.9 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.