MPI-CMS: A hybrid parallel approach to geometrical motif search in proteins

Ferretti, Marco; Musci, Mirto; Santangelo, Luigi

doi:10.1002/cpe.3588

This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI-CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP-CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI-CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI-CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI-CMS. We conclude the paper describing a revised implementation of MPI-CMS, which takes into account the size of the protein pairs to fine-tune the parallelization strategy.