In the domain of proteomics, an in-depth analysis of the 3D struc- ture of a protein is of paramount importance for many biological studies and applications. At the secondary level, protein structure can be described in terms of motifs, recurrent patterns of smaller biological structures called Sec- ondary Structure Elements. In this paper, the focus is on the identi cation of geometrical motifs in dif- ferent proteins using the Cross Motif Search Algorithm (CMS). Such task, due to the high computational cost of CMS with respect to traditional alignment algorithms, is very demanding, and thus parallel processing is mandatory. In previous papers, CMS parallelization has been already studied from the HPC standpoint. Since cloud computing is emerging as an alternative to on- premise HPC systems, it is worthwhile examining the feasibility and possible advantages in terms of both performance and costs, of migrating to a cloud implementation. This paper is an extension of a preliminary work  carried out on the cloud parallelization of CMS. The paper has two main contributions. First of all, an analytic model of the communication pattern of CMS is described, in order to get insights on the performance of the application when executed on a cloud infrastructure. Secondly, an optimized location-aware" scheduling policy to assign workload to the application workers is introduced, in order to minimize internode communication in a cloud setting.
File in questo prodotto:
Non ci sono file associati a questo prodotto.