The 3D structure of a protein is strictly related to its function. For example, antibodies (Immunoglobulin protein family) serve the function of capturing foreign objects such as bacteria and viruses. Although belonging into a number of different classes, antibodies all share the Y-like structure: the arms of the Y (Fab region) contain the sites that capture the antigens, i.e. the foreign objects to be neutralized. The leg of the Y (Rc region) serves for the immune system cells to bind to the antibody, in order to destroy the captured pathogen. In 1972 the Nobel Prize Christian B. Anfinsen postulated (see [2] and [3]) that the 3D structure of a fold is determined solely by its amino acid sequence. This postulate is traditionally referred to as “Anfinsen’s Dogma.” In other words, given a polypeptide chain, there is a unique 3D structure it can fold into. Therefore, the secondary and tertiary structures of a protein are uniquely determined by its primary structure. However, it’s important to remark that, even if a given amino acid sequence results in a unique tertiary structure, the reverse does not necessarily apply. Moreover, proteins showing significant similarities in their tertiary structure, and therefore possibly sharing a common ancestor in their evolution, may present little homology in their primary structure [9]. This pushes for the need of methods and tools to find similarities at each level of the protein structure, even if the primary structure determines the secondary and the tertiary. Many algorithms have been described that search for similarities in proteins, at each level. Here we present a novel approach to detect similarities at the secondary level, based on the Generalized Hough Transform [4]. However, we do not simply want to assign a score of similarity to a pair of protein, nor we want to develop yet another (even if more efficient) alignment tool. As we will show, our main goal is to look for previously unknown common geometrical structures in so-called “unfamiliar” proteins. To do that, we need to abandon the traditional topological description of structural motifs. In other words, our main concern is the precise, flexible and efficient identification of geometrical similarities among proteins; that is the retrieval of geometrically defined structural motifs. This chapter is organized as follows. In Section 1.1 we introduce some basic concept of biochemistry related to the problem at hand. In Section 1.2 we briefly analyze the state of the art regarding the retrieval of structural motifs. The core of this contribution is Section 1.3. There we fully describe our proposals from both a computer science and a mathematical point of view. In Section 1.4 we briefly discuss our implementation strategy and the available parallelism of our proposals, reviewing the result of some of the benchmarks we performed. Finally, in Section 1.5 we lay down some conclusions and present our future research work.

Structural Motifs Identification and Retrieval: A Geometrical Approach

CANTONI, VIRGINIO;FERRETTI, MARCO;MUSCI, MIRTO;NUGRAHANINGSIH, NAHUMI
2016-01-01

Abstract

The 3D structure of a protein is strictly related to its function. For example, antibodies (Immunoglobulin protein family) serve the function of capturing foreign objects such as bacteria and viruses. Although belonging into a number of different classes, antibodies all share the Y-like structure: the arms of the Y (Fab region) contain the sites that capture the antigens, i.e. the foreign objects to be neutralized. The leg of the Y (Rc region) serves for the immune system cells to bind to the antibody, in order to destroy the captured pathogen. In 1972 the Nobel Prize Christian B. Anfinsen postulated (see [2] and [3]) that the 3D structure of a fold is determined solely by its amino acid sequence. This postulate is traditionally referred to as “Anfinsen’s Dogma.” In other words, given a polypeptide chain, there is a unique 3D structure it can fold into. Therefore, the secondary and tertiary structures of a protein are uniquely determined by its primary structure. However, it’s important to remark that, even if a given amino acid sequence results in a unique tertiary structure, the reverse does not necessarily apply. Moreover, proteins showing significant similarities in their tertiary structure, and therefore possibly sharing a common ancestor in their evolution, may present little homology in their primary structure [9]. This pushes for the need of methods and tools to find similarities at each level of the protein structure, even if the primary structure determines the secondary and the tertiary. Many algorithms have been described that search for similarities in proteins, at each level. Here we present a novel approach to detect similarities at the secondary level, based on the Generalized Hough Transform [4]. However, we do not simply want to assign a score of similarity to a pair of protein, nor we want to develop yet another (even if more efficient) alignment tool. As we will show, our main goal is to look for previously unknown common geometrical structures in so-called “unfamiliar” proteins. To do that, we need to abandon the traditional topological description of structural motifs. In other words, our main concern is the precise, flexible and efficient identification of geometrical similarities among proteins; that is the retrieval of geometrically defined structural motifs. This chapter is organized as follows. In Section 1.1 we introduce some basic concept of biochemistry related to the problem at hand. In Section 1.2 we briefly analyze the state of the art regarding the retrieval of structural motifs. The core of this contribution is Section 1.3. There we fully describe our proposals from both a computer science and a mathematical point of view. In Section 1.4 we briefly discuss our implementation strategy and the available parallelism of our proposals, reviewing the result of some of the benchmarks we performed. Finally, in Section 1.5 we lay down some conclusions and present our future research work.
2016
978-1-118-89368-5
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/945634
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact