¿ Complejidad o simplicidad computacional ?. Predicción (protein folding) y diseño (protein design) de proteínas.

En una entrada anterior comentábamos sobre un interesante avance en el problema de predicción de la estructura terciaria, tridimensional, nativa o de mínima energía (cuatro sinónimos para un mismo fenómeno) de la proteínas naturales.

El avance ha sido realizado por investigadores del laboratorio de David Baker de la Universidad de Washington, y en su página web explican muy claramente la problemática y soluciones de las dos cuestiones que estudian: la Predicción y Diseño de estructura de proteínas, que básicamente son  dos problemas, inversos el uno del otro.

David Baker lleva estudiando este problema desde hace décadas y ya en el año 2000 publicaba un artículo lleno de optimismo, titulado precisamente A surprising simplicity for protein folding.


The experimental results and predictions discussed here indicate that the fundamental physics underlying folding may be simpler than previously thought and that the folding process is surprisingly robust. The topology of a protein’s native state appears to determine the major features of its folding free-energy landscape. Both protein structures and protein-folding mechanisms can be predicted, to some extent, using models based on simpli®ed representations of the polypeptide chain. The challenge ahead is to improve these models to the point where they can contribute to the interpretation of genome sequence information.

1. Predicción y diseño de proteinas: definición del problema y motivación para su estudio.  

En el problema de predicción se parte de la secuencia de aminoácidos y se busca descubrir , con el método más rápido y barato posible, la estructura tridimensional correspondientes a esta estructura. En el problema de diseño, se parte de unas funciones deseables de la proteína, a las cuales se corresponden una determinada estructura tridimensional y lo que se busca encontrar es la secuencia de aminoácidos.

1.1 Predicción (protein folding).

Ayer citábamos un paper, titulado On the complexity of protein folding , revisado en 1998 y cuya nómina de autores es espectacular, en el cual trataban la complejidad computacional del problema de predicción. Antes describían muy claramente el problema, su expresión cómo modelo discreto y la motivación para estudiarlo:

Understanding how the genotype (the genetic information
of an organism) determines the phenotype (the macroacopic
charocteristics of the organism) is a problem at the forefront of today´s science (often refered to dramatically as “breaking the genetic code” or “the last phase of the mendelian revolution”). This mapping can be roughly divided in three parts:  

— Of these the first (the mapping from DNA sequences to monomer sequences) is simple and very well-understood.

— In contrast, the mapping from the  sequence of a protein to the geometric conformation of its native state is much more intricate and complex, and less understood; it has been the subject of intense investigation for decades.

— (The third part, going from proteins to macroscopic characteristics, seems even more intractable.) 

It seems clear that the forces underlying protein folding are the interactions between their monomers; recently, the view that non-local interactions dominate this process has been  gaining ground [5]. To test this and other hypothesess concerning protein folding, researchers resorted to simplified models of proteins, mathematical abstractions of proteins that hide many aspects and exaggerate the effect of others; analysis and computer simulation of such models can then be compared to experimental results with actual proteins, to determine whether the emphasized aspects are indeed the dominant ones.

1.2. Diseño (protein design).

En este otro artículo titulado Theoretical and Computational Protein Design nos explican muy claramente la motivación del problema de diseño:

Why design proteins when nature already provides a wealth of structures and functions? 

–Designing proteins critically tests and advances our understanding of protein stability and folding (6).

–In addition, designing proteins provides efficient access to large (nanoscale), well-defined molecular structures, as arbitrary protein sequences can be straightforward to create and folding yields the three-dimensional structure.

–Protein design can provide possible routes to novel catalytic, pharmaceutical, structural, and sensing properties.

Realizing such designed proteins, however, requires the means to identify successfully appropriate sequences that fold and are functional. De novo protein design faces a variety of challenges.

–Early design efforts impressively yielded new proteins, but many were structurally and thermodynamically less well defined than natural proteins (7–12).The difficulty results in part from the subtle physical and chemical interactions that stabilize folded structures.

–In addition, the exponentially large number of possible sequences (e.g., more than 20^100 ∼ 10^130 sequences for a 100-residue protein) impedes direct rational design. In addition to being energetically consistent with its folded structure, a protein sequence alsomust be incompatible with alternative, competing structures.

Computational methods for protein design are designed to surmount some of these difficulties and address the large number of concerted interactions within a targeted structure that confer folding and desired functional properties. 

Y en las conclusiones nos dicen:

Successful protein design poses many challenges: the many degrees of freedom involving both sequence and local structure that lead to the combinatorial complexity of the search for sequences, the subtlety of the underlying physical forces that stabilize folded structures, and our incomplete understanding of the determinants of folding and function. Computational protein design seeks to make quantitative many fundamental rules governing protein folding and to develop efficient approaches to identify and characterize sequences consistent with a given structure. In recent years, this quantitative approach to protein design has yielded a number of high-quality sequence prediction methods. These efforts have led to milestones in the design of new proteins with novel properties. Continued development and application of thesemethods will inform our understanding of protein structure, protein folding, and protein activity. Such methods open the door to the creation of tailored, nonnatural functional proteins and the re-engineering of natural proteins to enhance or redirect their activities.

El artículo constituye un buen survey sobre el estado del arte sobre esta cuestión en 2011.

2. Predicción y diseño de proteinas: complejidad computacional

Los dos problemas, diseño y predicción, expresados cómo problemas de decisión según modelos discretos (y recordemos que no debemos de confundir los fenómenos naturales con sus modelos), son NP-completos.

2.1 Predicción (protein folding).

Y el resultado principal del paper On the complexity of protein folding, es cómo sigue:

We show that the protein folding problem in the two-dimensional H-P model is NP-complete.

Obtienen este resultado reduciendo el conocido (al menos para los lectores de este blog) problema de ciclos hamiltonianos restringido a un determinado tipo de grafos planares al problema de predicción de proteinas.

2.2 Diseño.

El problema de diseño expresado cómo problema de optimización decisión es NP-Hard. En este caso el  resultado de NP-completitud se obtiene reduciendo el problema SAT al problema  de diseño.

Pero, cómo intentábamos explicar en la entrada anterior sobre este asunto, no todo está perdido. En este paper de Pierce y Winfree, profesores de matemáticas computacionales y aplicadas de Caltech titulado: Protein Design is NP-Hard explican lo mismo que intentábamos explicar de manera mucho más clara:

How should the research community attempt to bridge this sizeable gap? The knowledge that the problem is NP-hard implies that we may always have to rely on exponential-time algorithms if GMEC solutions are required. Hence,

a) current state-of-the-art methods that have exponential worst-case bounds continue to merit further research. These methods are able to identify global minima for problems that would require 10^70 times the age of the universe using an exhaustive search. Similar further improvements would dramatically increase the scope of rational protein design.

b) If such advancements prove elusive using exact algorithms, then improved stochastic,  heuristic or approximate methods will likely play an increasingly important role in protein design.

The finding that protein design optimization is NP-hard thus reinforces the need for continued efforts on exact exponential-time and approximate stochastic methods, encourages interaction with other scientific  communities working on NP-hard optimization problems and drastically increases the pessimism for finding an efficient polynomial-time algorithm.

c) It remains possible that the protein design problem can be simplified (e.g. by placing restrictions on the potential function) to yield a less general problem for which polynomial-time algorithms can be devised. [It is instructive to consider the situation with variants of the traveling salesman problem (TSP)—perhaps the most-studied NP-hard problem (Lawler et al., 1985). TSP cannot even be efficiently approximated to within any constant factor (Sahni and Gonzalez, 1976). However, when restricted to the important subclass of Euclidean constant-dimension problems, finding exact optimal solutions remains NP-complete, but approximate TSP is dramatically easier (nearly linear-time for randomized algorithms) (Arora, 1998).

Thus important open questions are to determine whether PRODES is efficiently approximable and to find natural restrictions on PRODES that allow for more efficient algorithms.] 

d) It may also be possible that there exist alternative formulations of the protein design problem that are of equal intellectual merit and practical utility (e.g. design for surface shape or chemistry with no restriction on the backbone fold), but greater computational tractability. These alternative formulations are also likely to be NP-hard, so it remains an important open question as to whether the protein design problem can be described in a form that is both computationally tractable and biologically sound.

En definitiva, aunque el avance del laboratorio Baker es sobre el problema de predicción y no sobre el problema de diseño, cómo los dos pertenecen a la misma clase de complejidad, se les puede aplicar las mismas conclusiones:

–o bien este nuevo avance supone el descubrimiento de que el problema Predicción de Proteínas es un subconjunto sencillo (polinómico) de un problema más general NP-Hard (opción c), en cuyo caso es incompleto para el modelo pero suficiente para el fenómeno natural. Estaríamos hablando de un futuro Premio Nobel sin duda ninguna.

–o bien es un algoritmo exacto optimizado (opción a), en cuyo caso tendrá problemas para escalar y en un tiempo indeterminado aparecerá un paper que se titule: Baker laboratoy rules for protein folding exponential in the worst case, o algo así

–bien es una buena heurística que funciona para algunos casos pero no para todos, opción b, en cuyo caso es incompleto), en cuyo caso aparecerá un paper en un tiempo indeterminado que se titule Baker laboratory rules for protein folding problem fails for protein X, o algo así.

–bien supone el descubrimiento implícito de un nuevo modelo de Predicción de Proteínas que hace el problema más tratable en todos los casos, (opción d). De nuevo, muy posible Premio Nobel a la vista.

Y hay una última posibilidad que se estima cómo muy inverosímil: que el conjunto de reglas sea escalable y completo para el modelo, posibilidad que ni siquiera contemplan en el paper…

El tiempo dirá.

Otros enlaces:

una presentación sobre el diseño de proteinas, desde el punto de vista computacional.


Una respuesta to “¿ Complejidad o simplicidad computacional ?. Predicción (protein folding) y diseño (protein design) de proteínas.”

  1. Amedar Consulting Group Says:

    I haven’t checked in here for a while because I thought it was getting boring, but the last few posts are great quality so I guess I’ll add you back to my everyday bloglist. You deserve it my friend 🙂

Terms and conditions: 1. Any commenter of this blog agrees to transfer the copy right of his comments to the blogger. 2. RSS readers and / or aggregators that captures the content of this blog (posts or comments) are forbidden. These actions will be subject to the DMCA notice-and-takedown rules and will be legally pursued by the proprietor of the blog.

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión /  Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )


Conectando a %s

A %d blogueros les gusta esto: