Dissecting the relationship between protein structure and sequence evolution
MetadataShow full item record
What can protein structure tell us about protein evolutionary dynamics? Despite extensive variety in their native structures, from hyper-thermostable to intrinsically disordered, all proteins share a common feature: flexibility and dynamics at different levels of structure. In addition to spatial dynamics, proteins are also highly evolutionary dynamic polymers, exhibiting variability in their amino acid sequences on evolutionary timescales. Significant variations can be observed in the amino acid sequences of the divergent members of a single protein family, while their native conformations and biological functions remain almost conserved among all members of the family. These evolutionary variations can be due to a combination of point mutations, insertions, deletions or sometimes the rearrangement of domains in the protein sequence. In recent years, it has become increasingly evident that the dynamics of proteins in space and time domains -- corresponding to structural and evolutionary variations -- mutually influence each other at the amino acid level. In particular, it is generally observed that the amino acids in the core of protein are more conserved than the amino acids on the surface. Some site-specific structural quantities have been already identified that are capable of explaining the general patterns of sequence variability in globular proteins. A prominent example is the amino acid exposure to solvent molecules -- typically water -- which surround proteins in vivo. Furthermore, some partial associations between the local flexibility, packing density and sequence variability can be also observed among globular proteins. There is however no consensus as to which set of structural characteristics play the dominant role in sequence evolution. The strength of sequence--structure correlations also appear to vary widely from one protein to another, with Spearman's correlation strength ρ ∈ [0.1,0.8]. Throughout a series of works summarized in the following chapters, first I explore the wide spectrum of structural determinants of sequence evolution, their interrelationships, and their role in the evolutionary dynamics of protein. I find that amino acid sites that are important for the overall stability of protein structure in general tend to be highly conserved. In other words, any amino acid substitution that results in a significant change of the potential energy landscape and thus the native conformation of protein, is disruptive and hence occurs less frequently on evolutionary timescale. I also find that long-range interactions among individual amino acids play a weak but non-negligible role in site-specific evolution of proteins and their inclusion generally results in better predictions of sequence evolution from protein structure. Then, I present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying >200 structural and evolutionary characteristics of proteins in a dataset of viral and enzymatic proteins. I discuss the main protein properties responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30% of the observed variation in sequence-structure relations. In addition to sequence divergence, I identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, corresponding to proteins containing large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relations.