Analyses of the general rule on residue pair frequencies in local amino acid sequences of soluble, ordered proteins

Research output: Contribution to journalArticlepeer-review

Abstract

The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as a-helix and b-sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low-complexity regions of hydrophobic or polar residues.

Original languageEnglish
Pages (from-to)725-733
Number of pages9
JournalProtein Science
Volume22
Issue number6
DOIs
Publication statusPublished - 2013 Jun

Keywords

  • Protein disorder
  • Protein structure
  • Secondary structure
  • Sequence analysis

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology

Fingerprint Dive into the research topics of 'Analyses of the general rule on residue pair frequencies in local amino acid sequences of soluble, ordered proteins'. Together they form a unique fingerprint.

Cite this