We propose a new method for classifying and identifying transmembrane (TM) protein functions in proteome-scale by applying a single-linkage clustering method based on TM topology similarity, which is calculated simply from comparing the lengths of loop regions. In this study, we focused on 87 prokaryotic TM proteomes consisting of 31 proteobacteria, 22 gram-positive bacteria, 19 other bacteria, and 15 archaea. Prior to performing the clustering, we first categorized individual TM protein sequences as "known," "putative" (similar to "known" sequences), or "unknown" by using the homology search and the sequence similarity comparison against SWISS-PROT to assess the current status of the functional annotation of the TM proteomes based on sequence similarity only. More than three-quarters, that is, 75.7% of the TM protein sequences are functionally "unknown," with only 3.8% and 20.5% of them being classified as "known" and "putative," respectively. Using our clustering approach based on TM topology similarity, we succeeded in increasing the rate of TM protein sequences functionally classified and identified from 24.3% to 60.9%. Obtained clusters correspond well to functional superfamilies or families, and the functional classification and identification are successfully achieved by this approach. For example, in an obtained cluster of TM proteins with six TM segments, 109 sequences out of 119 sequences annotated as "ATP-binding cassette transporter" are properly included and 122 "unknown" sequences are also contained.
- Functional classification and identification
- Prokaryotic genome
- Proteome-wide analysis
- Transmembrane protein
- Transmembrane topology similarity
ASJC Scopus subject areas
- Molecular Biology