TY - JOUR
T1 - A proteome-wide analysis of domain architectures of prokaryotic single-spanning transmembrane proteins
AU - Arai, Masafumi
AU - Fukushi, Takafumi
AU - Satake, Masanobu
AU - Shimizu, Toshio
N1 - Funding Information:
We thank Drs. Gunnar von Heijne, Anders Krogh and Gabor E. Tusnády for providing us with the TopPred II, TMHMM 2.0 and HMMTOP 2.0 programs used in this study, respectively. This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ (No. 16014202) and a Grant-in-Aid for Scientific Research (S) (No. 16109006) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
PY - 2005/10
Y1 - 2005/10
N2 - We performed a proteome-wide survey of the domain architectures in single-spanning transmembrane (TM) proteins (single-spannings) from 87 sequenced prokaryotic (Bacterial and Archaean) genomes by assigning Pfam domains to their N-tail and C-tail loops. Out of 14,625 single-spannings, 3516 sequences have at least one domain assigned, and no domains were assigned to 7850, with the remaining 3259 with less reliable assignment. In the domain-assigned sequences, 3116 sequences are with at most two domains, and the other 400 sequences with more than two. The assigned domains distribute over 651 Pfam families, which account for 11.4% of the total Pfam-A families. Among the 651 families are mostly soluble-protein-originated ones, but only 21 families are unique to TM proteins. The occurrence frequency of the individual domain families follows a power-law, that is, 264 families occur only once, 106 just twice, and the families appeared more than 30 times are counted by only 39. It is found that the great majority of the sequences having one or two domains are of the type II topology with the C-tail loop containing domains on it. On the contrary, the N-tail loop of the same type topology seldom carries domains. Importantly, the assigned domains are always found on the tail loops longer than 60 residues, even for the small domains with less than 30 residues. There are still as many as 5800 sequences without assigned domains in spite of having at least one long tail, on which no less than 1000 novel domain families are expected most likely to lie concealed unknown yet. We also investigated the domain arrangement preference and the domain family combination patterns in 'singlets' (single-spannings with one assigned domain) and 'doublets' (with two domains).
AB - We performed a proteome-wide survey of the domain architectures in single-spanning transmembrane (TM) proteins (single-spannings) from 87 sequenced prokaryotic (Bacterial and Archaean) genomes by assigning Pfam domains to their N-tail and C-tail loops. Out of 14,625 single-spannings, 3516 sequences have at least one domain assigned, and no domains were assigned to 7850, with the remaining 3259 with less reliable assignment. In the domain-assigned sequences, 3116 sequences are with at most two domains, and the other 400 sequences with more than two. The assigned domains distribute over 651 Pfam families, which account for 11.4% of the total Pfam-A families. Among the 651 families are mostly soluble-protein-originated ones, but only 21 families are unique to TM proteins. The occurrence frequency of the individual domain families follows a power-law, that is, 264 families occur only once, 106 just twice, and the families appeared more than 30 times are counted by only 39. It is found that the great majority of the sequences having one or two domains are of the type II topology with the C-tail loop containing domains on it. On the contrary, the N-tail loop of the same type topology seldom carries domains. Importantly, the assigned domains are always found on the tail loops longer than 60 residues, even for the small domains with less than 30 residues. There are still as many as 5800 sequences without assigned domains in spite of having at least one long tail, on which no less than 1000 novel domain families are expected most likely to lie concealed unknown yet. We also investigated the domain arrangement preference and the domain family combination patterns in 'singlets' (single-spannings with one assigned domain) and 'doublets' (with two domains).
KW - Domain architecture
KW - Prokaryotic genome
KW - Proteome-scale analysis
KW - Single-spanning transmembrane protein
KW - Transmembrane topology
UR - http://www.scopus.com/inward/record.url?scp=27644456390&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=27644456390&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2005.08.004
DO - 10.1016/j.compbiolchem.2005.08.004
M3 - Article
C2 - 16213795
AN - SCOPUS:27644456390
VL - 29
SP - 379
EP - 387
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
SN - 1476-9271
IS - 5
ER -