TY - JOUR
T1 - Context specific protein function prediction.
AU - Nariai, Naoki
AU - Kasif, Simon
PY - 2007
Y1 - 2007
N2 - Although whole-genome sequencing of many organisms has been completed, numerous newly discovered genes are still functionally unknown. Using high-throughput data such as protein-protein interaction (PPI) information to assign putative protein function to the unknown genes has been proposed, since in many cases it is not feasible to annotate the newly discovered genes by sequence-based approaches alone. In addition to PPI data, information such as protein localization within a cell may be employed to improve protein function prediction in two ways: 1) By using such localization information as a direct indicator of protein function (e.g. nucleolus localized proteins might be involved in ribosome biogenesis), and 2) by refining noisy PPI data by localization information. In the latter case, localization information may be used to distinguish different types of PPIs: Namely, interactions between co-localized proteins (more reliable), and interactions between differently localized proteins (potentially less reliable). In this paper, we propose a probabilistic method to predict protein function from PPI data and localization information. A Bayesian network is used to model dependencies between protein function, PPI data and localization information. We showed in our cross-validation experiment that in some cases, our method (conditioning PPI data by localization information) significantly improves prediction precision, as compared to a simple Naive Bayes method that assumes PPI data and localization information are conditionally independent given protein function. Finally, we predicted 57 unknown genes as "ribosome biogenesis" proteins.
AB - Although whole-genome sequencing of many organisms has been completed, numerous newly discovered genes are still functionally unknown. Using high-throughput data such as protein-protein interaction (PPI) information to assign putative protein function to the unknown genes has been proposed, since in many cases it is not feasible to annotate the newly discovered genes by sequence-based approaches alone. In addition to PPI data, information such as protein localization within a cell may be employed to improve protein function prediction in two ways: 1) By using such localization information as a direct indicator of protein function (e.g. nucleolus localized proteins might be involved in ribosome biogenesis), and 2) by refining noisy PPI data by localization information. In the latter case, localization information may be used to distinguish different types of PPIs: Namely, interactions between co-localized proteins (more reliable), and interactions between differently localized proteins (potentially less reliable). In this paper, we propose a probabilistic method to predict protein function from PPI data and localization information. A Bayesian network is used to model dependencies between protein function, PPI data and localization information. We showed in our cross-validation experiment that in some cases, our method (conditioning PPI data by localization information) significantly improves prediction precision, as compared to a simple Naive Bayes method that assumes PPI data and localization information are conditionally independent given protein function. Finally, we predicted 57 unknown genes as "ribosome biogenesis" proteins.
UR - http://www.scopus.com/inward/record.url?scp=48549092477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=48549092477&partnerID=8YFLogxK
U2 - 10.1142/9781860949920_0017
DO - 10.1142/9781860949920_0017
M3 - Article
C2 - 18546485
AN - SCOPUS:48549092477
VL - 18
SP - 173
EP - 182
JO - Genome informatics. International Conference on Genome Informatics
JF - Genome informatics. International Conference on Genome Informatics
SN - 0919-9454
ER -