Towards a comprehensive human protein-protein interaction network
Obtaining a reliable interaction data set describing the human interactome is a milestone yet to be reached. The past few years has seen tremendous progress in elucidating the yeast interactome. Experimental approaches for obtaining large-scale protein interaction data coupled with powerful computational methods for combining these data sets and for predicting functional relations between genes have been successful in tackling the yeast interactome. The concerted development of visualization techniques and the progress in the field of network biology has provided us with tools to evaluate, analyze, and interpret the interactome. Although techniques are being scaled to tackle mammalian genomes, as witnessed by the first protein interaction networks for fly and worm we are far from a complete map of the human interactome. Human genes create additional challenges due to molecular complexity, tissue specificity, and alternate splicing. It therefore becomes important to build well-annotated benchmarks and accuracy measures to evaluate new data. Here, we describe three methods that provide a framework to build a comprehensive human interactome. We have developed a novel algorithm for predicting protein interaction partners based on comparing the position of proteins in their respective phylogenetic trees. We establish two tests of the accuracy of human protein interaction data sets and integrate the small-scale human interaction data sets using a Log likelihood framework. The benchmarks and the consolidated interaction set will provide a basis for determining the quality of future large-scale human protein interaction assays. Lastly, based on patterns of conserved co-expression of human gene pairs and their orthologs from 5 different organisms (A. thaliana, M. musculus, D. melanogaster, C. elegans, and Yeast) we predict protein interactions, and test them against the benchmarks established by us. By combining the existing interaction data sets, we build a network of 61,974 interactions between 9,642 human proteins and cluster the network to show examples representative of the quality of the interactions in the network. The methods, benchmarks and the Log likelihood framework, we hope, would enable us to build a comprehensive human interactome.