Hybrid Relative Specificity Similarity based on Gene Ontology - HRSS

Xiaomei Wu, Erli Pang, Kui Lin and Zhen-Ming Pei


Laboratory of Computational Molecular Biology (CMB at BNU)




Background

Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC).

Results and conclusions

Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS). HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products.

Availability

Latest release: 0.2 (May 2013)
   It is a free software and can be used, modified and redistributed without any restrictions.

Documentation: Read online for a quick look OR download the PDF manual for detail.

Requirement: Linux system and MySQL database.

Installation: The software was compiled on linux platform. It includes three folders, 'bin', 'data' and 'results'.

  • Folder 'bin' contains both source files and perl scripts. Run "make" to compile yourself.
  • Folder 'data' contains scripts for running the program and example input files.
  • Folder 'results' contains result files after running the scripts in the folder of 'data'.

Datasets for the evaluation

The datasets used in the evaluation analyses include (see the 'Materials and Methods' section for detail):

      (1) Protein-protein interactions for yeast and human

      (2) 13,430 protein pairs from the CESSM platform

      (3) Human-mouse orthologs

HRSS results for the protein pairs in these datasets are provided.

Download the datasets here HRSS_datasets.tar.gz.

News

  • Mar. 20, 2014 Update documentation (online and PDF versions) and the file 'main.cpp' in software package by explaining the parameter 'evidence_codes_ignored' in more detail.
  • Nov. 19, 2013 Update documentation (online and PDF versions) by appending ¡®Note' in Module 5 hrsspps.
  • May. 31, 2013 Published online at doi:10.1371/journal.pone.0066745.
  • May. 2, 2013 (version 0.2) improve the program to reduce the computational time via MySQL for running Module ¡®hrssmatrix¡¯, and solve the bugs resulted from the incompatibility issue across different versions of linux/MySQL/GCC.
  • Jan. 11, 2013 (version 0.1) initial release.

Citation

If you use HRSS for your research, please cite the following paper:

Wu, X., Pang, E., Lin, K. and Pei, Z.M. (2013) Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method. PLoS One 8(5), e66745. doi:10.1371/journal.pone.0066745

References

  • Wu, X.*, Pang, E., Lin, K. and Pei, Z.M. (2013) Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method. PLoS One 8(5), e66745.
  • Wu, X., Zhu, L., Guo, J., Zhang, D.Y. and Lin, K.* (2006). Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Research 34(7): 2137-2150.
  • Wu, X., Zhu, L., Guo, J., Fu, C., Zhou, H.J, Dong, D., Li, Z.B, Zhang, D.Y. and Lin, K.* (2006) SPIDer: Saccharomyces protein-protein interaction database. BMC Bioinformatics 7, S16.