Saccharomyces Protein-protein Interaction Database
   
Data contents
Materials
 
Table 1. Materials for the integrated data
Data
Version
File Name

Gene Ontology

Mar-06
 
go_200603-termdb-data

SGD gene annotation

31-Mar-06
 
gene_association.sgd

SGD feature

27-May-06
 
SGD_features.tab

SGD dbrefs

27-May-06
 
dbxref.tab

SGD orf translation sequence

13-May-06
 
orf_trans_all.fasta

NCBI yeast genome

6-Feb-06
 
*.gbk

DIP

2-Apr-06
 
Scere20060402.mif

BIND

21-May-05
 
20050521.ints.txt
20050521.refs.txt

MIPS evidence scheme

25-Feb-05
 
evidencecat.scheme

MIPS complex scheme

23-Mar-05
 
complexcat.scheme

MIPS binary interactions

18-Jan-05
 
PPI_180105.tab

MIPS complex

20-Jun-05
 
complexcat_data_20062005
  Back to top
 
GSP dataset
  METHOD
  • Data filtering criteria
    The following GO terms were filtered including all the descendant terms of them:
    (i) in BP ontology, 'biological process unknown' (GO:0000004; including 1482 annotations);
    (ii) in CC ontology, 'cellular component unknown' (GO:0008372; including 879 annotations), 'extracellular matrix' (GO:0031012), 'extracellular region' (GO:0005576; including 22 annotations), 'synapse' (GO:0045202) and 'virion' (GO:0019012).
 
Table 2. Distributions of GO terms and the respective yeast protein annotations
Number of
Filtering
CC
BP
CC+BP
Total
GO Terms
Before
1687
10 458
0
12 145
After
1609
10 457
0
12 066
Terms with
assignments
Before
367
1146
0
1513
After
365
1145
0
1510
Annotated
proteins
Before
5892
5893
5892
5893
After
5007
4411
4183
5235
Annotated
protein pairs
Before
17 354 886
17 360 778
17 354 886
17 360 778
After
12 532 521
9 726 255
8 746 653
13 512 123
Associations
Before
8736
10 038
0
18 774
After
7838
8548
0
16 386
   
 
  • Seven known protein-protein interaction datasets
    Please refer to the paper (Wu et al. 2006) for details.
 
Table 3. Numbers of proteins and interactions of the seven known interaction datasets
Number of
Level
Gavin
Ho
Ito
Uetz
MIPS Complexes
MIPS Interactions
de Lichtenberg
Total
Proteins
original
1363
1570
3244
988
1223
1246
299
4768
CC
1328
1448
2377
806
1175
1194
279
3904
BP
1256
1338
2052
701
1169
1193
267
3485
CC+BP
1236
1289
1848
662
1165
1171
258
3310
Interactions
original
3225
3603
4426
932
10 955
1828
711
22 885
CC
3163
3395
3057
727
10 898
1715
676
20 914
BP
3066
3143
2736
637
10 808
1715
663
20 069
CC+BP
3027
3054
2376
592
10 765
1690
643
19 493
   
 
  • Relative Specificity Similarity (RSS) of two proteins annotated in a GO
    Please refer to the paper (Wu et al. 2006) for details.
   
 
  • Statistical significance of protein pairs falling in various levels of RSS values
    Please refer to the paper (Wu et al. 2006) for details.

  Back to top
   
  RESULTS
 
  • Distribution of all pairs of annotated proteins according to their RSS values
 
Figure 1. Distributions of the annotated protein pairs with various RSS values in the CC (A) and BP (B) ontologies.

 

Figure 2. Statistical significance of the quality scoring system in the CC (A) and BP (B) ontologies.

Based on the distribution analysis of Z-scores for CC and BP ontologies, the 11 categories of RSSCC could roughly be divided into three groups with high confidence (H; 0.8 < RSSCC <= 1), medium confidence (M; 0.5 < RSSCC <= 0.8) and low confidence (L; 0 <= RSSCC <= 0.5). Like RSSCC, RSSBP can also be split into three groups, with high confidence (H; 0.8 < RSSBP <= 1), medium confidence (M; 0.4 < RSSBP <= 0.8) and low confidence (L; 0 <= RSSBP <= 0.4). The nine data segments (DSs) contains 4183 proteins and 8 746 653 protein pairs.

Figure 3. Nine data segments (DSs) with different confidences related to CC and BP ontologies (A) and the selection of positives and negatives, as well as GSPs and GSNs (B).
 
  • Gold standard positive and negative protein interaction datasets
 

Our high quality dataset, the ' valid experimental interactions (VEIs)', was composed of MIPS complexes, MIPS small-scale physical interactions, and the integrated interactions by de Lichtenberg et al. (de Lichtenberg et al. 2005). There are 12 062 interactions among 1807 proteins in VEIs (Figure 4).

Figure 4. Distribution of interactions in three existing high-quality datasets: MIPS Complexes (in drab olive), MIPS Interactions (in blue), and de Lichtenberg (in saffron yellow).

 

Figure 5. Distribution of the numbers of valid experimental interactions (VEIs) covered by each of the nine DSs (A), and statistical significance of VEIs in nine DSs using Z-score analysis (B).
 
  • Assessment of four known genome-scale experimental datasets
 

The assessment result is in agreement with the previous studies (Edwards et al. 2002; von Mering et al. 2002).

Figure 6. The sizes of four different genome-scale datasets and the rates of interactions covered by positive and negative datasets (A), as well as gold standard datasets (B).
 
  • Yeast protein-protein interaction network
 

The yeast protein-protein interaction network reconstructed from GSPs comprises 92 257 interactions encompassing 3600 proteins. The whole network consists of 23 connected component (Table 3);

Table 3. Numbers of proteins and interactions of 23
     connected components

Connected
component ID

Number of proteins
(dynamic)

Number of
interactions

1

3527 (302)

92 084

2

20

130

3

4

6

4

4 (1)

6

5

3

3

6

3 (1)

3

7

3

3

8

3

3

9

4

3

10

3

3

11

2

1

12

2

1

13

2

1

14

2 (1)

1

15

2

1

16

2

1

17

2

1

18

2

1

19

2

1

20

2 (1)

1

21

2

1

22

2

1

23

2

1

Total

3600 (306)

92 257

  Back to top
 
Cross references to other interaction datasets
 
SPIDer provides general links to connect predicted protein-protein interactions with four other related datasets derived from three databases, namely, DIP, BIND and MIPS. Both DIP and BIND are important databases that contain protein-protein interaction information. MIPS provides all annotated and genome-scale protein interactions, and also compiles data on protein complexes which are converted to binary interactions using the matrix model. The releases of these datasets are shown in Materials.
     As a result, there are 15 296 interactions in GSPs which have at least one connection to the four datasets, out of which 2951, 1431, 1135 and 14 452 interactions in our database have cross references to DIP, BIND, MIPS binary interactions and MIPS complexes, respectively.

  Back to top
 
References
 
  • de Lichtenberg, U., Jensen, L.J., Brunak, S. and Bork, P. (2005) Dynamic complex formation during the yeast cell cycle. Science, 307, 724-727. [ PubMed ]
  • Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J. and Gerstein, M. (2002) Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet, 18, 529-536. [ PubMed ]
  • von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S. and Bork, P. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 399-403. [ PubMed ]
  • Wu, X., Zhu, L., Guo, J., Zhang, D.Y. and Lin, K. (2006) Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res, 34, 2137-2150. [ PubMed ]
  Back to top

Laboratory of Computational Molecular Biology. Beijing Normal University.Thursday, September 21 2017. Beijing, China