|
|
Data
contents |
Materials
|
|
Table
1. Materials for the integrated data |
Data |
Version |
File Name |
Gene
Ontology |
Mar-06 |
|
go_200603-termdb-data |
|
SGD
gene annotation |
31-Mar-06 |
|
gene_association.sgd |
SGD
feature |
27-May-06 |
|
SGD_features.tab |
SGD
dbrefs |
27-May-06
|
|
dbxref.tab |
SGD
orf translation sequence |
13-May-06
|
|
orf_trans_all.fasta |
|
NCBI
yeast genome |
6-Feb-06
|
|
*.gbk |
|
DIP |
2-Apr-06
|
|
Scere20060402.mif |
|
BIND
|
21-May-05
|
|
20050521.ints.txt
20050521.refs.txt |
|
MIPS
evidence scheme |
25-Feb-05
|
|
evidencecat.scheme |
MIPS
complex scheme |
23-Mar-05
|
|
complexcat.scheme |
MIPS
binary interactions |
18-Jan-05
|
|
PPI_180105.tab |
MIPS
complex |
20-Jun-05
|
|
complexcat_data_20062005 |
|
|
Back
to top |
|
GSP
dataset |
|
METHOD |
-
Data
filtering criteria
The following GO terms were filtered including all the descendant
terms of them:
(i) in BP ontology, 'biological process
unknown' (GO:0000004; including 1482 annotations);
(ii) in CC ontology, 'cellular component
unknown' (GO:0008372; including 879 annotations), 'extracellular
matrix' (GO:0031012), 'extracellular region' (GO:0005576;
including 22 annotations), 'synapse' (GO:0045202) and 'virion'
(GO:0019012).
|
|
Table
2. Distributions of GO terms and the respective
yeast protein annotations |
Number
of |
Filtering |
CC |
BP |
CC+BP |
Total |
GO
Terms |
Before |
1687
|
10
458 |
0
|
12
145 |
After
|
1609
|
10
457 |
0
|
12
066 |
|
Terms
with
assignments |
Before |
367
|
1146 |
0
|
1513 |
After
|
365
|
1145
|
0
|
1510 |
|
Annotated
proteins |
Before |
5892 |
5893
|
5892
|
5893 |
After
|
5007
|
4411
|
4183
|
5235 |
|
Annotated
protein pairs |
Before |
17
354 886 |
17
360 778 |
17
354 886 |
17
360 778 |
After
|
12
532 521 |
9
726 255 |
8
746 653 |
13
512 123 |
|
Associations |
Before |
8736
|
10
038 |
0
|
18
774 |
After
|
7838
|
8548
|
0
|
16
386 |
|
|
|
|
- Seven known
protein-protein interaction datasets
Please refer to the paper (Wu et al. 2006) for details.
|
|
Table
3. Numbers of proteins and interactions of the
seven known interaction datasets |
Number
of |
Level |
Gavin |
Ho |
Ito |
Uetz |
MIPS
Complexes |
MIPS
Interactions |
de
Lichtenberg |
Total |
Proteins |
original |
1363
|
1570
|
3244
|
988
|
1223
|
1246
|
299
|
4768 |
CC |
1328
|
1448
|
2377
|
806
|
1175
|
1194
|
279
|
3904 |
BP |
1256
|
1338
|
2052
|
701
|
1169
|
1193 |
267
|
3485 |
CC+BP |
1236
|
1289
|
1848
|
662
|
1165 |
1171
|
258
|
3310 |
|
Interactions |
original |
3225
|
3603
|
4426
|
932
|
10
955 |
1828
|
711
|
22
885 |
CC |
3163
|
3395
|
3057
|
727
|
10
898 |
1715
|
676
|
20
914 |
BP |
3066
|
3143
|
2736
|
637
|
10
808 |
1715
|
663
|
20
069 |
CC+BP |
3027
|
3054
|
2376
|
592
|
10
765 |
1690
|
643
|
19
493 |
|
|
|
|
- Relative
Specificity Similarity (RSS) of two proteins annotated in a
GO
Please refer to the paper (Wu et al. 2006) for details.
|
|
|
|
- Statistical
significance of protein pairs falling in various levels of RSS
values
Please refer to the paper (Wu et al. 2006) for details.
|
|
Back
to top |
|
|
|
RESULTS |
|
- Distribution
of all pairs of annotated proteins according to their RSS values
|
|
|
Figure
1. Distributions of the
annotated protein pairs with various RSS values in the
CC (A) and BP (B) ontologies. |
|
Figure
2. Statistical significance of the quality scoring
system in the CC (A) and BP (B) ontologies. |
Based
on the distribution analysis of Z-scores for CC and BP ontologies,
the 11 categories of RSSCC could roughly be divided into three
groups with high confidence (H; 0.8 < RSSCC <= 1), medium
confidence (M; 0.5 < RSSCC <= 0.8) and low confidence (L;
0 <= RSSCC <= 0.5). Like RSSCC, RSSBP can also be split into
three groups, with high confidence (H; 0.8 < RSSBP <= 1),
medium confidence (M; 0.4 < RSSBP <= 0.8) and low confidence
(L; 0 <= RSSBP <= 0.4). The nine data segments (DSs) contains
4183 proteins and 8 746 653 protein pairs.
|
Figure
3. Nine data segments (DSs) with different confidences
related to CC and BP ontologies (A) and the selection
of positives and negatives, as well as GSPs and GSNs (B).
|
|
|
- Gold standard
positive and negative protein interaction datasets
|
|
Our high quality dataset, the
' valid experimental interactions (VEIs)', was composed of MIPS
complexes, MIPS small-scale physical interactions, and the integrated
interactions by de Lichtenberg et al. (de Lichtenberg
et al. 2005). There are 12 062 interactions among 1807 proteins
in VEIs (Figure 4).
|
Figure
4. Distribution of interactions in three existing
high-quality datasets: MIPS Complexes (in drab olive),
MIPS Interactions (in blue), and de Lichtenberg (in saffron
yellow). |
|
Figure
5. Distribution of the numbers of valid experimental
interactions (VEIs) covered by each of the nine DSs (A),
and statistical significance of VEIs in nine DSs using
Z-score analysis (B). |
|
|
- Assessment
of four known genome-scale experimental datasets
|
|
The assessment result
is in agreement with the previous studies (Edwards et al.
2002; von Mering et al. 2002).
|
Figure
6. The sizes of four different genome-scale datasets
and the rates of interactions covered by positive and
negative datasets (A), as well as gold standard datasets
(B). |
|
|
|
|
The yeast protein-protein
interaction network reconstructed from GSPs comprises 92 257
interactions encompassing 3600 proteins. The whole network consists
of 23 connected component (Table 3);
Table
3. Numbers of proteins and interactions of 23
connected components |
Connected
component ID |
Number
of proteins
(dynamic) |
Number
of
interactions |
1
|
3527
(302) |
92
084 |
2
|
20
|
130
|
3
|
4
|
6
|
4
|
4
(1) |
6
|
5
|
3
|
3
|
6
|
3
(1) |
3
|
7
|
3
|
3
|
8
|
3
|
3
|
9
|
4
|
3
|
10
|
3
|
3
|
11
|
2
|
1
|
12
|
2
|
1
|
13
|
2
|
1
|
14
|
2
(1) |
1
|
15
|
2
|
1
|
16
|
2
|
1
|
17
|
2
|
1
|
18
|
2
|
1
|
19
|
2
|
1
|
20
|
2
(1) |
1
|
21
|
2
|
1
|
22
|
2
|
1
|
23
|
2
|
1
|
|
Total
|
3600
(306) |
92
257 |
|
|
Back
to top |
|
Cross
references to other interaction datasets |
|
SPIDer
provides general links to connect predicted protein-protein interactions
with four other related datasets derived from three databases, namely,
DIP, BIND and MIPS. Both DIP and BIND are important databases
that contain protein-protein interaction information. MIPS provides
all annotated and genome-scale protein interactions, and also
compiles data on protein complexes which are converted to binary interactions
using the matrix model. The releases of these datasets are shown
in Materials.
As a result, there are 15 296 interactions in GSPs
which have at least one connection to the four datasets, out of which 2951,
1431, 1135 and 14 452 interactions in our database have cross
references to DIP, BIND, MIPS binary interactions and MIPS complexes,
respectively.
|
|
Back
to top |
|
References |
|
-
de
Lichtenberg, U., Jensen, L.J., Brunak, S. and Bork, P. (2005)
Dynamic complex formation during the yeast cell cycle. Science,
307, 724-727. [ PubMed ]
-
Edwards,
A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J. and
Gerstein, M. (2002) Bridging structural biology and genomics:
assessing protein interaction data with known complexes. Trends
Genet, 18, 529-536. [ PubMed ]
-
von
Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G.,
Fields, S. and Bork, P. (2002) Comparative assessment of large-scale
data sets of protein-protein interactions. Nature, 417, 399-403. [ PubMed ]
-
Wu,
X., Zhu, L., Guo, J., Zhang, D.Y. and Lin, K. (2006) Prediction
of yeast protein-protein interaction network: insights from
the Gene Ontology and annotations. Nucleic Acids Res, 34,
2137-2150. [ PubMed ]
|
|
Back
to top |
|