Eurographics Workshop on 3D Object Retrieval (2010)
I. Pratikakis, M. Spagnuolo, T. Theoharis, and R. Veltkamp (Editors)
SHREC’10 Track: Protein Models
L. Mavridis,1 V. Venkatraman,1 D. W. Ritchie,1 N. Morikawa,2 R. Andonov,3 A. Cornu,3 N. Malod-Dognin,3
J. Nicolas,3 M. Temerinac-Ott,4 M. Reisert,4 H. Burkhardt,4 A. Axenopoulos,5 P. Daras5
1 ORPAILLEUR
/ INRIA Nancy - Grand Est, France
Japan
3 SYMBIOSE, IRISA / INRIA Rennes, France
4 Albert-Ludwig University Freiburg, Germany
5 Informatics & Telematics Institute Thessaloniki, Greece
2 GENOCRIPT,
Abstract
This paper presents the results of the SHREC’10 Protein Models Classification Track. The aim of this track is
to evaluate how well 3D shape recognition algorithms can classify protein structures according to the CATH
[CSL∗ 08] superfamily classification. Five groups participated in this track, using a total of six methods, and for
each method a set of ranked predictions was submitted for each classification task. The evaluation of each method
is based on the nearest neighbour and area under the curve(AUC) metrics.
Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Curve, surface, solid, and
object representations—Geometric algorithms, languages, and systems
1. Introduction
The specific shapes of protein molecules are central to their
biological function. Conventional approaches to compare
and classify proteins usually work with their amino acid
sequences (e.g. BLAST [AGM∗ 90] and FASTA [LP85]).
However, in Nature, the 3D structures of proteins are often more conserved than their sequences. Hence, structural
alignments can provide significant insights about protein
function and can help classify protein families into functional super-families [HS95].
Currently, the most widely used protein structure classification systems are CATH [CSL∗ 08] and SCOP [MBHC95],
both of which are curated by human experts. In CATH, the
classification is initially performed using the SSAP [OT96]
structural alignment tool, whereas SCOP relies more on visual inspection by the curators. However, with the rapid
growth of the three-dimensional (3D) protein structures in
the Protein Data Bank (PDB [BWF∗ 00]), it would be desirable to be able to assemble and update structural classifications in a more automated way.
c The Eurographics Association 2010.
2. Task
The task of this track is to classify protein structures according to their CATH superfamilies. Five groups (listed in the
order in which they registered) participated in this track, and
each group was intially provided with a data set of 1000 proteins, selected by the track organisers, with which they could
train or prepare their algorithms. All information about the
nature of the proteins and their primary amino acid sequence
information was masked to prevent the participants from using such knowledge in conventional protein sequence analysis software. Five days before the deadline for the track,
50 further protein structures (Figure 1) were made available
to be used as queries against the initial set. The participants
were asked to rank the initial dataset in order of similarity
to each of the query proteins. Thus each group was asked to
submit 50 ranked lists for each similarity method used.
3. Data
Using CATH version 3.3, the track organisers assembled a
dataset of 1000 protein structures from 100 CATH superfamilies, where each superfamily consisted of at least 10 structures, and where each structure contained at least 50 amino
acids. From 50 of the superfamilies, one additional member
Mavridis et al., / SHREC’10 Track: Protein Models
1tteA02
2vrnA00
1su1A00
2o04A00
1brtA00
3c2bA02
2a1uA01
1dbhA02
1t9bA03
1w9cA00
2zmfA00
1wwjA00
2gnnB00
1c9bA01
1n27A00
2pw9A02
1cwvA05
1y93A00
2jhfA01
1vk4A00
1jnsA00
1r6jA00
1o6lA01
1jkvA01
3c0wA01
2ffyA00
3bjdA02
1kl9A01
1kvkA01
1iicA02
1jb0C00
1jnmA00
2a8xA03
2v95A02
1r5tB00
1uwwA00
1xtzA02
2jq5A00
2ov0A00
3c4aA01
1or4B00
1peaA02
1nyaA00
1j0pA00
1oheA02
1jftA01
1uarA02
3bioA02
3bfpA02
1xubA01
Figure 1: “Ribbon cartoon” representations of the 50 protein structures used as queries in the evaluation (in numerical query
order from top left to bottom right). Each protein is labelled according to the CATH naming scheme.
c The Eurographics Association 2010.
Mavridis et al., / SHREC’10 Track: Protein Models
was selected at random to serve as a query structure. The
protein file names and protein sequence information were
masked to try to prevent the participants from using conventional protein sequence matching techniques. Hence, the
supplied data files included only the x, y and z coordinates
and radii for the atoms within each protein. A simple table
was also provided which associates each given protein structure file with a synthetic superfamily name (e.g. “F001”), as
shown in Table 1.
Protein File
P0001.pdb
P0002.pdb
...
P0500.pdb
...
P0999.pdb
P1000.pdb
Protein Family
F001
F001
...
F050
...
F100
F100
Method Name
3DBlast
3DZernike
GENOCRIPT
Contact Maps
Group Integration
Spherical
Transform
Trace
Participants
L. Mavridis and D.W. Ritchie
V. Venkatraman
N. Morikawa
R. Andonov, A. Cornu, N.
Malod-Dognin, and J. Nicolas
M. Temerinac-Ott, M. Reisert,
and H. Burkhardt
A. Axenopoulos and P. Daras
Table 2: Participating groups and methods.
Table 1: An extract of the classification file, which was provided with the initial data set.
where N is the order of the expansion, Rnl (r) are LaguerreGaussian radial functions, ylm (ϑ, ϕ) are spherical harmonics, and anlm are the expansion coefficients which are calculated numerically as described previously [RK00]. Figure 2
shows the SPF representations of a pair of similar nitrogenase domains at several expansion orders. For this track, we
used expansions to order N = 25 for all calculations.
4. Evaluation
For the evaluation, the participants were asked to provide a
ranked list for each of the query proteins against the 1000
protein dataset. Using these ranked lists, the perfomance of
each method was measured in two different ways.
• Nearest neighbour: If the first protein of each ranked list
was found to be a member of the same CATH superfamily
as the query, this was counted as a correct prediction. The
overall percentage of correct predictions was calculated
over the 50 queries submitted by each group.
• ROC plot (Receiver Operating Characteristic [Ega75]):
By construction, each ranked list contained 10 true positives (TPs) and 990 true negatives (TNs). In order to measure the overall ability of each method to distinguish the
TPs from the TNs, the list was traversed sequentially and
the rate of TPs (TPR) against the rate of FPs (FPR) was
plotted. The area under the curve (AUC) of each ROC plot
was calculated to give a single numerical performance
measure. A perfect prediction would consist of a list of
10 TPs followed by 990 TNs, giving an AUC of 1.0.
Figure 2: The superposition of a pair of nitrogenase proteins, shown as ribbon cartoons (left), backbone traces (middle), and as 3D SPF density expansions to order N=25
(right). The protein in the top row is from azotobacter vinlandii (PDB code 2MIN). Top row: PDB code 2MIN; middle
row PDB code 1MIO; bottom row their superposed orientation. The two proteins have a sequence identity of 43%.
5. Methods
Brief decriptions of the methods are provided in this section.
5.1. Spherical Polar Fourier Shape Density Functions
(SPF) by L. Mavridis and D.W. Ritchie
In the SPF approach, protein shapes are represented as 3D
density functions expressed as expansions of orthonormal
basis functions:
ρ(r) =
N n−1
l
∑∑ ∑
n=1 l=0 m=−l
c The Eurographics Association 2010.
anlm Rnl (r)ylm (ϑ, ϕ)
(1)
In order to superpose a pair of protein structures we calculate a rotation-dependent Carbo-like similarity score SROT
using:
N
∑ anlm bnlm
SROT =
nlm
N
N
(2)
[ ∑ a2nlm ] 2 [ ∑ b2nlm ] 2
nlm
1
nlm
1
Mavridis et al., / SHREC’10 Track: Protein Models
Conceptually, one protein is held fixed and a sixdimensional (6D) rotational/translation search over positions
of the second protein is performed. However, in practice it
is more efficient to implement the search using one translational and five Euler angle rotational coordinates [RK00].
5.3.1. Extraction of the CA trace of a protein
5.2. 3DZernike by V. Venkatraman
5.3.2. D2 encoding of local protein structures
∗
3D Zernike descriptors [NK04, LLL 08], an extension of
spherical harmonics, have been used for molecular shape
retrieval and more recently for protein-protein docking
[VLYK09]. A key point in favour of this representation
is that of rotational invariance while allowing for a compact shape representation to an arbitrary expansion order.
Mathematical and implementation details can be found in
the papers by Novotni and Klein [NK04] and Mak et al.
[MGM08]. For the current protein classification task, the following procedure was used:
Surface Generation Molecular surfaces for the proteins
were generated using the MSMS software [SOS96].
Binary Voxelization The program binvox [Min] was used
to produce a binary voxel grid (voxel dimension set to
128).
Zernike moments Software provided by Novotni and
Klein [Nov] was used to calculate Zernike moments upto
an expansion order N = 20. Each protein is thus represented by a vector of 121 coefficients.
Similarity Measurement Two protein shapes A and B represented by their respective Zernike moments
were comv
u121
u
pared using the Euclidean metric d = t (A2 − B2 ).
∑
i
i
i=1
5.3. GENOCRIPT / D2 encoding by N. Morikawa
We performed the retrieval of the dataset of 1000 protein structures for structurally similar proteins of 50 query
structures in the following three steps. First, the “CA”
(or α-Carbon atom) traces of the proteins were extracted
from the supplied data files by considering the pattern of
atom radii. Next, the D2 codes of the 1000 protein structures were computed by program "ProteinEncoder" and
saved in a ".code" file (392 KB): target_SHREC2010.code.
Also computed were the D2 codes of the 50 query structures: query_structure.code. Then, retrieval of the dataset
was carried out with program "ComSubstruct," which computes the length of the longest common subsequence of
two D2 codes. For example, the top 100 D2 code-similar
fragments are obtained by typing the following command:
“ComSubstruct -l -o1 -s -w1.1 -b100 query_structure.code
target_SHREC2010.code.” Because more than one fragment
may correspond to a protein, the top-most fragment was
chosen for each protein to obtain the ranked list of protein
names. See below for more detail. The programs ProteinEncode and ComSubstruct are available from http://www.
genocript.com.
To identify the main-chain fragments of N-CA-C atoms, the
supplied data files were examined for the atom radius pattern of 1.70-2.00-1.74. Only the CA atoms of the N-CA-C
fragments are considered in our method.
We used a discrete differential geometrical technique called
“D2 encoding” to analyse local protein structures, where the
conformation of all five-CA fragments (i.e. fragments of five
CA atoms) of a protein are encoded using a five-tetrahedron
sequence [Mor07]. First, the conformation of each five-CA
fragment is represented by a folded sequence of five tetrahedrons. Next, the corresponding (0,1)-valued sequence of
length five, which are denoted as a base-32 number, are assigned to the center CA atom of the fragment. Then, we obtain a description of the conformation of a protein by arranging base-32 numbers in the order that the corresponding CA
atoms appear in the CA trace. The base-32 number sequence
is called the D2 code of a protein.
5.3.3. Dataset search by ComSubStruct
One of the simplest measures of sequence similarity is the
length of the longest common subsequence (LCS). We used
the length of the LCSs of two D2 codes to quantify the differences between two protein backbone conformations. The
width of compare window was set to the product of "1.1"
and the length of the shorter sequence using the "-w" option.
The width of slide step was then the product of 0.1 and the
length of the shorter sequence.
5.3.4. Sorting structures
Protein names are ranked based on the length of LCS. The
length of a protein sequence is used for tie-break purposes
(the shorter, the better). The similarity scores are obtained
by dividing the LCS-length by the protein-length. More precisely, the maximum value is (protein-length - 4) / proteinlength.
5.4. Contact Map Overlap maximization by R.
Andonov, A. Cornu, N. Malod-Dognin and
J.Nicolas
5.4.1. Principle
This approach compares protein structures based on common inter-atomic contacts. Formally, the contact map of a
protein is a graph, CM = (V, E), with vertices V associated
to the amino-acids of the protein and contact edges E associated to close amino acids (Euclidean distance between CA
atoms, which are known to form the backbone of the protein) smaller than a given threshold. The similarity between
two proteins is then determined by the maximum overlap
of their contact maps (equivalent to their maximum Number
c The Eurographics Association 2010.
Mavridis et al., / SHREC’10 Track: Protein Models
of Common Contacts (NCC)). Finding this number and the
associated alignment between the amino-acids of both proteins, known as Contact Map Overlap maximization (CMO),
is an NP-hard problem [GIP99] and has been extensively
studied in the bioinformatics and computer science communities [CCI∗ 04, XS07].
The second index used the confidence in the results of
A_purva, C = ULBB , and was finally retained for the contest.
The score is given by:
5.4.2. The A_purva solver
A final step used the knowledge of superfamily labels: the
mean rank of the three best scores for each superfamily was
computed. It allowed classifying all proteins of a same superfamily together if they got a good rank.
To classify the queries in the context of SHREC_10 we used
the solver A_purva which has been recently proposed in
[AYMD08]. A detailed description can be found in [MD10].
A_purva is able to solve CMO in an exact manner in the
framework of a classical branch and bound approach (B&B)
where upper (UB) and lower (LB) bounds are generated
by Lagrangian relaxation. When an instance is optimally
solved, we have the relation LB = NCC = UB. Otherwise
UB > LB and the so called relative gap U B−LB
U B gives an idea
of the precision of the results. This property was very useful in the context of SHREC_10 where, because of the time
limitation, we were forced to limit the search process on the
root of the B&B only.
A_purva was launched without branch and bound, with
a limit of 10 000 subgradient descent iterations (i.e. about
20sec per instance). For most query instances (with less than
700 CA atoms) a limit of 2 000 iterations and 4 sec gave the
same results.
5.4.3. Extraction of the backbone and generation of the
contact map
In order to adapt A_purva to SHREC_10 conditions where
only the coordinates of the atoms have been provided, without identifying their names, we proceeded as follows. Interesting atoms have been filtered on the basis of stable distances that could correspond to the protein backbone (in a
PDB file, consecutive atoms N, CA, C and O exhibit NCA, CA-C and C-O bonds with relatively fixed distances of
1.45Å, 1.53Å and 1.24Å, respectively). Note that we did
not use atom radii for this purpose. Globally, the procedure
tends to filter all CA and a few other carbon atoms in each
protein that we consider as CA in the rest of the treatment.
The contact maps were generated with a distance threshold of 7.5Å between two CA atoms, excluding natural contacts between consecutive amino-acids.
5.4.4. Scoring scheme
Based on the obtained values, two scoring functions were
tested in order to detect the similarity between a query Q and
each protein P of CATH superfamilies. The first one was first
proposed in [XS07]:
SIM(Q, P) =
2 × LB
.
|EQ | + |EP |
(3)
Once results known, this default score appeared to be the
best one for the classification task. The nearest neighbour
score with it reaches 88%.
c The Eurographics Association 2010.
Cscore(Q, P) =
C ×UB + (1 −C) × LB
|EQ | × (1 +
abs(|EP |−|EQ |)
max(|EP |,|EQ |) )
(4)
All Contact Maps computations were done on the Ouestgenopole bioinformatics platform http://genouest.org.
5.5. Group Integration for Protein Structure
Description by M. Temerinac-Ott, M. Reisert and
H. Burkhardt
Group Integration (GI) is a powerful tool for describing three
dimensional structures [BS01]. The main idea is to average the representatives of a transformation group (e.g. Euclidean) in order to obtain group invariant descriptors, which
can be compared in order to determine similarities. Group
integration can be extended by Spherical Harmonics [RB06]
in order to obtain more robust descriptors. The details of our
method are explained in [TRB07].
5.5.1. Modelling Protein Shape
Proteins can be described by the position of the atoms of the
protein and their order in the amino acid sequence. In order
to apply group integration to proteins, the proteins are modelled as superpositions of Gaussian distributions centered at
the positions of the atoms.
In the SHREC’07 protein track [MTB07], only CA atoms
were used, whereas here we now use all atoms to compute GI
features. However, we did not use the provided atom radius
data.
5.5.2. Classifying Proteins based on GI features
The result of group integration is a multidimensional histogram Hα,β,γ,∆,µ,ł with 2048 bins. Through concatenation
of the histogram dimension, we obtain one feature vector
for each protein. The similarity measure s(x, y) between two
feature vectors is obtained using the χ22 distance.
5.6. 3D protein classification using the Spherical trace
transform by A. Axenopoulos and P. Daras
Our 3D shape-based approach is presented for the efficient
search, retrieval, and classification of protein molecules. The
method relies on the geometric 3D structure of the proteins,
which is produced from the corresponding PDB files. After
proper positioning of the 3D structures, in terms of translation and scaling, the Spherical Trace Transform is applied
Mavridis et al., / SHREC’10 Track: Protein Models
to them so as to produce geometry-based descriptor vectors, which are completely rotation invariant and perfectly
describe their 3D shape.
5.6.1. Preprocessing
Since the exact 3D position and radius of the protein’s atoms
is known from the available PDB file, the protein can be represented as a set of spheres. Then, the Solvent Excluded Surface is computed using the MSMS algorithm [SOS95].
and the Polar Fourier Transform, while the T function is the
Spherical Fourier Transform.
A more detailed description of the extraction of these descriptors is available in [DZA∗ 06]. The dimension of descriptor vectors is NFourier = 1080 for the descriptors based
on the Polar-Fourier 2D functional and NKrawtchouk = 1080
for the descriptors based on the Krawtchouk 2D functional.
5.6.3. Matching
Firstly, the descriptors are normalized so that their absolute
sum is equal to 1. Then, the Minkowski L1 distance is computed for a pair of descriptor vectors. The L1 distance is a
measure of dissimilarity between two descriptor vectors. In
order to transform this dissimilarity into a similarity metric, a decreasing sigmoid function was applied so that lowdissimilarity values are closer to 1 and high-dissimilarity
values are closer to 0.
6. Results
Figure 3: 3D representation of a protein with a) spheres and
b) Solvent Excluded Surface.
The protein is now represented as a triangulated mesh
which provides a sufficient approximation of the protein’s
3D shape. As a next step, a voxelization process, similar to
the one presented in [DZA∗ 06] takes place. More specifically, the 3D mesh is placed into a bounding cube, which is
partitioned in equal cube shaped voxels. Voxels that lie inside the 3D model or on the surface are assigned non-zero
values.
5.6.2. Descriptor Extraction
Every 3D object is expressed in terms of a binary volumetric function. In order to achieve translation invariance, the
center of mass of the 3D object is calculated and the model
is translated so that its center of mass coincides with the
coordinate system origin. Scaling invariance is also accomplished, by scaling the object in order to fit inside the unit
sphere. Then, a set of concentric spheres is defined. For every sphere, a set of planes which are tangential to the sphere
is also defined. Further, the intersection of each plane with
the object’s volume provides a spline of the object, which
can be treated as a 2D image.
Next, 2D rotation invariant functionals, F, are applied to
this 2D image, producing a single value. Thus, the result of
these functionals when applied to all splines, is a set functions defined on every sphere whose range is the results of
the functional. Finally, a rotation invariant transform, T, is
applied on these functions, in order to produce rotation invariant descriptors. For the needs of the SHREC, the implemented functionals F are the 2D Krawtchouk moments,
In this section, we present the perfomance evaluation results
of the track. Each participating group submitted one set of
results based on their selected set of parameters. This was
a blind experiment and each group could only submit one
set of results. Therefore, it was not possible for partipants to
tune the parameters of their algorithms.
Nearest neighbour : Table 3 summarizes the retrieval
rates for all the methods. There were five cases in which
none of the methods found the nearest neighbour. These
were: Q12 (1wwjA00), Q30 (1iicA02), Q40 (3c4aA01), Q43
(1nyaA00), and Q48 (3bioA02). In a further seven cases,
only one method found the nearest method as the top match.
However, there were 11 additional cases in which several
methods found the nearest neighbour as the second hit (i.e.
4 for GENOCRIPT, 3 for Group Integration, 3 for 3DBlast
and 1 for 3DZernike).
Method
3DBlast
3DZernike
GENOCRIPT
Contact Maps
Group Integration
Spherical Trace Transform
Correct Predictions
68%
8%
56%
80%
52%
0%
Table 3: Nearest neighbour results.
ROC plots : For each of the submitted result lists, a ROC
plot and its corresponding AUC was calculated. Figure 4
shows the resulting AUC of all methods for each target. Because early recognition of TPs is at least as important as obtaining a good overall AUC score, we also calculated another set of AUC values which correspond to the first part,
up to 10% of the database, of the ROC curves. An aggregate
c The Eurographics Association 2010.
Mavridis et al., / SHREC’10 Track: Protein Models
ROC plot was also calculated to summarize the overall performance of each method as a single ROC curve, as shown
in Figure 5.
1
0.8
0.4
0.8
3DBLAST
0.0
0.6
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
28
30
32
34
36
38
40
42
44
46
48
50
TPR
1
0.4
0.8
3D Zernike
0.0
0.4
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
26
Genocript
0.8
3DBlast
3DZernike
Genocript/D2
Contact Maps
Group Integration
Spherical Trace Transform
RANDOM
0.0
0.4
0.2
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
Contact Maps
0.4
0.8
0
0.0
0
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
30
32
34
36
38
40
42
44
46
48
50
0.2
0.4
0.6
0.8
1
0.06
0.08
0.1
FPR
0.0
0.4
0.8
Group Integration
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
26
28
1
0.0
0.4
0.8
Spherical Trace Transform
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
24
28
30
32
34
36
38
40
42
44
46
48
0.8
50
3D Zernike
0.08
0.10
26
0.02
0.04
TPR
0.06
0.6
0.4
0.00
0.00
0.02
0.04
0.06
0.08
0.10
3DBLAST
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
0.10
Contact Maps
0.2
0.06
0.04
0.08
0.06
0.04
0.08
0.10
Genocript
0.02
0
0.00
0.00
0.02
0
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
0.02
0.04
0.06
0.08
0.10
Spherical Trace Transform
0.00
0.00
0.02
0.04
0.06
0.08
0.10
Group Integration
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
1
3
5
7
9 11
14
17
20
23
26
29
32
35
38
41
44
47
50
Figure 4: Bar chart analyses for each method showing the
calculated AUC for each of the 50 query proteins. The upper
bar charts show the total AUC, whereas the lower bar charts
show the AUCs calculated for the top 10% of the database.
0.02
0.04
FPR
Figure 5: The upper figure shows aggregate ROC plots
for each method obtained when quering the 1000 protein
dataset using the 50 query proteins. The lower figure shows
an expanded view of the first 10% of the upper figure to highlight the early recognition behaviour of each method.
Contact Maps and 3D-Blast were conceived specifically to compare proteins structures, and these approaches
give the best results, although the Group Integration and
GENOCRIPT/D2 approaches also perform very well. The
Contact Maps and GENOCRIPT approaches both used a
preselection step to try to infer the CA backbone structure
of the corresponding proteins from simple geometrical invariants.
Both 3D-Blast and 3D-Zernike compare shapes globally, but 3D-Blast uses FFT-based rotational comparisons,
whereas 3D-Zernike uses a fast scale- and rotation-invariant
scoring technique derived from a spherical harmonic plus
Zernike polynomial expansion of each protein. The Spherical Trace Transform approach calculates scale- and rotationinvariant descriptors from 2D slices of the protein volumes
using polar Fourier transforms. The Group Integration approach constructs and compares group invariant descriptors
from the given atomic coordinates of each protein. With
the exception of the 3D-Zernike approach, which gave unexpectedly disappointing results, the general shape classification approaches also gave very encouraging predictions
when one considers the generic nature of those approaches
and the very tight timetable under which this experiment was
conducted.
Contact Maps compares proteins on the basis of conserved proximities between atoms, where Genocript encodes
the CA backbone structure of length N into a 16 valuedsequence of length (N-4).
Although in this experiment, some superfamilies may
have been easier to identify than others, it is worth noting
that no approach can reproduce the classification of the human experts in all cases. This suggests that protein model-
7. Conclusions
In this paper, we have presented and compared the perfomance of six algorithms submitted by the five research
groups who participated in this track.
c The Eurographics Association 2010.
Mavridis et al., / SHREC’10 Track: Protein Models
ing and classification is a difficult task for current 3D shape
recognition methods. Therefore adopting a benchmark based
on protein shape classification, such as the one presented
here, will provide a challenging dataset with which to evaluate new 3D object recognition algorithms.
References
[MGM08] M AK L., G RANDISON S., M ORRIS R.: An extension
of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison. Journal of Molecular Graphics and Modelling 26(7) (2008), 1035–
1045.
[Min]
M IN P.: Binary voxelation.
[Mor07] M ORIKAWA N.: Discrete differential geometry of tetrahedrons and encoding of local protein structure, 2007.
[AGM∗ 90] A LTSCHUL S., G ISH W., M ILLER W., M YERS E.,
L IPMAN D.: Basic local alignment search tool. J. Mol. Biol. 215
(1990), 403–410.
[MTB07] M. T EMERINAC M. R., B URKHARDT H.: Shrec 2007:
3d shape retrieval contest, protein retrieval track. Technical Report UU-CS-2007-015, R. C. Veltcamp and F. B. ter Haar (eds.)
(2007), 17–21.
[AYMD08] A NDONOV R., YANEV N., M ALOD -D OGNIN N.:
An efficient lagrangian relaxation for the contact map overlap
problem. In WABI ’08: Proc. of the 8th int. workshop on Algorithms in Bioinformatics (2008), Springer-Verlag, pp. 162–173.
[NK04] N OVOTNI M., K LEIN R.: Shape retrieval using 3d
zernike descriptors. Computer Aided Design 36(11) (2004),
1047–1062.
[BS01] B URKHARDT H., S IGGELKOW S.: Invariant features in
pattern recognition – fundamentals and applications. Nonlinear Model-Based Image/Video Processing and Analysis, edts.
C. Kotropoulos and I. Pitas, John Wiley & Sons (2001), 269–
307.
[BWF∗ 00] B ERMAN H., W ESTBROOK J., F ENG Z., G ILLILAND
G., B HAT T., W EISSIG H., S HINDYALOV I., B OURNE P.: The
protein data bank. Nucleic Acids Research 28 (2000), 235–242.
[CCI∗ 04] C APRARA A., C ARR R., I SRAIL S., L ANCIA G.,
WALENZ B.: 1001 optimal PDB structure alignments: Integer
programming methods for finding the maximum contact map
overlap. J. Comput. Biol. 11, 1 (2004), 27–52.
[CSL∗ 08] C UFF A., S ILLITOE I., L EWIS T., G ARRATT O.,
T HORNTON J., O RENGO C.: The cath classification revisited
– architectures reviewed and new ways to characterize structural
divergence in superfamilies. Nucleic Acids Research 37 (2008),
310–314.
[DZA∗ 06]
D ARAS P., Z ARPALAS D., A XENOPOULOS A., T ZO D., S TRINTZIS M.: Three-dimensional shape-structure
comparison method for protein classification. IEEE/ACM transactions on Computational Biology and Bioinformatics 3(3)
(2006), 193–207.
VARAS
[Ega75] E GAN P.: Signal detection theory and roc analysis. Academic Press: New York (1975).
[GIP99] G OLDMAN D., I STRAIL S., PAPADIMITRIOU C.: Algorithmic aspects of protein structure similarity. In FOCS ’99:
Proc. of the 40th Annual Symposium on Foundations of Computer Science (1999), IEEE Computer Society, pp. 512–521.
[Nov]
N OVOTNI M.: 3d zernike descriptors.
[OT96] O RENGO C., TAYLOR W.: Ssap: sequential alignment
program for protein structure comparison. Methods Enzymol 266
(1996), 617–635.
[RB06] R EISERT M., B URKHARDT H.: Invariant features for
3d-data based on group integration using directional information
and spherical harmonic expansion. Proceedings of the ICPR’06,
Hong Kong (2006).
[RK00] R ITCHIE D., K EMP G.: Protein docking using spherical polar fourier correlations. Proteins: Struc. Funct. Genet. 39
(2000), 178–194.
[SOS95] S ANNER M., O LSON A., S PEHNER J.: Fast and robust
computation of molecular surfaces. In In the 11th ACM Symposium on Computational Geometry (1995).
[SOS96] S ANNER M., O LSON A., S PEHNER J.: An extension of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison.
Biopolymers 38(3) (1996), 305–320.
[TRB07] T EMERINAC M., R EISERT M., B URKHARDT H.: Invariant features for searching in protein fold databases. International Journal on Computer Mathematics , ’Special Issue on
Bioinformatics’ 84(5) (2007), 635–651.
[VLYK09] V ENKATRAMAN V., L EE S., YANG Y., K IHARA D.:
Protein-protein docking using region-based 3d zernike descriptors. BMC Bioinformatics 10 (2009), 407–428.
[XS07] X IE W., S AHINIDIS N.: A reduction-based exact algorithm for the contact map overlap problem. Journal of Computational Biology 14, 5 (2007), 637–654.
[HS95] H OLM L., S ANDER C.: Dali: a network tool for protein
structure comparison. Trends in Biochemical Sciences 20 (1995),
478–480.
[LLL∗ 08]
L EE S., L I B., L A D., FANG Y., R AMANI K., RUS R., K IHARA D.: Fast protein tertiary structure retrieval
based on global surface shape similarity. Proteins: Structure,
Function, and Bioinformatics 72(4) (2008), 1259–1273.
TAMOV
[LP85] L IPMAN D., P EARSON W.: Rapid and sensitive protein
similarity searches. Science 227 (1985), 1435–1441.
[MBHC95] M URZIN A., B RENNER S., H UBBARD T., C HOTHIA
C.: Scop: a structural classification of proteins database for
the investigation of sequences and structures. J. Mol. Biol. 247
(1995), 536–540.
[MD10] M ALOD -D OGNIN N.: Protein Structure Comparison:
From Contact Map Overlap Maximisation to Distance-based
Alignment Search Tool. PhD thesis, University of Rennes 1,
2010.
c The Eurographics Association 2010.
View publication stats