Omega: Majid Mohammadi, Jafar Rezaei

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Omega 96 (2020) 102254

Contents lists available at ScienceDirect

Omega
journal homepage: www.elsevier.com/locate/omega

Ensemble ranking: Aggregation of rankings produced by different


multi-criteria decision-making methods ✩
Majid Mohammadi a,b,∗, Jafar Rezaei a
a
Faculty of Technology, Policy, and Management, Delft University of Technology, The Netherlands
b
The Jheronimus Academy of Data Science, s-Hertogenbosch, The Netherlands

a r t i c l e i n f o a b s t r a c t

Article history: One of the essential problems in multi-criteria decision-making (MCDM) is ranking a set of alternatives
Received 23 July 2019 based on a set of criteria. In this regard, there exist several MCDM methods which rank the alternatives
Accepted 20 March 2020
in different ways. As such, it would be worthwhile to try and arrive at a consensus on this important
Available online 25 March 2020
subject. In this paper, a new approach is proposed based on the half-quadratic (HQ) theory. The pro-
Keywords: posed approach determines an optimal weight for each of the MCDM ranking methods, which are used
MCDM to compute the aggregated final ranking. The weight of each ranking method is obtained via a minimizer
Half-quadratic function that is inspired by the HQ theory, which automatically fulfills the basic constraints of weights
Ensemble ranking in MCDM. The proposed framework also provides a consensus index and a trust level for the aggregated
Ontology alignment ranking. To illustrate the proposed approach, the evaluation and comparison of ontology alignment sys-
tems are modeled as an MCDM problem and the proposed framework is applied to the ontology align-
ment evaluation initiative (OAEI) 2018, for which the ranking of participating systems is of the utmost
importance.
© 2020 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)

1. Introduction One of the main controversial issues in this area is that dif-
ferent MCDM methods, even when they use the same input, pro-
Multi-criteria decision-making (MCDM) is a branch of Opera- duce different and potentially conflicting rankings, which means
tions Research that has numerous applications in a variety of ar- that finding an overall aggregated ranking of alternatives is of the
eas involving real decision-making problems. In a typical MCDM essence. Some studies ignore the existence of such a conflict [29],
problem, K alternatives are evaluated on the basis of n criteria, and or use a simple ranking statistic, like averages [43], while yet other
the outcome of the evaluation is summarized in a so-called perfor- methods attempt to reconcile the difference and work out a com-
mance matrix, within which MCDM methods are used to select the promise [28,42]. Ku et al. [28] estimate the weight for each MCDM
best, sort, or rank the alternative(s). The focus of this study is on method based on the Spearman’s correlation coefficient. The un-
ranking, where a set of K alternatives needs to be ranked. There derlying idea is that if the ranking of an MCDM method devi-
exist several MCDM methods which can be used for the rank- ates from those of other methods, it would then be assigned a
ing problem, including value and utility-based methods such as lower weight. As such, the weight of each MCDM ranking is com-
AHP (analytic hierarchy process) [48], ANP (analytic network pro- puted using the correlation coefficient. By the same token, Ping
cess) [49], BWM (best-worst method) [47], SMART (simple multi- et al. [42] has proposed an optimization problem to determine the
attribute rating technique) [14], and Swing [36], and also the out- weight of each individual MCDM method and then aggregate them
ranking methods like ELECTRE (ELimination and Choice Expressing accordingly. The optimization problem assumes that the final ag-
REality) and its extensions [17], and PROMETHEE (Preference Rank- gregated ranking is a weighted linear combination of the rankings
ing Organization METHod for Enrichment of Evaluations) and its provided by different MCDM methods, and it tries to determine the
extensions [7]. For more information about popular MCDM meth- weights accordingly. Although these methods do come up with a
ods, see [55]. final aggregated ranking, they do not provide any further informa-
tion about the consensus or reliability of the aggregated ranking.

In this paper, a new ensemble method is proposed based on
This manuscript was processed by Associate Editor Triantaphyllou.

the half-quadratic (HQ) theory [18,19,37]. In this regard, a new
Corresponding author.
E-mail address: [email protected] (M. Mohammadi).
model is proposed based on a general non-convex HQ function,

https://doi.org/10.1016/j.omega.2020.102254
0305-0483/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)
2 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

and the procedure involved in determining the optimal solution to In summary, this paper makes the following contributions:
the given minimization is provided with guaranteed convergence. • A new approach for ensemble ranking is proposed based on
Although no weights for the MCDM methods are considered ex-
the HQ theory.
plicitly, the proposed model estimates a weight for each of the • The proposed method can assign weights objectively to the
MCDM methods by using the so-called minimizer function inspired
MCDM methods being used, since no decision-maker is in-
by the HQ theory, whose estimation improves adaptively through-
volved in determining the weights of the final aggregated
out the optimization procedure. An MCDM method whose ranking
ranking.
is different from those of most of the other MCDM methods be- • The proposed method can also be used to compute a con-
ing used is treated as an outlier in the proposed framework and,
sensus index and a trust level for the final aggregated rank-
as such, is assigned a lower weight. The aggregated final ranking
ing.
is also obtained by the weighted combination of rankings of the • As a real-world implementation, we study the ranking of on-
MCDM methods being used, which means that the methods whose
tology alignment systems with respect to multiple perfor-
rankings deviate from others will have a lower impact on the final
mance metrics. Such a ranking is of the utmost importance,
ranking. Although the proposed model is unconstrained, interest-
particularly for the OAEI where there is a competition in-
ingly, the computed weights by the minimizer function preserve
volving several standard benchmarks. The proposed ensem-
the non-negativity and unit-sum properties, that are required for
ble method can be used in other ontology alignment bench-
the MCDM methods. The proposed compromise method is also ob-
marks as well as any other MCDM problem that uses multi-
jective, since it does not need to elicit preferences from decision-
ple MCDM methods.
makers. However, the MCDM methods being used in the frame-
work could belong to either class of MCDM methods (subjective or The remainder of this article is structured as follows. In
objective). Section 2, we present the proposed ensemble method, followed
For some of the HQ functions, there are parameters that have by an overview of MCDM methods being used in Section 3.
to be tuned. To that end, we take advantage of several recent stud- Sections 4 and 5 are devoted to our real-world implementation
ies to tune the parameters efficiently [22,24]. Having such param- of the proposed method in ontology alignment, while the lessons
eters helps compute a consensus index and trust level based on learned are discussed in Section 6, and conclusions and future re-
the computed weights. The outcome of the proposed method is to search directions are presented in Section 7. The MATLAB code and
determine the weights of MCDM methods and compute the final the MS Excel solver of the proposed method are freely available at
aggregated ranking of alternatives, as well as two indicators show- https://github.com/Majeed7/EnsembleRanking.
ing the level of agreement and reliability of the final aggregated
ranking. 2. Ensemble ranking: A half-quadratic programming approach
As a real-world implementation, we study the evaluation and
comparison of ontology alignment systems by using different The MCDM methods may provide different rankings for the
MCDM methods. Such a comparison is of the essence for two ma- same problem because they use different mechanisms, making it
jor reasons. First, there are numerous ontology alignment systems hard to provide sufficient support for the ranking of one MCDM
in the existing literature [13,16,25,35,46,59], each claiming to be method compared to the others. As such, in this section, a compro-
superior to the other available systems. To support that claim, the mise method is developed to estimate the final ranking of all al-
developers of the systems involved typically look at solely one per- ternatives based on the rankings of different MCDM methods. The
formance score, on which the claim of superiority is based. If there proposed method utilizes the HQ theory which results in estimat-
are multiple benchmarks, the average of these scores is computed ing a weight for each of the MCDM methods. The weights obtained
and regarded as the overall performance representation. However, by the method satisfy the non-negativity and unit-sum properties,
the main drawback of using averages is that it only allows a com- which are necessary for the MCDM methods. In addition, the pro-
parison on the basis of one performance score. As a result, it is not posed method is objective, since the weights are computed with-
possible to take into account different facets of a system measured out any expert input. Another important property of the proposed
by several metrics. For instance, an important criterion for align- method is that, in contrast to averaging, it is insensitive to out-
ment is execution time, which also has to be included in an eval- liers, owing to the use of the robust HQ functions. For aggregating
uation and comparison. Here, we formulate the comparison of on- MCDM rankings, outliers are indeed the rankings that are different
tology alignment systems as an MCDM problem, where the perfor- from the majority of rankings, which means that it is to be ex-
mance metrics are the criteria, and the ontology alignment systems pected that they contribute less to the final aggregated ranking. In
are the alternatives. Consequently, the decision which system is su- addition to the aggregated ranking, a consensus index and a trust
perior is transformed into an MCDM problem, making it possible to level are calculated for the aggregated ranking. In the following,
compare the systems based on multiple metrics. The second reason we first explain the notations used in the study which follows by
for using MCDM methods to assess alignment systems is the com- reviewing the fundamentals of the HQ theory.
petition that exists in the ontology alignment evaluation initiative We begin by explaining the notations used in this article. The
(OAEI), with several standard benchmarks in divided tracks with alternatives are referred to as Ai , i = 1, 2, . . . , K, while the perfor-
an available reference (or gold standard). Within that competition, mance metrics or criteria are denoted by Pj , j = 1, 2, . . . , n. Thus,
the participating systems conduct the alignment on the given on- there are K alternatives which are evaluated with respect to n cri-
tologies, and their outcome is then juxtaposed with the reference teria (or performance metrics). Furthermore, the matrix contain-
for evaluation. In addition, there are various performance metrics ing all performance scores are shown as X, and Xi. , X.j , Xij refer-
for different benchmarks, making the final ranking of the systems, ring to the ith row, the jth column, and the element at the ith row
which is potentially one of the principal goals of the competition and the jth column, respectively. By the same token, the ith ele-
in the first place, much more difficult. In this paper, we review ment in a vector like s is  shown by si . Also, we show the Eu-
s
the performance metrics for five OAEI tracks, and apply the MCDM clidean norm with e2 = i=1 ei , ∀e ∈ R . The ranking of the
2 s

methods along with the proposed ensemble method to determine alternatives computed by the m MCDM method is shown as Rm ,
th

the final ranking of the systems. The methodology proposed in this m = 1, . . . , M, and the final aggregated ranking is shown by R∗ . In
paper can also be used by the OAEI organizers to evaluate the par- addition, the ranking of alternative k obtained by method m and
ticipating systems with respect to multiple performance metrics. by the aggregated ranking are shown by Rm k
and R∗k , respectively.
M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254 3

Table 1
Different M-estimators and their corresponding minimizer function δ (.) based on the HQ multiplicative form. β is a positive constant,
and σ or γ are the parameters of the HQ functions.

estimators l1 -l2 fair log-cosh Welsch Huber


 
s2j
s2j |s j | |s j | s2 , |s j | ≤ γ
HQ function g(sj ) β+ σ −1 β − log (1 + β ) log(cosh(β sj )) 1 − exp(− σ j2 ) 2
γ |s j | − γ2 , |s j | > γ
2 2


β s2 1, |s j | ≤ γ
Minimizer Function δ (sj ) 1 1
tanh(β s j ) exp(− σ j2 ) γ
β +s2j β ( β +|s j | ) sj
|s j | , |s j | > γ

2.1. Half-Quadratic minimization shown a promising performance in a variety of problems and it


is known to be the most promising and outlier-robust estimator
In this section, we review the fundamental theory of the HQ among the HQ functions [23]. Second, we can calculate a consen-
minimization, introduce the appropriate HQ functions and look at sus index and a trust level if the Welsch estimator is used.
the minimization procedure of the HQ programming.
The Euclidean norm is arguably the most popular loss func- 2.2. An HQ-based compromise method
tion used in various circumstances, while least square fitting is the
most popular regression technique that utilizes the Euclidean norm The proposed ensemble method can be used for any number of
as the loss function. Although it is simple and also yields a closed- MCDM methods. In this regard, assume that there are M MCDM
form solution, it is highly sensitive to outliers and shows dimin- methods which rank K alternatives on the basis of n criteria.
ished performance in noisy environments. A viable way to solve A simple yet practical solution to estimate the overall ranking
that sensitivity is to use various robust estimators. In robust statis- R∗ is to minimize its Euclidean distance to each computed ranking.
tics, M-estimator is a family of the robust estimators, by which the The corresponding minimization is,
HQ functions are inspired. Although these functions are not con-
1 m
M
vex, their optimum can be obtained using HQ minimization with min R − R∗ 22 , (5)
guaranteed convergence. Table 1 tabulates the HQ functions g(.) ∗
R 2
m=1
along with their minimizer functions δ (.) that are used in the op-
timization procedure. where M is the number of MCDM methods and Rm is the rank-
Consider the following minimization, ing of the mth MCDM method. Minimization (5) has the following
 closed-form solution,
min g( s j ) , (1)
1  m
s M
j
R∗ = R , (6)
M
where g(.) is one of the HQ functions tabulated in Table 1. To solve m=1
problem (1), there are two forms of the HQ programming (multi- which is indeed the average of the rankings produced by differ-
plicative [18] and additive [19]) that can efficiently find a local op- ent methods. However, averages are not reliable estimators, since
timal solution. Both forms have been applied to different areas, in- they are sensitive to outliers [11], like other methods using the Eu-
cluding robust estimation [34,57], signal processing [33,38,58], im- clidean norm as their basic loss function. In aggregating rankings,
age processing [21,23], and machine learning [22,24]. In this pa- it means that, if one MCDM method has a distinct ranking from the
per, we use the multiplicative form since its optimization proce- other methods, it can significantly influence the aggregated rank-
dure can be interpreted meaningfully within MCDM. ing. Instead, we utilize the HQ functions, which are potentially in-
Based on the multiplicative form of the HQ programming sensitive to outliers [26], as well as allowing us to compute a con-
[18,37], problem (1) can be rewritten as sensus index and trust level for the final aggregated ranking.

min w j s2j + ψ (w j ), (2) The proposed optimization problem to estimate R∗ is,
s,w
1
j M

where wj > 0 is the HQ auxiliary variable, and ψ (.) is the convex min g(  R m − R ∗  2 ) , (7)

R 2
m=1
conjugate of g(.) defined as [5],
 where g(.) is an HQ function. Although minimization (7) is not con-
ψ w j = max ew j − g(e ). (3)
e vex, it can be solved efficiently using half-quadratic programming
To solve minimization (2), variables w and s must be updated [18,37]. Using the HQ multiplicative form as in equation (2), mini-
iteratively until convergence is reached. Based on the HQ multi- mization (7) can be restated as,
plicative theory [18], the update of variables is as follows: 
M

wl+1 = δ (slj ), min J ( R∗ , α ) = αm Rm − R∗ 22 + ψ (αm ), (8)


j R ,α

 m=1
sl+1 = arg min wl+1 s2j , (4)
s j where α ∈ RM is the half-quadratic auxiliary variable. According to
j
the HQ programming, the following steps must be iterated until
where δ (.) is the minimizer function with respect to g(.) (see convergence for the two variables is reached,
Table 1), and l and l + 1 represent the iteration counter.

In the next section, a new compromise method is developed αm = δ Rm − R∗ 2 , m = 1, . . . , M,
based on the multiplicative HQ minimization, and it is shown that
the auxiliary variable w would play the role of weights in the 
M

MCDM problems. Since the value of w is reliant on the type of R∗ = arg min

αm Rm − R∗ 22 . (9)
R
m=1
HQ function g(.), different HQ functions would result in differ-
ent weights and different final aggregated ranking. We particu- The solution to the first step is obtained by the minimizer func-
larly consider the Welsch M-estimator, for two reasons. First, it has tion tabulated in Table 1, and the optimum for the second step is
4 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

obtained by setting the derivative of the objective function equal 2.3. Consensus index and trust level
to zero, i.e.,
The weight of each MCDM method differs with respect to the
dJ 
M
HQ function in question, since δ (.) relies on the g(.) function. Con-

=0⇒ αm (Rm − R∗ ) = 0
dR sequently, various HQ functions would result in different weights
m=1
and a different final aggregated ranking. Among the HQ functions,

M 
M
the Welsch estimator has shown a promising performance in a
⇒ R∗ αm = αm Rm
number of domains [22,24]. Interestingly, it is possible to obtain
m=1 m=1
a consensus index and trust level using this estimator, owing to its

M
αm use of the Gaussian distribution in the formulation. Prior to obtain-
⇒ R∗ = wm Rm , where wm = M . (10)
m=1 j=1 αj ing the consensus index and trust level, we first need to discuss
tuning the parameter σ in the Welsch estimator. As a recent study
Thus, the final aggregated ranking is computed as the weighted has indicated [24], the parameter of this estimator can be tuned
sum of all the MCDM rankings, with the weights being computed recursively in each iteration as,
by the minimizer function. Interestingly, the weights of MCDM M
Rm − R∗ 22
rankings in (10) are non-zero and fulfill the unit-sum property, σ= m=1
. (13)
which are the requirements for the MCDM methods. Note that the 2K 2
optimization problem is unconstrained and these properties are After computing σ in the optimization procedure, we now dis-
fulfilled, thanks to the use of the HQ functions. cuss the consensus index and the trust level of the final ranking
Algorithm 1 summarizes the overall procedure of the proposed obtained by Algorithm 1.
ensemble ranking of MCDM methods.
Definition 2.4 (Consensus Index). A consensus index C shows the
extent to which all MCDM methods agree upon the final ranking.
Algorithm 1 Ensemble Ranking.
The key element in this definition is that the consensus index
Input: Rankings Rm , m = 1, 2, . . ., M.
shows the agreement among all the ranking methods being used,
while NotCongverged do
allowing us to compute the similarity of each ranking with the fi-
αm = δ (R
m − R∗  ),
2 m = 1, 2, . . ., M
nal aggregated ranking, thanks to the Welsch estimator. As a result,
wm = αm / j α j , m = 1, 2, . . ., M
 the consensus index C of a given final ranking R∗ with respect to
R∗ = m wm Rm
rankings Rm , m = 1, 2, . . . , M can be computed as,
end while
Output Final Ranking R∗ , α 1 
K M
Nσ (R∗k − Rm )
C ( R∗ ) = qkm , qkm = k
, (14)
KM Nσ ( 0 )
k=1 m=1
The following lemma guarantees the convergence of this algo-
where Nσ (. ) is the probability density function of the Gaussian
rithm.
distribution with a mean of zero and a standard deviation of σ ,
Lemma 2.1. The sequence {(α l , R∗l ), l = 1, 2, . . .} generated by and Nσ (0 ) is used to normalize the similarity computation, thus
Algorithm 1, where l indicates the iteration number, converges. qkm , C(R∗ ) ∈ [0, 1]. If there is a complete agreement between dif-
ferent rankings, then
Proof. The function δ (.) has the following property [37], Nσ ( 0 )
qkm = = 1, ∀k, m, σ ,
J (α l+1 , R∗l+1 ) ≤ J (α l , R∗l+1 ), (11) Nσ ( 0 )
that results in a consensus index of one. As rankings deviate from
where R∗ is assumed to be fixed. Similarly, the sequence of R∗ is
each other, the consensus index decreases. As a result, the consen-
decreasing since J is convex, e.g.,
sus index is an indicator of the agreement among different rank-
J (α l+1 , R∗l+1 ) ≤ J (α l+1 , R∗l ). (12) ings. It means that, if there is one ranking method that is different
from the rest, it can adversely affect the consensus index. At the
Thus, the sequence same time, this distinct ranking method is treated as an outlier in
the HQ functions being used. As a result, it will have less impact
{. . . , J (α l , R∗l ), J (α l+1 , R∗l ), J (α l+1 , R∗l+1 ), . . .}
on the final ranking, while it can profoundly influence the consen-
converges as l → ∞ since J is bounded.  sus index.

Remark 2.2. The proposed ensemble method is predicated on the Definition 2.5 (Trust Level). A trust level T for ensemble ranking is
fact that proper ranking methods are used, since the final ag- the degree to which one can accredit the final aggregated ranking.
gregated ranking is naturally dependent on the ranking methods The trust level is an indicator of reliability of the final rank-
in question. If we add or remove a ranking method, the aggre- ing. For instance, if there is an MCDM ranking that deviates sig-
gated ranking is likely to change. However, in cases which include nificantly from the majority of rankings, it takes a lower weight in
a significant number of methods, the proposed method is much Algorithm 1, and consequently, has less of an impact on the final
less sensitive to adding or removing a ranking method. As such, ranking. Since the weight of such a method is lower than that of
the proposed method can be particularly useful in voting systems the other methods, it should also have less impact on the trust
which usually contain a considerable number of votes. level. Taking this into account, the trust level can be computed
as,
Remark 2.3. The methods for ensemble ranking are useful for the
1 
K M
case where there is no prior information about the suitability of
one specific ranking method. In this situation, the rankings of dif- T ( R∗ ) = wm qkm , (15)
K
ferent methods are treated equally a priori, and finding an aggre- k=1 m=1

gated ranking is desired, typically by working out a compromise where wm , m = 1, . . . , M, is computed in Algorithm 1. Thus, the
between different rankings. trust level is distorted to a lesser extent by the rankings that
M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254 5

alternatives based on their distances to the two computed solu-


tions. The alternatives are ranked based on their closeness to the
positive-ideal solution and their distance from the negative-ideal
solution.
While TOPSIS has many variations and extensions [1,8,10], in
this study, we adopt the original version proposed in [41]. The
ranking process in TOPSIS includes the following steps:
Step 1: First, the performance matrix should be normalized. The
elements of the normalized matrix Xˆ are calculated as,
Xk j
Xˆk j = , k = 1, 2, . . . , K, j = 1, 2, . . . , n. (16)
X. j 
Step 2: Find the positive-ideal solution S+ = (S1+ , S2+ , . . . , Sn+ ),
where S+ = maxk Xˆk j for benefit criteria, e.g., profit, and S+ =
j j
mink Xˆk j for cost criteria, e.g., time.
Step 3: Find the negative-ideal solution S− = (S1− , S2− , . . . , Sn− ),
where S− = mink Xˆk j for benefit criteria, and S− = maxk Xˆk j
j j
for cost criteria.
Step 4: Calculate the Euclidean distance to the positive-ideal
and negative-ideal solutions for each alternative. For the kth
alternative, the distance to the ideal solution, D+
i
, and to the
negative-ideal solution, D−i
, is computed as

D+
k
= Xˆk. − S+ , D−
k
= Xˆk. − S− . (17)
Step 5: Calculate the ratio Lk for each alternative as
D−
k
Fig. 1. The implementation process of the proposed ensemble ranking to a Lk = , k = 1, . . . , K. (18)
decision-making problem. D+
k
+ D−
k
Step 6: Rank the alternatives according to their ratios Lk in a
descending order.
are different from the majority of rankings, and it is a measure-
ment of the reliability of the aggregated ranking R∗ computed by 3.2. Vlsekriterijumska optimizacija i kompromisno resenje (VIKOR)
Algorithm 1. It is evident from equation (15) that the trust level is
equivalent to the consensus index if the weights of MCDM meth- VIKOR is another MCDM method that ranks the alternatives
ods, i.e., wm , m = 1, 2, . . . , M, are identical. based on a set of possibly conflicting criteria. The procedure used
Fig. 1 summarizes the implementation process of the proposed in VIKOR can be summarized as follows [39,40].
ensemble ranking to a decision-making problem.
Step 1: Find the best f + and the worst f − values among the
alternatives for all criteria. For the benefit criteria, we have
3. Three MCDM methods for illustrating the proposed approach
f j+ = max Xi j , j = 1, 2, . . . , n,
i
There exist several MCDM methods which can be used for the
f j− = min Xi j , j = 1, 2, . . . , n, (19)
ranking problem (see [55] for an overview). In this study, three dif- i
ferent MCDM methods (TOPSIS, VIKOR, and PROMETHEE) are se- where the minimum and maximum are substituted if it is
lected to illustrate the proposed ensemble ranking method. These the cost criteria.
methods are used (in the next section) to rank alignment systems Step 2: For each alternative, compute Si and Ri as
with respect to several performance metrics (criteria). We selected n f+ − X
 j ij
these three methods as they are among popular methods in the Si = ,
MCDM field (see, for instance, [12,32,44] for the applications of f j+ − f j−
j=1
TOPSIS, [2,4,50] for the applications of VIKOR, and [3,20,31] for the f + − Xi j
applications of PROMETHEE). Secondly, compared to many other Ri = max
j
. (20)
MCDM methods, they can be used in an objective way, without j f j+ − f j−
having to include the opinions of experts or users. In addition, they Step 3: For each alternative, calculate Qi as
were selected because of their ability to rank alternatives, which
Si − S + Ri − R+
implies that other MCDM methods, which are devised for other Qi = ν − +
+ (1 − ν ) − ,
purposes (such as sorting or selecting), are not appropriate for this S −S R − R+
study, although that does not mean that the three MCDM methods S+ = min Si , S− = max Si ,
i i
being used in this study are the only usable methods, nor does the
proposed method rely on the number of MCDM methods. R+ = min Ri , R− = max Ri , (21)
i i

where ν ∈ [0, 1] is a trade-off parameter. It is the common


3.1. Technique for order preference by similarity to ideal solution practice to set ν = 0.5.
(TOPSIS) Step 4: Ranking the alternatives based on their corresponding
Qi in descending order.
TOPSIS is one of the popular MCDM methods for ranking al- Step 5: For two alternatives Ai and Ak , Ai is given a better rank-
ternatives with respect to a set of criteria [56]. It first identifies ing than Ak if: (a) Qi − Qk > 1/( j − 1 ); and (b) Ai has a better
the positive-ideal and negative-ideal solutions and then ranks the ranking according to Si and/or Ri .
6 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

3.3. Preference ranking organization METHod for enrichment of • e and e


are two entities from O and O
, respectively;
evaluations (PROMETHEE) • rel denotes the relation of two entities e and e
, e.g., equiva-
lence, subsumption;
PROMETHEE uses pairwise comparison between different alter- • d ∈ [0, 1] is the degree of the correspondence confidence.
natives to establish a ranking. And while PROMETHEE I [6] con-
ducts partial pairwise comparison and computes the ranking ac- Definition 4.3 (Alignment [15]). Given two ontologies O and O
, an
cordingly, PROMETHEE II [54], on the other hand, uses complete alignment is a set of correspondences mapping the concepts of two
pairwise comparison, which is required for the proposed ensemble ontologies in question.
method and makes it also more suitable to rank the alignment sys-
tems. The ranking procedure used by PROMETHEE II is as follows. 4.2. Performance metrics

Step 1: For i, k = 1, 2, . . . , K, compute the function π ik as the Alignment is the typical outcome of the ontology alignment
number of criteria in which Ai has better performance than systems, based on which different systems are evaluated and com-
Ak , e.g., pared. In addition, several standard benchmarks with a known ref-

n erence alignment have to be included, so that the evaluation can
πik = I (Xi j > Xk j ), i, k = 1, 2, . . . , K, (22) be made by the juxtaposition of the reference and the alignment
j=1 generated by a system. The three widely-used performance metrics
for ontology alignment are precision, recall, and F-measure. Given
where I is the Dirac function which is 1 when the condition
an alignment A and the reference A∗ , precision is the ratio of true
in the parenthesis is satisfied, and 0 when it is not.
positives to the total correspondences in the generated alignment
Step 2: Calculate the positive φ + and negative φ − outranking
by a system; thus, it can be written as
flow and the net flow φ for each alternative as,
|A ∩ A∗ |
1 
K
1 
K
P r (A, A∗ ) = (25)
φ +
( Ai ) = πik , φ −
( Ai ) = πki , (23) |A|
K −1 K −1
k=1 k=1
where Pr is the precision and |.| is the cardinality operator.
Recall is another popular metric, which is computed as the ratio
φ ( Ai ) = φ + ( Ai ) − φ − ( Ai ). (24) of the true positives to the total number of correspondences in the
Step 3: Rank in decreasing order the alternatives based on their reference. Thus, it can be computed as
net flow. |A ∩ A∗ |
Re(A, A∗ ) = (26)
|A∗ |
4. Fundamentals of ontology alignment evaluation
where Re is recall.
In this section, we first review the basic concepts of ontology Both precision and recall represent only one aspect of the align-
and ontology alignment, and then discuss the metrics to evaluate ment systems; the former only considers the correctness of the
the alignment systems. alignment, while the latter accentuates the completeness of an
alignment with respect to the reference. As a combination of both,
4.1. Ontology and ontology alignment F-measure is often used. It is the harmonic mean of the precision
and recall and is computed as
An ontology contains the concepts of a domain, along with their P r (A, A∗ ) × Re(A, A∗ )
properties and relationships. The following definition explains the F-measure(A, A∗ ) = 2 .
P r (A, A∗ ) + Re(A, A∗ )
ontology in a formal manner.
We do not include F-measure in this study since it is the av-
Definition 4.1 (Ontology [15]). An ontology O is a set of the fol- erage of precision and recall, which violates the independence of
lowing 4-tuples criteria required for the MCDM methods. Aside from these pop-
O = (C, P rop, Ob jP rop, Ins ) ular performance metrics, there are two important principles for
a given alignment. The first is conservativity [52,53], which states
where that, with regard to the alignment being generated, the system
• C contains all classes in the ontology representing the concepts; must not impose any new semantic relationship between the con-
• Prop is the collection of data properties describing the classes cepts of the ontologies involved. The second is consistency, which
within the ontology; states that the discovered correspondences should not lead to un-
• ObjProp is the group of object properties representing the rela- satisfiable classes in the merged ontology [53].
tions of classes within the ontology; There is also a metric called Recall+, which indicates the por-
• Ins is the set of individuals instantiated from classes, properties, tion of correspondences that a system cannot readily detect. When
or object properties. this performance metric has a higher value, that indicates that the
associated system is able to identify the most non-trivial, i.e., non-
All the classes, properties, and object properties are called the syntactically identical, correspondences between two given ontolo-
entities of an ontology. The design of an ontology is subjective, gies. In addition, the execution time is another important indicator
so two ontologies describing the same domain can have a distinct of the performance of the alignment systems, that also has to be
structure/terminology, which means that ontology alignment is re- taken into account.
quired to deal with this discrepancy. We now consider the rudi-
mentary concepts of ontology alignment.
4.3. Participating systems and standard benchmarks: Five OAEI tracks
Definition 4.2 (Correspondence [15]). To match the ontologies O
and O
, a correspondence is as a set of 4-tuples To determine some of the performance metrics, we need to
have the underlying true alignment of the ontologies in ques-
< e, e
, rel, d >
tion, for which we use the benchmarks of five different tracks of
where the OAEI whose reference alignment are also available. The tracks
M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254 7

Table 2 the weight of VIKOR is relatively high and is close to one, while
The selected performance metrics of five tracks of the OAEI.
the weights of the other two methods are lower and close to zero,
OAEI track Performance metrics/indicators which means that the proposed ensemble method favors the mid-
Anatomy time, precision, recall, recall+, consistency dle ground ranking among these three MCDM methods. Since two
Conference precision, recall, conservativity, consistency methods have different rankings compared to the aggregated fi-
LargeBioMed time, precision, recall nal ranking, the consensus index is not high at around 0.80. At
Disease and Phenotype time, precision, recall the same time, the trust level is 1.00 because the weights of two
SPIMBENCH time, precision, recall
MCDM methods are nearly zero so that they cannot affect this in-
dicator. This table shows that AML, LogMap, and XMap are listed
as the top three systems in this task.
are anatomy, conference, largeBioMed (large biomedical track), dis- In addition, Table 5 shows the ranking of participants in match-
ease and phenotype, and SPIMBENCH. By revising the history of the ing FMA and SNOMED. This table is similar to Table 4, since VIKOR
tracks in the OAEI competition1 , as well as asking the organizers has a higher weight compared to the other methods, with its rank-
of the tracks, the appropriate performance metrics for each of the ing situated between the other rankings. The consensus index for
tracks listed above are obtained. Table 2 tabulates the performance the final ranking is 0.80, while the trust level is 0.98. Similarly,
metrics for all five tracks. Table 6 shows the ranking of seven systems participated in match-
According to Table 2, the execution time is essential to all ing NCI to SNOMED. According to this table, VIKOR once more has
tracks, with the exception of conference, since the size of ontolo- a higher weight, and as a result, the final consensus index is 0.80,
gies in this track is small (i.e., < 100 entities) and the systems with a trust level of 0.98. According to Tables 5 and 6, AML and
are therefore able to perform the alignment swiftly. Furthermore, LogMap are the top two systems in aligning FMA to SNOMED as
precision and recall are important in all tracks. However, we did well as NCI to SNOMED.
not include F-measure, since it is the harmonic mean of precision
and recall. In other words, since the evaluation based on MCDM 5.2. Disease and Phenotype Track
includes both precision and recall, using F-measure is a redun-
dancy. In addition, the criteria must be independent of each other The OAEI disease and phenotype track comprises matching var-
in MCDM, which means that using F-measure would invalidate the ious disease and phenotype ontologies. The OAEI 2018 consisted of
overall ranking computed by various MCDM methods. two tasks. The first one to align the human phenotype (HP) on-
The evaluation is conducted on the alignment systems took part tology to the mammalian phenotype (MP), the second to align the
in the OAEI 2018. The exhaustive list of the participating systems in human disease ontology (DOID) and the orphanet and rare diseases
one or multiple of the five tracks are AML [16], LogMap, LogMap- ontology (ORDO). The performance metrics used for this track are
Bio, and LogMapLite [13], SANOM [35], DOME [25], POMAP++ [30], execution time, precision, and recall.
Holontology [45], ALIN [51], XMap [59], ALDO2Vec [46], FCAMapX In the OAEI 2018, eight systems were able to align HP and MP,
[9], and KEPLER [27]. Table 3 displays the systems participated while nine systems could match DOID and ORDO. Table 7 illus-
in different OAEI tracks. According to this table, 14 systems par- trates the ranking of the systems participated in the OAEI 2018
ticipated in the anatomy track, 12 in conference, seven in Large- disease and phenotype track for mapping HP and MP ontologies.
BioMed, eight in disease and phenotype, and three in SPIMBENCH. According to this table, the weights of TOPSIS and VIKOR are sig-
Another point is that AML and LogMap participated in all five nificantly higher than that of PROMETHEE, because the rankings
tracks. obtained by PROMETHEE deviate more from the other two meth-
ods. For instance, PROMETHEE puts AML in the fourth place, while
5. Experiments the other two consider it to be the best alignment system. As a re-
sult, the weight of PROMETHEE became insignificant. The consen-
In this section, the MCDM methods and the proposed aggre- sus index for this ranking is 0.85 and its trust level is 0.95. Also,
gated methodology are applied to five tracks of the OAEI, and the this table indicates that AML, LogMapLite, and LogMap are the top
systems participating in 2018 are compared and ranked accord- systems in this mapping task.
ingly. The alignments produced by various systems are available on Another matching task in this track involves the alignment of
the OAEI website.2 DOID and ORDO ontologies. Table 8 shows the ranking of the par-
ticipating systems for this task. According to this table, TOPSIS
5.1. Large BioMed Track takes the highest weight, since it is a compromise of the other two
MCDM methods. In particular, the TOPSIS ranking of DOME lies be-
The aim of this track is to find alignments between the Foun- tween those of VIKOR and PROMETHEE. Also, TOPSIS rankings oc-
dational Model of Anatomy (FMA), SNOMED CT, and the National casionally agree with one of the other ranking methods: It agrees
Cancer Institute Thesaurus (NCI) ontologies. The ontologies are with VIKOR on ranking LogMap, LogMapLite, and XMap, while it is
large and contain tens of thousands of classes. The performance in line with PROMETHEE with regard to POMAPP++. Given these
metrics used to rank the systems participated in this track are ex- rankings, TOPSIS has a higher weight compared to other MCDM
ecution time, precision, and recall. methods. The consensus index and trust level of this ranking are
Table 4 tabulates the ranking of seven systems that applied for 0.87 and 0.95, respectively. Accordingly, LogMap, LogMapLite, and
matching FMA to NCI. This is an interesting case, since the MCDM XMap are the top systems on this task with regard to all the per-
rankings are conflicting. In particular, the rankings of VIKOR and formance metrics.
PROMETHEE are in line for LogMapBio and FCAMAPX and are both
different compared to the ranking of TOPSIS, while the rankings 5.3. Anatomy track
of TOPSIS and VIKOR agree with regard to LogMapLite and XMap
and are distinct from the ranking of PROMETHEE. When consid- This track consists of matching the adult mouse anatomy to a
ering the weights of MCDM methods, it is interesting to see that part of NCI thesaurus describing the human anatomy. In the OAEI
2018, 14 systems participated in the anatomy track. The systems
1
http://oaei.ontologymatching.org/ are compared based on execution time, precision, recall, consis-
2
http://oaei.ontologymatching.org/2018/results/index.html tency, and recall+. Table 9 shows the ranking of the systems in the
8 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

Table 3
The OAEI tracks and the participating systems in each individual track for the year 2018.

OAEI track Alignment systems

Anatomy LogMapBio, DOME, POMAP++, Holontology, ALIN, AML, XMap, LogMap, ALOD2Vec, FCAMapX, KEPLER, LogMapLite, SANOM, Lily
Conference Holontology, DOME, ALIN, AML, XMap, LogMap, ALOD2Vec, FCAMapX, KEPLER, LogMapLite, SANOM, Lily
LargeBioMed AML, LogMap, LogMapBio, XMap, FCAMapX, LogMapLt, DOME
Disease and Phenotype LogMap, LogMapBio, AML, LogMapLt, POMAP++, Lily, XMap, DOME
SPIMBENCH AML, Lily, LogMap

Table 4
Ranking of systems taking part in the Large BioMed track for mapping FMA to NCI.

Time(s) Precision Recall TOPSIS VIKOR PROM R∗ Aggregated ranking

AML 55 0.84 0.87 1 1 1 1 1


LogMap 51 0.86 0.81 2 2 2 2 2
LogMapBio 1072 0.83 0.83 7 6 6 6 6
XMap 65 0.88 0.74 3 3 4 3 3
FCAMapX 881 0.67 0.84 6 7 7 7 7
LogMapLt 6 0.68 0.82 4 4 3 4 4
DOME 12 0.8 0.67 5 5 5 5 5
weights 0.00 1.00 0.00

Consensus Index = 0.80

Trust Level = 1.00

Table 5
Ranking of systems taking part in the Large BioMed track for mapping FMA to SNOMED.

Time Precision Recall TOPSIS VIKOR PROM R∗ Aggregated ranking

FCAMapX 1736 0.82 0.76 6 5 5 5.00 5


AML 94 0.88 0.69 1 1 1 1.00 1
LogMapBio 1840 0.83 0.65 7 7 6 6.95 7
LogMap 287 0.84 0.64 2 2 4 2.08 2
XMap 299 0.72 0.61 3 6 7 6.02 6
LogMapLt 9 0.85 0.21 5 4 3 3.96 4
DOME 20 0.94 0.20 4 3 2 2.96 3
weights 0.0056 0.9502 0.0442

Consensus Index = 0.80

Trust Level = 0.98

Table 6
Ranking of systems taking part in the Large BioMed track for mapping NCI to SNOMED.

Time Precision Recall TOPSIS VIKOR PROM R∗ Aggregated ranking

AML 168 0.90 0.67 1 1 1 1 1


FCAMapX 2377 0.80 0.68 6 4 5 4.07 4
LogMapBi 2942 0.85 0.63 7 6 6 6.02 6
LogMap 475 0.87 0.60 3 2 3 2.05 2
LogMapLt 11 0.80 0.57 2 3 4 3.00 3
DOME 24 0.91 0.48 4 5 2 4.90 5
XMap 427 0.64 0.58 5 7 7 6.95 7
weights 0.0255 0.9490 0.0255

Consistency Index = 0.80

Trust Level = 0.98

Table 7
Ranking of eight systems participated in the 2018 OAEI disease and phenotype track. The task involves mapping
HP and MP.

Time Precision Recall TOPSIS VIKOR PROM R∗ Aggregated ranking

LogMap 31 0.88 0.84 2 2 2 2 2


LogMapBio 821 0.86 0.84 3 4 5 3.50 4
AML 70 0.89 0.8 1 1 4 1.01 1
LogMapLt 7 0.99 0.61 4 3 1 3.48 3
POMAP++ 1668 0.86 0.58 7 5 7 6.01 6
Lily 4749 0.68 0.65 8 8 8 8 8
XMap 20 0.99 0.31 5 6 3 5.48 5
DOME 46 1 0.31 6 7 6 6.50 7
weights - - - 0.4997 0.4946 0.0057 - -

Consensus Index = 0.85

Trust Level = 0.95
M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254 9

Table 8
Ranking of systems participated in the 2018 OAEI disease and phenotype track. The task involves the alignment
of DOID and ORDO.

Time Precision Recall TOPSIS VIKOR PROM R∗ Aggregated ranking

LogMap 25 0.94 0.78 1 1 4 1.0843 1


LogMapBio 1891 0.9 0.8 6 4 3 5.3494 5
POMAP++ 2264 0.87 0.8 7 5 7 6.4337 7
LogMapLt 7 0.99 0.62 2 2 1 1.9718 2
XMap 15 0.97 0.55 3 3 5 3.0562 3
KEPLER 2746 0.88 0.57 8 8 8 8 8
Lily 2847 0.59 0.78 9 9 9 9 9
AML 135 0.51 0.87 5 7 6 5.5943 6
DOME 10 1 0.44 4 6 2 4.5100 4
weights - - - 0.6888 0.2831 0.0281 - -

Consensus Index = 0.87

Trust Level = 0.95

Table 9
Ranking of 14 systems participated in the OAEI 2018 anatomy track.

Time (s) Precision Recall Recall+ Consist. TOPSIS VIKOR PROM R∗ Aggregated ranking

LogMapBio 808 0.89 0.91 0.76 1 4 5 4 4.44 4


DOME 22 1 0.62 0.01 0 13 11 7 11.19 11
POMAP++ 210 0.92 0.88 0.7 0 6 6 5 5.85 5
Holontology 265 0.98 0.29 0.01 0 14 14 14 14.00 14
ALIN 271 1 0.61 0 1 7 4 11 6.29 6
AML 42 0.95 0.94 0.83 1 1 1 1 1.00 1
XMap 37 0.93 0.87 0.65 1 2 2 2 2.00 2
LogMap 23 0.92 0.85 0.59 1 3 3 3 3.00 3
ALOD2Vec 75 1 0.65 0.09 0 12 10 9 10.66 10
FCAMapX 118 0.94 0.79 0.46 0 8 7 10 7.87 8
KEPLER 244 0.96 0.74 0.32 0 11 12 12 11.60 12
LogMapLite 18 0.96 0.73 0.29 0 9 8 6 8.10 9
SANOM 487 0.89 0.84 0.63 0 5 9 8 7.23 7
Lily 278 0.87 0.8 0.52 0 10 13 13 11.79 13
weights 0.4048 0.4413 0.1539

Consensus Index = 0.95

Trust Level = 0.97

Table 10
Ranking of systems participated in the 2018 OAEI conference track. The evaluation is based on the certain reference alignment.

Precision Recall AvgConserViol AvgConsisViol TOPSIS VIKOR PROM R∗ Aggregated ranking

SANOM 0.78 0.76 5.15 4.6 9 4 7 7.67 8


AML 0.83 0.7 1.86 0 3 1 2 2.35 2
LogMap 0.84 0.64 1.19 0 1 2 1 1.04 1
XMap 0.81 0.61 2.65 0.7 4 3 6 5.07 5
KEPLER 0.76 0.61 5.86 7.57 10 9 10 9.96 10
ALIN 0.88 0.54 0.1 0 2 5 3 2.69 3
DOME 0.88 0.54 5.05 0.48 7 7 5 5.88 6
Holontology 0.86 0.55 3.14 0.48 5 6 4 4.49 4
FCAMapX 0.71 0.61 5.9 13 12 12 12 12.00 12
LogMapLite 0.84 0.54 4.57 1.19 6 8 8 7.20 7
ALOD2Vec 0.85 0.54 5.9 1.29 8 10 9 8.65 9
Lily 0.59 0.63 7 6.2 11 11 11 11.00 11
weights 0.3986 0.0436 0.5578

Consensus Index = 0.91

Trust Level = 0.95

anatomy track computed by three MCDM methods, the final rank- table, LogMap, AML, and Alin are the top systems. For the uncer-
ing being obtained by using the proposed ensemble method. The tain version of the reference alignment, as Table 11 shows, AML,
consensus index and trust level for this track are 0.95 and 0.97, LogMap, and Holontology are the top three systems. The consensus
respectively. Based on this table, AML, XMap, and LogMap are the index and trust level for this track are 0.93 and 0.95, respectively.
top three systems in the anatomy track.
5.5. SPIMBENCH Track
5.4. Conference Track
The SPIMBENCH task is another matching task, the aim of
The conference track involves matching and aligning seven on- which is to determine when two OWL instances describe the same
tologies from different conferences. For this track, there are two Creative Work. There are two datasets, called Sandbox and Main-
different reference alignments, i.e., certain and uncertain. Table 10 box, each of which has a Tbox as the source ontology and Abox
tabulates the result of the analysis of the 12 systems participated as the target. Tbox contains the ontology and instances, and it has
in this track at the OAEI 2018 with the certain alignment, with a to be aligned to Abox, which only contains instances. The differ-
consensus index of 0.91 and a trust level of 0.95. Based on this ence between Sandbox and Mainbox is that the reference of the
10 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

Table 11
Ranking of systems participated in the 2018 OAEI conference track. The evaluation is based on the uncertain reference alignment.

Precision Recall AvgConserViol AvgConsisViol TOPSIS VIKOR PROM Average Aggregated ranking

SANOM 0.8 0.67 5.15 4.6 9 4 4 4.82 5


AML 0.79 0.65 1.86 0 3 1 2 1.67 1
LogMap 0.79 0.58 1.19 0 1 2 3 2.18 2
XMap 0.79 0.55 2.65 0.7 4 3 5 3.85 4
KEPLER 0.68 0.57 5.86 7.57 11 10 9 9.82 10
Holontology 0.81 0.5 0.1 0 2 6 1 3.63 3
ALIN 0.82 0.48 5.05 0.48 7 8 6 7.15 7
FCAMa pX 0.67 0.56 3.14 0.48 5 5 7 5.69 6
DOME 0.82 0.48 5.9 13 12 11 10 10.82 11
ALOD2Vec 0.8 0.49 4.57 1.19 6 7 8 7.18 8
LogMapLite 0.79 0.49 5.9 1.29 8 9 11 9.52 9
Lily 0.58 0.56 7 6.2 10 12 12 11.67 12
weights 0.1639 0.4935 0.3427

Consensus Index = 0.93

Trust Level = 0.95

Table 12
Ranking of systems participated in the 2018 OAEI SPEMBENCH track. The task is Sandbox.

Precision Recall Time TOPSIS VIKOR PROM R∗ Aggregated ranking

AML 0.83 0.9 6220 2 3 3 3 3


Lily 0.85 1 1960 1 1 1 1 1
LogMap 0.94 0.76 5887 3 2 2 2 2
weights 0 0.50 0.50

Consensus Index = 0.77

Trust Level = 1.00

Table 13
Ranking of systems participated in the 2018 OAEI SPEMBENCH track. The task is Mainbox.

Precision Recall Time TOPSIS VIKOR PROM R∗ Aggregated ranking

AML 0.84 0.88 37,190 3 3 3 3 3


Lily 0.85 1 3103 1 1 1 1 1
LogMap 0.89 0.71 23,494 2 2 2 2 2
weights 0.33 0.33 0.33

Consensus Index = 1.00

Trust Level = 1.00

former is available to the participants, while the latter is a blind Remark 5.2. In this study we used three MCDM methods for
matching task so that participants do not know the real alignment which we do not need to use the expert/decision-maker opinion
in advance. to make the final ranking. This, however, does not mean that we
There are only three systems included in this track at the OAEI cannot use the MCDM methods in which expert/decision-maker
2018. Tables 12 and 13 list the ranking of the systems for the Sand- opinion is used to make the ranking (such as AHP/ANP, BWM). In
box and Mainbox tasks, respectively. The Sandbox task is interest- fact the rankings (which are the input for our ensemble method)
ing, since two MCDM methods have identical rankings, while the could come from any set of MCDM methods (with or without
other, i.e., TOPSIS, differs in ranking two systems, as a result of expert/decision-maker opinion). It is, however, important to know
which its weight becomes insignificant, while the weight of the that regardless of the MCDM methods we use in our proposed
other two rankings is about 0.50. The consensus index for this ensemble method, there is no need to have the opinion of an
ranking is 0.77, while its trust level is 1.00, since the final rank- expert/decision-maker on comparing the rankings which are pro-
ing is identical to the ranking (or average) of the other two MCDM duced by the different MCDM methods.
methods.
For the Mainbox task, Table 13 shows the ranking of the three 6. Discussion
systems on this task. Interestingly, the rankings of the MCDM
methods are identical and they all take on a similar weight in the As we discussed earlier, the consensus index and the trust level
proposed method. As expected, the consensus index and trust level indicate two different aspects of the final aggregated ranking. Gen-
are also one. According to these tables, Lily performs best in both erally speaking, higher values are desirable for both indicators. The
tasks, followed by LogMap and AML. consensus index is an indicator of the agreement among all the
MCDM methods being used, while the trust level shows the relia-
bility with regard to the final aggregated ranking. Below, based on
Remark 5.1. We discussed the ranking of TOPSIS, VIKOR, and the main properties of the proposed approach and the findings of
PROMETHEE for different OAEI tracks. They all had higher weights the experiments, we elaborate on some general possible outcomes
in some tracks and lower weights in some of the others. However, of the proposed methods.
the aim of this study is not to compare MCDM methods or dis-
cuss their suitability. These methods can take on higher or lower • Consensus index high, trust level high: If all the MCDM meth-
weights in different decision-making problems, and their weights ods being used have identical rankings, their weights are analo-
are entirely dependent on the computed rankings based on the gous and equivalent to 1/M, where M is the number of ranking
performance matrix of the decision-making problem in question. methods. In this case, the final aggregated ranking is precisely
M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254 11

the average of the individual rankings. As a result, the proposed ment evaluation initiative (OAEI) competition, the approach dis-
ensemble method represents the average, or equivalently, the cussed in this article can be used to produce a final ranking of on-
HQ functions operate as the Euclidean norm. This is indeed ac- tology alignment systems in each of the OAEI tracks. The outcome
ceptable, since there are no outliers when all the rankings are can provide greater insight into the overall performance of systems
identical. In this case, because there is full agreement among and promote the report provided annually by the OAEI organizer.
all the MCDM methods being used, both consensus index and This study can be extended in various ways. To begin with,
trust level are one. the performance metrics used to rank the alignment systems are
• Consensus index low, trust level high: Where there is a low treated as though they are equally important, but it is worthwhile
consensus index and a high trust level, that can mean either of to keep in mind that different performance metrics may in fact
two things. First, if a small fraction of the MCDM methods be- not be equally important, which means that one area of future re-
ing used deliver rankings that deviate from the other rankings, search involves examining the preferences of different performance
the proposed ensemble method treats them as outliers, assign- metrics for different OAEI tracks by the experts in the domain, and
ing them lower weights, which reduces their impact on the fi- then ranking the systems involved accordingly. To that end, a broad
nal aggregated ranking. The presence of such methods can be range of MCDM methods could be used.
detected by inspecting the weights obtained by the proposed The proposed approach in this paper has the potential to be
ensemble method. Methods that have a lower weight are seen used for many real-world applications where a number of MCDM
as a deviation from the majority of MCDM rankings, as well as methods are used to rank a number of alternatives, and that a
from the final ranking, which means they are treated as out- consensus among the methods being used are needed to come up
liers. The second option is when the number of methods with with a final aggregated ranking. Finally, we think that it would be
lower weights is significant compared to the overall number interesting to use the proposed method to integrate the votes in
of the MCDM methods being used. The MCDM rankings with voting systems.
higher weights are the intermediates of all the methods. As a
result, the intermediate rankings take on higher weights and CRediT authorship contribution statement
have a more profound impact on the final aggregated ranking.
In both of these cases, the agreement among the MCDM meth- Majid Mohammadi: Conceptualization, Methodology, Software,
ods being used is low, while the final ranking is fully captured Writing - original draft. Jafar Rezaei: Validation, Writing - review
by a fraction of the MCDM methods involved, which is why the & editing, Supervision.
consensus index is insignificant and the trust level is high.
• Consensus index low, trust level low: If all the MCDM rank- References
ings in question deviate significantly from each other, the con-
sensus index will be low. In that case, there is not a share of [1] Abo-Sinna MA, Amer AH. Extensions of topsis for multi-objective large-scale
the MCDM methods involved with significantly higher weights, nonlinear programming problems. Appl Math Comput 2005;162(1):243–56.
[2] Acuña-Soto CM, Liern V, Pérez-Gladish B. A vikor-based approach for
which means that the trust level is also low. the ranking of mathematical instructional videos. Management Decision
• Consensus index high, trust level low: This scenario does not 2019;57(2):501–22.
occur, because the trust level is high when there is a consensus [3] Amaral TM, Costa AP. Improving decision-making and management of hospital
resources: an application of the promethee ii method in an emergency depart-
among the MCDM methods being used. ment. Oper Res Health Care 2014;3(1):1–6.
[4] Bai C, Rezaei J, Sarkis J. Multicriteria green supplier segmentation. IEEE Trans
This is a general discussion framework, and we think that the Eng Manage 2017;64(4):515–28.
levels could be defined by the decision-makers for a particular [5] Boyd S, Vandenberghe L. Convex optimization. Cambridge university press;
2004.
problem.
[6] Brans J. Lingenierie de la decision, llaboration dinstruments daidea la decision.
colloque sur laidea la decision. Faculte des Sciences de lAdministration, Uni-
versite Laval 1982.
7. Conclusion [7] Brans J-P, Mareschal B. Promethee methods. In: Multiple criteria decision anal-
ysis: state of the art surveys. Springer; 2005. p. 163–86.
In this paper, a new compromise ensemle method was pro- [8] Cha Y, Jung M. Satisfaction assessment of multi-objective schedules using neu-
ral fuzzy methodology. Int J Prod Res 2003;41(8):1831–49.
posed, based on the half-quadratic (HQ) theory. The proposed
[9] G. Chen, S. Zhang, Fcamapx results for oaei 2018(2018).
method can be used to compute a final aggregated ranking, in the [10] Chu T-C. Facility location selection using fuzzy topsis under group decisions.
form of the weighted sum of the MCDM rankings. The weights in Int J Uncertainty Fuzziness Knowledge Based Syst 2002;10(6):687–701.
the proposed method were computed using the minimizer func- [11] Demšar J. Statistical comparisons of classifiers over multiple data sets. Journal
of Machine learning research 2006;7(Jan):1–30.
tions inspired in the HQ theory, but it satisfied the basic properties [12] Du Y, Gao C, Hu Y, Mahadevan S, Deng Y. A new method of identifying influ-
of weights in MCDM. In addition, using multiple performance met- ential nodes in complex networks based on topsis. Physica A 2014;399:57–69.
rics, the ranking of ontology alignment systems was modeled as an [13] B.C.G. E. Jimenez-Ruiz, V. Cross, Logmap family participation in the oaei
2018(2018).
MCDM problem, where the systems and the performance metrics [14] Edwards W, Barron FH. Smarts and smarter: improved simple methods
served as alternatives and criteria, respectively. In this regard, ap- for multiattribute utility measurement. Organ Behav Hum Decis Process
propriate MCDM methods were reviewed, each of which could as- 1994;60(3):306–25.
[15] Euzenat J, Shvaiko P, et al. Ontology matching, 18. Springer; 2007.
sign a ranking to each system on a benchmark with respect to its [16] D. Faria, C. Pesquita, B.S. Balasubramani, T. Tervo, D. Carriço, R. Garrilha, F.M.
performance metrics. Couto, I.F. Cruz, Results of aml participation in oaei 2018(2018).
We also introduced two indicators, consensus index and trust [17] Figueira J, Mousseau V, Roy B. Electre methods. In: Multiple criteria decision
analysis: State of the art surveys. Springer; 2005. p. 133–53.
level, the former indicates the level of agreement among MCDM
[18] Geman D, Reynolds G. Constrained restoration and the recovery of dis-
ranking methods, while the latter reflects the reliability of the continuities. IEEE Transactions on Pattern Analysis & Machine Intelligence
ranking schemes. It became clear in the cases we examined that, 1992(3):367–83.
[19] Geman D, Yang C. Nonlinear image recovery with half-quadratic regularization.
when a ranking method deviates from the others, it has a low con-
IEEE Trans Image Process 1995;4(7):932–46.
sensus index but high trust level. As a result, these two indicators [20] Govindan K, Kadziński M, Sivakumar R. Application of a novel
are able to delineate different properties of the final aggregated promethee-based method for construction of a group compromise ranking
ranking. to prioritization of green suppliers in food supply chain. Omega (Westport)
2017;71:129–45.
Since evaluating and ranking ontology alignment systems are [21] He R, Tan T, Wang L. Robust recovery of corrupted low-rankmatrix by implicit
important activities, in particular in light of the ontology align- regularizers. IEEE Trans Pattern Anal Mach Intell 2014a;36(4):770–83.
12 M. Mohammadi and J. Rezaei / Omega 96 (2020) 102254

[22] He R, Zhang Y, Sun Z, Yin Q. Robust subspace clustering with complex noise. [41] Opricovic S, Tzeng G-H. Compromise solution by mcdm methods: a compara-
IEEE Trans Image Process 2015;24(11):4001–13. tive analysis of vikor and topsis. Eur J Oper Res 2004;156(2):445–55.
[23] He R, Zheng W-S, Hu B-G, Kong X-W. Two-stage nonnegative sparse repre- [42] Peng Y, Kou G, Wang G, Shi Y. Famcdm: a fusion approach of mcdm
sentation for large-scale face recognition. IEEE Trans Neural Netw Learn Syst methods to rank multiclass classification algorithms. Omega (Westport)
2013;24(1):35–46. 2011;39(6):677–89.
[24] He R, Zheng W-S, Tan T, Sun Z. Half-quadratic-based iterative minimiza- [43] Peng Y, Wang G, Wang H. User preferences based software defect detection
tion for robust sparse representation. IEEE Trans Pattern Anal Mach Intell algorithms selection using mcdm. Inf Sci (Ny) 2012;191:3–13.
2014b;36(2):261–75. [44] Percin S. Evaluation of third-party logistics (3pl) providers by using a
[25] S. Hertling, H. Paulheim, Dome results for oaei 2018 (2018). two-phase ahp and topsis methodology. Benchmarking: An International Jour-
[26] Huber PJ. Robust statistics. Springer; 2011. nal 2009;16(5):588–604.
[27] M. Kachroudi, G. Diallo, S.B. Yahia, Kepler at oaei 2018 (2018). [45] O.T.C.T. Philippe Roussille, Imen Megdiche, Holontology : results of the 2018
[28] Kou G, Lu Y, Peng Y, Shi Y. Evaluation of classification algorithms using mcdm oaei evaluation campaign (2018).
and rank correlation. International Journal of Information Technology & Deci- [46] J. Portisch, H. Paulheim, Alod2vec matcher (2018).
sion Making 2012;11(01):197–225. [47] Rezaei J. Best-worst multi-criteria decision-making method. Omega (Westport)
[29] Kou G, Peng Y, Wang G. Evaluation of clustering algorithms for financial risk 2015;53:49–57.
analysis using mcdm methods. Inf Sci (Ny) 2014;275:1–12. [48] Saaty TL. A scaling method for priorities in hierarchical structures. J Math Psy-
[30] A. Laadhar, F. Ghozzi, I. Megdiche, F. Ravat, O. Teste, F. Gargouri, Oaei 2018 chol 1977;15(3):234–81.
results of pomap+ (2018). [49] Saaty TL. Decision making for leaders: the analytic hierarchy process for deci-
[31] Liu H-C, Li Z, Song W, Su Q. Failure mode and effect analysis using cloud sions in a complex world. RWS publications; 1990.
model theory and promethee method. IEEE Trans Reliab 2017;66(4):1058–72. [50] Shojaei P, Haeri SAS, Mohammadi S. Airports evaluation and ranking model
[32] Liu H-C, Wang L-E, Li Z, Hu Y-P. Improving risk evaluation in fmea with cloud using taguchi loss function, best-worst method and vikor technique. Journal of
model and hierarchical topsis method. IEEE Trans Fuzzy Syst 2018;27(1):84–95. Air Transport Management 2018;68:4–13.
[33] Liu W, Pokharel PP, Príncipe JC. Correntropy: properties and appli- [51] K.R. Jomar da Silva, F.A. Baiao, Alin results for oaei 2018 (2018).
cations in non-gaussian signal processing. IEEE Trans Signal Process [52] Solimando A, Jiménez-Ruiz E, Guerrini G. Detecting and correcting conserva-
2007;55(11):5286–98. tivity principle violations in ontology-to-ontology mappings. In: International
[34] Mann ME, Lees JM. Robust estimation of background noise and signal detec- Semantic Web Conference. Springer; 2014a. p. 1–16.
tion in climatic time series. Clim Change 1996;33(3):409–45. [53] Solimando A, Jiménez-Ruiz E, Guerrini G. A multi-strategy approach for detect-
[35] Mohammadi M, Hofman W, Tan Y-H. Simulated annealing-based ontology ing and correcting conservativity principle violations in ontology alignments..
matching. ACM Transactions on Management Information Systems (TMIS) In: OWLED; 2014b. p. 13–24.
2019;10(1):3. [54] Soylu B. Integrating prometheeii with the tchebycheff function for multi crite-
[36] Mustajoki J, Hämäläinen RP, Salo A. Decision support by interval smart/swing ria decision making. International Journal of Information Technology & Deci-
incorporating imprecision in the smart and swing methods. Decision Sciences sion Making 2010;9(04):525–45.
2005;36(2):317–39. [55] Triantaphyllou E. Multi-criteria decision making methods. In: Multi-criteria de-
[37] Nikolova M, Ng MK. Analysis of half-quadratic minimization methods cision making methods: A comparative study. Springer; 20 0 0. p. 5–21.
for signal and image recovery. SIAM Journal on Scientific computing [56] Tzeng G-H, Huang J-J. Multiple attribute decision making: methods and appli-
2005;27(3):937–66. cations. Chapman and Hall/CRC; 2011.
[38] Noghabi HS, Mohammadi M, Tan Y-H. Robust group fused lasso for mul- [57] Wang H, Li H, Zhang W, Zuo J, Wang H. Maximum correntropy derivative-free
tisample copy number variation detection under uncertainty. IET Syst Biol robust kalman filter and smoother. IEEE Access 2018;6:70794–807.
2016;10(6):229–36. [58] Wang H, Li H, Zhang W, Zuo J, Wang H. A unified framework for m-estimation
[39] Opricovic S. Multicriteria optimization of civil engineering systems. Faculty of based robust kalman smoothing. Signal Processing 2019;158:61–5.
Civil Engineering, Belgrade 1998;2(1):5–21. [59] S.B.Y. Warith Eddine Djeddi, M.T. Khadir, Xmap : Results for oaei 2018(2018).
[40] Opricovic S, Tzeng G-H. Multicriteria planning of post-earthquake sustainable
reconstruction. Comput-Aided Civ Infrastruct Eng 2002;17(3):211–20.

You might also like