The IPSI BgD Transactions on
Internet Research
Multi-, Inter-, and Trans-disciplinary Issues in Computer Science and Engineering
A publication of
IPSI Bgd Internet Research Society
New York, Frankfurt, Tokyo, Belgrade
January 2010 Volume 6 Number 1 (ISSN 1820-4503)
Table of Contents:
Pearls of Wisdom by Nobel Laureate:
Interview - Inventivity from the Look of Science
Kroto, H. .......................................................................................................................... 2
Invited Papers:
Fuzzy Sets and Inference as an Effective Methodology in the Construction of
Intelligent Controllers
Saade, J. ….…………………………………………………................................................ 3
Merging Data Sources Based on Semantics, Contexts, and Trust
Šubelj, L.; Jelenc, D.; Zupančič, E.; Lavbič, D.; Tršek, D.; Krisper, M.; Bajec, M. ........ 18
Evaluation Models for E-Learning Platforms and the AHP Approach: a Case Study
Colace, F.; De Santo, M. .............................................................................................. 31
Academic Ranking of World Universities 2009/2010
Mester, G. …………………………………...................................................................... 44
Visual and Aural: Visualization of Harmony in Music with Colour
Klemenc, B.; Ciuha, P.; Šubelj, L.; and Bajec, M. ......................................................... 48
The IPSI BgD Internet Research Society
The Internet Research Society is an association of people with professional interest in the field of the Internet. All
members will receive this TRANSACTIONS upon payment of the annual Society membership fee of €100 plus an
annual subscription fee of €1000 (air mail printed matters delivery).
Member copies of Transactions are for personal use only
IPSI BGD TRANSACTIONS ON ADVANCED RESEARCH
www.internetjournals.net
STAFF
Veljko Milutinovic, Editor-in-Chief
Marko Novakovic, Journal Manager
Department of Computer Engineering
Department of Computer Engineering
IPSI BgD Internet Research Society
IPSI BgD Internet Research Society
University of Belgrade
University of Belgrade
POB 35-54
POB 35-54
Belgrade, Serbia
Belgrade, Serbia
Tel: (381) 64-2956756
Tel: (381) 64-1389281
[email protected]
Lipkovski, Aleksandar
The Faculty of Mathematics,
Belgrade,
Serbia
Blaisten-Barojas Estela
George Mason University,
Fairfax, Virginia
USA
Crisp, Bob
University of Arkansas,
Fayetteville, Arkansas
USA
Domenici, Andrea
University of Pisa,
Pisa
Italy
Flynn, Michael
Stanford University,
Palo Alto, California
USA
Fujii, Hironori
Fujii Labs, M.I.T.,
Tokyo
Japan
Ganascia, Jean-Luc
Paris University,
Paris
France
[email protected]
EDITORIAL BOARD
Gonzalez, Victor
Victor Milligan, Charles
University of Oviedo,
Sun Microsystems,
Gijon,
Colorado
Spain
USA
Janicic Predrag
Kovacevic, Milos
The Faculty of Mathematics,
School of Electrical Engineering,
Belgrade
Belgrade
Serbia
Serbia
Jutla, Dawn
Neuhold, Erich
Sant Marry's University,
Research Studios Austria,
Halifax
Vienna
Canada
Austria
Karabeg, Dino
Piccardi, Massimo
Oslo University,
Sydney University of Technology,
Oslo
Sydney
Norway
Australia
Kiong, Tan Kok
Radenkovic, Bozidar
National University
Faculty of Organizational Sciences,
of Singapore
Belgrade
Singapore
Serbia
Kovacevic, Branko
Rutledge, Chip
School of Electrical Engineering,
Purdue Discovery Park,
Belgrade
Indiana
Serbia
USA
Patricelli, Frederic
Mester, Gyula
ICTEK Worldwide
University of Szeged,
L'Aquila
Szeged
Italy
Hungary
1
Interview –
Inventivity from the Look of Science
Kroto, H.
1. How you define inventivity and creativity?
science and the World as more as. I also think
that discovery is better as the ideas and the
goals behind it have more impact to
improvement mankind.
I define a creativity/inventivity as a new look to
existing phenomena. Many scientists through
centuries have been overlooking the standard
frame of mind to see various aspects of existing
and new inventions. Discoveries are the final
product of creativity. We can look at discovery in
many different ways, but the most important
thing about discovery is what impact it has to
mankind and environment.
3. For small nations (like Montenegro or
Serbia), what are the things to do to induce
inventivity and creativity among young
people?
Young people are a lot of creative, but they
could go astray. Education should be pitched
upon a healthy basis. There is not a big
difference between small and big countries in
the way of inducing creativity, herewith small
countries can organize in a better way.
2. What was the major catalyst which
enabled the inventivity to happen in the case
of invention that brought the Nobel Prize to
you?
I think that this is a good collaboration with my
colleagues and striving to improve education,
About the Author
Sir Harold (Harry) Walter Kroto, KCB, FRS (born
7 October 1939) is an English chemist and one
of the three recipients to share the 1996 Nobel
Prize in Chemistry.
He is currently on faculty at Florida State
University, which he joined in 2004; prior to that
he spent a large part of his working career at the
University of Sussex, where he holds an
emeritus professorship.
2
Fuzzy Sets and Inference as an Effective Methodology in
the Construction of Intelligent Controllers
J. Saade
ECE Department, FEA, American University of Beirut
P.O.Box: 11-0236, Riad El Solh 1107 2020, Beirut, Lebanon
Fax: 961.1.744 462, e-mail:
[email protected]
Abstract- Intelligent controllers are human-like thinking
machines. They have the objective of controlling ill-defined,
vague and complex processes in a manner similar to the control of
human experts. Emphasizing fuzzy sets and inference as an
effective methodology in the construction of intelligent
controllers is the subject of concentration. This is based on the
fact that fuzzy inference is capable of delivering machines that
apply the approximate reasoning principles and other important
aspects of intelligent thinking. Case studies related to non-linear
function representation and robot navigation are presented to
show the success of fuzzy inference in the production of
intelligent machines. Data-driven and other methodologies, which
are used in the construction of intelligent controllers, are
emphasized and the superiority of the data-driven fuzzy learning
methodology is presented.
this section, emphasis is also placed on the major drawbacks of
the existing neuro-fuzzy approaches for fuzzy system modeling
and on the fact that these drawbacks emerge from the basic
structure of Takagi-Sugeno type controllers and the minimization
of data approximation error.
Furthermore, Section 4 outlines the learning procedure
implemented in a data-driven and purely fuzzy learning
methodology for Mamdani-type fully-linguistic controllers [20]
and states the advantages of this methodology compared to the
neuro-fuzzy methods and the reasons for these advantages. In this
section a summary of the fuzzy learning algorithm and its design
aspects are also provided.
Then, a typical non-linear function, which was considered in
the literature to test existing neuro-fuzzy, clustering and other
design approaches, is considered in Section 5. The objective is to
use the performance criteria to test the design algorithms and give
comparisons and preferences. Particular emphasis is placed on the
comparison of the results obtained using the fuzzy learning
methodology with those given by other learning approaches.
In Section 6, use is made of the defined performance criteria to
compare the fuzzy learning methodology with a powerful
neuro-fuzzy approach in the area of robot navigation. Conclusive
comments related to the superiority of the fuzzy algorithm over
neuro-fuzzy and other approaches are offered in Section 7.
Keywords: Intelligent controllers; Fuzzy sets and inference;
Approximate reasoning; Vehicle navigation; Learning.
I. INTRODUCTION
The construction or design of intelligent controllers using
fuzzy sets and inference has been an active area of research for
quite a number of years. A part of this research has been
concerned with the development of automatic, data-driven
learning algorithms. The objective of these algorithms is to
provide a fuzzy system that approximates available input-output
data considered to model the human expert’s control actions.
Different data-driven approaches have been published. We state
here, for instance, the neuro-fuzzy approaches [1-7] and
approaches based on the use of clustering, genetic algorithms and
combined gradient-descent-least-squares [7-19]. When the
above-noted algorithms were tested and compared using
non-linear functions and/or control applications, the testing and
comparison has almost relied on the sole use of data
approximation error as a measure of performance.
In this study, it is shown that the data approximation error
cannot be considered as the only measure of performance. Rather,
practical performance criteria, based on important aspects of
intelligent human thinking, need to be defined and used to test,
compare and determine preferences between data-driven fuzzy
controller’s construction algorithms. This is done in Section 2.
Section 3 reviews the literature related to the issues of noisy
and incomplete training data as they relate to the performance
criteria in the context of the design of fuzzy inference systems. In
II. PERFORMANCE CRITERIA
Practical performance criteria are defined in this section based
on important aspects of intelligent human thinking; such as
approximate reasoning reflected as tolerance for imprecision and
generalization. These criteria are then made available to test,
compare and determine preferences between data-driven fuzzy
controller’s construction algorithms.
It is true that a fuzzy controller is a non-linear controller that
should approximate a non-linear function according to which a
human expert performs the control of some ill-defined, vague and
complex process. This function, however, is practically unknown
[8, 21-24]. Thus, expert input-output data are, in fact,
measurements of the control actions taken by a human expert in
response to process states while a control task is being performed.
Hence, the data are practically noisy versions of the expert’s
actual control actions, which obey the non-linear function
representing the process control.
Also, the data can be incomplete or not available in some
region(s) of the input space. This could be due to missing
3
measurements resulting from the fact that the expert has not gone
into situations where the fuzzy system designer is willing to have
his system able to venture and still perform satisfactorily. After
all, an intelligent system should be one that, when trained in some
situation, behaves satisfactorily by generalization in a related
situation whose facts were not used in training. Such an aspect
does, in fact, characterize the nature of human intelligence.
Henceforth, when non-linear functions are considered to test
the performance of fuzzy inference systems resulting from the use
of data-driven design algorithms and establish preferences
between these algorithms, the following practical performance
criteria need to be adopted:
algorithm to identify T-S type fuzzy systems using a weighted
performance index. The objective was to provide a balance
between the approximation and generalization aspects of fuzzy
models. Branco and Dente [28] pointed out ignored issues in
fuzzy models design. They addressed the appearance of noise as a
source of ambiguity to the fuzzy model, the fuzzy model
generalization ability and the influence of the training set size on
the learning performance. According to the authors, the
improvement of the data approximation error by a few
percentages in a new algorithm could make the new model
predictions (generalization) irrelevant.
Furthermore, Shi and Mizumoto [15] used the fuzzy c-means
clustering to preprocess the data, remove existing redundancies
(noise) and extract typical data to be used for training in a
neuro-fuzzy learning algorithm. Leski [29] recognized the
intrinsic inconsistency of neuro-fuzzy modeling due to its zero
tolerance to imprecision while fuzzy modeling is based on the
premise that human thinking is tolerant to imprecision.
Consequently, the studies reported in [15, 27-29] can be used
to provide an additional validation of the issues raised in Section 2
and in [25,26] and to the fact that the performance assessment of
data-driven fuzzy systems modeling algorithms need to be done
by accounting for noisy and incomplete training data. It can also
be concluded from these studies that a fuzzy controller
construction approach that is structured based on the
minimization of data approximation error hinders the noise
insensitivity and generalization capability of the resulting fuzzy
model and it contradicts with Zadeh’s principle of “tolerance for
imprecision” [30].
What could be added here as well is a remark about the fact
that it is the T-S fuzzy system model that has triggered the
appearance of the numerous neuro-fuzzy research reports in the
area of fuzzy systems modeling. In addition to having crisp values
or linear combinations of the system input variables as rules
consequents, an aspect that diminishes the fuzzy system linguistic
representation of human knowledge, T-S models supplied with
neuro-fuzzy learning techniques also have many other
undesirable aspects that were brought out in [5].
Henceforth, in the next section a summary of a data-driven
purely fuzzy learning methodology for Mamdani-type and
fully-linguistic fuzzy controllers is provided. The learning
algorithm enables the full linguistic representation of human
knowledge and expertise and permits thinking tolerant to
imprecision by not seeking error minimization. It rather seeks
error reduction and requires that the learning stop once the error
becomes less than or equal to some threshold value that can be set
by the system designer. The algorithm will also be shown, as a
result, to possess good noise insensitivity and generalization
capability. It will in addition be compared to fuzzy clustering and
partition approaches and most importantly to ANFIS [13] using a
typical non-linear function and robot navigation.
(a) the value of some error function in the approximation of the
underlying noise-free data when the training data are noisy,
(b) the noise insensitivity,
(c) the generalization capability,
(d) the noise insensitivity and generalization capability,
(e) the representation of the shape and smoothness of the
non-linear function.
Assessing the performance criteria in the presence of noise
(points (a) and (b)) can be done as follows: The data points
extracted from a non-linear function, or some of them, are to be
modified to violate the function analytical equation. The modified
data are to be used in training and the extracted data are
considered the noise-free ones. The smaller is the value of the
error at the noise-free data and the larger the error at the noisy
points, the better the performance of the fuzzy system modeling
approach. We note here that both points (a) and (b) need to be
considered. Since as a result of introducing noise, having the
obtained fuzzy system not responding to or staying away from the
noisy points does not necessarily imply that it will get closer to the
noise-free data and thus to the real control curve.
As to generalization, data points in some boundary region of
the input space and selected from among those extracted from a
non-linear function are to be eliminated. Then, the fuzzy system
obtained by training using the remaining data is to be tested based
on its ability to extrapolate to the region of missing data. This can
be assessed by observing whether the system obtained by training
based on the remaining data is the same or close to the one
obtained using the whole data set. Also, if necessary, the error
value at the excluded points and at the whole set of data can be
considered in the assessment of generalization.
Point (e) consists of testing the capability of the fuzzy system
design approach to achieve smooth control. Point (e) can also be
considered to assess generalization taken in the sense of
interpolation to points within the training data.
III. REVIEW OF LITERATURE
RELATED TO NOISY AND
INCOMPLETE TRAINING DATA
IV. SUMMARY OF A DATA-DRIVEN FUZZY LEARNING ALGORITHM
After raising the issues related to the performance testing and
comparison of data-driven fuzzy controllers design algorithms
and arguing the fact that performance criteria accounting for noisy
and incomplete training data need to considered due to practical
reasons (Saade, [25]) and (Saade and Al-Khatib, [26]), research
reports in which these important matters have been approached
started to appear. Oh and Pedrycz [27] introduced an auto-tuning
The data-driven fuzzy learning algorithm for the design of
intelligent controllers considers fully-linguistic fuzzy inference
systems of Mamdani-type. In these systems, the fuzzy output, for
a given crisp input value or vector, is obtained using the
compositional rule of inference (CRI) [30] and defuzzification is
applied to this fuzzy output to convert it into a crisp one.
4
Defuzzification in this algorithm is accomplished by applying a
parameterized strategy that was established in [31].
Now, the parameterized defuzzification method, developed in
[31], applies to the normalized version of C0i(z), denoted, C0in(z),
and obtained by dividing the membership function of C0i(z) by the
highest membership grade, as follows:
Consider a collection of N if-then fuzzy inference rules for a
two-input, one-output fuzzy inference system. Let the j-th rule,
1 ≤ j ≤ N, be:
Fδ [C 0in ( z )] =
if x 1 is A j and x 2 is B j, then z is C j .
respectively. The fuzzy output, corresponding to some crisp input
pair (x 1i , x 2i ) , can be obtained using the CRI as follows:
1≤ j ≤ N
(1)
There is a need to stress here on Zadeh’s rule of composition.
Consider first the two variables u and v, which assume values
respectively in spaces U and V. Let A and B be two fuzzy sets
defined respectively over the spaces U and V. The fuzzy
conditional statement expressed as “If u is A, then v is B” can be
interpreted as a fuzzy relation R defined by the Cartesian product
of A and B. That is, [If u is A, then v is B]≡R=A×B. The
membership function of R, denoted R(u,v) can be obtained using
some operation, called “fuzzy implication operation”, between
A(u) and B(v), which are the membership functions of A and B
respectively. The minimum operation has initially been suggested
by Zadeh [30]. Now, the fuzzy relation R, as above, induces from
a fuzzy set A’, defined over the space U, a fuzzy set B’, over the
space V, such that B’=A’оR, where о denotes relation
composition. If the max-min composition is used [32], then the
following is obtained:
B ′(v) = max[min( A′(u ), R (u , v))].
u∈U
When the fuzzy set A’ is a singleton; that is, A’= uo∈ U, the above
equation becomes: B’(v)=R(uo,v). This is the case of interest in
this study since the fuzzy controller is assumed, as is mostly the
case, to be one that admits crisp inputs.
Returning now to our collection of N if-then fuzzy inference
rules for a two-input, one-output fuzzy inference system, then the
system rules can be represented by the following fuzzy relation:
R = [( A1 ∩ B 1) × C1 ] ∪ [( A2 ∩ B2 ) × C2 ] ∪ ... ∪ [( AN ∩ BN ) × C N ]
=
1
( 2)
[c1(α), c2(α)] is the α-level set of C0in(z) and δ is a parameter that
takes values in the interval [0,1]. The defuzzification method in
Equation (2) was derived by first reformulating the classical
criteria; minimax, maximax and the Hurwicz criterion, which
apply for ranking intervals, using the intervals characteristic
functions. Then, the reformulation of the noted criteria for
intervals permitted their natural extension to fuzzy sets by
replacing the intervals characteristic functions by the fuzzy sets
membership functions. The criteria, which were originally
expressed using a defined distance measure and integration of the
characteristic or membership functions along the real axis, were
then expressed by an equivalent integration along the membership
axis using the α-level set of a fuzzy set [33].
Furthermore, the study in [31] was one that formally related
ranking fuzzy sets to the defuzzification of the outputs of fuzzy
controllers. It unified the two problems to make the solution for
the first applicable to the second. Also, the benefits that could be
obtained if the defuzzification of the outputs of fuzzy controllers
is approached from the point of view of ranking, and through the
use of a parameterized defuzzification formula to satisfy design
objectives (shaping the controller input-output characteristic),
have been emphasized. It is in this spirit that the study in [31]
unified the criteria introduced in [33] to be represented by a single
parameterized ranking and defuzzification formula, which is the
one expressed in Equation (2).
In the data-driven learning algorithm that is described below,
and whose flow chart is shown in Figure 1, Equation (2) is used
for tuning initial fuzzy controllers based on input-output data.
This tuning considers the consistent modification of the parameter
δ and the rules consequents to reduce the value of some error
function and obtain a final fuzzy system. It is assumed that the
system designer is able to specify the input and output variables of
the fuzzy controller and the ranges of these variables. Then
overlapping membership functions are assigned to cover the
entire ranges of the variables of concern. In terms of overlap, it
was observed that the smoothness of the controller input-output
surface is best served whether the input membership functions,
assigned over a single variable, are such that the sum of the
membership grades at any crisp input is one.
Once the input membership functions are assigned, then all
combinations of input fuzzy sets are considered to form the
antecedent parts of the rules. All the initial rules consequents are
required to be equal to the left most of the fuzzy sets assigned over
the output variable. This set needs to be formed by a flat and a
decreasing part or a decreasing part only. This makes the
defuzzified value of any fuzzy output, obtained by Equation (1)
and through the application of Equation (2) with δ=1, equal to the
smallest value of the output range.
x 1 and x 2 and z are the input and output variables of the system.
Also, A j , B j and C j are fuzzy sets defined over x 1 , x 2 and z
C 0i (z) = max [A j (x 1i ) ∧ B j (x 2i ) ∧ C j (z)].
∫0 [δ c1 (α ) + (1 − δ ) c2 (α )] dα
U[( A j ∩ B j ) × C j ].
N
j =1
In this relation, the symbol ∪ is taken as a representation of the
OR operator introduced between the rules. The symbol ∩
represents the operator AND used in the antecedent part of the
rules. The fuzzy controller output that corresponds to a crisp input
pair (x1i, x2i) can, therefore, be obtained by C0i(z)=R(x1i, x2i, z). If
the minimum operation (∧) is adopted for AND and THEN
operators and the maximum (max) operation is used for OR, then
Equation (1) above is obtained. Other operations, such as sum and
product, can also be used. Also, Equation (1) can be generalized
easily to systems with more than two input variables.
5
Initial fuzzy system
Start
Store
rules
No
Compute C0in , i=1,2,…,n
Are the updated
rules identical
to previous ones ?
δ=1
Stop
Yes
Compute F1(C0in), i=1,2,…,n
Compute E. Store δ, E and
the rules if E is the smallest
error.
Is E≤Ed ?
Yes
Stop
No
Updated
rules
Lower by one
membership function
the consequent of each
rule triggering one C0j
such that F1(C0j) > zjd
No
Is F1(C0in) ≤ zid ,
i=1,2,…,n ?
Yes
Decrease δ by a step size
δ=δ-Δ
δ≥0?
No
Yes
Compute Fδ(C0in), i=1,2,…,n
No
Compute E. Store δ, E and
the rules if E is the smallest
error.
Is E≤Ed ?
Yes
Stop
Raise the consequent of each rule
by one membership function
Yes
No
Was the raise in rules consequents
possible for at least one rule ?
Stop
Figure 1. Flow-chart of the data-driven fuzzy controllers design algorithm described in Section 4.
6
Given the input-output data pairs in the form ( x i , z id ), with
i=1, 2, 3,…, n and x i = (x 1i , x 2i , x 3i ,..., x pi ) , where p is the
number of input variables, the learning process starts with an
initial fuzzy system as specified above. The algorithm (see Figure
1) computes the fuzzy outputs C 0i for all x i , i=1, 2, 3, …, n using
the CRI (Equation (1)) and then defuzzifies their normalized
versions, C 0in , using Equation (2) when δ =1. Here, all the
defuzzified values will be equal to the smallest value of the output
range. Hence, given that z id are all greater than or equal to this
value (this should always be the case), then F1[C0in(z)]≤zid. For
these defuzzified values, the error E is computed using some error
function and compared with a desired error value Ed. If E≤Ed,
then the learning stops. Otherwise, δ is decreased from 1 to 0 by
passing through discrete intermediate values. For each δ, the error
is computed and compared with Ed. Note here that the decrease in
δ results in an increase in the defuzzified values of the fuzzy
outputs. These values are then made closer to the desired outputs.
Whether the change in δ has led to the satisfaction of the error
goal, that is, E≤Ed has been achieved for some δ∈[0, 1], then the
learning stops. Otherwise, the algorithm starts again from δ=1 but
with new rules.
The new rules are obtained by raising each rule consequent by
one fuzzy set. This, however, might lead to a violation of the
inequality F1[C0in(z)]≤zid for some values of i. If so, the inequality
can be reestablished by repeatedly lowering the consequents of
the rules, which trigger one fuzzy output whose defuzzified value
for δ=1 is greater than its desired counterpart. Once all defuzzified
values become again smaller than or equal to the desired ones, δ
will be decreased from 1 to 0 and for each δ value the error is
computed and compared with Ed. This process is repeated until
either the error goal is satisfied or no more raise in the rules
consequents is possible or when the raise and lowering of the rules
consequents result in a system that has already been obtained.
When the learning stops, the algorithm delivers the final fuzzy
system with the least error value that can be obtained under the
described procedure, the error and the final δ value.
Figure 2. Input-output surface of the non-linear function given in
Equation (3).
Table 1. Fifty training data points extracted from the non-linear
function in Equation (3).
#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
V. PERFORMANCE TESTING USING A TYPICAL NON-LINEAR
FUNCTION
In this section, we consider the typical non-linear function
given in Equation (3) below to test the performance (Section 2) of
the algorithm described in Section 4 and compare with other
methods. This function has been considered in research studies to
test existing data-driven techniques.
z = f(x 1 , x 2 ) = (1 + x 1−2 + x −2 1.5 ) 2 , 1 ≤ x 1 , x 2 ≤ 5.
(3)
x1
1.4
4.28
1.18
1.96
1.85
3.66
3.64
4.51
3.77
4.84
1.05
4.51
1.84
1.67
2.03
3.62
1.67
3.38
2.83
1.48
3.37
2.84
1.19
4.1
1.65
x2
1.8
4.96
4.29
1.9
1.43
1.6
2.14
1.52
1.45
4.32
2.55
1.37
4.43
2.81
1.88
1.95
2.23
3.7
1.77
4.44
2.13
1.24
1.53
1.71
1.38
z
3.7
1.31
3.35
2.7
3.52
2.46
1.95
2.51
2.7
1.33
4.63
2.8
1.97
2.47
2.66
2.08
2.75
1.51
2.4
2.44
1.99
3.42
4.99
2.27
3.94
#
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
x1
2
2.71
1.78
3.61
2.24
1.81
4.85
3.41
1.38
2.46
2.66
4.44
3.11
4.47
1.35
1.24
2.81
1.92
4.61
3.04
4.82
2.58
4.14
4.35
2.22
x2
2.06
4.13
1.11
2.27
3.74
3.18
4.66
3.88
2.55
2.12
4.42
4.71
1.06
3.66
1.76
1.41
1.35
4.25
2.68
4.97
3.8
1.97
4.76
3.9
1.35
z
2.52
1.58
4.71
1.87
1.79
2.2
1.3
1.48
3.14
2.22
1.56
1.32
4.08
1.42
3.91
5.05
1.97
1.92
1.63
1.44
1.39
2.29
1.33
1.4
3.39
In [16], a fuzzy system was determined with 0.318 as a
mean-square error (MSE) value. The error was then reduced to
0.01 using position gradient. In [10], which is also a study based
on fuzzy clustering, the best-obtained MSE was 0.231. The use of
fuzzy partition [7] gave a fuzzy system with 0.351 as an MSE
value. This was then reduced to 0.005 by a fuzzy neural network.
This function, whose plot is shown in Figure 2, was first
considered by Sugeno and Yasukawa [16] and then considered
again by Delgado et al. [10] and by Lin et al [7]. Fifty input-output
data points, listed in [16], were extracted from Equation (3) and
considered in the noted references. These points are also given in
Table 1.
7
by ANFIS had the surfaces shown in Figures 6(a), (b) and (c)
respectively.
As for the algorithm described in this study, the same 50 data
points were considered. Three membership functions were
adopted over each of the input variables (Figures 3(a) and 3(b)).
The error value of 0.216 was obtained with 4 output fuzzy sets as
in Figure 3(c). The final obtained rules are as listed below and the
final δ value is 1. The input-output surface is shown in Figure 4.
M e m b e r s h ip
A
1
0
A
1
A
2
3
.5
0
1
If x 1 is A1 and x 2 is B1 , then z is C 3
If x 1 is A1 and x 2 is B 2 , then z is C 3
B
1
If x 1 is A1 and x 2 is B 3 , then z is C 3
0
5
B
1
B
2
3
.5
0
If x 1 is A 2 and x 2 is B1 , then z is C 3
1
3
5
(b ) IN P U T -x 2
M e m b e r s h ip
If x 1 is A 2 and x 2 is B 2 , then z is C 2
C
1
If x 1 is A 2 and x 2 is B 3 , then z is C 2
0
If x 1 is A 3 and x 2 is B1 , then z is C 3
1
C
C
2
3
C 4
.5
0
0 .5
3 .7 5
7
(c ) O U T P U T -Z
If x 1 is A 3 and x 2 is B 2 , then z is C 2
If x 1 is A 3 and x 2 is B 3 , then z is C 2
3
(a ) IN P U T -x 1
M e m b e r s h ip
(4)
Figure 3. Membership functions assigned over the input and
output variables of the non-linear function given in Equation (3).
The non-linear function in Equation (3) is considered again to
test the performance of the algorithm in complete but noisy data
type. Since the methods noted in [7,10,16] considered only
complete and noise-free data and the resulting error values, the
comparison will be done with ANFIS. It is powerful enough to
bring out the relative powers of the algorithm presented in Section
4.
ANFIS, introduced by Jang [13], is available under MATLAB
and based on a combination of the least-squares and
gradient-descent methods. For the comparison to be meaningful,
the same number and type of the input membership functions will
be considered in ANFIS. Also, due to the fact that the ANFIS
results depend on the number of epochs and to a lesser degree on
the initial step-size, the presented results will be the best ones we
obtained after attempting different combinations of
epoch-step-size values. The algorithm will also be examined for
generalization and noise insensitivity plus generalization. That is,
in the cases of noise-free but incomplete and noisy and incomplete
data types.
Figure 4. Input-output surface of the fuzzy system given in
Equation (4) and representing the function given in Equation (3).
Let us first consider the use of the same 50 data pairs (Table 1)
in ANFIS. A fuzzy system with MSE equal to 0.0303 was
obtained under 100 epochs and 0.00001 as initial step-size. The
input-output surface is shown in Figure 5. Although the error
value is smaller than 0.216 obtained in our approach, the
comparison of Figures 4 and 5 reveals that the described
algorithm has a better representation of the shape of the non-linear
function.
All the above-noted 3 cases were used in the algorithm
described in Section 4. The resulting fuzzy system was always as
in Equation (4) and with δ=1. The input-output surface is just the
one shown in Figure 4. The comparison of Figure 4 with Figures
6(a), (b) and (c) reveals that the presented approach has a better
noise insensitivity than ANFIS. This result is also supported by
the error values at the noisy points. In terms of the error obtained
by considering the underlying 50 noise-free data (Table 1),
ANFIS gave 0.2292, 0.2925 and 0.3217 respectively. Comparison
of these error values with 0.216 (obtained using the presented
algorithm), the noise insensitivity result and also Figure 4 with
Figures 6(a), (b) and (c), shows that our approach has a
performance preference over ANFIS when the learning is based
on noisy data (see Section 2).
Concerning complete and noisy data, 3 stages of modification
of output values in the 50 points listed in Table 1 were performed.
In each stage, 4 output values were modified to result in 4
input-output pairs, denoted as set, not satisfying the function in
Equation (3). First, set 1 was used in addition to the remaining 46
noise-free data pairs. In stage 2, sets 1 and 2 were used in addition
to the remaining 42 noise-free data. In stage 3, sets 1, 2 and 3 were
used in addition to 38 noiseless data. The fuzzy systems obtained
8
(a)
Figure 5. Input-output surface of the fuzzy system obtained from
ANFIS using the 50 data pairs in Table 1.
In terms of testing the generalization capability of the presented
approach, sets of data points from among those listed in Table 1
and located in specific boundary regions of the input space were
excluded in succession. The remaining data were entered into the
presented algorithm. First, data pairs such that 1< x1 <2.5 and
3.5< x2 <5 were eliminated. This resulted in the exclusion of 5
data points. Second, data points such that 1< x1 <3 and 3< x 2 <5 (8
points) were excluded. In both cases, the final fuzzy system turned
out to be as in Equation (4) and δ=1. The error values were
respectively 0.2355 and 0.2468. In both cases, the error value
0.216 still holds for the original 50 points in Table 1.
The presented algorithm was also tested for its ability to
combat noise and generalize simultaneously. The data elimination
process described in the preceding paragraph was again
considered and noisy data from among the above-mentioned 12
points were introduced. The introduced noisy points were those
which survived elimination. Hence, 9 noisy points (set 1, 1 point
from set 2 and set 3) were used among the 45 data pairs, which
resulted from the first data exclusion. Also, 8 noisy points (set 1
and set 3) were used among the 42 data pairs, which resulted from
the second data elimination process. In both cases, the algorithm
returned the final fuzzy system expressed in Equation (4) with
δ=1. The input-output surface is therefore as shown in Figure 4.
The proposed fuzzy system modeling approach is therefore able
to combat noise and generalize simultaneously.
(b)
VI. ROBOT NAVIGATION CASE
(c)
Figure 6. Input-output surfaces of the fuzzy systems obtained
from ANFIS using: (a) 46 noise-free data pairs and 4 noisy ones
(b) 42 noise-free data and 8 noisy points and (c) 38 noise-free data
and 12 noisy points.
In dealing with the motion-planning problem of a mobile
robot among existing obstacles, different classical approaches
have been developed. Within these approaches, we state the path
velocity decomposition [34,35], incremental planning [36],
relative velocity paradigm [37], and potential field [38].
Soft-computing techniques, employing various learning methods,
have also been used to improve the performance of conventional
controllers [39,40]. Each of the above noted methods is either
computationally extensive or capable of solving only a particular
type of problems or both.
In order to reduce the computational burden and provide a
more natural solution for the dynamic motion-planning (DMP)
problem, fuzzy approaches, with emphasis on user-defined rules
and collision-free paths, have been suggested [41-43]. Recently, a
more advanced fuzzy-genetic-algorithm approach has been
9
be identified using the present and predicted positions of each
obstacle.
Therefore, what need to be used are the present and predicted
positions of the NOF. The position has two components; angle
and distance. The angle is the one between the line joining the
target G to the robot position, denoted by R in Figure 7, and the
line between the robot and the NOF. The distance is the one
between the robot and the NOF. The FLC output is the deviation
angle between the target-robot line and the new direction of robot
movement, denoted by line RD (see also Figure 7). Based on the
noted information the robot will be able to know whether the NOF
will get close to or cross the line segment joining the present
position of the robot and the point it reaches after ΔT time if it
moves straight to the target. This knowledge is in fact necessary
for the determination of the angle of deviation.
But, to include all these variables in the conditions of the FLC
will complicate its structure. It will also make the derivation of the
input-output data points needed for the construction of the
inference rules a difficult task. To make things simpler while
maintaining the practicality of the problem, a contraint (constraint
#4 below), which is not too restrictive, is considered in addition to
other ones implied by the aforementioned problem description
and adopted in [44].
1. The robot is considered to be a single point.
2. Each obstacle is represented by its bounding circle.
3.The speed of each obstacle is constant with a fixed direction
between its previous, present and predicted positions.
4. The distance traveled by the NOF in ΔT time is comparable to
its diameter.
Of course, constraint 3 presupposes that the obstacles do not
collide while moving. Also, constraint 4, with the problem
configuration as depicted in Figure 8 and its use in the
determination of the input-output data (see below), will reduce the
number of FLC input variables to 2; predicted angle and distance.
The present position of the NOF is still accounted for but not used
explicitly in the controller conditions.
Figure 8 considers a quadrant filled by side-to-side obstacles
each of which may constitute the predicted position of the NOF.
Suppose that the robot is in position R (present position) and the
NOF predicted position is in (A22, B21). The robot initial intension
is to move straight to G (no deviation) if collision is deemed
impossible. Otherwise, an angle of deviation needs to be
determined. Due to constraint 4, the present position of the NOF
could be any of the neighboring obstacles such that the distance
between the center of each of these obstacles and the center of
(A22, B21) is approximately equal to the obstacle diameter. For the
purpose of explaining how the deviation angle is to be determined
for every possible pair of predicted angle and distance of the
NOF, a rough representation of 8 critical neighboring obtacles is
considered. These are: (A21, B20), (A22, B20), (A23, B20), (A21, B21),
(A23, B21), (A21, B22), (A22, B22) and (A23, B22). If the segment
between the present position of the robot and the point it reaches
after ΔT time, if it moves straight to G, penetrates the square
formed by the outer tangent lines to the noted 8 obstacles, a
deviation from the straight line between the robot and target point
is required. Otherwise, no deviation is necessary. The amount of
deviation is to be specified based on having the robot move in a
direction that is just sufficient to avoid hitting not only the
predicted obstacle position, but also any possible present position
devised [44]. The emphasis has been not only on obtaining
collision-free paths, but also on the optimization of travel time or
path between the start and target points of the robot. Genetic
algorithms have, therefore, been used to come up with an optimal
or near optimal fuzzy rule-base off-line by employing a number of
user-defined scenarios. Although the noted fuzzy-genetic
approach provided good testing results on scenarios some of
which were used in training and others were not, it had its
limitations. A different set of rules needed to be determined for
every specific number of moving obstacles.
The approach presented in this section considers the off-line
derivation of a general fuzzy rule-base; that is, a base that can be
used on-line by the robot independently of the number of moving
obstacles [45]. This is achieved using the data-driven learning
algorithm presented in Section 4 and by devising a method for the
derivation of the training data based on the general setting of the
DMP problem and not on specific scenarios. Collision-free paths
and reduction of travel time are still within the goals considered in
the derivation of the fuzzy logic controller (FLC). Furthermore,
the noise insensitivity and generalization capability of the FLC
construction algorithm are emphasized again here and tested in
this practical control case. Comparison of the results with those
obtained by the fuzzy-genetic one and ANFIS is also done.
The robot needs to move from a start point S to a target point
G located in some quadrant where the moving obstacles exist. The
purpose is to find an obstacle-free path, which takes the robot
from S to G with minimum time. A fuzzy controller represented
by a set of fuzzy inference rules is to be constructed to achieve this
objective.
The robot moves incrementally from one point to another in
accordance with time steps, each of duration ΔT, and at the end of
each step it needs to decide on the movement direction. Due to the
problem objective, once the robot is at some point it needs to
consider moving in a straight line towards the target point unless
the information collected about the moving obstacles tells
otherwise due to a possible collision. Hence, the information that
needs to be obtained has to relate, in principle, to the position of
each obstacle and its velocity relative to the robot position; i.e.,
the obstacle velocity vector. But, since the robot knows the
position of each obstacle at every time step, an alternative to the
use of the relative velocity can be the present and predicted
positions of each obstacle. The predicted position can be
computed based on the obstacle present and previous positions.
Ppredicted is assumed the linearly extrapolated position of each
obstacle from its present position Ppresent along the line formed by
joining Ppresent and Pprevious . Thus,
Ppredicted = Ppresent + ( Ppresent - Pprevious)
But, to process all this information by the robot controller is
difficult. The procedure that can be applied here, and which leads
to a simplification of the controller structure, consists of using the
collected information to determine the “nearest obstacle forward”
(NOF) to the robot [44]. Then, only the information related to this
obstacle is used by the FLC to provide decisions. The NOF is the
obstacle located in front of the robot and with velocity vector
pointing towards the line joining the robot position to the target
point. In this way it needs to constitute the most possible collision
danger relative to other obstacles whether the robot chooses to
move straight to the target (Figure 7). The NOF can equivalently
10
G
O 3= N O F
O2
V2
G RO 3 = an gle
O3
V3
R O 3 = distan ce
D
O1
G RD = deviation
V1
R
S
Figure 7. Illustration of NOF, angle, distance
and deviation.
Figure 8. A general configuration of the DMP problem
used in the data derivation.
An g
D ist
D ev
An g
D ist
D ev
An g
D ist
D ev
An g
D ist
D ev
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
90
90
90
90
90
90
90
0.3
0.5
0.7
1
1.2
1.4
1.5
1.8
2
4
5
15
24
0.3
0.5
0.7
1
1.2
1.4
1.5
33
15
0
0
0
0
0
0
0
0
0
0
0
-33
-15
0
0
0
0
0
90
90
90
90
90
-45
-45
-45
-45
-45
-45
-45
-45
-45
-45
-45
45
45
45
45
1.8
2
4
5
15
0.3
0.5
0.7
1
1.2
1.4
1.8
2
2.4
5
20
0.3
0.5
0.7
1
0
0
0
0
0
90
90
62
15
0
0
0
0
0
0
0
-90
-90
-62
-15
45
45
45
45
45
45
-22
-22
-22
-22
-22
-22
-22
-22
-22
-22
-22
22
22
22
1.2
1.4
1.8
2
2.4
20
0.3
0.5
0.7
1
1.2
1.4
1.8
3
10
20
24
0.3
0.5
0.7
0
0
0
0
0
0
90
90
90
33
25
13
0
0
0
0
0
-90
-90
-90
22
22
22
22
22
22
22
22
0
0
0
0
0
0
0
0
0
0
0
0
1
1.2
1.4
1.8
2
3
10
20
0.3
0.5
0.7
1
1.2
1.4
1.8
2.4
3
4
10
20
-33
-25
-13
0
0
0
0
0
90
90
90
57
45
37
25
15
0
0
0
0
Table 2. Input-output data pairs obtained using the method described in Section 6.
A n g l e
A 1
A 2
A 3
A 4
A 5
A 6
A 7
D i s t a n c e
D 1
V 7
V 9
V 9
V 9
V 1
V 1
V 4
D 2
V 6
V 9
V 9
V 9
V 1
V 1
V 4
D 3
V 5
V 8
V 9
V 9
V 1
V 2
V 5
D 4
V 5
V 6
V 7
V 8
V 4
V 4
V 5
D 5
V 5
V 5
V 6
V 7
V 4
V 5
V 5
D 6
V 5
V 5
V 6
V 7
V 4
V 5
V 5
D 7
V 5
V 5
V 5
V 6
V 5
V 5
V 5
D 8
V 5
V 5
V 5
V 5
V 5
V 5
V 5
Table 3. Final fuzzy system obtained by learning.
11
to 90 degrees respectively. These ranges are considered to account
for all possible values. The control surface of the FLC, whose
rules are shown in Table 3, is given in Figure 10, the root mean
square error is 4.679 and the δ parameter value is 0.5.
The obtained FLC is tested on various scenarios containing
different numbers of obstacles. The cases of 3, 5, and 2 cases of 8
obstacles are considered and the simulation results are presented
in Figure 11. In all the cases, the robot travels from point S to
point G without hitting any of the obstacles. Also, the traveled
paths are optimal in the sense that the deviations which took place
at the end of every time step are in most cases just as necessary in
order for the robot to remain as close as possible to the
robot-destination direct path while not colliding with the
obstacles. Moreover, two of these scenarios (Figures 11(a) and
11(d)) were presented in [44] and had obstacles with distinct
diameters. Some had diameters close to the one considered in this
study, and others with larger diameters. Despite this, the robot
path chosen by the constructed FLC does not hit any of the
obstacles. This shows that the constructed FLC can work properly
for obstacles whose diameter values differ from the one used in
the data derivation. Of course, a significant increase in the
diameters would make the chances of having the robot hitting the
obstacles higher.
Table 4 shows the distance ratio (traveled distance/direct
distance) using the presented data-driven fuzzy approach and the
fuzzy-genetic one. The ratio in the fuzzy methodology for the case
of 3 obstacles is a bit higher than that obtained in [44]. Thus, given
that the robot speed in moving from one point to another in
accordance with the previously-noted incremental time steps is
the same in both approaches, then slightly higher time duration is
required in our approach for the robot to reach destination. This
result, however, is quite acceptable given the fact that the
presented methodology is general in the sense that it can be
applied independently of the number of moving obstacles.
The data points in Table 2 are used to construct a
Takagi-Sugeno type fuzzy system by applying ANFIS [13] and
compare the results with those obtained above using the algorithm
presented in Section 4. Although the data in Table 2 were
determined using measuring instruments, and they are thus, noisy,
care has been taken to make these data points as accurate as
possible. Due to this and the fact that the actual values of the data
are unknown, we consider the data in Table 2 as approximately
noise-free. To be surer of the existence of noise in the data,
arbitrary small modifications will be introduced later to some
angles of deviations to make the corresponding data pairs noisy.
Under these circumstances, a comparison between the results of
the algorithm in Section 4 and ANFIS will again be done. Testing
the algorithm for generalization and noise insensitivity plus
generalization will also be investigated.
M e m b e r s h ip
A
1
A
1
A
2
A
3
4
A
0
2 2
A
5
A
6
7
0 .5
0
-9 0
-2 2
-4 5
4 5
9 0
(a ) A N G L E
M e m b e r s h ip
1
D
D
2
3
D
D
4
5
D
6
D
D
7
8
≈
D
1
0 .5
≈
0
0
.3
.7
.5
1
1 .2
1 .4
1 .8
2
2 5
(b ) D IS T A N C E
M e m b e r s h ip
V
1
V
2
V
3
V
4
V
5
V
V
6
V
7
8
V
9
1
0 .5
0
-9 0
-6 0
-3 0
0
3 0
6 0
9 0
(c ) D E V IA T IO N
Figure 9. Input and output membership functions used in
learning.
Figure 10. Control surface of the FLC constructed using the
algorithm in Section 4 and the data in Table 2. The angle and
deviation are in degrees and the distance is in meters.
of the NOF; i.e., any of the above-described neighboring
obstacles, which are guaranteed to reside inside the
previously-noted square. Among the two movement directions
RD1 and RD2, which lead to the avoidance of the obstacles
positions, the one with the smallest deviation angle; i.e., RD1 is
chosen. This serves the travel path reduction objective.
Now, based on the problem configuration in Figure 8 and the
described general approach for the determination of the necessary
deviation for every possible pair of predicted distance and angle
of the NOF, various locations of the NOF within the noted
quadrant were considered and accordingly input-output data were
derived. The derived data points are shown in Table 2. These are
obtained based on obstacle diameter equal to 0.5 meters and robot
traveled distance in ΔT time equal to 2 meters.
The data points in Table 2 were used in the fuzzy learning
algorithm in Section 4 and a set of inference rules (Table 3) was
obtained using the input and output membership functions (MF’s)
shown in Figure 9. The ranges of the distance, angle and deviation
as considered are from 0 to 25 meters, –90 to 90 degrees and –90
The ANFIS obtained FLC using the data in Table 2 and 8
triangular membership functions over each input variable has the
input-output surface shown in Figure 12 and RMSE equal to
5.438. The surface in Figure 10 is much more consistent than
Figure 12 with our expectations of the control surface as
configured using the data in Table 2. This is especially true in the
range of small distances where most of robot deviations are
necessary.
12
(a)
(b )
(c)
(d )
Figure 11. Paths traveled by the robot in 4 scenarios using the FLC in Table 3: (a) 3 obstacles,
(b) 5 obstacles and (c) and (d) have 8 obstacles each.
OBSTACLES
DIRECT DISTANCE (m)
3
14
5
14
8
20
8
20
TRAVELED DISTANCE (m)
15
15.8
23
20.9
RATIO (OUR APPROACH)
1.07
1.286
1.15
1.045
RATIO (GENETIC)
1.046
1.05
Table 4. Traveled distance and ratios for the presented approach and
the fuzzy-genetic one.
Figure 12. Control Surface of the FLC constructed by ANFIS using the data in Table 2.
The angle and deviation are in degrees and the distance is in meters.
13
(a)
(b )
(c)
(d )
Figure 13. Paths traveled by the robot in 4 scenarios using the FLC obtained by ANFIS:
(a) 3 obstacles, (b) 5 obstacles and (c) and (d) have 8 obstacles each.
Ang
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
-90
90
90
90
90
90
90
90
Dist
0.3
0.5
0.7
1
1.2
1.4
1.5
1.8
2
4
5
15
24
0.3
0.5
0.7
1
1.2
1.4
1.5
Dev
33
15
0
0
0
0
0
0
0
0
0
0
0
-33
-15
0
0
0
0
0
Ang
90
90
90
90
90
-45
-45
-45
-45
-45
-45
-45
-45
-45
-45
-45
45
45
45
45
Dist
1.8
2
4
5
15
0.3
0.5
0.7
1
1.2
1.4
1.8
2
2.4
5
20
0.3
0.5
0.7
1
Dev
0
0
0
0
0
86
87
58
13
-2
0
-2
0
-3
0
0
-86
-86
-58
-13
Ang
45
45
45
45
45
45
-22
-22
-22
-22
-22
-22
-22
-22
-22
-22
-22
22
22
22
Dist
1.2
1.4
1.8
2
2.4
20
0.3
0.5
0.7
1
1.2
1.4
1.8
3
10
20
24
0.3
0.5
0.7
Dev
0
2
0
3
0
2
90
87
86
33
21
13
0
-3
0
-2
0
-87
-90
-87
Ang
22
22
22
22
22
22
22
22
0
0
0
0
0
0
0
0
0
0
0
0
Dist
1
1.2
1.4
1.8
2
3
10
20
0.3
0.5
0.7
1
1.2
1.4
1.8
2.4
3
4
10
20
Dev
-31
-22
-13
0
1
0
2
0
87
90
88
59
45
35
24
14
0
2
-1
0
Table 5. Input-output data pairs obtained by modifying the deviation in 33 data pairs in Table 3.
14
Figure 14. Control surface of the FLC constructed by ANFIS using the data in Table 5.
The angle and deviation are in degrees and the distance is in meters.
(a )
(b )
(c)
(d )
Figure 15. Paths traveled by the robot in 4 scenarios using the FLC obtained by ANFIS
with the data in Table 5: (a) 3 obstacles, (b) 5 obstacles and (c) and (d) have
8 obstacles each.
15
they have not been spelled out, discussed, validated nor tested as
is done in this study. The following summarized results and
conclusions are to be noted.
When the data pairs are noise-free and complete, the proposed
algorithm provided lower error values than the methods based on
clustering and fuzzy partitions. Further, the fuzzy partition
neural-network method, clustering-position-gradient and ANFIS
gave lower error values than the presented algorithm. In the
example given in Section 5 and the robot navigation case,
however, ANFIS gave an inferior function-shape representation.
Furthermore, when the data are noisy, which is the practical
case (Section 2), it has been shown in the non-linear function case
and robot navigation that the proposed fuzzy learning algorithm
(Section 4) has a performance preference over ANFIS. The
proposed algorithm is also capable of generalization. It
extrapolates very reliably to regions of missing data. It is also able
to combat noise and generalize simultaneously. In addition, the
algorithm provides readable fuzzy controllers. That is, controllers
which can be interpreted linguistically in a simple manner. The
T-S type fuzzy models, as was explained in Section 3, cannot
furnish this linguistic aspect.
The fuzzy learning algorithm in Section 4 has also been shown
capable of providing fuzzy if-then rules that can be represented in
the form of a rule-table, it has only one parameter to be tuned and
the setting of initial rules can easily be done. The algorithm also
permits thinking tolerant to imprecision. This can be concluded
from the learning procedure, which is based on error reduction
rather than error minimization and customized setting of the error
threshold to reflect the level of precision needed in a particular
situation. It can also be concluded from the resulting noise
insensitivity and generalization capability of the algorithm. It is
worth noting here as well that the learning procedure in the
introduced algorithm is independent of the form of the error
function and also of the shape of the fuzzy system membership
functions.
Further strengthening of the drawn conclusions should later on
come up through the application of the algorithm and the other
design methods to more non-linear functions and practical control
cases while accounting for the criteria defined in Section 2. This
can be done by programming the referenced methods, especially
those that have recently accounted for the issues of noise and
generalization, and testing them in different types of data.
The testing of the ANFIS obtained fuzzy system on the 4
scenarios in Figure 11 gave the robot trajectories shown in Figure
13. The ANFIS obtained trajectories lengths were respectively
20.2, 17.3, 23.4 and 25.9 meters for cases (a), (b), (c) and (d) in
Figure 13. These are therefore larger than the paths lengths in
Figure 11, which were obtained using the fuzzy algorithm (Table
4). Also, ANFIS gave two collision cases; one in (c) and another
in (d).
Now, we consider the data points in Table 5. These were
obtained from Table 2 by randomly introducing between 1 to 4
degrees modifications to 33 of the deviation angles and in such a
manner to adequately cover the input space. The use of the data in
Table 5 in the algorithm described in Section 4 gave the same
FLC as the one in Table 3 with the same parameter value. Thus,
the control surface is the one shown in Figure 10. Hence, the robot
paths shown in Figure 11 remain unchanged. The ANFIS obtained
system, however, turned out to be different from the one obtained
with the use of the original data (Table 2). It had the surface
shown in Figure 14. Figure 15 shows the testing of the ANFIS
given system on the 4 scenarios in Figure 11. Regarding the robot
paths shown in Figure 15, they have the following respective
lengths for cases (b), (c) and (d): 23, 26.1, and 26 meters. In case
(a), the robot is not able to reach destination and in case (d) one hit
occurs. ANFIS, therefore gave here longer trajectories than those
in Figure 13 and in Figure 11.
Furthermore, the fuzzy controller construction algorithm was
tested for generalization by eliminating 8 data points from those
listed in Table 2. The eliminated points were those with input
pairs located in the input region for 0°<Angle≤90° and 2 meters
<Distance≤25 meters. The learning was therefore based on the
remaining 72 data. The fuzzy system whose rules are given in
Table 3 was again obtained with the same parameter value. The
control surface is, thus, as given in Figure 10. In addition, the
same 8 points were eliminated from the data listed in Table 5. The
learning based on the remaining 72 data points also gave the fuzzy
system in Table 3 and the same parameter value.
The presented results validate the fact that the algorithm is
capable of combating noise, which is usually present in the data,
generalizing to regions of missing data and combating noise and
generalizing simultaneously. The comparison of Figures 12 and
14 and also Figures 13 and 15 shows that the fuzzy systems
constructed by ANFIS are noise sensitive. They are also sensitive
to data exclusion and do not present a good generalization
capability. This was concluded by looking at the ANFIS obtained
surfaces using the above-noted 72 data points remaining after the
exclusion of the mentioned 8 points from Tables 2 and 5. These
surfaces, in fact, turned out to be different and no better than the
ones presented in Figures 12 and 14 and the testing on the 4
scenarios in Figure 11 did not lead to any improvement over those
obtained in Figures 13 and 15.
REFERENCES
[1] Takagi T., Sugeno M., “Fuzzy identification of systems and
its application to modeling and control,” IEEE Trans. Systems,
Man and Cybernetics, SMC-15, 1, pp.116-132, 1985.
[2] Horikawa S., Furuhashi T., Uchikawa Y., “On fuzzy
modeling using fuzzy neural networks with the backpropagation
algorithm,” IEEE Trans. Neural Networks, Vol. 3, No. 5,
pp.801-806, 1992.
[3] Jang JSR., “Self-learning fuzzy controllers based on temporal
back-propagation,” IEEE Trans. Neural Networks, Vol. 3, No. 5,
pp. 714-723, 1992.
[4] Nomura H., Hayashi I., Wakami N., “A learning method of
fuzzy inference rules by descent method,” IEEE Int’l Conf. on
Fuzzy Systems, San Diego, California, March 18-22, 1992, pp.
203-210.
VII. SUMMARY AND CONCLUSIONS
In this study, performance criteria, which need to be
considered to test and compare data-driven fuzzy system
modeling algorithms using non-linear control functions, have
been defined based on a practical perspective and on important
aspects of intelligent human thinking represented by approximate
reasoning and generalization. Hence, priority has been given to
the criteria accounting for noisy and incomplete training data.
Despite the extreme importance of these performance criteria,
16
Conference, Palma de Mallorca, Spain, September 22-25, 1999,
pp. 55-58.
[26] Saade J.J, Al-Khatib M., “Efficient representation of
non-linear functions by fuzzy controllers design algorithms,” 7th
IEEE Int’l Conference on Electronics, Circuits and Systems,
December 17-20, 2000, Beirut, Lebanon, pp. 554-557.
[27] Oh S., Pedrycz W., “Identification of fuzzy systems by
means of an auto-tuning algorithm and its application to
non-linear systems,” Fuzzy Sets and Systems, Vol. 115, pp.
205-230, 2000.
[28] Branco P.J.C., Dente J.A., “Fuzzy systems modeling in
practice,” Fuzzy Sets and Systems, Vol. 121, pp. 73-93, 2001.
[29] Leski J.M., “Neuro-fuzzy systems with learning tolerant to
imprecision,” Fuzzy Sets and Systems, Vol. 138, pp. 427-439,
2003.
[30] Zadeh L. A., “Outline of a new approach to the analysis of
complex systems and decision processes,” IEEE Trans. Systems,
Man and Cybernetics, Vol. SMC-3, No. 1, pp. 28-44, 1973.
[31] Saade J.J. “A unifying approach to defuzzification and
comparison of the outputs of fuzzy controllers,” IEEE Trans.
Fuzzy Systems, Vol. 4, No. 3, pp. 227-237, 1996.
[32] Klir G., Yuan B., Fuzzy Sets and Fuzzy Logic, Theory and
Applications, Prentice Hall PTR, New Jersey, 1995.
[33] Saade J.J. and Schwarzlander H., “Ordering fuzzy sets over
the real line: an approach based on decision making under
uncertainty,” Fuzzy Sets and Systems, Vol. 50, pp. 237-246, 1992.
[34] Fujimura K., Samet H., “A hierarchical strategy for path
planning among moving obstacles,” IEEE Trans. Robotics and
Automation, Vol. 5, pp. 61-69, 1989.
[35] Griswold N.C., Eem J., “Conrtol for mobile robots in the
presence of moving objects,” IEEE Trans. Robotics and
Automation, Vol. 6, pp. 263-268, 1990.
[36] Lamadrid J.G., “Avoidance of obstacles with unknown
Trajectories: Locally optimal paths and periodic sensor readings,”
Int.J. Robotics Research , pp. 496-507, 1994.
[37] Fiorini P., Shiller Z., “Motion planning in dynamic
environments using the relative velocity paradigm,” Proc. IEEE
Conference on Robotics and Automation, 1993, pp. 560-565.
[38] Barraquand J., Langlois B., Latombe J.C., “Numerical
potential field techniques for robot path planning,” IEEE Trans.
Systems, Man and Cybernetics, Vol. 22, pp. 224-241, 1992.
[39] Donnart J.Y., Meyer J.A., “Learning reactive and planning
rules in a motivationally autonomous animat,” IEEE Trans.
Systems, Man and Cybernetics B, Vol. 26, pp. 381-395, 1996.
[40] Floreano D., Mondada F., “Evolution of homing navigation
in a real mobile robot,” IEEE Trans. Systems, Man and
Cybernetics B, Vol. 26, pp. 396-407, 1996.
[41] Beaufrere B., Zeghloul S., “A mobile robot navigation
method using a fuzzy logic approach,” Robotica, Vol. 13, pp.
437-448, 1995.
[42] Matinez A., et al., “Fuzzy logic based collision avoidance for
a mobile robot,” Robotica, Vol. 12, pp. 521-527, 1994.
[43] Pin F.G., Watanabe Y., “Navigation of mobile robots using a
fuzzy behaviorist approach and custom-designed fuzzy
inferencing boards,” Robotica, Vol. 12, pp. 491-503, 1994.
[44] Pratihar D.K., Deb K., Ghosh A., “A genetic-fuzzy approach
for mobile robot navigation among moving obstacles,” Int. J.
Approximate Reasoning, Vol. 20, pp. 145-172, 1999.
[45] Al-Khatib M., Saade J. J., “An efficient data-driven fuzzy
approach to the motion planning problem of a mobile robot,”
Fuzzy Sets and Systems, Vol. 134, pp. 65-82, 2003.
[5] Shi Y., Mizumoto M., “Some considerations on conventional
neuro-fuzzy learning algorithms,” Fuzzy Sets and Systems, Vol.
112, pp. 51-63, 2000.
[6] Shi Y., Mizumoto M., “A new approach of neuro-fuzzy
learning algorithm for tuning fuzzy rules,” Fuzzy Sets and
Systems, Vol. 112, pp. 99-116, 2000.
[7] Lin Y., Cunningham III G.A., Coggeshall S.V., “Using fuzzy
partitions to create fuzzy systems from input-output data and set
the initial weights in a fuzzy neural network,” IEEE Trans. Fuzzy
Systems, Vol. 5, No. 4, pp. 614-621, 1997.
[8] Klawonn F., Kruse R., “Constructing a fuzzy controller from
data,” Fuzzy Sets and Systems, Vol. 85, pp.177-193, 1997.
[9] Tarng Y.S., Yeh Z.M., Nian C.Y., “Genetic synthesis of fuzzy
logic controllers in turning,” Fuzzy Sets and Systems, Vol. 83, pp.
301-310, 1996.
[10] Delgado M., Skarmita A.F.G., Martin F., “A fuzzy
clustering-based
rapid prototyping for fuzzy rule-based
modeling,” IEEE Trans. Fuzzy Systems, Vol. 5, No. 2, pp.
223-233, 1997.
[11] Carse B., Fogarty T.C., Munro A., “Evolving fuzzy
rule-based controllers using genetic algorithms,” Fuzzy Sets and
Systems, Vol. 80, pp. 273-293, 1996.
[12] Chen J.Q., Xi Y.G., Zhang Z.J., “A clustering algorithm for
fuzzy model identification,” Fuzzy Sets and Systems, Vol. 98, pp.
319-329, 1998.
[13] Jang J.S.R., “ANFIS: Adaptive-network-based fuzzy
inference system,” IEEE Trans. Systems, Man and Cybernetics,
Vol. 23, No. 3, pp. 665-685, 1993.
[14] Jang J.S.R., Sun C.T., “Neuro-fuzzy modeling and control,”
Proceedings of the IEEE, Vol. 83, No. 3, pp. 378-406, 1995.
[15] Shi Y., Mizumoto M., “An improvement of neuro-fuzzy
learning algorithm for tuning fuzzy rules,” Fuzzy Sets and
Systems, Vol. 118, pp. 339-350, 2001.
[16] Sugeno M., Yasukawa T., “A fuzzy-logic-based approach to
qualitative modeling,” IEEE Trans. on Fuzzy Systems, Vol. 1, No.
1, pp. 7-31, 1993.
[17] Siarry P., Guely F., “A genetic algorithm for optimizing
Takagi-Sugeno fuzzy rule-bases,” Fuzzy Sets and Systems, Vol.
99, pp. 37-47, 1998.
[18] Papadakis S.E., Theocharis J.B., “A GA-based fuzzy
modeling approach for generating TSK models,” Fuzzy Sets and
Systems, vol. 131, pp. 121-152, 2002.
[19] Lee C.W., Shin Y.C., “Construction of fuzzy systems using
least-squares method and genetic algorithm,” Fuzzy Sets and
Systems, Vol. 137, pp. 297-323, 2003.
[20] Saade J.J. “A defuzzification based new algorithm for the
design of Mamdani-type fuzzy controllers,” Mathware and Soft
Computing, Vol. 7, pp. 159-173, 2000.
[21] Buckley J.J., Hayashi Y., “Numerical relationships between
neural networks, continuous functions, and fuzzy systems,” Fuzzy
Sets and Systems, Vol. 60, pp.1-8, 1993.
[22] Klawonn F, Novak V. The relation between inference and
interpolation in the framework of fuzzy systems, Fuzzy Sets and
Systems, Vol. 81, 1996, pp. 331-354.
[23] Koczy L.T., Zorat A., “Fuzzy systems and approximation,”
Fuzzy Sets and Systems, Vol. 85, pp. 203-222, 1997.
[24] Lee J., Chae S., “Analysis on function duplicating
capabilities of fuzzy controllers,” Fuzzy Sets and Systems, Vol.
56, pp.127-143, 1993.
[25] Saade J.J., “A new algorithm for the design of Mamdani-type
fuzzy controllers,” Proc. Of the EUSFLAT-ESTYLF Joint
17
Merging data sources based on semantics,
contexts and trust
Lovro Šubelj, David Jelenc, Eva Zupančič, Dejan Lavbič, Denis Trček, Marjan Krisper and
Marko Bajec
Abstract—Matching and merging of data from heterogeneous sources is a common need in various scenarios. Despite
numerous algorithms proposed in the recent literature, there is a lack of general and complete solutions combining different
dimensions arising during the matching and merging execution. We propose a general framework, and accompanying algorithms,
that allow joint control over various dimensions of matching and merging. To achieve superior performance, standard (relational)
data representation is enriched with semantics and thus elevated towards the real world situation. Data sources are merged
using collective entity resolution and redundancy elimination algorithms that are managed through the use of different contexts –
user, data and also trust contexts. Introduction of trust allows for an adequate trust management and efficient security assurance
which is, besides a general solution for matching and merging, the main novelty of the proposition.
Index Terms—merging data, semantic elevation, context, trust management, entity resolution, redundancy elimination.
✦
1
I NTRODUCTION
W
the recent advent of Semantic Web and
open (on-line) data sources, merging of data
from heterogeneous sources is rapidly becoming a
common need in various fields. Different scenarios of
use include analyzing heterogeneous datasets collectively, enriching data with some on-line data source
or reducing redundancy among datasets by merging
them into one. Literature provides several state-of-theart approaches for matching and merging, although
there is a lack of general solutions combining different
dimensions arising during the matching and merging
execution. We propose a general and complete solution that allows a joint control over these dimensions.
Data sources commonly include not only relational
data, but also semantically enriched data. Thus a
state-of-the-art solution should employ semantically
elevated algorithms, to fully exploit the data at hand.
However, due to a vast diversity of data sources, also
an adequate data architecture has to be employed. In
particular, the architecture should support all types
and formats of data, and provide appropriate data
for each algorithm. As algorithms favor different representations and levels of semantics behind the data,
architecture should be structured appropriately.
Due to different origin of (heterogeneous) data
sources, the trustworthiness (or accuracy) of their
data can often be questionable. Specially, when many
such datasets are merged, the results are likely to
be inexact. A common approach for dealing with
data sources that provide untrustworthy or conflicting
ITH
• L. Šubelj, D. Jelenc, E. Zupančič, D. Lavbič, D. Trček, M. Krisper and
M. Bajec are with University of Ljubljana, Faculty of Computer and
Information Science.
statements, is the use of trust management systems
and techniques. Thus matching and merging should
be advanced to a trust-aware level, to jointly optimize
trustworthiness of data and accuracy of matching
or merging. Such collective optimization can significantly improve over other approaches.
The article proposes a general framework for
matching and merging execution. An adequate data
architecture enables either pure relational data, in the
form of networks, or semantically enriched data, in
the form of ontologies. Different datasets are merged
using collective entity resolution and redundancy
elimination algorithms, enhanced with trust management techniques. Algorithms are managed through
the use of different contexts that characterize each
particular execution, and can be used to jointly control
various dimensions of variability of matching and
merging execution.
The rest of the article is structured as follows. The
following section gives a brief overview of the related
work, focusing mainly on trust-aware matching and
merging. Next, section 3, presents employed data
architecture and discusses semantic elevation of the
proposition. Section 4 formalizes the notion of trust
and introduces the proposed trust management techniques. General framework, and accompanying algorithms, for matching and merging are presented in
section 5, and further discussed in section 6. Section 7
concludes the article.
2
R ELATED
WORK
Recent literature proposes several state-of-the-art solutions for matching and merging data sources. Relevant work and approaches exist in the field of data
18integration [1], [2], [3], [4], data deduplication [5],
[6], [7], information retrieval, schema and ontology
matching [8], [9], [10], [11], and (relational) entity
resolution [1], [12], [13]. However, the propositions
mainly address only selected issues of more general matching and merging problem. In particular,
approaches only partially support the variability of
the execution; commonly only homogeneous sources,
with predefined level of semantics, are employed; or
the approaches discard the trustworthiness of data
and sources of origin.
Literature also provides various trust-based, or
trust-aware, approaches for matching and merging [14], [15]. Although they formally exploit trust in
the data, they do not represent a general or complete
solution. Mainly, they explore the idea of Web of Trust,
to model trust or belief in different entities. Related
work (on Web of Trust) exists in the fields of identity
verification [16], information retrieval [17], [18], social
network analysis [19], [20], data mining and pattern
recognition [21], [22]. Our work also relates to more
general research of trust management and techniques
that provide formal means for computing with trust
(e.g. [23]).
3
DATA
3.1 Representation with networks
Most natural representation of any relational domain
are networks. They are based upon mathematical objects called graphs. Informally speaking, graph consists
of a collection of points, called vertices, and links
between these points, called edges (Fig. 1). Let VN ,
EN be a set of vertices, edges for some graph N
respectively. We define N as N = (VN , EN ) where
VN
EN
=
⊆
{v1 , v2 . . . vn },
{{vi , vj }| vi , vj ∈ VN ∧ i < j}.
(1)
(2)
Edges are sets of vertices, hence they are not directed
(undirected graph). In the case of directed graphs equation (2) rewrites to
EN
⊆
{(vi , vj )| vi , vj ∈ VN ∧ i 6= j},
(3)
where (vi , vj ) is an edge from vi to vj . The definition
can be further generalized by allowing multiple edges
between two vertices and loops (edges that connect
vertices with themselves). Such graphs are called
multigraphs (Fig. 1 (b)).
ARCHITECTURE
An adequate data architecture is of vital importance
for efficient matching and merging. Key issues arising
are as follows: (1) architecture should allow for data
from heterogeneous sources, commonly in various
formats; (2) semantical component of data should be
addressed properly; and (3) architecture should also
deal with (partially) missing and uncertain data.
To achieve superior performance, we propose a
three level architecture (Fig. 3). Standard relational Fig. 1. (a) directed graph; (b) labeled undirected multidata representation on the bottom level (data level) graph (labels are represented graphically); (c) network
is enriched with semantics (semantic level) and thus representing a group of related traffic accidents (round
elevated towards the topmost real world level (abstract vertices correspond to participants and cornered corlevel). Datasets on data level are represented with respond to vehicles).
networks, when the semantics are employed through
the use of ontologies.
In practical applications we commonly strive to
Every dataset is (preferably) represented on data store some additional information along with the verand semantic level. Although both describe the same tices and edges. Formally, we define labels or weights
set of entities on abstract level, the representation on for each node and edge in the graph – they represent
each level is independent from the other. This separa- a set of properties that can also be described using
tion resides from the fact that different algorithms of two attribute functions
matching and merging execution privilege different
(4)
AVN : VN → ΣV1 N × ΣV2 N × . . . ,
representations of data – either pure relational or
EN
EN
(5)
AEN : EN → Σ1 × Σ2 × . . . ,
semantically elevated representation. Separation thus
V
E
G
G
results in more accurate and efficient matching and AN = (AVN , AEN ), where Σi , Σi are sets of all
merging, moreover, representations can complement possible vertex, edge attribute values respectively.
each other in order to boost the performance.
Networks are most commonly seen as labeled, or
The following section gives a brief introduction weighted, multigraphs with both directed and undito networks, used for data level representation. Sec- rected edges (Fig. 1 (c)). Vertices of a network repretion 3.2 describes ontologies and semantic elevation of sent some entities, and edges represent relations bedata level (i.e. semantic level). Proposed data architec- tween them. A (relational) dataset, represented with a
ture is formalized and further discussed in section 3.3. 19network on the data level, is thus defined as (N, AN ).
Fig. 2. Ontology representing various classes, relations and attributes related to traffic accidents and automobile
insurance domain. Classes are colored with orange, relations with blue and attributes with green. Key concepts
of the ontology are Event, Person, Driver, Witness, Owner and Vehicle.
3.2
Semantic elevation using ontologies
Ontologies are a tool for specifying the semantics of terminology systems in a well defined and unambiguous
manner [24] (Fig. 2). They can simply be defined as a
network of entities, restricted and annotated with a set
of axioms. Let EO , AO be the sets of entities, axioms
for some ontology O respectively. Dataset, represented
with an ontology on semantic level, is defined as
O = (EO , AO ) where
EO
AO
⊆
⊆
EC ∪ EI ∪ ER ∪ EA,
a
a
{a| EO
⊆ EO ∧ a axiom on EO
}.
(6)
(7)
Entities EO consist of classes E C (concepts), individuals E I (instances), relations E R (among classes and
individuals) and attributes E A (properties of classes);
and axioms AO are assertions (over entities) in a
logical form that together comprise the overall theory
described by ontology O.
This article focuses on ontologies based on descriptive logic that, besides assigning meaning to axioms,
enable also reasoning capabilities. The latter can be
used to compute consequences of the previously made
assumptions (queries), or to discover non-intended
consequences and inconsistencies within the ontology.
With the advent of Semantic Web, ontologies are
rapidly gaining importance. One of the most prominent applications of ontologies is in the domain of
semantic interoperability (among heterogeneous software systems). While pure semantics concerns the
study of meanings, semantic elevation means to achieve
semantic interoperability and can be considered as
a subset of information integration (including data
access, aggregation, correlation and transformation).
Thus one of the key aspects of semantic elevation
is to derive a common representation of classes, individuals, relations and attributes within some ontology.
We employ a concept of knowledge chunks [9], where
each entity is represented with its name and a set of
semantic relations (or attributes), their values and (ontology) identifiers. All of the data about a certain entity
is thus transformed into attribute-value format, with
an identifier of the data source of origin appended
to each value. Knowledge chunks, denoted k ∈ K,
thus provide a (common) synthetic representation of
20an ontology that is used during the matching and
merging execution. For more details on knowledge
chunks, and their construction from a RDF(S) (Resource Description Framework Schema) repository or an
OWL (Web Ontology Language) ontology, see [9], [25].
Notion of knowledge chunks is introduced also on
data level. Hence, each network is represented in the
same, easily maintainable, form, allowing for common
matching and merging algorithms. Exact description
of the transformation between networked data and
knowledge chunks is not given, although it is very
similar to the definition of inferred axioms in equation (12).
3.3
Let (N, AN ) be a dataset, represented as a network
on data level. Without loss for generality, we assume
that N is an undirected network. Inferred ontology
(ẼÕ , ÃÕ ) on semantic level is defined with
ẼC
ẼI
ẼR
ẼA
Fig. 3. (a) information-based view of the data architecture; (b) data-based view of the data architecture.
{vertex, edge},
VN ∪ EN ,
{isOf, isIn},
{AVN , AEN }
(8)
(9)
(10)
(11)
and
(12)
ÃÕ = {v isOf vertex| v ∈ VN }
∪ {e isOf edge| e ∈ EN }
∪ {v isIn e| v ∈ VN ∧ e ∈ EN ∧ v ∈ e}
∪ {v.AVN = a| v ∈ VN ∧ AVN (v) = a}
∪ {e.AEN = a| e ∈ EN ∧ AEN (e) = a}.
Three level architecture
As previously stated, every dataset is (independently)
represented on three levels – data, semantic and abstract level (Fig. 3). Bottommost data level holds data
in a pure relational format (i.e. networks), mainly
to facilitate state-of-the-art relational algorithms for
matching. Next level, semantic level, enriches data with
semantics (i.e. ontologies), to further enhance matching and to promote semantic merging execution. Data
on both levels represent entities of topmost abstract
level, which serves merely as an abstract (artificial)
representation of all the entities, used during matching and merging execution.
The information captured by data level is a subset
of that of semantic level. Similarly, the information
captured by semantic level is a subset of that of abstract level. This information-based view of the architecture is seen in Fig. 3 (a). However, representation on
each level is completely independent from the others,
due to absolute separation of data. This provides an
alternative data-based view, seen in Fig. 3 (b).
=
=
=
=
We denote IN : (N, AN ) 7→ (ẼÕ , ÃÕ ). One can
−1
easily see that IN
◦ IN is an identity (transformation
preserves all the information).
On the other hand, given a dataset (EO , AO ), represented with an ontology on semantic level, inferred
(undirected) network (Ñ , ÃÑ ) on data level is defined
with
ṼÑ
ẼÑ
=
EO ∩ E I ,
=
a
{EO
I
∩ E | a ∈ AO ∧
(13)
a
EO
⊆ EO }
(14)
and
ÃṼ :
ṼÑ → E C × E A ,
(15)
ÃẼ :
ẼÑ → E R .
(16)
Ñ
Ñ
Instances of ontology are represented with the vertices
of the network, and axioms with its edges. Classes and
relations are, together with the attributes, expressed
through vertex, edge attribute functions.
We denote IO : (EO , AO ) 7→ (Ñ , ÃÑ ). Transformation IO discards purely semantic information (e.g.
relations between classes), as it cannot be represented
on the data level. Thus IO cannot be inverted as IN .
However, all the data, and data related information,
is preserved (e.g. relations among individuals, and
individuals and classes).
Due to limitations of networks, only axioms, relating at most two individuals in EO , can be represented
with the set of edges ẼÑ (equation (14)). When this is
not sufficient, hypernetworks (or hypergraphs1 ) should
be employed instead. Nevertheless, networks should
suffice in most cases.
One more issue has to be stressed. Although IN
and IO give a “common” representation of every
dataset, the transformations are completely different. For instance, presume (N, AN ) and (EO , AO )
are (given) representations of the same dataset. Then
To manage data and semantic level independently
(or jointly), a mapping between the levels is required.
In practice, data source could provide datasets on
both, data and semantic level. The mapping is in that
case trivial (i.e. given). However, more commonly,
data source would only provide datasets on one of
1. Hypergraphs are similar to ordinary graphs only that the edges
21can connect multiple vertices.
the levels, and the other has to be inferred.
IN (N, AN ) 6= (EO , AO ) and IO (EO , AO ) 6= (N, AN ) in
general – inferred ontology, network does not equal
given ontology, network respectively. The former nonequation resides in the fact that network (N, AN )
contains no knowledge of the (pure) semantics within
ontology (EO , AO ); and the latter resides in the fact
that IO has no information of the exact representation
used for (N, AN ). Still, transformations IN and IO can
be used to manage data on a common basis.
Last, we discuss three key issues regarding an adequate data architecture, presented in section 3. Firstly,
due to variety of different data formats, a mutual
representation must be employed. As the data on both
data and semantic level is represented in the form of
knowledge chunks (section 3.2), every piece of data
is stored in exactly the same way. This allows for
common algorithms of matching and merging and
makes the data easily manageable.
Furthermore, introduction of knowledge chunks
naturally deals also with missing data. As each chunk
is actually a set of attribute-value pairs, missing data
only results in smaller chunks. Alternatively, missing
data could be randomly inputted from the rest and
treated as extremely uncertain or mistrustful (section 4).
Secondly, semantical component of data should
be addressed properly. Proposed architecture allows
for simple (relational) data and also semantically
enriched data. Hence no information is discarded.
Moreover, appropriate transformations make all data
accessible on both data and semantic level, providing
for specific needs of each algorithm.
Thirdly, architecture should deal with (partially)
missing and uncertain or mistrustful data, which is
thoroughly discussed in the following section.
4
T RUST
AND TRUST MANAGEMENT
When merging data from different sources, these are
often of different origin and thus their trustworthiness (or accuracy) can be questionable. For instance,
personal data of participants in a traffic accident is
usually more accurate in the police record of the
accident, then inside participants’ social network profiles. Nevertheless, an attribute from less trusted data
source can still be more accurate than an attribute
from more trusted one – a relationship status (e.g.
single or married) in the record may be outdated,
while such type of information is inside the social
network profiles quite often up-to-date.
A complete solution for matching and merging
execution should address such problems as well. A
common approach for dealing with data sources that
provide untrustworthy or conflicting statements, is
the use of trust management (systems). These are, alongside the concept of trust, both further discussed in
sections 4.1 and 4.2.
4.1
Definition of trust
Trust is a complex psychological-sociological phenomenon. Despite of, people use term trust in everyday life widely, and with very different meanings.
Most common definition states that trust is an assured
reliance on the character, ability, strength, or truth of
someone or something.
In the context of computer networks, trust is modeled as a relationship between entities. Formally, we
define a trust relationship as
ωE :
E × E → ΣE
(17)
where E is a set of entities and ΣE a set of all
possible, numerical or descriptive, trust values. ωE
thus represents one entity’s attitude towards another
and is used to model trust(worthiness) TE of all
entities in E. To this end, different trust modeling
methodologies and systems can be employed, from
qualitative to quantitative (e.g. [14], [15], [23]).
We introduce trust on three different levels. First,
we define trust on the level of data source, in order to
represent trustworthiness of the source in general. Let
S be the set of all data sources. Their trust is defined
as TS : S → [0, 1], where higher values of TS represent
more trustworthy source.
Second, we define trust on the level of attributes
(or semantic relations) within the knowledge chunks.
The trust in attributes is naturally dependent on the
data source of origin, and is defined as TAs : As →
[0, 1], where As is the set of attributes for data source
s ∈ S. As before, higher values of TAs represent more
trustworthy attribute.
Last, we define trust on the level of knowledge
chunks. Despite the trustworthiness of data source
and attributes within some knowledge chunk, its data
can be (semantically) corrupted, missing or otherwise unreliable. This information is captured using
trustworthiness of knowledge chunks, and again defined as TK : K → [0, 1], where K is a set of
all knowledge chunks. Although the trust relationships (equation (17)), needed for the evaluation of
trustworthiness of data sources and attributes, are
(mainly) defined by the user, computation of trust
in knowledge chunks can be fully automated using
proper evaluation function (section 4.2).
Three levels of trust provide high flexibility during
matching and merging. For instance, attributes from
more trusted data sources are generally favored over
those from less trusted ones. However, by properly
assigning trust in attributes, certain attributes from
else less trusted data sources can prevail. Moreover,
trust in knowledge chunks can also assist in revealing
corrupted, and thus questionable, chunks that should
be excluded from further execution.
Finally, we define trust in some particular value
within a knowledge chunk, denoted trust value T . This
22is the value in fact used during merging and matching
execution and is computed from corresponding trusts
on all three levels. In general, T can be an arbitrary
function of TS , TAs and TK . Assuming independence,
we calculate trust value by concatenating corresponding trusts,
T
=
T S ◦ T As ◦ T K .
(18)
Concatenation function ◦ could be a simple multiplication or some fuzzy logic operation (trusts should in
this case be defined as fuzzy sets).
4.2
Trust management
During merging and matching execution, trust values are computed using trust management algorithm
based on [15]. We begin by assigning trust values TS ,
TAs for each data source, attribute respectively (we
actually assign trust relationships). Commonly, only
a subset of values must necessarily be assigned, as
others can be inferred or estimated from the first.
Next, trust values for each knowledge chunk are not
defined by the user, but are calculated using the chunk
evaluation function feval (i.e. TK = feval ).
An example of such function is a density of inconsistencies within some knowledge chunk. For instance,
when attributes Birth and Age of some particular
knowledge chunk mismatch, this can be seen as
an inconsistency. However, one must also consider
the trust of the corresponding attributes (and data
sources), as only inconsistencies among trustworthy
attributes should be considered. Formally, density of
inconsistencies is defined as
feval (k)
=
N̂inc (k) − Ninc (k)
N̂inc (k)
,
(19)
where k is a knowledge chunk, k ∈ K, Ninc (k) the
number of inconsistencies within k and N̂inc (k) the
number of all possible inconsistencies.
Finally, after all individual trusts TS , TAs and TK
have been assigned, trust values T are computed
using equation (18). When merging takes place and
two or more data sources (or knowledge chunks)
provide conflicting attribute values, corresponding to
the same (resolved) entity, trust values T are used to
determine actual attribute value in the resulting data
source (or knowledge chunk). For further discussion
on trust management during matching and merging
see section 5.
5
M ATCHING AND MERGING DATA SOURCES
Matching and merging is employed in various scenarios. As the specific needs of each scenario vary,
different dimensions of variability characterize every
matching and merging execution. These dimensions
are managed through the use of contexts [9], [26]. Contexts allow a formal definition of specific needs arising
in diverse scenarios and a joint control over various
dimensions of matching and merging execution.
The following section discusses the notion of contexts more throughly and introduces different types
of contexts used. Next, sections 5.2, 5.3 describe
employed entity resolution and redundancy elimination
algorithms respectively. The general framework for
matching and merging is presented and formalized
in section 5.4, and discussed in section 6.
5.1
Contexts
Every matching and merging execution is characterized by different dimensions of variability of the
data, and mappings between. Contexts are a formal
representation of all possible operations in these dimensions, providing for specific needs of each scenario. Every execution is thus characterized with the
contexts it defines (Fig. 4), and can be managed and
controlled through their use.
The idea of contexts originates in the field of requirements engineering, where it has been applied to
model domain variability [26]. It has just recently been
proposed to model also variability of the matching
execution [9]. Our work goes one step further as
it introduces contexts, not bounded only to user or
scenario specific dimensions, but also data related and
trust contexts.
Fig. 4. Characterization of merging and matching
execution defining one context in user dimension, two
contexts in data dimension and all contexts in trust
dimension.
Merging data from heterogeneous sources can be seen
Formally, we define a context C as
as a two-step process. The first step resolves the real
world entities of abstract level, described by the data
C : D → {true, f alse},
(20)
on lower levels, and constructs a mapping between
the levels. This mapping is used in the second step where D can be any simple or composite domain. A
that actually merges the datasets at hand. We denote context simply limits all possible values, attributes, rethese subsequent steps as entity resolution (i.e. match- lations, knowledge chunks, datasets, sources or other,
23that are considered in different parts of matching and
ing) and redundancy elimination (i.e. merging).
merging execution. Despite its simple definition, a
context can be a complex function. It is defined on
any of the architecture levels, preferably on all. Let
CA , CS and CD represent the same context on abstract,
semantic and data level respectively. The joint context
is defined as
CJ
=
CA ∧ CS ∧ CD .
(21)
In the case of missing data (or contexts), only appropriate contexts are considered. Alternatively, contexts
could be defined as fuzzy sets, to address also the
noisiness of data. In that case, a fuzzy AND operation
should be used to derive joint context CJ .
We distinguish between three types of contexts due
to different dimensions characterized (Fig. 4).
user User or scenario specific contexts are used
mainly to limit the data and control the execution. This type coincides with dimensions
identified in [9]. An example of user context
is a simple selection or projection of the data.
data Data related contexts arise from dealing with
relational or semantic data, and various formats of data. Missing or corrupted data can
also be managed through the use of these
contexts.
trust Trust and data uncertainty contexts provide
for an adequate trust management and efficient security assurance between and during
different phases of execution. An example of
trust context is a definition of required level
of trustworthiness of data or sources.
Detailed description of each context is out of scope
of this article. For more details on (user) contexts
see [9].
5.2
Entity resolution
altogether (in a collective fashion), is denoted collective
(relational) entity resolution algorithm.
We employ a state-of-the-art (collective) relational
clustering algorithm proposed in [12]. To further enhance the performance, algorithm is semantically elevated and adapted to allow for proper and efficient
trust management.
The algorithm is actually a greedy agglomerative
clustering approach. Entities (on lower levels) are
represented as a group of clusters C, where each
cluster represents a set of entities that resolve to the
same entity on abstract level. At the beginning, each
(lower level) entity resides in a separate cluster. Then,
at each step, the algorithm merges two clusters in
C that are most likely to represent the same entity
(most similar clusters). When the algorithm unfolds,
C holds a mapping between the entities on each level
(i.e. maps entities on lower levels through the entities
on abstract level).
During the algorithm, similarity of clusters is computed using a joint similarity measure (equation (28)),
combining attribute, relational and semantic similarity.
First is a basic pairwise comparison of attribute values; second introduces relational information into the
computation of similarity (in a collective fashion); and
third represents semantic elevation of the algorithm.
Let ci , cj ∈ C be two clusters of entities. Using knowledge chunk representation, attribute cluster
similarity is defined as
simA (ci , cj ) =
P
(22)
trust(k
.a,
k
.a)sim
(k
.a,
k
.a),
i
j
A i
j
ki,j ∈ci,j ∧a∈ki,j
where ki,j ∈ K are knowledge chunks, a ∈ As is an
attribute and simA (ki .a, kj .a) similarity between two
attribute values. (Attribute) similarity between two
clusters is thus defined as a weighted sum of similarities between each pair of values in each knowledge
chunk. Weights are assigned due to trustworthiness
of values – trust in values ki .a and kj .a is computed
using
First step of matching and merging execution is to
resolve the real world entities on abstract level, described by the data on lower levels. Thus a mapping
between the levels (entities) is constructed and used
in consequent merging execution. Recent literature
proposes several state-of-the-art approaches for entity
trust(ki .a, kj .a) = min{T (ki .a), T (kj .a)}. (23)
resolution (e.g. [5], [1], [12], [13], [6]). A naive approach is a simple pairwise comparison of attribute Hence, when even one of the values is uncertain or
values among different entities. Although, such an mistrustful, similarity is penalized appropriately, to
approach could already be sufficient for flat data, this prevent matching based on (likely) incorrect informais not the case for relational data, as the approach tion.
completely discards relations between the entities.
For computation of similarity between actual atFor instance, when two entities are related to similar tribute values simA (ki .a, kj .a) (equation (22)), differentities, they are more likely to represent the same ent measures have been proposed. Levenshtein disentity. However, only the attributes of the related tance [27] measures edit distance between two strings
entities are compared, thus the approach still discards – number of insertions, deletions and replacements
the information if related entities resolve to the same that traverse one string into the other. Another class
entities – entities are even more likely to represent the of similarity measures are TF-IDF2 -based measures
same entities when their related entities resolve to, (e.g. Cos TF-IDF and Soft TF-IDF [28], [29]). They treat
not only similar, but the same entities. An approach
that uses this information, and thus resolves entities 24 2. Term Frequency-Inverse Document Frequency.
attribute values as a bag of words, thus the order of
words in the attribute has no impact on the similarity.
Other attribute measures are also Jaro [30] and JaroWinkler [31] that count number of matching characters
between the attributes.
Different similarity measures prefer different types
of attributes. TF-IDF-based measures work best with
longer strings (e.g. descriptions), when other prefer
shorter strings (e.g. names). For numerical attributes,
an alternative measure has to be employed (e.g. simple evaluation, followed by a numerical comparison).
Therefore, when computing attribute similarity for a
pair of clusters, different attribute measures are used
with different attributes (equation (22)).
Using data level representation, we define a neighborhood for vertex v ∈ VN as
nbr(v)
=
{vn | vn ∈ VN ∧ {v, vn } ∈ EN }
(24)
and cluster c ∈ C as
nbr(c) = {cn | cn ∈ C ∧ v ∈ c ∧ cn ∩ nbr(v) 6= ∅}. (25)
Neighborhood of a vertex is defined as a set of connected vertices. Similarly, neighborhood of a cluster
is defined as a set of clusters, connected through the
vertices within.
For a (collective) relational similarity measure, we
adapt a Jaccard coefficient [12] measure for trust-aware
(relational) data. Jaccard coefficient is based on Jaccard
index and measures the number of common neighbors of two clusters, considering also the size of the
clusters’ neighborhoods – when the size of neighborhoods is large, the probability of common neighbors
increases. We define
P
T
T
cn ∈nbr(ci )∩nbr(cj ) trust(ein , ejn )
(26)
simR (ci , cj ) =
|nbr(ci ) ∪ nbr(cj )|
where eTin , eTjn is the most trustworthy edge connecting vertices in cn and ci , cj respectively (for the computation of trust(eTin , eTjn ), a knowledge chunk representation of eTin , eTjn is used). (Relational) similarity between two clusters is defined as the size of a common
neighborhood (considering also the trustworthiness
of connecting relations), decreased due to the size of
clusters’ neighborhoods. Entities related to a relatively
large set of entities that resolve to the same entities on
abstract level, are thus considered to be similar.
Alternatively, one could use some other similarity
measure like Adar-Adamic similarity [32], random walk
measures, or measures considering also the ambiguity
of attributes or higher order neighborhoods [12].
For the computation of the last, semantic, similarity,
we propose a random walk like approach. Using a
semantic level representation of clusters ci , cj ∈ C,
we do a number of random assumptions (queries)
over underlying ontologies. Let Nass be the number
of times the consequences (results) of the assumptions
made matched, Ñass number of times the consequences were undefined (for at least one ontology)
and N̂ass the number of all assumptions made. FurT
be the trustworthiness of ontolthermore, let Nass
ogy elements used for reasoning in assumptions that
matched (computed as a sum of products of trusts on
the paths of reasoning, similar as in equation (23)).
Semantic similarity is then defined as
simS (ci , cj )
=
T
Nass
(ci , cj )
N̂ass (ci , cj ) − Ñass (ci , cj )
. (27)
Similarity represents the trust in the number of times
ontologies produced the same consequences, not considering assumptions that were undefined for some
ontology. As the expressiveness of different ontologies vary, and some of them are even inferred from
relational data, many of the assumptions could be
undefined for some ontology. Still, for N̂ass (ci , cj ) −
Ñass (ci , cj ) large enough, equation (27) gives a good
approximation of semantic similarity.
Using attribute, relational and semantic similarity
(equations (22), (26) and (27)) we define a joint similarity for two clusters as
1
(28)
δA + δR + δS
(δA simA (ci , cj ) + δR simR (ci , cj ) + δS simS (ci , cj )),
sim(ci , cj ) =
where δA , δR and δS are weights, set due to the scale
of relational and semantical information within the
data. For instance, setting δR = δS = 0 reduces the
algorithm to a naive pairwise comparison of attribute
values, which should be used when no relational or
semantic information is present.
Finally, we present the collective clustering algorithm employed for entity resolution (algorithm 1).
First, the algorithm initializes clusters C and priority queue of similarities Q, considering the current set
of clusters (lines 1-5). Each cluster represents at most
one entity as it is composed out of a single knowledge
chunk. Algorithm then, at each iteration, retrieves currently the most similar clusters and merges them (i.e.
matching of resolved entities), when their similarity is
greater than threshold θS (lines 7-11). As clusters are
stored in the form of knowledge chunks, matching in
line 11 results in a simple concatenation of chunks.
Next, lines 12-17 update similarities in the priority
queue Q, and lines 18-22 insert (or update) also neighbors’ similarities (required due to relational similarity
measure). When the algorithm terminates, clusters C
represent chunks of data resolved to the same entity
on abstract level. This mapping between the entities
(i.e. their knowledge chunk representations) is used
to merge the data in the next step.
Threshold θS represents minimum similarity for
two clusters that are considered to represent the same
entities. Optimal value should be estimated from the
25data.
Algorithm 1 Collective entity resolution
1: Initialize clusters as C = {{k}| k ∈ K}
2: Initialize priority queue as Q = ∅
3: for ci , cj ∈ C and sim(ci , cj ) ≥ θS do
4:
Q.insert(sim(ci , cj ), ci , cj )
5: end for
6: while Q 6= ∅ do
7:
(sim(ci , cj ), ci , cj ) ← Q.pop() {Most similar.}
8:
if sim(ci , cj ) < θS then
9:
return C
10:
end if
11:
C ← C − {ci , cj } ∪ {ci ∪ cj } {Matching.}
12:
for (sim(cx , ck ), cx , ck ) ∈ Q and x ∈ {i, j} do
13:
Q.remove(sim(cx , ck ), cx , ck )
14:
end for
15:
for ck ∈ C and sim(ci ∪ cj , ck ) ≥ θS do
16:
Q.insert(sim(ci ∪ cj , ck ), ci ∪ cj , ck )
17:
end for
18:
for cn ∈ nbr(ci ∪ cj ) do
19:
for ck ∈ C and sim(cn , ck ) ≥ θS do
20:
Q.insert(sim(cn , ck ), cn , ck ) {Or update.}
21:
end for
22:
end for
23: end while
24: return C
that are compared due to their names, and also due
to different values they hold; and relations between
entities (attributes) represent co-occurrence in the
knowledge chunks. As certain attributes commonly
occur with some other attributes, this would further
improve the resolution.
Another possible improvement is to address also
the attribute values in a similar manner. As different
values can represent the same underlying value, value
resolution, done prior to attribute resolution, can even
further improve the performance.
5.3
Redundancy elimination
After the entities, residing in the data, have been
resolved (section 5.2), the next step is to eliminate
the redundancy and merge the datasets at hand. This
process is somewhat straightforward as all data is
represented in the form of knowledge chunks. Thus
we merely need to merge the knowledge chunks,
resolved to the same entity on abstract level. Redundancy elimination is done entirely on semantic level,
to preserve all the knowledge inside the data.
When knowledge chunks hold disjoint data (i.e.
attributes), they can simply be concatenated together.
However, commonly various chunks would provide
values for the same attribute and, when these values
are inconsistent, they need to be handled appropriately. A naive approach would count only the number
of occurrences of some value, when we consider also
their trustworthiness, to determine the most probable
value for each attribute.
Let c ∈ C be a cluster representing some entity
on abstract level (resolved in the previous step), let
k1 , k2 . . . kn ∈ c be its knowledge chunks and let k c
be the merged knowledge chunk, we wish to obtain.
Furthermore, for some attribute a ∈ A· , let X a be a
random variable measuring the true value of a and let
Xia be the random variables for a in each knowledge
chunk it occurs (i.e. ki .a). Value of attribute a for the
merged knowledge chunk k c is then defined as
^
arg max P (X a = v| Xia = ki .a).
(29)
Three more aspects of the algorithm ought to be
discussed. Firstly, pairwise comparison of all clusters during the execution of the algorithm is computationally expensive, specially in early staged of
the algorithm. Authors in [12] propose an approach
in which they initially find groups of chunks that
could possibly resolve to the same entity. In this
way, the number of comparisons can be significantly
decreased.
Secondly, due to the nature of (collective) relational
similarity measures, they are ineffective when none
of the entities has already been resolved (e.g. in early
stages of the algorithm). As the measure in equation (26) counts the number of common neighbors,
this always evaluates to 0 in early stages (in general).
Thus relative similarity measures should be used
v
i
after the algorithm has already resolved some of the
entities, using only attribute and semantic similarities. Each attribute is thus assigned the most probable
Thirdly, in the algorithm we implicitly assumed that value, given the evidence observed (i.e. values ki .a).
all attributes, (semantic) relations and other, have the By assuming pair-wise independence among Xia (consame names or identifiers in every dataset (or knowl- ditional on X a ) and uniform distribution of X a equaedge chunk). Although, we can probably assume that tion (29) simplifies to
all attributes within datasets, produced by the same
Y
arg max
P (Xia = ki .a|X a = v).
(30)
source, have same and unique names, this cannot be
v
i
generalized.
We propose a simple, yet effective, solution. The Finally, conditional probabilities in equation (30) are
problem at hand could be denoted attribute resolution, approximated with trustworthiness of values,
as we merely wish to map attributes between the
T (ki .a)
for ki .a = v (31a)
datasets. Thus we can use the approach proposed for
a
a
P (Xi |X ) ≈
entity resolution. Entities are in this case attributes 26
1 − T (ki .a)
for ki .a 6= v (31b)
Fig. 5. Entity resolution and redundancy elimination for two relational datasets, representing a group of traffic
accidents (above). One dataset is also annotated with ontology in Fig. 2.
hence
At the end, only the data that was actually provided
by some data source, should be preserved. Thus
Y
Y
1 − T (ki .a). (32) all inferred data (through IN or IO ; section 3.3) is
T (ki .a)
k c .a = arg max
v
ki .a6=v
ki .a=v
discarded, as it is merely an artificial representation
needed for (common) entity resolution and redunOnly knowledge chunks containing attribute a are
dancy elimination. Still, all provided data and semanconsidered.
tical information is preserved and properly merged
We present the proposed redundancy elimination
with the rest. Hence, although redundancy eliminaalgorithm (algorithm 2).
tion is done on semantic level, resulting dataset is
given on both data and semantic level (that compleAlgorithm 2 Redundancy elimination
ment each other).
Last, we discuss the assumptions of independence
1: Initialize knowledge chunks K C
among Xia and uniform distribution of X a . Clearly,
2: for c ∈ C and a ∈ A· do
Q
Q
3:
k c .a = arg maxv k∈c∧k.a=v T (k.a) k∈c∧k.a6=v 1− both assumptions are violated, still the former must be
made in order for the computation of most probable
T (k.a)
value to be feasible. However, the latter can be elim4: end for
inated when distribution of X a can be approximated
5: return K C
from some large-enough dataset.
The algorithm uses knowledge chunk representation of semantic level. First, it initializes merged 5.4 General framework
knowledge chunks k c ∈ K C . Then, for each attribute Proposed entity resolution and redundancy eliminak c .a, it finds the most probable value among all tion algorithms (sections 5.2, 5.3) are integrated into a
given knowledge chunks (line 3). When the algo- general framework for matching and merging (Fig. 6).
rithm unfolds, knowledge chunks K C represent a Framework represents a complete solution, allowing a
merged dataset, with resolved entities and eliminated joint control over various dimensions of matching and
redundancy. Each knowledge chunk k c corresponds merging execution. Each component of the framework
to unique entity on abstract level, and each attribute is briefly presented in the following, and further
27discussed in section 6.
holds the most trustworthy value.
Fig. 6. General framework for matching and merging data from heterogeneous sources.
Initially, data from various sources is preprocessed 6 D ISCUSSION
appropriately. Every network or ontology is trans- The following section discusses key aspects of the
formed into a knowledge chunk representation and, proposition.
when needed, also inferred on an absent architecture
Proposed framework for matching and merging
level (section 3.3). After preprocessing is done, all represents a general and complete solution, applicable
data is represented in the same, easily manageable, in all diverse areas of use. Introduction of contexts
form, allowing for common, semantically elevated, allows a joint control over various dimensions of
subsequent analyses.
matching and merging variability, providing for spePrior to entity resolution, attribute resolution is cific needs of each scenario. Furthermore, data archidone (section 5.2). The process resolves and matches tecture combines simple (relational) data with semanattributes in the heterogeneous datasets, using the tically enriched data, which makes the proposition
same algorithm as for entity resolution. As all data applicable for any data source. Framework can thus
is represented in the form of knowledge chunks, be used as a general solution for merging data from
this actually unifies all the underlying networks and heterogeneous sources, and also merely for matching.
ontologies.
The fundamental difference between matching, inNext, proposed entity resolution and redun- cluding only attribute and entity resolution, and
dancy elimination algorithms are employed (sec- merging, including also redundancy elimination, is,
tions 5.2, 5.3). The process thus first resolves entities in besides the obvious, in the fact that merged data is
the data, and then uses this information to eliminate read-only. Since datasets, obtained after merging, do
the redundancy and to merge the datasets at hand. not necessarily resemble the original datasets, the data
Algorithms explore not only the relations in the data, cannot be altered thus the changes would apply also
but also the semantics behind it, to further improve in the original datasets. Alternative approach is to
the performance.
merely match the given datasets and to merge them
Last, postprocessing is done, in order to discard all only on demand. When altering matched data, user
artificially inferred data and to translate knowledge can change the original datasets (that are in this phase
chunks back to the original network or ontology still represented independently) or change the merged
representation (section 3). Throughout the entire ex- dataset (that was previously demanded for), in which
ecution, components are jointly controlled through case he must also provide an appropriate strategy,
(defined) user, data and trust contexts (section 5.1). how the changes should be applied in the original
Furthermore, contexts also manage the results of the datasets.
algorithms, to account for specific needs of each sceProposed algorithms employ relational data, senario.
mantically enriched with ontologies. With the advent
Every component of the framework is further en- of Semantic Web, ontologies are gaining importance
hanced, to allow for proper trust management, and mainly due to availability of formal ontology lanthus also for efficient security assurance. In particular, guages. These standardization efforts promote several
all the similarity measures for entity resolution are notable uses of ontologies like assisting in commutrust-aware, moreover, trust is even used as a primary nication between people, achieving interoperability
evidence in the redundancy elimination algorithm. (communication) among heterogeneous software sysThe introduction of trust-aware and security-aware tems and improving the design and quality of softalgorithms represents the main novelty of the propo- ware systems. One of the most prominent applica28tions is in the domain of semantic interoperability.
sition.
While pure semantics concerns the study of meanings, semantic elevation means to achieve semantic
interoperability and can be considered as a subset of
information integration (including data access, aggregation, correlation and transformation). Semantic elevation of proposed matching and merging framework
represents one major step towards this end.
Use of trust-aware techniques and algorithms introduces several key properties. Firstly, an adequate
trust management provides means to deal with uncertain or questionable data sources, by modeling
trustworthiness of each provided value appropriately.
Secondly, algorithms jointly optimize not only entity
resolution or redundancy elimination of provided
datasets, but also the trustworthiness of the resulting datasets. The latter can substantially increase the
accuracy. Thirdly, trustworthiness of data can be used
also for security reasons, by seeing trustworthy values
as more secure. Optimizing the trustworthiness of
matching and merging thus also results in an efficient
security assurance.
Next, we discuss the main rationale behind the introduction of contexts. Although, contexts are merely
a way to guide the execution of some algorithm,
their definition is relatively different from that of any
simple parameter. The execution is controlled with
mere definition of the contexts, when in the case
of parameters, it is controlled by assigning different
values. For instance, when default behavior is desired,
the parameters still need to be assigned, when in the
case of contexts, the algorithm is used as it is. For any
general solution, working with heterogeneous clients,
such behavior can significantly reduce the complexity.
As different contexts are used jointly throughout
matching and merging execution, they allow a collective control over various dimensions of variability.
Furthermore, each execution is controlled and also
characterized with the context it defines, which can
be used to compare and analyze different executions
or matching and merging algorithms.
Last, we briefly discuss a possible disadvantage
of the proposed framework. As the framework represents a general solution, applicable in all diverse
domains, the performance of some domain-specific
approach or algorithm can still be superior. However,
such approaches commonly cannot be generalized
and are thus inappropriate for practical (general) use.
7
C ONCLUSION
Article proposes a general framework, and accompanying algorithms, for matching and merging data
from heterogeneous sources. All the proposed algorithms are trust-aware, which enables the use of
appropriate trust management and security assurance
techniques. An adequate data architecture supports
not only (pure) relational data, but also semantically
enriched data, to promote semantically elevated analyses that thoroughly explore the data at hand. Matching and merging is done using state-of-the-art collective entity resolution and redundancy elimination
algorithms that are managed and controlled through
the use of different contexts. Framework thus allows
a joint control over various dimensions of variability
of matching and merging execution.
Further work will include empirical evaluation of
the proposition on some large testbeds. Next, soft
computing and fuzzy logic will be introduced for
contexts manipulation and trust management, to provide for inexactness of contexts and ambiguity of
trust phenomena. Moreover, trust management will
be advanced to a collective approach, resulting also in
a collective redundancy elimination algorithm. Last,
all proposed algorithms will be adapted to hypernetworks (or hypergraphs), to further generalize the
framework.
ACKNOWLEDGMENT
This work has been supported by the Slovene Research Agency ARRS within the research program P20359.
R EFERENCES
[1]
I. Bhattacharya and L. Getoor, “Iterative record linkage for
cleaning and integration,” in Proceedings of the ACM SIGKDD
Workshop on Research Issues in Data Mining and Knowledge
Discovery, 2004, pp. 11–18.
[2] W. W. Cohen, “Data integration using similarity joins and
a word-based information representation language,” ACM
Transactions on Information Systems, vol. 18, no. 3, pp. 288–321,
2000.
[3] M. Hernandez and S. Stolfo, “The merge/purge problem for
large databases,” Proceedings of the ACM SIGMOD International
Conference on Management of Data, pp. 127–138, 1995.
[4] M. Lenzerini, “Data integration: A theoretical perspective,” in
Proceedings of the ACM SIGMOD Symposium on Principles of
Database Systems, 2002, pp. 233–246.
[5] R. Ananthakrishna, S. Chaudhuri, and V. Ganti, “Eliminating
fuzzy duplicates in data warehouses,” in Proceedings of the
International Conference on Very Large Data Bases, 2002, pp. 586–
597.
[6] D. Kalashnikov and S. Mehrotra, “Domain-independent data
cleaning via anlysis of entity-relationship graph,” ACM Transactions on Database Systems, vol. 31, no. 2, pp. 716–767, 2006.
[7] A. Monge and C. Elkan, “The field matching problem: Algorithms and applications,” Proceedings of the International
Conference on Knowledge Discovery and Data Mining, pp. 267–
270, 1996.
[8] S. Castano, A. Ferrara, and S. Montanelli, “Matching ontologies in open networked systems: Techniques and applications,” Journal on Data Semantics, pp. 25–63, 2006.
[9] ——, “Dealing with matching variability of semantic web data
using contexts,” in Proceedings of the International Conference
on Advanced Information Systems Engineering, 2010, to be presented.
[10] J. Euzenat and P. Shvaiko, Ontology matching. Springer-Verlag,
2007.
[11] E. Rahm and P. A. Bernstein, “A survey of approaches to
automatic schema matching,” Journal on Very Large Data Bases,
vol. 10, no. 4, pp. 334–350, 2001.
[12] I. Bhattacharya and L. Getoor, “Collective entity resolution in
relational data,” ACM Transactions on Knowledge Discovery from
29
Data, vol. 1, no. 1, p. 5, 2007.
[13] X. Dong, A. Halevy, and J. Madhavan, “Reference reconciliation in complex information spaces,” in Proceedings of the ACM
SIGMOD International Conference on Management of Data, 2005,
pp. 85–96.
[14] M. Nagy, M. Vargas-Vera, and E. Motta, “Managing conflicting
beliefs with fuzzy trust on the semantic web,” in Proceedings
of the Mexican International Conference on Advances in Artificial
Intelligence, 2008, pp. 827–837.
[15] M. Richardson, R. Agrawal, and P. Domingos, “Trust management for the semantic web,” in Proceedings of the International
Semantic Web Conference, 2003, pp. 351–368.
[16] M. Blaze, J. Feigenbaum, and J. Lacy, “Decentralized trust
management,” in Proceedings of the IEEE Symposium on Security
and Privacy, 1996, pp. 164–173.
[17] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Keinberg, “Automatic resource compilation by
analyzing hyperlink structure and associated text,” Proceedings
of the International World Wide Web Conference, pp. 65–74, 1998.
[18] T. Joachims, “A probabilistic analysis of the rocchio algorithm
with TFIDF for text categorization,” in Proceedings of the International Conference on Machine Learning, 1997, pp. 143–151.
[19] P. Domingos and M. Richardson, “Mining the network value
of customers,” in Proceedings of the International Conference on
Knowledge Discovery and Data Mining, 2001, pp. 57–66.
[20] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM, vol. 46, no. 5, pp. 604–632,
1999.
[21] H. Kautz, B. Selman, and M. Shah, “Referral web: combining
social networks and collaborative filtering,” Communications of
the ACM, vol. 40, no. 3, pp. 63–65, 1997.
[22] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl,
“GroupLens: an open architecture for collaborative filtering
of netnews,” in Proceedings of ACM Conference on Computer
Supported Cooperative Work, 1994, pp. 175–186.
[23] D. Trcek, “A formal apparatus for modeling trust in computing
environments,” Mathematical and Computer Modelling, vol. 49,
no. 1-2, pp. 226–233, 2009.
[24] T. R. Gruber, “A translation approach to portable ontology
specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199–
220, 1993.
[25] S. Castano, A. Ferrara, and S. Montanelli, “The iCoord knowledge model for P2P semantic coordination,” in Proceedings of
the Conference on Italian Chapter of AIS, 2009.
[26] A. Lapouchnian and J. Mylopoulos, “Modeling domain variability in requirements engineering with contexts,” in Proceedings of the International Conference on Conceptual Modeling.
Gramado, Brazil: Springer-Verlag, 2009, pp. 115–130.
[27] V. Levenshtein, “Binary codes capable of correcting deletions,
insertions, and reversals,” Soviet Physics Doklady, vol. 10, no. 8,
pp. 707–710, 1966.
[28] W. W. Cohen, P. Ravikumar, and S. E. Fienberg, “A comparison of string distance metrics for name-matching tasks,” in
Proceedings of the IJCAI Workshop on Information Integration on
the Web, 2003, pp. 73–78.
[29] E. Moreau, F. Yvon, and O. Capp, “Robust similarity measures
for named entities matching,” in Proceedings of the International
Conference on Computational Linguistics, 2008, pp. 593–600.
[30] M. A. Jaro, “Advances in record linking methodolg as applied
to the 1985 census of tampa florida,” Journal of the American
Statistical Society, vol. 84, no. 406, pp. 414–420, 1989.
[31] W. E. Winkler, “String comparator metrics and enhanced
decision rules in the Fellegi-Sunter model of record linkage.”
in Proceedings of the Section on Survey Research Methods, 1990,
pp. 354–359.
[32] L. Adamic and E. Adar, “Friends and neighbors on the web,”
Social Networks, vol. 25, pp. 211–230, 2001.
30
Evaluation models for e-learning platforms
and the AHP approach: a case study
Colace, F.; De Santo, M.
Abstract - Our “information-oriented” society shows an
increasing exigency of life-long learning. In such
framework, the E-Learning approach is becoming an
important tool to allow the flexibility and quality
requested by such a kind of learning process. In the
recent past, a great number of on-line platforms have
been introduced on the market showing different
characteristics and services. With a plethora of ELearning providers and solutions available in the market,
there is a new kind of problem faced by organizations
consisting in the selection of the most suitable ELearning suite. This paper proposes a model for
describing, characterizing and selecting E-Learning
platform. The E-Learning solution selection is a multiple
criteria decision-making problem that needs to be
addressed objectively taking into consideration the
relative weights of the criteria for any organization. We
formulate the quoted multi criteria problem as a
decision hierarchy to be solved using the Analytic
Hierarchy Process (AHP). In this paper we will show the
general evaluation strategy and some obtained results
using our model to evaluate some existing commercial
platforms.
Objects, the availability of several E-Learning
platforms and the diffusion of standards, like
SCORM, to improve interoperability. Evaluation of
E-Learning platforms requires evaluating not only
the implementing software package, but additional
features as well, including, among the others the
supported teaching and delivering schema, the
provided QoS and so on. With respect to this
question, both pedagogical and technological
aspects must be carefully evaluated. In the first
case, it is necessary to develop new training
models clearly defining how to organize new
training paths and the didactic contents
associated with them, as well as how to provide
these contents in relation to the user who benefits
from them. As for the technological aspect, new
tools for distributing knowledge must be created,
tools able to reproduce as efficiently as possible
pedagogical training models. In fact, a series of
features should be taken into account when one
evaluates E-Learning platforms, starting from the
function and usability of the overall learning
system in the context of the human, social and
cultural organization within which it is to be used.
Obviously, the analysis of the features of a system
is not sufficient: it is also important to understand
how they are integrated to facilitate learning and
training and what principles are applied to guide
the way the system is used. To evaluate them
both pedagogical and technological aspects must
be carefully evaluated. So the goal of this paper is
to show a model for selecting the most suitable ELearning solution taking into account its
technological and pedagogical aspects. In
literature there are many approaches to the
evaluation of E-Learning platform. A common
approach is the introduction of some evaluation
grids able to evaluate the various aspects of an ELearning platform. The weak point of this
approach is in the subjectiveness of the
judgements. The starting point of the proposed
model is the formulation of a multi criteria decision
problem to be solved by the Analytic Hierarchy
Process (AHP). The hierarchical structure of the
problem allows the decision maker to compare
various features that characterize E-Learning
platforms. The Analytical Hierarchy Process (AHP)
is a decision-aiding method developed by Saaty
[2][3][4]. It aims at quantifying relative priorities for
a given set of alternatives on a ratio scale, based
on the judgment of the decision-maker, and
stresses the importance of the intuitive judgments
of a decision-maker as well as the consistency of
Keywords – E-Learning, E-Learning Platform, Multiple
Criteria Decision Making Problem
Introduction
The whole world is undergoing a change that
maybe is the most important one in the last thirty
years, and, through the spreading of new
information technologies, is deeply modifying
relations among countries, markets, people and
culture. The technological revolution has clearly
promoted a globalization process (nowadays
Internet represents the global village) and
information exchange. Information can be
considered as an economical value whose
significance is closely associated with the
knowledge that it offers. Updated knowledge is a
fundamental and decisive aspect of professions
related to the New Economy but the new society’s
dynamism does not well adapt itself to past
training models developed in more static or slowly
changeable contexts [1]. The continuous need of
new knowledge and competences has really
shattered this boundary and professional people
have to qualify themselves and to be willing to
acquire new knowledge. So new didactic models
have arisen. In this scenario one of the most
promising approaches is the E-Learning approach.
Several enabling factors played key role in today
developments, including, among the other, the
wide acceptance of the concept of Learning
Manuscript received April, 7th, 2008.
Authors are with the Dipartimento di Ingegneria dell’
Informazione e Ingegneria Elettrica – DIIIE, Università
degli Studi di Salerno, Via Ponte don Melillo, 1 84084
Fisciano (SA), Italy, e-mail contact author:
[email protected]
31
Tools, technological and pedagogical requisites
for a distance learning application will be defined,
in order to outline an evaluation model.
the comparison of alternatives in the decisionmaking process. Since a decision-maker bases
judgments on knowledge and experience, then
makes decisions accordingly, the AHP approach
agrees well with the behaviour of a decisionmaker. The strength of this approach is that it
organizes tangible and intangible factors in a
systematic way, and provides a structured yet
relatively simple solution to the decision-making
problems [5][6]. So the real aim of this paper is to
introduce the application of the AHP in E-Learning
Platform Evaluation. The paper briefly reviews the
concepts and applications of E-Learning platform
and of the multiple criteria decision analysis, the
AHP's implementation steps. Finally we the
obtained results applying the proposed approach
on some existing commercial and Open Source ELearning platforms.
Learning Content Management System
(LCMS)
A Learning Content Management System includes
all the functions enabling creation, description,
importation or exportation of contents as well as
their reuse and sharing. Contents are generally
organized into independent containers, called
learning objects, able to satisfy one or more
didactic goals. An advanced LCMS must be able
to store interactions between the user and each
learning object, aiming at gathering detailed
information about their utilization and efficacy.
When one talks about on-line learning, it is natural
to think of interactive media-based contents.
Actually, this is only a part of the widespread
contents. The contents available before the
spreading of on-line learning were mainly
documents, and most of them have been
proposed as didactic material in HTML format for
on-line courses. In addition, interactive media
have been sometimes introduced, such as audio,
video or training resources created by using other
multimedia tools (for example, Flash). A good
LCMS should accurately choose the contents to
be offered to the student during the lessons as
well as the way in which they must be provided.
The importance of LCMS is related to the growing
distance learning request that is determining a
significant increase in content production. The
current effort is to avoid a useless duplication of
contents by realizing learning objects consonant
to given standards in order to reuse them in
different contexts and platforms. All the contents
must be appropriately stored in special
repositories and be easily accessible and
updatable. In fact, a LCMS must be designed so
as to enable a constant updating of its contents,
allowing
this
process
(if
possible)
to
semiautomatically take place. It is important to
point out that, from our point of view, contents are
not considered as objects external to the platform
but as integral parts of it. This is possible thanks
to the services that constitute the learning content
management system. The trend towards a
growing of training resources, though necessary
to better characterize the training process, does
not allow the teacher an easy consultation and
use of these ones. At the same time, such an
important number of resources can disorientate
students that may run the risk of not choosing,
during the auto-training phase, the contents more
suitable to them. A solution to this problem is
given by a more detailed description for each
content so as to avoid ambiguity or duplication
among them. In particular, some information will
E-Learning Platforms
The Internet offers effective tools for exchanging
information that can be used in different ways for
on-line learning. Chat (textual message exchange)
and e-mail are currently the most widespread
ones, since they have first arisen in the Internet
world. However, new technologies and the use of
wider transmitting bands allow to utilize
audio/video communication tools in real time as
well as to share multimedia contents. At first, online learning platforms had to integrate such
services. NetMeeting application developed by
Microsoft is a useful example to understand how a
distance learning tool was structured. NetMeeting
offers such services as on-line textual chat,
videoconferencing, audio chat, application sharing
and whiteboards. At least until the first half of the
90s, this was the predominant way of organizing
distance education platforms. Once technological
problems
related
to
the
delivery
and
implementation of such services was resolved,
industries have began to improve platforms by
introducing modules and services able to manage
pedagogical aspects (associated with the training
process) [7] as well as content updating and
availability. The most part of contemporary elearning platform can be viewed as organized into
three fundamental macro components: a Learning
Management System (LMS), a Learning Content
Management System (LCMS) and a Set of Tools
for distributing training contents and for providing
interaction [8]. The LMS integrates all the aspects
for managing on-line teaching activities. The
LCMS offers services that allow managing
contents while paying particular attention to their
creation, importation and exportation. The Set of
Tools represents all the services that manage
teaching processes and interactions among users.
In the following, after describing in detail the
characteristics of the LCMS, LMS, and Set of
32
learning platform that aims to be compatible with a
high number of hardware platforms, operating
systems and standard applications. Standardized
descriptions of users can be then used within the
platform to store personal data, training profiles
and the most significant events characterizing
their training path. A LMS must implement a
functionality that adds a significant value to the
distance learning process. This functionality is that
enabling the student to consult, at any time,
results he/she has reached and, consequently, to
monitor his/her preparation level. This possibility
allows the student to understand his/her own gaps
and, possibly, to identify the training contents
more suitable to his formative requirements [13].
As for course management, an LMS can generally
manage self-paced, asynchronous instructor-led
and synchronous instructor-led courses. Selfpaced courses are usually asynchronous, in
hypertextual format, and give much freedom to
the student who accesses a course index. The
LMS system manages these courses starting from
their creation. Asynchronous courses are run by
an instructor, but they do not foresee interactive
moments between students and instructor. Their
design foresees delivery of strongly multimediaoriented contents. Synchronous courses generally
make use of collaborative learning that is of all
the tools that allow creating interactions in real
time between students and instructor. The LMS
must keep track of who is present at the courses.
These functions are useful to students, who can
know how they are using the course, and teachers,
who can control student participation in the
courses, as well as to administrators that evaluate
the use of on-line courses in order to determine
their efficiency and convenience.
support the content so as to better identify the
domain in which resources are included and to
draw LCMS and teacher’s attention to the most
peculiar characteristics of the training content. In
literature, this descriptive process is known as
metadata description [9]. At present, the scientific
community and industries engaged in this field are
trying to define standard metadata rules, so as to
encourage understanding of the real semantic
content of the various training resources. From
this point of view, such organizations as LTSC
supported by IEEE or IMS Global Learning
Consortium [10][11] are trying to create
standardization rules and processes able to
describe training resources as well as the user
and training paths. Therefore, the aim is not only
to facilitate and automate research and training
resource acquisition over the web, but also to find
the contents that better satisfy the student training
needs [12].
Learning Management System (LMS)
The Learning Management System (LMS)
embraces all the services for managing on-line
teaching activities. In particular, it aims to offer
management functionality to training platform
users: system administrators, teachers and
students. From students’ point of view, a LMS
must offer services able to evaluate and report the
acquired skills storing the training path followed by
them. The System administrator should have the
possibility of drawing up statistics on the use of
platform services in order to better organize online learning service delivery. A LMS should give
the teacher the possibility of verifying the right
formulation of the various lessons and suggesting
changes (in case it is semi-automatically inferred
from student tracking) in the learning path.
Therefore, the functionalities of a LMS integrated
within a distance learning platform can be
synthesized as follows:
• Student management
• Course management
• Student skill assessment
• Student activity monitoring and tracking
• Activity reporting
A student management system integrated within a
LMS must manage a database containing
standardized descriptions of student data so as to
better identify the user and his/her characteristics.
This type of description is generally based on the
XML
meta-language
(Extensible
Markup
Language), an element that guarantees data
portability. When we talk about portability, we
refer to the possibility of accessing a resource, in
this
case,
the
students’
descriptions,
independently of the computer type and operating
system. This characteristic is necessary for an e-
Tools for delivering and accessing
contents
On-line training efficiency is directly related to the
tools made available by the delivery platform as
well as to their usage easiness. The services
should satisfy teacher and student needs and it is
therefore necessary that the same kinds of
services are different in accordance with the user.
In particular, teachers should be provided with
tools enabling them to manage teaching
processes for single individuals or groups, as well
as all the interactions, including asynchronous
discussions or live events. In addition, it is
important to provide the teacher with updated
reports on learner or learner groups’ progresses
so as to better manage evaluation processes and
facilitate activities. Besides, it is necessary to give
students the possibility of synchronously and
asynchronously communicating with both the
teacher and other students. We will shortly
analyze some of the most popular services that
33
Some platforms can include, within their own
infrastructures, functionalities for exchanging email messages, but most of them allow the
integration with tools developed just for this
purpose, such as Outlook Express, Netscape
Messenger, Eudora, etc.
characterize on-line training platforms from a
collaborative point of view, and that they tend to
integrate within themselves. The Virtual
Classroom Service is a service designed for
distributing courses in a synchronous mode, and
also for supporting on-line live teaching. This type
of service aims to reproduce the mechanisms
present in a classroom during a traditional training
session and is considered as a kind of container
in which all the services able to recreate a virtual
classroom atmosphere will be included. The use
of a virtual classroom is obviously foreseen during
“live” lessons in order to better manage
synchronous interactions. The synchronous
communication systems are based on audio and
video conferencing technologies. The possibility of
transmitting network videoconferencing has been
implemented through the introduction of
compressing movie techniques that allow
reducing the use of bandwidth during the
transmission
in
comparison
with
the
uncompressed movies, intelligibility being equal.
However, it is true that compressed video stream
representations do not generally guarantee high
definition movie reproductions. The latter can be
anyway obtained by using high capability
transmitting channels (a satellite channel, for
example), whose utilization can be more
expensive. Audio/video conferencing tools allow
the display and dialogue in real-time among the
various members located in remote areas. The
interface generally presents a window in which the
video captured by a video camera is displayed.
Another
service
enabling
synchronous
communication within e-learning platforms is
provided by chat. This service allows participants
to send textual messages to the other students or
the teacher in a public mode (all the participants
see all the things) or a private one (only who is
directly involved receives the communication).
Chat service surely increases collaboration within
the environment in which it is used, but the
teacher or tutor must continuously monitor its
utilization, since it could lead to a lack of attention
and confusion within the virtual classroom. In
addition to a textual chat, the most recent
platforms tend to implement a vocal one by using
VoIP mechanisms. From an historical point of
view, the whiteboard has been one of the first
services made available by an online learning
platform. This service makes it available and
shareable to teachers and learners a virtual space,
usually called whiteboard. Both teachers and
learners can work with it by virtue of control rights.
This tool allows to write and draw on a shared
space and to display PowerPoint presentations
and images. E-mail has been one of the first
asynchronous communication tools used by elearning environments. Thanks to this service,
students can send messages to a specific
addressee only by having his/her e-mail address.
Characterizing distance learning platforms
As previously discussed, an on-line learning
platform can be characterized through an analysis
that takes into account:
the adopted teaching methodologies
the level of the training path personalization
operative modalities and didactic interaction
quality
learning assessment and student tracking
methods
typology and quality of both didactic material
and support system
In order to meet the exigencies of distance
training processes, support technologies should
also have characteristics that make the training
process functional and available. In particular, the
student should be allowed to fully benefit from
auto-learning, auto motivation and auto-evaluation
methods [14], and at the same time tutor and
teachers should be provided with a direct and
constant contact with the learners. So distance
learning platforms must adopt a pedagogical
approach based on constructivism a theory that is
based on results of Piaget's research [15].
Constructivist learning is based on students'
active participation in problem-solving and critical
thinking regarding a learning activity which they
find relevant and engaging. They are
"constructing" their own knowledge by testing
ideas and approaches based on their prior
knowledge and experience, applying these to a
new situation, and integrating the new knowledge
gained with pre-existing intellectual constructs. So
a constructivist e-learning platform is an
environment where learners collaborate and
support each other using a variety of tools and
resources, as well as an environment where
knowledge is constructed and learners assume a
central role in the cognitive process. On-line
learning platforms can implement easily a
constructivist approach [16] because they can
allow easily:
encouragement and acceptance of student
autonomy and initiative
encouragement of students to engage in
dialogue, both with the teacher and within
the group
continuous feedback
In other words, an on-line learning platform must
be able to efficiently and effectively manage the
single components of the process and their
interactions. A distance learning platform that has
these characteristics must carry out four principal
34
services for including and updating user
profile
services for
creating
courses and
cataloguing them
services for creating tests described through
a standard
user tracking services
services for managing reports on course
frequency and use
services for creating, organizing and
managing own training contents or contents
provided by other producers
The aspect related to the offered services is
particularly interesting, because it characterizes
the pedagogical approach. An analysis of the
teaching tools made available by the various
platforms is therefore necessary. These tools, as
previously discussed, can be divided into two
fundamental categories:
asynchronous communication tools
synchronous communication tools
Such tools as e-mail, discussion forum or
newsgroup surely belong to the first category.
Asynchronous services are really important for an
e-learning platform, since they eliminate the space
and time limits that can exist among the
interlocutors. Tools that belong to the second
category are:
textual or vocal chat
whiteboard
live video stream
virtual classroom
application and file sharing.
Real-time communication is used to carry out at
distance activities that are normally performed in
face-to-face meetings. In this way, learners can
interact with teachers creating an atmosphere
more similar to that of a traditional classroom. The
use of these new technologies will lead to a
pedagogical approach based on group’s
interactions, where the teacher has the role of
facilitating and organizing discussions. This
approach debates traditional teaching methods (in
which teachers are dominant and students are
passive) and substitutes them for one based on
active pedagogy. On the basis of the previous
considerations, we have grouped the parameters
of interest into four macro fields:
system requisites
training resources and course management
user management
services offered to users
For each macro field, an evaluation grid has been
designed.
functions: communication, information sharing,
information access and co-operation. These
functionalities characterize both the pedagogical
and technological approach. As for technical
requisites, the best solution to be adopted in
platform design should be based on the utilization
of a multilayered, web-based architecture [17][18].
In particular an e-learning platform must be webbased, in this way the client can access the
environment by simply using a web browser,
without compelling the user to install other
software into his/her computer. This characteristic
should be always taken into account by industries
producing distance training environments. Thanks
to it, students only need a basic knowledge in
computer science enabling them to interact with a
browser, which also avoids difficult installations of
not open source software. Another technical
requisite to be considered is portability, that is, the
possibility for a platform to rightly work
independently of the computer and the operating
system on which it runs. Obviously, the possibility
of not installing open source software into the
client machine increases system portability, since
it guarantees that all clients can use the same
services. A further requisite, as previously
described, is the system compatibility with the
most accredited descriptive standards of training
resources and users, such as AICC [19] and IMS
[10]. Compatibility with these standards is
fundamental, since it allows to import and export
contents and courses realized by different
industries, and gives the platform the possibility of
being equipped with a still little used tool: the
Intelligent Tutoring System (ITS). An ITS is an
application that can semiautomatically reach
decisions after acquiring information by the LMS
and LCMS. In other words, an ITS has the task of
monitoring students’ behaviour and advise them
on the most suitable retrieval programs [20].
Besides, on the basis of the acquired data, it can
advise the teacher on a different lesson
organization and a different technology use. In
fact, a course designer must have the possibility
of making the several training process modules
interactive, of adapting the training paths to the
specific learner needs, and defining new training
paths by using those already existing. Such
operations are surely speeded up by adopting
descriptive standards, even when an ITS is still
not used. Another aspect to be evaluated is
related to the services integrated into the LMS
and LCMS. As for management, services able to
manage enrolments, training paths, and student
tracking are really significant and add a new value.
Platforms including such systems are surely
ahead of others in services, as these tools will
represent in the next future the core of an elearning environment. In general, at present,
indispensable management services are the
following:
The Multiple Criteria Decision Analysis and the
AHP Approach
The selection of an E-Learning platform is not a
trivial or easy process. Project managers are
faced with decision environments and problems in
35
contains the list of alternatives.
3. Construct a set of pair-wise comparison
matrices (size NxN) for each of the lower
levels with one matrix for each element in
the level immediately above by using the
relative scale measurement shown in
Table 1. The pair-wise comparisons are
done in terms of which element
dominates the other.
4. There are n(n-1) judgments required to
develop the set of matrices in step 3.
Reciprocals are automatically assigned in
each pair-wise comparison.
5. Hierarchical synthesis is now used to
weight the eigenvectors by the weights of
the criteria and the sum is taken over all
weighted
eigenvector
entries
corresponding to those in the next lower
level of the hierarchy.
6. Having
made
all
the
pair-wise
comparisons,
the
consistency
is
determined by using the eigenvalue, max,
to calculate the consistency index, CI as
follows: CI = (max -n)/(n-1) where n is the
matrix size. Judgment consistency can be
checked by taking the consistency ratio
(CR) of CI with the appropriate value in
Table 2. The CR is acceptable, if it does
not exceed 0.10. If it is more, the
judgment matrix is inconsistent. To obtain
a consistent matrix, judgments should be
reviewed and improved.
7. Steps 3-6 are performed for all levels in
the hierarchy
projects that are complex. The elements of the
problems are numerous, and the interrelationships among the elements are extremely
complicated. Relationships between elements of a
problem may be highly nonlinear; changes in the
elements may not be related by simple
proportionality. Multiple criteria decision-making
(MCDM) approaches are major parts of decision
theory and analysis. They seek to take explicit
account of more than one criterion in supporting
the decision process [21]. The aim of MCDM
methods is to help decision-makers learn about
the problems they face, to learn about their own
and other parties' personal value systems, to learn
about organizational values and objectives, and
through exploring these in the context of the
problem to guide them in identifying a preferred
course of action. In other words, MCDM is useful
in
circumstances
which
necessitate
the
consideration of different courses of action, which
can not be evaluated by the measurement of a
simple, single dimension [21]. A good solution for
the MCDM problem is in the AHP approach. After
a long period of debate, in fact, on the effective
value of the AHP approach Harker and Vargas
[22] and Perez [23] proved that the AHP approach
is based upon a firm theoretical foundation. The
AHP approach is composed by the following
steps:
1. Define the problem and determine its
goal.
2. Structure the hierarchy from the top (the
objectives from a decision-maker's
viewpoint) through the intermediate levels
(criteria on which subsequent levels
depend) to the lowest level which usually
Numerical rating Verbal judgments of preferences
9
Extremely preferred
8
Very strongly to extremely
7
Very strongly preferred
6
Strongly to very strongly
5
Strongly preferred
4
Moderately to strongly
3
Moderately preferred
2
Equally to moderately
1
Equally preferred
Table 1: Pair-wise comparison scale for AHP preferences
Size of matrix
1 2
3
4
5
6
7
8
9
10
Random Consistency 0 0 0.58 0.09 1.12 1.24 1.32 1.41 1.45 1.49
Table 1 Average random consistency (RI)
36
important the tracking of the progress of the
students. Another characteristic of this user group
is the not very wide internet connection
bandwidth. The second scenario describes a
typical situation: E-Learning platform has to
support the activities of some courses. So in this
scenario management tools are very important.
Also the collaborative tools have to be considered.
The last scenario involves the use of an ELearning platform in the case of professional
training. In this case the target group is not very
skilled on ICT technologies and needs to interact
with very simple and clear graphic user interfaces.
In this case the usability feature has a really
importance. Also the tools for the adaptation of
learning path are important because the target
group could be very heterogeneous. So according
to the AHP approach we have to compare the
various platforms each other for every feature and
scenario. First of all we have to declare the
standing of the features ordered by importance.
For the various scenarios we have the following
standing (Table):
THE AHP APPROACH AND THE SELECTION
OF AN E-LEARNING PLATFORM
E-Learning platforms have to satisfy some rules in
order to be effective and, besides, some platforms
can be really effective only in some well defined
scenario. Obviously this is a Multiple Criteria
Decision Problem. So the first step is to set the
interest scenarios; in this paper we consider the
following cases: An ECDL course, a blended
university course, a professional training course.
In the following paragraphs we will describe in
more details the selected scenarios. So now the
first step is the definition of the AHP hierarchy.
Obviously in this case the first level is the
selection of the best E-Learning platform for the
selected scenario. The second level is composed
by features that have in account pedagogical,
technological and usability aspects. In particular
we have introduced five main features:
Management
Collaborative Approach
Management
and
enjoyment
interactive learning objects
Usability
Adaptation of learning path
of
ECDL
Course
Management
Management
and
enjoyment of
interactive
learning
objects
Obviously every feature involves, in their
determination, some sub-features. In order to test
our approach we selected the following platforms:
Docent [24]
Quasar [25]
Claroline [26]
IWT [27]
Running Platform [28]
Moodle [29]
ATutor [30]
Blended
Course
Management
Management
and
enjoyment of
interactive
learning
objects
Professional
Training
Usability
Adaptation
of learning
path
Usability
Collaborative
Approach
Management
and
enjoyment of
interactive
learning
objects
Adaptation
of learning
path
Usability
Management
Adaptation
Collaborative
of learning
Approach
path
Table 3: Standing of considered features ordered by
importance for the considered scenarios
Collaborative
Approach
ADA [31]
Ilias3 [32]
Docebo [33]
After this phase in order to have a value for every
feature we considered some evaluation grids
introduced in [8] in order to evaluate the following
indexes:
Management Index
Management Index = IM = Obtained Value for the
supported tools / Max Value
This index aims to evaluate how many services
for the management of students and of their
progress are in the various platforms. In the table
we show the obtained results. In this table the
column Weight indicates the relative importance
of the feature.
Now we can describe in details the proposed
approach for the various scenarios. We have to
outline that the various scenarios are obtained
from the analysis of real cases. In particular we
have considered scenarios that are in our
University. The first involves the selection of an ELearning platform for the endowment of ECDL
courses. In this case the platform has to support
classes composed by thirty students. These
students are not really familiar with computers’
world. So the usability feature has to be highly
and carefully evaluated. In this scenario it is very
37
Weight
Docent
Quasar
Claroline
IWT
Running Platform
Moodle
Atutor
ADA
Ilias 3
Docebo
3
3
3
3
3
3
3
3
3
3
0
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
0
2
2
2
2
2
2
Contents Insertion
1
1
1
1
1
1
1
1
1
1
1
Contents Sharing
2
2
2
2
2
2
2
2
2
2
2
Standard Contents Import
1
1
1
1
1
0
1
1
0
1
1
Contents Import
2
2
2
2
2
2
2
2
2
2
2
New Course Creation
1
1
1
1
1
1
1
1
1
1
1
Course Indexing
1
1
1
1
1
1
1
1
1
1
1
Report
2
2
2
2
2
0
2
2
2
2
2
Assessment
Management
2
2
2
2
2
2
2
2
2
2
2
Course List
1
1
1
1
1
1
1
1
1
1
1
Assessment Report
Analyzer
2
2
2
2
2
2
2
2
2
2
2
On-Line User
Registration
1
1
1
1
1
1
1
1
1
1
1
Multi-User Management
1
1
1
1
1
1
1
1
1
1
1
Total
24
24
24
24
22
21
24
24
23
24
21
1
1
1
0.92
0.87
1
1
0.96
1
0.87
Progress Tracking
Multi Course
Management
Student’s Group
Management
IM Index
Table 4: Obtained results for the Management Index
Collaborative Index
IC = Obtained Value for the supported tools / Max
Value
This index aims to evaluate how many
“collaborative” services are in the various
platforms. With the term “collaborative” services
we intend these platform services allowing the
interaction among students and/or teachers. In the
table 5 we show the obtained results. In this table
the column Weight indicates the relative
importance of the feature.
38
Weight
Docent
Quasar
Claroline
IWT
Running
Platform
Moodle
Atutor
ADA
Ilias 3
Docebo
E-Mail
1
1
1
1
1
1
1
1
1
1
1
Forum
2
2
2
2
2
2
2
2
2
2
2
Chat
2
2
2
2
2
2
2
2
2
2
2
Whiteboard
2
2
0
0
2
1
0
2
0
0
0
A/V
Streaming
2
2
0
0
2
0
0
0
2
0
0
2
2
2
2
2
2
2
2
2
2
2
2
2
0
0
2
0
0
0
0
0
0
Virtual
Classroom
3
3
3
0
3
0
0
0
0
0
0
Total
16
16
10
7
16
8
7
9
9
7
7
1
0.62
0.44
1
0.50
0.44
0.56
0.56
0.44
0.44
Contents
Download
Application
Sharing
IC Index
Table 5: Obtained results for the Collaborative Index
Management
and
enjoyment
interactive learning objects index
of
MIO = Obtained Value for the supported tools /
Max Value
This index aims to evaluate how many services
for the management and enjoyment of interactive
learning objects are in the various platforms. In
the table 6 we show the obtained results. In this
table the column Weight indicates the relative
importance of the feature.
39
Weight
Docent
Quasar
Claroline
IWT
Running
Platform
Moodle
Atutor
ADA
Ilias 3
Docebo
2
2
0
0
2
1
0
2
0
0
0
3
3
0
0
3
0
0
0
3
0
0
3
3
0
0
3
0
0
0
0
0
0
Virtual
Classroom
3
3
3
0
3
0
0
0
0
0
0
Total
11
11
3
0
8
1
0
2
3
0
0
1
0.27
0.00
0.73
0.10
0.00
0.18
0.73
0.00
0.00
Whiteboard
A/V
Streaming
Application
Sharing
MIO Index
Table 6: Obtained results for the Management and enjoyment of interactive learning objects index
Adaptation
of
user’s
learning path index
Usability
For the usability feature we used a questionnaire
introduced by Nielsen [34]. The aim is to evaluate
the use easiness of the platforms and of their
interfaces. The obtained results are depicted in
the table 7:
LPA = Obtained Value for the supported tools /
Max Value
This index aims to evaluate how many services
for the adaptation of user’s formative learning path
are in the various platforms. These services have
to allow the creation of personalized learning
paths and the continue assessment of students. In
the table 8 we show the obtained results. In this
table the column Weight indicates the relative
importance of the feature.
Usability Index
Docent
0.65
Quasar
0.70
Claroline
0.85
IWT
0.65
Running
0.75
Moodle
0.80
Atutor
1.00
ADA
0.85
Ilias3
0.70
Docebo
0.85
formative
Table 7: Obtained results for the Usability Index
40
Weight
Docent
Quasar
Claroline
IWT
Running Platform
Moodle
Atutor
ADA
Ilias 3
Docebo
Progress Tracking
3
3
3
3
3
3
3
3
3
3
0
Student’s Group
Management
2
2
2
2
0
2
2
2
2
2
2
Report
3
3
3
3
3
0
3
3
3
3
3
Assessment Management
2
2
2
2
2
2
2
2
2
2
2
Multi-User Management
1
1
1
1
1
1
1
1
1
1
1
Total
11
11
11
11
9
8
11
11
11
11
8
1.00
1.00
1.00
0.82
0.73
1.00
1.00
1.00
1.00
0.73
LPA Index
Table 8: Obtained results for the Adaptation of users formative learning path index
At the end of this phase we can compare the
“relative” obtained results of platforms in every
feature in order to have a standing. According to
the AHP approach we defined the “absolute”
weight of every feature keeping in mind the
constraints of the selected scenario. According to
the AHP strategy we can compose the results in
the following way:
Platform Final Score =
The obtained results for each scenario are the
depicted in the next figures:
Weight * PlatformValue
i 1,..,5
i
i
0,1800
0,1647
0,1600
0,1400
0,1322
0,1274
0,1200
0,0996
0,1000
0,1000
0,0942
0,0900
0,0876
0,0800
0,0600
0,0550
0,0493
0,0400
0,0200
0,0000
Docent
Quasar
Claroline
IWT
Running
Figure 1: Obtained Results for the ECDL scenario
41
Moodle
ATutor
ADA
ILIAS2
Docebo
0,2
0,1887
0,18
0,1584
0,16
0,14
0,12
0,1122
0,0979
0,1
0,0909
0,0874
0,0846
0,0833
0,08
0,06
0,0489
0,0477
0,04
0,02
0
Docent
Quasar
Claroline
IWT
Running
Moodle
ATutor
ADA
ILIAS2
Docebo
Figure 2: Obtained Results for the blended course scenario
0,2
0,18
0,1735
0,16
0,14
0,1237
0,12
0,1064
0,1098
0,1022
0,1
0,0915
0,0823
0,08
0,0743
0,0762
ILIAS2
Docebo
0,06
0,06
0,04
0,02
0
Docent
Quasar
Claroline
IWT
Running
Moodle
ATutor
ADA
Figure 3: Obtained Results for the professional training
a general and objective model for the evaluation
of E-Learning platforms. This task is not trivial
because a good evaluation model has to take in
account not only the platform and its services but
also the scenario where it has to work. So in this
paper we have introduced an evaluation model
based on the use of AHP approach. The AHP
approach, in fact, is useful in circumstances which
necessitate the consideration of different courses
of action, which can not be evaluated by the
measurement of a simple, single dimension. In
this way we can evaluate an E-Learning platform
considering both its application in the interest
scenario, both its comparison with other
considered platforms. We tested our approach on
four E-Learning platforms and in three scenarios.
The obtained results are encouraging and
effective. The proposed method, in fact, does not
only evaluate the platform but also its
effectiveness in the considered scenario. In this
paper, for example, we showed as in some
scenario the performances of a commercial
platform as Docent are similar to the ones of
“academic” frameworks. We aim to extend the
proposed approach to new scenarios and
platforms.
The AHP approach allows not only to evaluate the
platforms but to test them application in a well
defined scenario. In fact Docent platform has very
good results in the first two scenarios while in the
third it is not still true. In fact in the third case all
the management or collaborative tools are not
very important. The obtained results confirm that
the difference between commercial platforms and
open source in general is still very high, but our
method shows as in some scenarios this is not
true. In this case it can suggest the use of a
cheaper platform.
Conclusion
In order to accurately evaluate the potentialities of
an online learning platform, it is important to pay
attention to its three main components: Learning
Management
System,
Learning
Content
Management System and Virtual environment for
teaching and services associated with it. An
efficient system must be able to integrate into
oneself all these components so that they can
efficaciously interact with each other. Besides, it is
necessary that such platforms make reporting
data services available, so as to allow accurate
analyses on activities carried out by users. One of
the most interesting problem is the introduction of
42
[21] Belton V., “Multiple criteria decision analysis practically the
only way to choose”, Hendry LC, Eglese RW, editors.
Operational research tutorial papers, 1990
References
[1] Ubell R., “Engineers turn to E-Learning”, IEEE Spectrum,
Volume: 37, 2000
[22] Harker PT, Vargas LG, “The theory of ratio scale
estimation: Saaty's analytic hierarchy process”, Management
Science, 1987;33(1):1383
[2] Saaty TL., “Decision making for leaders”, Belmont,
California: Life Time Learning Publications, 1985.
[23] Perez J., “Some comments on Saaty's
Management Science, 1995;41(6):1091-1095
[3] Saaty TL., “How to make a decision: the analytic hierarchy
process”, European Journal of Operational Research, NorthHolland 1990
[24] Docent: http://www.docent.com/
[4] McCaffrey, J., “The Analytic Hierarchy Process”, MSDN
Magazine, June 2005 (Vol. 20, No. 6)
[25] Quasar: http://www.quasaronline.it/elearning/
[26] Claroline: http://www.claroline.net/
[5] Drake, P.R. (1998). "Using the Analytic Hierarchy Process
in Engineering Education", International Journal of Engineering
Education, 1998
[27] IWT: http://www.didatticaadistanza.com/
[28] Running Platform: http://rp.csedu.unisa.it/portal
[6] Skibniewski MJ, Chao L., “Evaluation of advanced
construction technology with AHP method”, Journal of
Construction Engineering and Management, ASCE 1992
[29] Moodle: http://moodle.org/
[30] ATutor: http://www.atutor.ca/
[7] Jonassen D.H., “Thinking Technology, toward a
Costructivistic Design Model”, Educational technology XXXIV,
1994
[31] ADA: http://ada.lynxlab.com/
[32] Ilias: http://www.ilias.de/
[8] Colace F., De Santo M., Vento M., “Evaluating On-line
Learning Platforms: a Case Study”, Proceedings of the 36th
Hawaii International Conference on System Sciences
(HICSS’03), 2003
[33] Docebo: http://www.docebo.org/doceboCms/
[34] Nielsen, J., “Usability Engineering”, Academic Press, San
Diego, 1993
[9] Berners-Lee, T., “Metadata Architecture”, Unpublished
white
paper,
January
1997,
http://www.w3.org/pub/WWW/DesignIssues/Metadata.html
[10]
The
IMS
http://www.imsproject.org/
Enterprise
[11] IEEE Learning Technology
http://www.ltsc.ieee.org/
Specification,
Standard
Committee,
[12] Schakelman, J.L., “The changing role of on-line pedagogy:
how many istructional management systems, metadata, and
problem-based learning combine to facilitate learner-centered
instruction”, SIGUCCS 2001
[13] Carchiolo V., Longheu A., Malgeri M., “Learning through
Ad-hoc Formative Paths”, ICALT 2001
[14] H.S. Barrows, A.C. Kelson, “Problem based learning in
secondary education and the problem based learning
institute”, Springfield, 1993
[15] Piaget, J., “To understand is to invent”, New York,
Grossman, 1973
[16] Jonassen D.H., Peck K. L., Wilson B. G., Pfeiffer W. S.,
“Learning with Technology: A Constructivist Perspective”,
Prentice Hall, 1998
[17] Drira, K., Villemur, T., Baudin, V., Diaz, M., “A MultiParadigm Layered Architecture for Synchronous Distance
Learning”, Proceedings of the 26th Euromicro Conference,
2000.
[18] Anido, L., Llamas, M., Fernández, M.J., Caeiro, M.,
Santos, J., Rodríguez, J., “A Component Model for
Stardardized Web-based Education”, WWW10, 2001, Hong
Kong
[19]
AICC
CMI
http://www.aicc.org/
Guidelines
for
AHP”,
interoperability,
[20] Zhou, Y., Evens, M. W., “A Practical Student Model in an
Intelligent Tutoring System”, ICTAI, 1999
43
Academic Ranking of World Universities
2009/2010
Mester, G.
Abstract— This paper proposes an analysis of
the Academic Ranking of World Universities,
published every year and gives on overview of the
present situation in Higher Education.
The
publication of the ―Institute of Higher Education,
Shanghai Jiao Tong University‖ – Academic
Ranking of World Universities 2009, the Ranking
Web of world universities 2010 (Spain) and the QS
World University Rankings 2009 are analyzed. The
paper gives an analysis of the scientific journal
publications from professors/researchers from the
USA and Europe.
Index Terms— Academic Ranking, Higher
Education, QS World University Rankings 2009,
Ranking Web of World universities 2010, World
Universities.
Table 1. Shanghai World rank list of the top 20
universities in 2009 (05. Nov. 2009)
The Academic Ranking of World Universities –
ARWU is first published in June 2003 by the
Center for World-Class Universities and the
Institute of Higher Education of Shanghai Jiao
Tong University. ARWU uses the following
indicators to rank world universities, including:
- the number of alumni and staff winning Nobel
Prizes and Fields Medals,
- number of highly cited researchers selected
by Thomson Scientific, number of articles
published in journals of Nature and Science,
- number of articles indexed in Science Citation
Index - Expanded and Social Sciences
Citation Index, and per capita performance
with respect to the size of an institution [3].
1. OVERVIEW OF THE PRESENT SITUATION IN HIGHER
EDUCATION
Europe has 4752 higher education institutions,
with over 17 million students and 1.5 million
staff.
All across Europe, countries and
universities are in a process of modernization.
From an EU perspective, these reforms are part
of the Lisbon Strategy [1], [2].
According to the publication of the well-known
institution "Institute of Higher Education
Shanghai Jiao Tong University” [3] the rank list
of the top 20 Universities in the world space of
higher education in 2009 appears in the following
order:
According to the latest edition of the Web
Ranking of World Universities [4], [5] published
by the Spanish National Research Council's
Cybermetrics Lab, the rank list of the top 15
universities in the world space of higher
education (from top 8000 universities) in January
2010 looks as follows:
Manuscript received April 29, 2010. Part of this paper is published
in the VIPSI-2010 Conference, Amalfi, Italy, March 4-7, 2010.
Gyula Mester is with the Department of Informatics, University of
Szeged, Hungary (e-mail:
[email protected]).
44
Table 4. Distribution by country
Distribution by continent is illustrated in Table 5.
Table 2. Web Ranking of World Universities of the top
15 universities in January 2010
Comparison of the main world universities'
rankings is illustrated in Table 3.
Table 5. Distribution by continent
The next table summarizes the actual
coverage of the Ranking, in terms of number of
countries and higher education institutions
around the world.
Table 3. Comparison of the main World
Universities' Rankings
Distribution by country is illustrated in Table 4.
Table 6. Coverage of the Webometrics Ranking of
World Universities
Rank of Universities of Serbia, Slovenia and
Macedonia 2010 are illustrated in Tables 7, 8
and Tables 9.
45
Table 10. QS World rank list of the top 20 universities
in 2009
2. PERFORMANCE RANKING OF SCIENTIFIC PAPERS
FOR WORLD UNIVERSITIES
Table 7. Rank of Universities of Serbia 2010
According to the latest edition of the:
2009 Performance Ranking of Scientific Papers
for World Universities published by:
Higher Education & Accreditation Council of
Taiwan
the rank list of the top 10 universities in the world
space of higher education in October 2009 was
the following:
Table 8. Rank of Universities of Macedonia 2010
Table 9. Rank of Universities of Slovenia 2010
For each country only the Higher Education
Institutions ranked till the 8,000th position are
included [4]. Complete coverage is over 17,000
organizations
Table 11. Performance Ranking of Scientific Papers
for World Universities in 2009
This annual ranking project began in 2007 and
evaluates and ranks the scientific paper
performance for the top 500 universities
worldwide. Three criteria represented by eight
indicators were used to assess a university’s
overall scientific paper performance:
research productivity (20%),
reseach impact (30%) and
research excellence (50%).
The QS World University Rankings™ have
become popular since they were launched in
2004. The QS rank list of the top 20 universities
in the world space of higher education looks as
follows [6]:
3. CONCLUSION
On the basis of the performed analysis I think
that we do not have enough time to achieve the
goals of the Lisbon Declaration in the European
Higher Education Area. I propose the adoption
of a new (Lisbon) strategy in the European
Higher Education Area in the year 2010.
ACKNOWLEDGMENT
I would like to acknowledge the great
hepfulness of Prof, Dr. Veljko Milutinovic for his
encouragement of my overall research agenda in
46
the field of higher education and academic
ranking of world universities.
REFERENCES
[1]
Gyula Mester, Academic Ranking of World Universities
2009/2010, Proceedings of the VIPSI Conference, pp. 136, Amalfi, Italy, 2010.
[2] Gyula Mester, The Lisbon Strategy 2000 in Higher
Education of Europe, Proceeding of the International
Conference on Advances in the Internet, Processing,
Systems, and Interdisciplinary research, VIPSI 2009, pp.
1-5, ISBN: 86-7466-117-3, Belgrade, Serbia, 2009.
[3] http://www.arwu.org/ /index.jsp
[4] http://www.webometrics.info
[5] Aguillo, I.F.; Ortega, J. L. & Fernández, M. (2008).
Webometric Ranking of World Universities: Introduction,
Methodology, and Future Developments. Higher Education
in Europe, 33(2/3): 234-244.
[6] http://www.topuniversities.com
[7] http://ranking.heeact.edu.tw/en-us/2009/Page/
Methodology
[8] http://www.google.com
[9] Gyula Mester, Dusan Bobera: „O potrebi bržeg uključivanja
visokog obrazovanja Srbije u Evropski prostor visokog
obrazovanja i naučnog rada“, Proceedings of the TREND
2007, pp. 169-172, Kopaonik, Serbia, 2007.
[10] Gyula Mester, Predlog poboljšanja statusa visokih
strukovnih škola Srbije u Bolonjskom sistemu studija,
Proceedings of the Conference TREND 2010, pp. 58-62,
Kopaonik, Serbia.2010.
Biography
Dr. Gyula Mester received his D. Sc.
degree in Engineering from the University
of Novi Sad in 1977. Currently, he is a
Professor at the University of Szeged,
Department of Informatics, Hungary. He is
the author of 168 research papers. His
professional activities include R/D in
different fields of robotics engineering:
Intelligent Mobile Robots, Humanoid
Robotics, Sensor-Based Remote Control. He is an invited
reviewer of more scientific journals and the author of several
books. He is the coordinator of the Robotics Laboratory from the
University of Szeged, in European Robotics Research Network.
His CV has been published in the Marquis “Who’s Who in the
World 1997”.
47
Visual and Aural:
Visualization of Harmony in Music with Colour
Bojan Klemenc, Peter Ciuha, Lovro Šubelj and Marko Bajec
Faculty of Computer and Information Science, University of Ljubljana
ABSTRACT—Music is strongly intertwined
with everyday life, however its inner structure
may not be comprehensible to everyone. Using other senses like vision can help us to understand the music better and produce a synergy between two senses. For this purpose we
designed a prototype visualization that shows
the structure of music and represents harmony
with colour by connecting similar aspects in music and visual perception. We improve current
visualization methods by calculating a common
colour for a group of concurrent tones based on
harmonic relationships between tones. Moreover we extend the colour calculation to broader
temporal segments to enable visualization of harmonic structure of a piece. The basis for mapping of tones to colour is the key-spanning circle
of thirds combined with the colour wheel. The
resulting visualization is rendered in real time
and can be interactively explored.
Index terms— music visualization, colour, concurrent tones, MIDI
1. INTRODUCTION
isualizing data is a challenge. Visualization helps
V
us to grasp, what would otherwise be difficult to
comprehend and may enable us to see patterns that
were unnoticed without visualization. It should not include redundant elements and it should be intuitive. We
have to search for appropriate mapping of source data
into visual dimensions. In this paper we focus on a specific domain of visualizing music. In the case of music
we are dealing with a stream of sound data. The basic
data unit we use is a musical tone, so the input to the
visualization is a stream of tones. The stream does not
necessary represent music – it can be a stream of arbitrary tones, as only a small subset of possible streams
is usually referred to as music. However the visualization has to account for these as well and visualise them
appropriately.
As the aim is to make visualization meaningful and
useful in practice, we have to explore possibilities of different mappings. We try to find interconnecting aspects
of sound and visual perception. In accordance with this
idea, we developed a prototype visualization that connects similar aspects of music and visual perception.
The input to the visualization tool is in MIDI format.
The basis for the visualization is a modified piano roll
notation, which uses spatial dimensions for visualising
time, pitch and instruments. Harmony, which is one the
most important aspects in tonal music, is represented
with colour. In comparison to existing related visualizations that use colour to denote pitch classes or a
predefined set of chords, our visualization takes into account that concurrent sounding tones are not perceived
as separated, but also as a whole. For this purpose
the musical piece is segmented into time slices and each
segment is assigned a colour based on a method using
vector addition inside a key spanning circle of thirds
assigned to colour wheel [6, 2]. As human perception
of harmony is not limited to a moment in time we expanded the method to encompass a broader time range
and used it to visualise harmonic structure of broader
temporal segments.
The resulting visualization offers a view of the composition as a whole. Additionally it can be observed
in real-time together with listening to the source data,
which enables the user to make a more direct connection
between the source and the visualization thus enabling
faster comprehension.
The rest of the paper is organised as follows. In section 2 we review relevant related work, in section 3 we
give a detailed explanation of our visualization, details
about implementation are given in section 4. The resulting visualization is reviewed and discussed in section 5 and concluding remarks are in section 6.
2. RELATED WORK
There are many possibilities for mapping tonal data
or whole musical structures into visual elements. Some
of them are only aesthetically pleasing, such as transfor-
48
mation of a physical property of sound like amplitude
into visual effects. However the real value is in visualizations of music that offer additional information that
may otherwise stay unnoticed or be difficult to understand by a musically untrained listener.
A well known visualization is musical notation, however it takes years of training for someone to look at a
score and know what it sounds like. An intuitive visualization is comprised of a time axis and an axis with
some other value of interest. In case of using time on
x-axis and pitch on y-axis we get a piano roll notation,
which is used as basis for some visualizations. Colour
usage also varies throughout different visualizations.
Smith and Williams [13] discussed a MIDI based visualization of music in 3-dimensional space, using colour
to denote timbre1 . Music Animation Machine [7] encompasses a number of visualizations including piano
roll and Tonnetz. It also uses colours to mark pitch
classes. The assignment of colour to pitch class is based
on assigning the colour wheel to the circle of fifths. Similar assignment was proposed by Scriabin (beginning of
the 20th century). The basic idea of this assignment is
that closely related keys or tones are mapped into related colours. Prior to Scriabin a commonly used mapping was colour to pitch, used already by Newton. However is not well suited to represent harmony because adjacent tones are weakly harmonically related. An outline of historical development of mappings of colour to
pitch classes is given by Wells [14].
The comp-i system [9] expands the piano roll notation into three dimensions to allow the user to visually
explore the source MIDI dataset and offers a view of
the structure of the music as a whole, additionally allow to explore the hierarchy of music using ConeTree
visualization [11]. Mardirossian and Chew [8] visualise
tonal distribution of a piece by using Lerdahl’s twodimensional pitch space – they divide the piece into
uniform slices and use a key-finding algorithm to determine the most likely key for the each slice. Keys are
coloured by aligning the colour wheel and the circle of
fifths. Bergstrom’s isochord [1] visualization highlights
consonant intervals between tones and chords at a given
time. It is based on Tonnetz grid and offers a view of
changing of the harmony over time. Sapp [12] visualizes
hierarchy of key regions of a given composition, where
horizontal axis represents time and vertical axis represents the duration of the key-finding algorithm’s sliding
window. Colour hues are assigned to keys by taking
a part of the circle of fifths and mapping it into the
colour wheel. A summary of visualizations is given by
Isaacson [5].
1
F
C
a
e
d
B♭
G
h
D
f♯
g
E♭
A
c
G♯
c♯
g♯
f
C♯
b♭
e♭
F♯
E
B
Figure 1: The key-spanning circle of thirds assigned to the colour wheel.
3. VISUALIZING HARMONY
WITH COLOUR
3.1
Assignment of colours to musical tones
The term of harmony encompasses consonance (especially of concurrent sounding tones), but more broadly
it also involves the study of tonal progressions. The
perception of consonance and dissonance of concurrent
tones is related to the ratios of the tone frequencies [3].
In order of rising dissonance the most consonant interval between two tones is unison with a tone ratio of
1:1, followed by octave (ratio of 1:2), perfect fifth (2:3),
major third (3:4), minor third (4:5) etc. Tones with
simple (small integer) frequency ratios are perceived as
similar – unison is made up of two same tones, similarity of octaves is also called octave equivalence and in
consequence two tones that lie an octave apart belong
to same pitch class. Following a series of perfect fifths
from a chosen tone (belonging to a certain pitch class),
after 12 steps we arrive roughly to the same pitch class.
In this way we can generate all 12 pitch classes of the
chromatic scale. These pitch classes can be organised
in a circle of fifths where to adjacent tones are a perfect fifth (or perfect forth in opposite direction) apart.
Similar tones are close together and dissimilar tones are
on opposite sites. In addition of representing tones, the
circle of fifths can also represent tonalities.
Because we want to map similar tones to similar colours,
the colour wheel is assigned to the circle of fifths. In
the colour wheel the colours that are perceived similar are close together while complementary colours are
on opposite sides. With such mapping the difference
Timbre is also called tone colour.
49
(a) Without broader temporal segments.
(a) Without broader temporal segments.
(b) Visualised with broader temporal segments. Dissonance of the sequence is visible trough grey layers surrounding the chords.
(b) With broader temporal segments.
Figure 3: Visualization of a C major triad played
as a broken chord on left and as a block chord
on the right.
Figure 2: Visualization of C major and F♯ major
triads played successively.
3.3
or similarity between two colours is much more important than the psychological meaning of the colours so
in consequence the initial orientation and alignment of
the colour wheel and the circle of fifths can be chosen
arbitrary. Our initial assignment is shown in Figure 1.
3.2
Calculating common colour for concurrent
tones
Concurrent tones are not perceived as entirely separate, but also as a whole [10]. To model this perception
we can calculate a common colour for a group of tones.
To reflect the difference between dissonant tone combinations, which are perceived as unpleasant and unstable on one side, and consonant, which are perceived as
pleasant, dissonant combinations are represented by unsaturated colours and consonant by saturated colours.
Combinations in between are also possible. Colour hue
should represent similarity of the tone combinations.
The tones of the 12-tone chromatic scale are represented
by a vector originating in the centre of the circle and
pointing towards the appropriate pitch class. To calculate a common colour for a combination of tones, the
vectors are added together. The direction of the resultant vector represents the hue and the length represents
the saturation. This method does not produce satisfactory for every combination because although the circle
of fifths shows the similarity of unison, octave, perfect
fifth and perfect forth, it does not show similarity of
major and minor thirds. To account for this we use a
revised method [6] for calculating colour of concurrent
tones that uses key-spanning circle of thirds [4] instead
of the circle of fifths. The key-spanning circle of thirds
is made up of two circles of fifths slightly rotated in correspondence to each other, so that the clockwise neighbour of a tone in the circle signed with capital letters is
its major third (Figure 1).
Common colour of broader temporal segments
The method for calculating colours works on concurrent tones – the piece has to be segmented in small time
slices, with each analysed separately. But the concept
of harmony more broadly encompasses more than just
the consonance of concurrent sounding tones, it includes
tonal progressions. If we have a series of random major
chords, each chord’s colour would be fully saturated,
but the sequence itself may be dissonant (Figure 2(a)
shows C major and F♯ major triads being played in succession – each triad is consonant, but the sequence is
dissonant). Broken chords are coloured tone by tone,
although they are a spread out variant of a block chord
(Figure 3(a) shows C major triad being played first as
broken chord and as a block chord thereafter; the yellow coloured E tone in the broken chord visualization
is noticeable). To address these problems neighbouring
segments are joined to form broader segments and the
colour is calculated for each joined segment using the
method for calculating the colour of concurrent tones.
The size of the joining window can be adjusted.
3.4
Integrating colour with spatial dimensions
The basis for visualization is the piano roll notation.
In the piano roll notation the x-axis represents time
and the y-axis represents pitch. As a particular pitch
may be played by instruments with different timbre at
the same time, we extended the visualization with zaxis representing instruments. Each tone is drawn as
a cylinder of fixed thickness with varying opacity depending on the loudness of the tone in given moment –
silent tones are almost transparent, while loud tones are
opaque. Decaying tones get gradually more transparent. The colour of tones varies and depends on colours
of the segments. As very small segments are impractical
50
for real-time visualization, they are extended to reduce
calculations and render time. The boundary between
two extended segments is one of following events: start
of a new tone, end of a tone, explicit change of loudness.
Colour is calculated at the beginning and at the end of
the segment, the colour values for the inside of the segment are linearly interpolated between the beginning
and the end colours. This greatly reduces calculation
time as in most cases change between two minimal segments is just gradual decay of tones.
Harmonic structure of broader temporal segments is
visualised by drawing semi-transparent layers around
the tones (Figure 2(b) and 3(b)). The colour of the layer
is determined by joining the segments with appropriate
size of the joining window and calculating the colour for
the joined broad segment. The factor of transparency of
the layers is dependent on the number of joining window
sizes to be displayed at a time (transparency of layers
increases with their number). For performance reasons
the number of layers and maximum joining window size
is limited.
Figure 4: The main visualization window displaying the extended piano roll visualization of
an excerpt from Smetana’s Vltava. The harmonic relationships between concurrent tones
and broader temporal segments are shown with
colour.
4. IMPLEMENTATION
The rendering of the visualization is made in OpenGL
as the volume of data needed to be rendered in real-time
can become large for some pieces. The visualization
takes MIDI data as input, which is sufficient as colour
calculation method takes the 12 tones of the chromatic
scale as input. Tones that lie outside of the 12-tone
chromatic scale are displayed with proper height in the
3-dimensional space, however for the purpose of calculating colour they are rounded to the nearest tone in the
scale. Another reason for using MIDI is that it eliminates the problem of extracting tones from recorded
sound. Input data is processed and rendered in realtime, allowing live input and observations of results.
Figure 4 shows the main window of the visualization
tool.
As perception of harmony extends also in time dimension, colour for broader temporal segments is also calculated and displayed. This solves the problems with broken chords, arpeggios and dissonant sequences of tones
or chords as can be seen in Figure 3 and 2. In this way
tones that are played in sequence instead of concurrently are given properly coloured “context”. Colours
of broader temporal segments of appropriate length also
point to the possible chord for that part of the piece.
Figure 5 depicts some examples of visualization of different musical pieces. In Figure 5(a) we can see slowly
changing colours that indicate progression trough related chords, but at the end the colour settles in violet
of D minor in which the piece is written. The piece
in Figure 5(b) is centred around orange colour of D
major. Arpeggio in the middle has a proper coloured
context, although the constituent tones have varying
colours. Goldsmith’s Star Trek Theme employs a lot
key modulation, which can be seen as stable regions of
one dominating colour and sudden changes of hue between the regions (Figure 5(c)). The use of dissonance
depends on music styles – some avoid dissonances, some
use it in very short segments that are afterwards resolved to stable consonances, some use it very extensively. Prokofiev’s Toccata in D minor has extensive
regions of dissonance that can be seen as grey areas
in Figure 5(d). The dissonant part consists of consonant and dissonant concurrent tone combinations, but
the calculation of common colour for broader temporal
segments results in dominant grey colour.
Other types of music genres like popular music, jazz,
folk music can be visualised without problems. Sounds
5. RESULTS AND DISCUSSION
The purpose of our visualization is to show harmonic
relationships with colour. For example consonant tone
combinations like C and G, C and F, C and E have saturated colours, dissonant combinations like C and F♯, C
and C♯ have low saturation. Related triads like C major, A minor, G major have similar colour hues (red to
magenta), while C major and C minor, which are not
harmonically related have distant hues (magenta and
blue respectively). Complex tone combinations involving dissonant tones result in low saturated colours.
As tone loudness is also taken into account when calculating colour, the resulting visualization has smooth
transitions of colour and changes in transparency. This
is especially noticeable in the visualization of decaying
of the tones.
51
(a) Excerpt from Brahms’s Ballades, Op. 10
(b) Excerpt from Tchaikovsky’s Waltz of the Flowers
(c) Excerpt from Goldsmith’s Star Trek Theme
(d) Excerpt from Prokofiev’s Toccata in D minor, Op. 11
Figure 5: Examples of visualization of different compositions demonstrating the representation of
harmony with colour.
from instruments without definable pitch (i.e. most percussion instruments) are omitted. The input to visualization can be an arbitrary stream of tonal data so
performances with mistakes or even random input can
be visualised. Mistakes in performance are noticeable
when compared to properly performed pieces for example in differences in colour. Random input results in
numerous dissonances and the dominant colour is grey.
light with different spectrums may produce same colour
sensation. The original method for calculating colour
considered only concurrent sounding tones. As human
perception of harmony is not limited to a moment in
time we expanded the method to encompass a broader
time range and used to visualise harmonic structure of
broader temporal segments of different lengths.
The aim of the proposed visualization is to enable easier understanding and learning of harmony in music, to
have an overview over the whole composition, compare
it with other compositions and see what we may have
missed by only listening. The visualization approaches
these goals by creating a synergy between two distinct
senses.
The approach are still open for improvement. For instance, although major and parallel minor chords are
differentiated by minor differences in hue, the psychological difference is bigger. Further work could also be
done on improving the method for calculating the colour
of tones to include more than only 12 tones of the chromatic scale or visualising rhythm as well.
6. CONCLUSIONS
The proposed visualization of music strives not only
to be aesthetically pleasing but to reveal the structure of
music and show harmonic relationships in music using
colour. To achieve this it uses a mapping that translates similarity in perception of tones to similarity in
perception of colour. We use a method based on vector
addition inside the key-spanning circle of thirds, which
takes a group of tones as input and calculates a common
colour for the group. This functions in similar way our
auditory system perceives a group of tones as a whole,
sometimes even completely merging the tones. However, given a resulting colour it is not possible to figure
out which tones were used to calculate it. Nevertheless
this is similar to the way colour perception works, where
52
7. REFERENCES
[1] T. Bergstrom, K. Karahalios, and J. C. Hart,
“Isochords: Visualizing structure in music,” in
GI’07: Proceedings of Graphics Interface 2007,
2007, pp. 297–304.
[2] P. Ciuha, B. Klemenc, and F. Solina,
“Visualization of concurrent tones with colour,”
2010, submitted to ACM Multimedia 2010.
[3] D. Deutsch, The Psychology of Music, 2nd ed.
Academic Press, 1998.
[4] G. Gatzsche, M. Mehnert, D. Gatzsche, and
K. Brandenburg, “A symmetry based approach
for musical tonality analysis,” in 8th International
Conference on Music Information Retrieval
(ISMIR 2007), Vienna, Austria, 2007.
[5] E. J. Isaacson, “What you see is what you get: on
visualizing music,” in ISMIR, 2005, pp. 389–395.
[6] B. Klemenc, “Visualization of music on the basis
of translation of concurrent tones into color
space,” Dipl. Ing. thesis, Faculty of Computer and
Information Science, University of Ljubljana,
Slovenia, 2008.
[7] S. Malinowski. (2007) Music animation machine.
[Online]. Available: http://www.musanim.com
[8] A. Mardirossian and E. Chew, “Visualizing music:
Tonal progressions and distributions,” in 8th
International Conference on Music Information
Retrieval, Vienna, Austria, September 2007.
[9] R. Miyazaki, I. Fujishiro, and R. Hiraga,
“Exploring midi datasets,” in SIGGRAPH 2003
conference on Sketches & applications. New
York, NY, USA: ACM Press, 2003.
[10] R. Parncutt, Harmony: A Psychoacoustical
Approach. Springer-Verlag, 1989, ch. 2.
[11] G. G. Robertson, J. D. Mackinlay, and S. K.
Card, “Cone trees: animated 3d visualizations of
hierarchical information,” in Proceedings of the
ACM Conference on Human Factors in
Computing Systems (CHI ’91), 1991, pp. 189–194.
[12] C. S. Sapp, “Harmonic visualizations of tonal
music,” in ICMC’01: Proceedings of the
International Computer Music Conference 2001,
2001, pp. 419–422.
[13] S. M. Smith and G. N. Williams, “A visualization
of music,” in VIS’97: Proceedings of the 8th
conference on Visualization 1997, 1997, pp.
499–503.
[14] A. Wells, “Music and visual color: A proposed
correlation,” in Leonardo, vol. 13, 1980, pp.
101–107.
53
Reviewers by Countries
Argentina
Olsina, Luis; National University of La Pama
Ovando, Gabriela P.; Universidad Nacional de
Rosario
Rossi, Gustavo; Universidad Nacional de La Plata
Australia
Abramov, Vyacheslav; Monash University
Begg, Rezaul; Victoria University
Bem, Derek; University of Western Sydney
Betts, Christopher; Pegacat Computing Pty. Ltd.
Buyya, Rajkumar; The University of Melbourne
Chapman, Judith; Australian University Limited
Chen, Yi-Ping Phoebe; Deakin University
Hammond, Mark; Flinders University
Henman, Paul; University of Queensland
Palmisano, Stephen; University of Wollongong
Ristic, Branko; Science and Technology Organisation
Sajjanhar, Atul; Deakin University
Sidhu, Amandeep; University of Technology, Sydney
Sudweeks, Fay; Murdoch University
Austria
Derntl, Michael; University of Vienna
Hug, Theo; University of Innsbruck
Loidl, Susanne; Johannes Kepler University Linz
Stockinger, Heinz; University of Vienna
Sutter, Matthias; University of Innsbruck
Walko, Zoltan
Brazil
Parracho, Annibal; Universidade Federal Fluminense
Traina, Agma; University of Sao Paulo
Traina, Caetano; University of Sao Paulo
Vicari, Rosa; Federal University of Rio Grande
Belgium
Huang, Ping; European Commission
Canada
Fung, Benjamin; Simon Fraser University
Grayson, Paul; York University
Gray, Bette; Alberta Education
Memmi, Daniel; UQAM
Neti, Sangeeta; University of Victoria
Nickull, Duane; Adobe Systems, Inc.
Ollivier-Gooch, Carl; The University of British
Columbia
Paulin, Michele; Concordia University
Plaisent, Michel; University of Quebec
Reid, Keith; Ontario Ministry og Agriculture
Shewchenko, Nicholas; Biokinetics and Associates
Steffan, Gregory; University of Toronto
Vandenberghe, Christian; HEC Montreal
Croatia
Jagnjic, Zeljko; University of Osijek
Czech Republic
Kala, Zdenek; Brno University of Technology
Korab, Vojtech; Brno University of technology
Lhotska, Lenka; Czech Technical University
Cyprus
Kyriacou, Efthyvoulos; University of Cyprus
Denmark
Bang, Joergen; Aarhus University
Edwards, Kasper; Technical University Denmark
Orngreen, Rikke; Copenhagen Business School
Estonia
Kull, Katrin; Tallinn University of Technology
Reintam, Endla; Estonian Agricultural University
Finland
Lahdelma, Risto; University of Turku
Salminen, Pekka; University of Jyvaskyla
France
Bournez, Olivier
Cardey, Sylviane; University of Franche-Comte
Klinger, Evelyne; LTCI – ENST, Paris
Roche, Christophe; University of Savoie
Valette, Robert; LAAS - CNRS
Germany
Accorsi, Rafael; University of Freiburg
Glatzer, Wolfgang; Goethe-University
Gradmann, Stefan; Universitat Hamburg
Groll, Andre; University of Siegen
Klamma, Ralf; RWTH Aachen University
Wurtz, Rolf P.; Ruhr-Universitat Bochum
Greece
Katzourakis, Nikolaos; Technical University of Athens
Bouras, Christos J.; University of Patras and RACTI
Hungary
Nagy, Zoltan; Miklos Zrinyi National Defense
University
India
Pareek, Deepak; Technology4Development
Scaria, Vinod; Institute of Integrative Biology
Shah, Mugdha; Mansukhlal Svayam
Ireland
Eisenberg, Jacob; University College Dublin
Israel
Feintuch, Uri; Hadassah-Hebrew University
Italy
Badia, Leonardo; IMT Institute for Advanced Studies
Berrittella, Maria; University of Palermo
Carpaneto, Enrico; Politecnico di Torino
Japan
Hattori, Yasunao; Shimane University
Livingston, Paisley; Linghan University
Srinivas, Hari; Global Development Research Center
Obayashi, Shigeru; Institute of Fluid Science, Tohoku
University
Mexico
Morado, Raymundo; University of Mexico
Netherlands
Mills, Melinda C.; University of Groningen
Pires, Luís Ferreira; University of Twente
New Zealand
Anderson, Tim; Van Der Veer Institute
53
Philippines
Castolo, Carmencita; Polytechnic University
Philippines
Poland
Kopytowski, Jerzy; Industrial Chemistry Research
Institute
Portugal
Cardoso, Jorge; University of Madeira
Natividade, Eduardo; Polytechnic Institute of Coimbra
Oliveira, Eugenio; University of Porto
Republic of Korea
Ahn, Sung-Hoon; Seoul National University
Romania
Moga, Liliana; “Dunarea de Jos” University
Serbia
Mitrovic, Slobodan; Otorhinolaryngology Clinic
Stanojevic, Mladen; The Mihailo Pupin Institute
Ugrinovic, Ivan; Fadata, d.o.o.
Singapore
Tan, Fock-Lai; Nanyang Technological University
Slovenia
Kocijan, Jus; Jozef Stefan Institute and University of
Nova Gorica
South Korea
Kwon, Wook Hyun; Seoul National University
Spain
Barrera, Juan Pablo Soto; University of Castilla
Gonzalez, Evelio J.; University of La Laguna
Perez, Juan Mendez; Universidad de La Laguna
Royuela, Vicente; Universidad de Barcelona
Vizcaino, Aurora; University of Castilla-La Mancha
Vilarrasa, Clelia Colombo; Open University of
Catalonia
Sweden
Johansson, Mats; Royal Institute of Technology
Switzerland
Niinimaki, Marko; Helsinki Institute of Physics
Pletka, Roman; AdNovum Informatik AG
Rizzotti, Sven; University of Basel
Specht, Matthias; University of Zurich
Taiwan
Lin, Hsiung Cheng; Chienkuo Technology University
Shyu, Yuh-Huei; Tamkang University
Sue, Chuan-Ching; National Cheng Kung
University
Ukraine
Vlasenko, Polina; EERC-Kyiv
United Kingdom
Ariwa, Ezendu; London Metropolitan University
Biggam, John; Glasgow Caledonian University
Coleman, Shirley; University of Newcastle
Conole, Grainne; University of Southampton
Dorfler, Viktor; Strathclyde University
Engelmann, Dirk; University of London
Eze, Emmanuel; University of Hull
Forrester, John; Stockholm Environment Institute
Jensen, Jens; STFC Rutherford Appleton Laboratory
Kolovos, Dimitrios S.; The University of York
McBurney, Peter; University of Liverpool
Vetta, Atam; Oxford Brookes University
Westland, Stephen; University of Leeds
WHYTE, William Stewart; University of Leeds
Xie, Changwen; Wicks and Wilson Limited
USA
Bach, Eric; University of Wisconsin
Bazarian, Jeffrey J.; University of Rochester School
Bolzendahl, Catherine; University of California
Bussler, Christoph; Cisco Systems, Inc.
Charpentier, Michel; University of New Hampshire
Chester, Daniel; Computer and Information Sciences
Chong, Stephen; Cornell University
Collison, George; The Concord Consortium
DeWeaver, Eric; University of Wisconsin - Madison
Ellard, Daniel; Network Appliance, Inc
Gaede, Steve; Lone Eagle Systems Inc.
Gans, Eric; University of California
Gill, Sam; San Francisco State University
Gustafson, John L..; ClearSpeed Technology
Hunter, Lynette; University of California Davis
Iceland, John; University of Maryland
Kaplan, Samantha W.; University of Wisconsin
Langou, Julien; The University of Tennessee
Liu, Yuliang; Southern Illinois University Edwardsville
Lok, Benjamin; University of Florida
Minh, Chi Cao; Stanford University
Morrissey, Robert; The University of Chicago
Mui, Lik; Google, Inc
Rizzo, Albert ; University of Southern California
Rosenberg, Jonathan M. ; University of Maryland
Shaffer, Cliff ; Virginia Tech
Sherman, Elaine; Hofstra University
Snyder, David F.; Texas State University
Song, Zhe; University of Iowa
Wei, Chen; Intelligent Automation, Inc.
Yu, Zhiyi; University of California
Venezuela
Candal, Maria Virginia; Universidad Simon Bolívar
IPSI Team
Advisors for IPSI Developments and Research:
Zoran Babovic, Darko Jovic, Aleksandar Crnjin,
Marko Stankovic, Marko Novakovic
Authors of papers are responsible for the contents and layout of their papers.
54
Welcome to IPSI BgD Conferences and Journals!
http://www.internetconferences.net
http://www.internetjournals.net
CIP – Katalogizacija u publikaciji
Narodna biblioteka Srbije, Beograd
ISSN 1820 – 4503 =
The IPSI BGD Transactions on Internet
Research
COBISS.SR - ID 119128844