Merging data sources based on semantics, contexts and trust

Dejan Lavbič

Merging data sources based on semantics, contexts and trust

Dejan Lavbič

visibility

…

description

57 pages

link

1 file

Matching and merging of data from heterogeneous sources is a common need in various scenarios. Despite numerous algorithms proposed in the recent literature, there is a lack of general and complete solutions combining different dimensions arising during the matching and merging execution. We propose a general framework, and accompanying algorithms, that allow joint control over various dimensions of matching and merging. To achieve superior performance, standard (relational) data representation is enriched with semantics and thus elevated towards the real world situation. Data sources are merged using collective entity resolution and redundancy elimination algorithms that are managed through the use of different contexts – user, data and also trust contexts. Introduction of trust allows for an adequate trust management and efﬁcient security assurance which is, besides a general solution for matching and merging, the main novelty of the proposition.

The IPSI BgD Transactions on Internet Research Multi-, Inter-, and Trans-disciplinary Issues in Computer Science and Engineering A publication of IPSI Bgd Internet Research Society New York, Frankfurt, Tokyo, Belgrade January 2010 Volume 6 Number 1 (ISSN 1820-4503) Table of Contents: Pearls of Wisdom by Nobel Laureate: Interview - Inventivity from the Look of Science Kroto, H. .......................................................................................................................... 2 Invited Papers: Fuzzy Sets and Inference as an Effective Methodology in the Construction of Intelligent Controllers Saade, J. ….…………………………………………………................................................ 3 Merging Data Sources Based on Semantics, Contexts, and Trust Šubelj, L.; Jelenc, D.; Zupančič, E.; Lavbič, D.; Tršek, D.; Krisper, M.; Bajec, M. ........ 18 Evaluation Models for E-Learning Platforms and the AHP Approach: a Case Study Colace, F.; De Santo, M. .............................................................................................. 31 Academic Ranking of World Universities 2009/2010 Mester, G. …………………………………...................................................................... 44 Visual and Aural: Visualization of Harmony in Music with Colour Klemenc, B.; Ciuha, P.; Šubelj, L.; and Bajec, M. ......................................................... 48 The IPSI BgD Internet Research Society The Internet Research Society is an association of people with professional interest in the field of the Internet. All members will receive this TRANSACTIONS upon payment of the annual Society membership fee of €100 plus an annual subscription fee of €1000 (air mail printed matters delivery). Member copies of Transactions are for personal use only IPSI BGD TRANSACTIONS ON ADVANCED RESEARCH www.internetjournals.net STAFF Veljko Milutinovic, Editor-in-Chief Marko Novakovic, Journal Manager Department of Computer Engineering Department of Computer Engineering IPSI BgD Internet Research Society IPSI BgD Internet Research Society University of Belgrade University of Belgrade POB 35-54 POB 35-54 Belgrade, Serbia Belgrade, Serbia Tel: (381) 64-2956756 Tel: (381) 64-1389281 [email protected] Lipkovski, Aleksandar The Faculty of Mathematics, Belgrade, Serbia Blaisten-Barojas Estela George Mason University, Fairfax, Virginia USA Crisp, Bob University of Arkansas, Fayetteville, Arkansas USA Domenici, Andrea University of Pisa, Pisa Italy Flynn, Michael Stanford University, Palo Alto, California USA Fujii, Hironori Fujii Labs, M.I.T., Tokyo Japan Ganascia, Jean-Luc Paris University, Paris France [email protected] EDITORIAL BOARD Gonzalez, Victor Victor Milligan, Charles University of Oviedo, Sun Microsystems, Gijon, Colorado Spain USA Janicic Predrag Kovacevic, Milos The Faculty of Mathematics, School of Electrical Engineering, Belgrade Belgrade Serbia Serbia Jutla, Dawn Neuhold, Erich Sant Marry's University, Research Studios Austria, Halifax Vienna Canada Austria Karabeg, Dino Piccardi, Massimo Oslo University, Sydney University of Technology, Oslo Sydney Norway Australia Kiong, Tan Kok Radenkovic, Bozidar National University Faculty of Organizational Sciences, of Singapore Belgrade Singapore Serbia Kovacevic, Branko Rutledge, Chip School of Electrical Engineering, Purdue Discovery Park, Belgrade Indiana Serbia USA Patricelli, Frederic Mester, Gyula ICTEK Worldwide University of Szeged, L'Aquila Szeged Italy Hungary 1 Interview – Inventivity from the Look of Science Kroto, H. 1. How you define inventivity and creativity? science and the World as more as. I also think that discovery is better as the ideas and the goals behind it have more impact to improvement mankind. I define a creativity/inventivity as a new look to existing phenomena. Many scientists through centuries have been overlooking the standard frame of mind to see various aspects of existing and new inventions. Discoveries are the final product of creativity. We can look at discovery in many different ways, but the most important thing about discovery is what impact it has to mankind and environment. 3. For small nations (like Montenegro or Serbia), what are the things to do to induce inventivity and creativity among young people? Young people are a lot of creative, but they could go astray. Education should be pitched upon a healthy basis. There is not a big difference between small and big countries in the way of inducing creativity, herewith small countries can organize in a better way. 2. What was the major catalyst which enabled the inventivity to happen in the case of invention that brought the Nobel Prize to you? I think that this is a good collaboration with my colleagues and striving to improve education, About the Author Sir Harold (Harry) Walter Kroto, KCB, FRS (born 7 October 1939) is an English chemist and one of the three recipients to share the 1996 Nobel Prize in Chemistry. He is currently on faculty at Florida State University, which he joined in 2004; prior to that he spent a large part of his working career at the University of Sussex, where he holds an emeritus professorship. 2 Fuzzy Sets and Inference as an Effective Methodology in the Construction of Intelligent Controllers J. Saade ECE Department, FEA, American University of Beirut P.O.Box: 11-0236, Riad El Solh 1107 2020, Beirut, Lebanon Fax: 961.1.744 462, e-mail: [email protected] Abstract- Intelligent controllers are human-like thinking machines. They have the objective of controlling ill-defined, vague and complex processes in a manner similar to the control of human experts. Emphasizing fuzzy sets and inference as an effective methodology in the construction of intelligent controllers is the subject of concentration. This is based on the fact that fuzzy inference is capable of delivering machines that apply the approximate reasoning principles and other important aspects of intelligent thinking. Case studies related to non-linear function representation and robot navigation are presented to show the success of fuzzy inference in the production of intelligent machines. Data-driven and other methodologies, which are used in the construction of intelligent controllers, are emphasized and the superiority of the data-driven fuzzy learning methodology is presented. this section, emphasis is also placed on the major drawbacks of the existing neuro-fuzzy approaches for fuzzy system modeling and on the fact that these drawbacks emerge from the basic structure of Takagi-Sugeno type controllers and the minimization of data approximation error. Furthermore, Section 4 outlines the learning procedure implemented in a data-driven and purely fuzzy learning methodology for Mamdani-type fully-linguistic controllers [20] and states the advantages of this methodology compared to the neuro-fuzzy methods and the reasons for these advantages. In this section a summary of the fuzzy learning algorithm and its design aspects are also provided. Then, a typical non-linear function, which was considered in the literature to test existing neuro-fuzzy, clustering and other design approaches, is considered in Section 5. The objective is to use the performance criteria to test the design algorithms and give comparisons and preferences. Particular emphasis is placed on the comparison of the results obtained using the fuzzy learning methodology with those given by other learning approaches. In Section 6, use is made of the defined performance criteria to compare the fuzzy learning methodology with a powerful neuro-fuzzy approach in the area of robot navigation. Conclusive comments related to the superiority of the fuzzy algorithm over neuro-fuzzy and other approaches are offered in Section 7. Keywords: Intelligent controllers; Fuzzy sets and inference; Approximate reasoning; Vehicle navigation; Learning. I. INTRODUCTION The construction or design of intelligent controllers using fuzzy sets and inference has been an active area of research for quite a number of years. A part of this research has been concerned with the development of automatic, data-driven learning algorithms. The objective of these algorithms is to provide a fuzzy system that approximates available input-output data considered to model the human expert’s control actions. Different data-driven approaches have been published. We state here, for instance, the neuro-fuzzy approaches [1-7] and approaches based on the use of clustering, genetic algorithms and combined gradient-descent-least-squares [7-19]. When the above-noted algorithms were tested and compared using non-linear functions and/or control applications, the testing and comparison has almost relied on the sole use of data approximation error as a measure of performance. In this study, it is shown that the data approximation error cannot be considered as the only measure of performance. Rather, practical performance criteria, based on important aspects of intelligent human thinking, need to be defined and used to test, compare and determine preferences between data-driven fuzzy controller’s construction algorithms. This is done in Section 2. Section 3 reviews the literature related to the issues of noisy and incomplete training data as they relate to the performance criteria in the context of the design of fuzzy inference systems. In II. PERFORMANCE CRITERIA Practical performance criteria are defined in this section based on important aspects of intelligent human thinking; such as approximate reasoning reflected as tolerance for imprecision and generalization. These criteria are then made available to test, compare and determine preferences between data-driven fuzzy controller’s construction algorithms. It is true that a fuzzy controller is a non-linear controller that should approximate a non-linear function according to which a human expert performs the control of some ill-defined, vague and complex process. This function, however, is practically unknown [8, 21-24]. Thus, expert input-output data are, in fact, measurements of the control actions taken by a human expert in response to process states while a control task is being performed. Hence, the data are practically noisy versions of the expert’s actual control actions, which obey the non-linear function representing the process control. Also, the data can be incomplete or not available in some region(s) of the input space. This could be due to missing 3 measurements resulting from the fact that the expert has not gone into situations where the fuzzy system designer is willing to have his system able to venture and still perform satisfactorily. After all, an intelligent system should be one that, when trained in some situation, behaves satisfactorily by generalization in a related situation whose facts were not used in training. Such an aspect does, in fact, characterize the nature of human intelligence. Henceforth, when non-linear functions are considered to test the performance of fuzzy inference systems resulting from the use of data-driven design algorithms and establish preferences between these algorithms, the following practical performance criteria need to be adopted: algorithm to identify T-S type fuzzy systems using a weighted performance index. The objective was to provide a balance between the approximation and generalization aspects of fuzzy models. Branco and Dente [28] pointed out ignored issues in fuzzy models design. They addressed the appearance of noise as a source of ambiguity to the fuzzy model, the fuzzy model generalization ability and the influence of the training set size on the learning performance. According to the authors, the improvement of the data approximation error by a few percentages in a new algorithm could make the new model predictions (generalization) irrelevant. Furthermore, Shi and Mizumoto [15] used the fuzzy c-means clustering to preprocess the data, remove existing redundancies (noise) and extract typical data to be used for training in a neuro-fuzzy learning algorithm. Leski [29] recognized the intrinsic inconsistency of neuro-fuzzy modeling due to its zero tolerance to imprecision while fuzzy modeling is based on the premise that human thinking is tolerant to imprecision. Consequently, the studies reported in [15, 27-29] can be used to provide an additional validation of the issues raised in Section 2 and in [25,26] and to the fact that the performance assessment of data-driven fuzzy systems modeling algorithms need to be done by accounting for noisy and incomplete training data. It can also be concluded from these studies that a fuzzy controller construction approach that is structured based on the minimization of data approximation error hinders the noise insensitivity and generalization capability of the resulting fuzzy model and it contradicts with Zadeh’s principle of “tolerance for imprecision” [30]. What could be added here as well is a remark about the fact that it is the T-S fuzzy system model that has triggered the appearance of the numerous neuro-fuzzy research reports in the area of fuzzy systems modeling. In addition to having crisp values or linear combinations of the system input variables as rules consequents, an aspect that diminishes the fuzzy system linguistic representation of human knowledge, T-S models supplied with neuro-fuzzy learning techniques also have many other undesirable aspects that were brought out in [5]. Henceforth, in the next section a summary of a data-driven purely fuzzy learning methodology for Mamdani-type and fully-linguistic fuzzy controllers is provided. The learning algorithm enables the full linguistic representation of human knowledge and expertise and permits thinking tolerant to imprecision by not seeking error minimization. It rather seeks error reduction and requires that the learning stop once the error becomes less than or equal to some threshold value that can be set by the system designer. The algorithm will also be shown, as a result, to possess good noise insensitivity and generalization capability. It will in addition be compared to fuzzy clustering and partition approaches and most importantly to ANFIS [13] using a typical non-linear function and robot navigation. (a) the value of some error function in the approximation of the underlying noise-free data when the training data are noisy, (b) the noise insensitivity, (c) the generalization capability, (d) the noise insensitivity and generalization capability, (e) the representation of the shape and smoothness of the non-linear function. Assessing the performance criteria in the presence of noise (points (a) and (b)) can be done as follows: The data points extracted from a non-linear function, or some of them, are to be modified to violate the function analytical equation. The modified data are to be used in training and the extracted data are considered the noise-free ones. The smaller is the value of the error at the noise-free data and the larger the error at the noisy points, the better the performance of the fuzzy system modeling approach. We note here that both points (a) and (b) need to be considered. Since as a result of introducing noise, having the obtained fuzzy system not responding to or staying away from the noisy points does not necessarily imply that it will get closer to the noise-free data and thus to the real control curve. As to generalization, data points in some boundary region of the input space and selected from among those extracted from a non-linear function are to be eliminated. Then, the fuzzy system obtained by training using the remaining data is to be tested based on its ability to extrapolate to the region of missing data. This can be assessed by observing whether the system obtained by training based on the remaining data is the same or close to the one obtained using the whole data set. Also, if necessary, the error value at the excluded points and at the whole set of data can be considered in the assessment of generalization. Point (e) consists of testing the capability of the fuzzy system design approach to achieve smooth control. Point (e) can also be considered to assess generalization taken in the sense of interpolation to points within the training data. III. REVIEW OF LITERATURE RELATED TO NOISY AND INCOMPLETE TRAINING DATA IV. SUMMARY OF A DATA-DRIVEN FUZZY LEARNING ALGORITHM After raising the issues related to the performance testing and comparison of data-driven fuzzy controllers design algorithms and arguing the fact that performance criteria accounting for noisy and incomplete training data need to considered due to practical reasons (Saade, [25]) and (Saade and Al-Khatib, [26]), research reports in which these important matters have been approached started to appear. Oh and Pedrycz [27] introduced an auto-tuning The data-driven fuzzy learning algorithm for the design of intelligent controllers considers fully-linguistic fuzzy inference systems of Mamdani-type. In these systems, the fuzzy output, for a given crisp input value or vector, is obtained using the compositional rule of inference (CRI) [30] and defuzzification is applied to this fuzzy output to convert it into a crisp one. 4 Defuzzification in this algorithm is accomplished by applying a parameterized strategy that was established in [31]. Now, the parameterized defuzzification method, developed in [31], applies to the normalized version of C0i(z), denoted, C0in(z), and obtained by dividing the membership function of C0i(z) by the highest membership grade, as follows: Consider a collection of N if-then fuzzy inference rules for a two-input, one-output fuzzy inference system. Let the j-th rule, 1 ≤ j ≤ N, be: Fδ [C 0in ( z )] = if x 1 is A j and x 2 is B j, then z is C j . respectively. The fuzzy output, corresponding to some crisp input pair (x 1i , x 2i ) , can be obtained using the CRI as follows: 1≤ j ≤ N (1) There is a need to stress here on Zadeh’s rule of composition. Consider first the two variables u and v, which assume values respectively in spaces U and V. Let A and B be two fuzzy sets defined respectively over the spaces U and V. The fuzzy conditional statement expressed as “If u is A, then v is B” can be interpreted as a fuzzy relation R defined by the Cartesian product of A and B. That is, [If u is A, then v is B]≡R=A×B. The membership function of R, denoted R(u,v) can be obtained using some operation, called “fuzzy implication operation”, between A(u) and B(v), which are the membership functions of A and B respectively. The minimum operation has initially been suggested by Zadeh [30]. Now, the fuzzy relation R, as above, induces from a fuzzy set A’, defined over the space U, a fuzzy set B’, over the space V, such that B’=A’оR, where о denotes relation composition. If the max-min composition is used [32], then the following is obtained: B ′(v) = max[min( A′(u ), R (u , v))]. u∈U When the fuzzy set A’ is a singleton; that is, A’= uo∈ U, the above equation becomes: B’(v)=R(uo,v). This is the case of interest in this study since the fuzzy controller is assumed, as is mostly the case, to be one that admits crisp inputs. Returning now to our collection of N if-then fuzzy inference rules for a two-input, one-output fuzzy inference system, then the system rules can be represented by the following fuzzy relation: R = [( A1 ∩ B 1) × C1 ] ∪ [( A2 ∩ B2 ) × C2 ] ∪ ... ∪ [( AN ∩ BN ) × C N ] = 1 ( 2) [c1(α), c2(α)] is the α-level set of C0in(z) and δ is a parameter that takes values in the interval [0,1]. The defuzzification method in Equation (2) was derived by first reformulating the classical criteria; minimax, maximax and the Hurwicz criterion, which apply for ranking intervals, using the intervals characteristic functions. Then, the reformulation of the noted criteria for intervals permitted their natural extension to fuzzy sets by replacing the intervals characteristic functions by the fuzzy sets membership functions. The criteria, which were originally expressed using a defined distance measure and integration of the characteristic or membership functions along the real axis, were then expressed by an equivalent integration along the membership axis using the α-level set of a fuzzy set [33]. Furthermore, the study in [31] was one that formally related ranking fuzzy sets to the defuzzification of the outputs of fuzzy controllers. It unified the two problems to make the solution for the first applicable to the second. Also, the benefits that could be obtained if the defuzzification of the outputs of fuzzy controllers is approached from the point of view of ranking, and through the use of a parameterized defuzzification formula to satisfy design objectives (shaping the controller input-output characteristic), have been emphasized. It is in this spirit that the study in [31] unified the criteria introduced in [33] to be represented by a single parameterized ranking and defuzzification formula, which is the one expressed in Equation (2). In the data-driven learning algorithm that is described below, and whose flow chart is shown in Figure 1, Equation (2) is used for tuning initial fuzzy controllers based on input-output data. This tuning considers the consistent modification of the parameter δ and the rules consequents to reduce the value of some error function and obtain a final fuzzy system. It is assumed that the system designer is able to specify the input and output variables of the fuzzy controller and the ranges of these variables. Then overlapping membership functions are assigned to cover the entire ranges of the variables of concern. In terms of overlap, it was observed that the smoothness of the controller input-output surface is best served whether the input membership functions, assigned over a single variable, are such that the sum of the membership grades at any crisp input is one. Once the input membership functions are assigned, then all combinations of input fuzzy sets are considered to form the antecedent parts of the rules. All the initial rules consequents are required to be equal to the left most of the fuzzy sets assigned over the output variable. This set needs to be formed by a flat and a decreasing part or a decreasing part only. This makes the defuzzified value of any fuzzy output, obtained by Equation (1) and through the application of Equation (2) with δ=1, equal to the smallest value of the output range. x 1 and x 2 and z are the input and output variables of the system. Also, A j , B j and C j are fuzzy sets defined over x 1 , x 2 and z C 0i (z) = max [A j (x 1i ) ∧ B j (x 2i ) ∧ C j (z)]. ∫0 [δ c1 (α ) + (1 − δ ) c2 (α )] dα U[( A j ∩ B j ) × C j ]. N j =1 In this relation, the symbol ∪ is taken as a representation of the OR operator introduced between the rules. The symbol ∩ represents the operator AND used in the antecedent part of the rules. The fuzzy controller output that corresponds to a crisp input pair (x1i, x2i) can, therefore, be obtained by C0i(z)=R(x1i, x2i, z). If the minimum operation (∧) is adopted for AND and THEN operators and the maximum (max) operation is used for OR, then Equation (1) above is obtained. Other operations, such as sum and product, can also be used. Also, Equation (1) can be generalized easily to systems with more than two input variables. 5 Initial fuzzy system Start Store rules No Compute C0in , i=1,2,…,n Are the updated rules identical to previous ones ? δ=1 Stop Yes Compute F1(C0in), i=1,2,…,n Compute E. Store δ, E and the rules if E is the smallest error. Is E≤Ed ? Yes Stop No Updated rules Lower by one membership function the consequent of each rule triggering one C0j such that F1(C0j) > zjd No Is F1(C0in) ≤ zid , i=1,2,…,n ? Yes Decrease δ by a step size δ=δ-Δ δ≥0? No Yes Compute Fδ(C0in), i=1,2,…,n No Compute E. Store δ, E and the rules if E is the smallest error. Is E≤Ed ? Yes Stop Raise the consequent of each rule by one membership function Yes No Was the raise in rules consequents possible for at least one rule ? Stop Figure 1. Flow-chart of the data-driven fuzzy controllers design algorithm described in Section 4. 6 Given the input-output data pairs in the form ( x i , z id ), with i=1, 2, 3,…, n and x i = (x 1i , x 2i , x 3i ,..., x pi ) , where p is the number of input variables, the learning process starts with an initial fuzzy system as specified above. The algorithm (see Figure 1) computes the fuzzy outputs C 0i for all x i , i=1, 2, 3, …, n using the CRI (Equation (1)) and then defuzzifies their normalized versions, C 0in , using Equation (2) when δ =1. Here, all the defuzzified values will be equal to the smallest value of the output range. Hence, given that z id are all greater than or equal to this value (this should always be the case), then F1[C0in(z)]≤zid. For these defuzzified values, the error E is computed using some error function and compared with a desired error value Ed. If E≤Ed, then the learning stops. Otherwise, δ is decreased from 1 to 0 by passing through discrete intermediate values. For each δ, the error is computed and compared with Ed. Note here that the decrease in δ results in an increase in the defuzzified values of the fuzzy outputs. These values are then made closer to the desired outputs. Whether the change in δ has led to the satisfaction of the error goal, that is, E≤Ed has been achieved for some δ∈[0, 1], then the learning stops. Otherwise, the algorithm starts again from δ=1 but with new rules. The new rules are obtained by raising each rule consequent by one fuzzy set. This, however, might lead to a violation of the inequality F1[C0in(z)]≤zid for some values of i. If so, the inequality can be reestablished by repeatedly lowering the consequents of the rules, which trigger one fuzzy output whose defuzzified value for δ=1 is greater than its desired counterpart. Once all defuzzified values become again smaller than or equal to the desired ones, δ will be decreased from 1 to 0 and for each δ value the error is computed and compared with Ed. This process is repeated until either the error goal is satisfied or no more raise in the rules consequents is possible or when the raise and lowering of the rules consequents result in a system that has already been obtained. When the learning stops, the algorithm delivers the final fuzzy system with the least error value that can be obtained under the described procedure, the error and the final δ value. Figure 2. Input-output surface of the non-linear function given in Equation (3). Table 1. Fifty training data points extracted from the non-linear function in Equation (3). # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 V. PERFORMANCE TESTING USING A TYPICAL NON-LINEAR FUNCTION In this section, we consider the typical non-linear function given in Equation (3) below to test the performance (Section 2) of the algorithm described in Section 4 and compare with other methods. This function has been considered in research studies to test existing data-driven techniques. z = f(x 1 , x 2 ) = (1 + x 1−2 + x −2 1.5 ) 2 , 1 ≤ x 1 , x 2 ≤ 5. (3) x1 1.4 4.28 1.18 1.96 1.85 3.66 3.64 4.51 3.77 4.84 1.05 4.51 1.84 1.67 2.03 3.62 1.67 3.38 2.83 1.48 3.37 2.84 1.19 4.1 1.65 x2 1.8 4.96 4.29 1.9 1.43 1.6 2.14 1.52 1.45 4.32 2.55 1.37 4.43 2.81 1.88 1.95 2.23 3.7 1.77 4.44 2.13 1.24 1.53 1.71 1.38 z 3.7 1.31 3.35 2.7 3.52 2.46 1.95 2.51 2.7 1.33 4.63 2.8 1.97 2.47 2.66 2.08 2.75 1.51 2.4 2.44 1.99 3.42 4.99 2.27 3.94 # 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 x1 2 2.71 1.78 3.61 2.24 1.81 4.85 3.41 1.38 2.46 2.66 4.44 3.11 4.47 1.35 1.24 2.81 1.92 4.61 3.04 4.82 2.58 4.14 4.35 2.22 x2 2.06 4.13 1.11 2.27 3.74 3.18 4.66 3.88 2.55 2.12 4.42 4.71 1.06 3.66 1.76 1.41 1.35 4.25 2.68 4.97 3.8 1.97 4.76 3.9 1.35 z 2.52 1.58 4.71 1.87 1.79 2.2 1.3 1.48 3.14 2.22 1.56 1.32 4.08 1.42 3.91 5.05 1.97 1.92 1.63 1.44 1.39 2.29 1.33 1.4 3.39 In [16], a fuzzy system was determined with 0.318 as a mean-square error (MSE) value. The error was then reduced to 0.01 using position gradient. In [10], which is also a study based on fuzzy clustering, the best-obtained MSE was 0.231. The use of fuzzy partition [7] gave a fuzzy system with 0.351 as an MSE value. This was then reduced to 0.005 by a fuzzy neural network. This function, whose plot is shown in Figure 2, was first considered by Sugeno and Yasukawa [16] and then considered again by Delgado et al. [10] and by Lin et al [7]. Fifty input-output data points, listed in [16], were extracted from Equation (3) and considered in the noted references. These points are also given in Table 1. 7 by ANFIS had the surfaces shown in Figures 6(a), (b) and (c) respectively. As for the algorithm described in this study, the same 50 data points were considered. Three membership functions were adopted over each of the input variables (Figures 3(a) and 3(b)). The error value of 0.216 was obtained with 4 output fuzzy sets as in Figure 3(c). The final obtained rules are as listed below and the final δ value is 1. The input-output surface is shown in Figure 4. M e m b e r s h ip A 1 0 A 1 A 2 3 .5 0 1 If x 1 is A1 and x 2 is B1 , then z is C 3 If x 1 is A1 and x 2 is B 2 , then z is C 3 B 1 If x 1 is A1 and x 2 is B 3 , then z is C 3 0 5 B 1 B 2 3 .5 0 If x 1 is A 2 and x 2 is B1 , then z is C 3 1 3 5 (b ) IN P U T -x 2 M e m b e r s h ip If x 1 is A 2 and x 2 is B 2 , then z is C 2 C 1 If x 1 is A 2 and x 2 is B 3 , then z is C 2 0 If x 1 is A 3 and x 2 is B1 , then z is C 3 1 C C 2 3 C 4 .5 0 0 .5 3 .7 5 7 (c ) O U T P U T -Z If x 1 is A 3 and x 2 is B 2 , then z is C 2 If x 1 is A 3 and x 2 is B 3 , then z is C 2 3 (a ) IN P U T -x 1 M e m b e r s h ip (4) Figure 3. Membership functions assigned over the input and output variables of the non-linear function given in Equation (3). The non-linear function in Equation (3) is considered again to test the performance of the algorithm in complete but noisy data type. Since the methods noted in [7,10,16] considered only complete and noise-free data and the resulting error values, the comparison will be done with ANFIS. It is powerful enough to bring out the relative powers of the algorithm presented in Section 4. ANFIS, introduced by Jang [13], is available under MATLAB and based on a combination of the least-squares and gradient-descent methods. For the comparison to be meaningful, the same number and type of the input membership functions will be considered in ANFIS. Also, due to the fact that the ANFIS results depend on the number of epochs and to a lesser degree on the initial step-size, the presented results will be the best ones we obtained after attempting different combinations of epoch-step-size values. The algorithm will also be examined for generalization and noise insensitivity plus generalization. That is, in the cases of noise-free but incomplete and noisy and incomplete data types. Figure 4. Input-output surface of the fuzzy system given in Equation (4) and representing the function given in Equation (3). Let us first consider the use of the same 50 data pairs (Table 1) in ANFIS. A fuzzy system with MSE equal to 0.0303 was obtained under 100 epochs and 0.00001 as initial step-size. The input-output surface is shown in Figure 5. Although the error value is smaller than 0.216 obtained in our approach, the comparison of Figures 4 and 5 reveals that the described algorithm has a better representation of the shape of the non-linear function. All the above-noted 3 cases were used in the algorithm described in Section 4. The resulting fuzzy system was always as in Equation (4) and with δ=1. The input-output surface is just the one shown in Figure 4. The comparison of Figure 4 with Figures 6(a), (b) and (c) reveals that the presented approach has a better noise insensitivity than ANFIS. This result is also supported by the error values at the noisy points. In terms of the error obtained by considering the underlying 50 noise-free data (Table 1), ANFIS gave 0.2292, 0.2925 and 0.3217 respectively. Comparison of these error values with 0.216 (obtained using the presented algorithm), the noise insensitivity result and also Figure 4 with Figures 6(a), (b) and (c), shows that our approach has a performance preference over ANFIS when the learning is based on noisy data (see Section 2). Concerning complete and noisy data, 3 stages of modification of output values in the 50 points listed in Table 1 were performed. In each stage, 4 output values were modified to result in 4 input-output pairs, denoted as set, not satisfying the function in Equation (3). First, set 1 was used in addition to the remaining 46 noise-free data pairs. In stage 2, sets 1 and 2 were used in addition to the remaining 42 noise-free data. In stage 3, sets 1, 2 and 3 were used in addition to 38 noiseless data. The fuzzy systems obtained 8 (a) Figure 5. Input-output surface of the fuzzy system obtained from ANFIS using the 50 data pairs in Table 1. In terms of testing the generalization capability of the presented approach, sets of data points from among those listed in Table 1 and located in specific boundary regions of the input space were excluded in succession. The remaining data were entered into the presented algorithm. First, data pairs such that 1< x1 <2.5 and 3.5< x2 <5 were eliminated. This resulted in the exclusion of 5 data points. Second, data points such that 1< x1 <3 and 3< x 2 <5 (8 points) were excluded. In both cases, the final fuzzy system turned out to be as in Equation (4) and δ=1. The error values were respectively 0.2355 and 0.2468. In both cases, the error value 0.216 still holds for the original 50 points in Table 1. The presented algorithm was also tested for its ability to combat noise and generalize simultaneously. The data elimination process described in the preceding paragraph was again considered and noisy data from among the above-mentioned 12 points were introduced. The introduced noisy points were those which survived elimination. Hence, 9 noisy points (set 1, 1 point from set 2 and set 3) were used among the 45 data pairs, which resulted from the first data exclusion. Also, 8 noisy points (set 1 and set 3) were used among the 42 data pairs, which resulted from the second data elimination process. In both cases, the algorithm returned the final fuzzy system expressed in Equation (4) with δ=1. The input-output surface is therefore as shown in Figure 4. The proposed fuzzy system modeling approach is therefore able to combat noise and generalize simultaneously. (b) VI. ROBOT NAVIGATION CASE (c) Figure 6. Input-output surfaces of the fuzzy systems obtained from ANFIS using: (a) 46 noise-free data pairs and 4 noisy ones (b) 42 noise-free data and 8 noisy points and (c) 38 noise-free data and 12 noisy points. In dealing with the motion-planning problem of a mobile robot among existing obstacles, different classical approaches have been developed. Within these approaches, we state the path velocity decomposition [34,35], incremental planning [36], relative velocity paradigm [37], and potential field [38]. Soft-computing techniques, employing various learning methods, have also been used to improve the performance of conventional controllers [39,40]. Each of the above noted methods is either computationally extensive or capable of solving only a particular type of problems or both. In order to reduce the computational burden and provide a more natural solution for the dynamic motion-planning (DMP) problem, fuzzy approaches, with emphasis on user-defined rules and collision-free paths, have been suggested [41-43]. Recently, a more advanced fuzzy-genetic-algorithm approach has been 9 be identified using the present and predicted positions of each obstacle. Therefore, what need to be used are the present and predicted positions of the NOF. The position has two components; angle and distance. The angle is the one between the line joining the target G to the robot position, denoted by R in Figure 7, and the line between the robot and the NOF. The distance is the one between the robot and the NOF. The FLC output is the deviation angle between the target-robot line and the new direction of robot movement, denoted by line RD (see also Figure 7). Based on the noted information the robot will be able to know whether the NOF will get close to or cross the line segment joining the present position of the robot and the point it reaches after ΔT time if it moves straight to the target. This knowledge is in fact necessary for the determination of the angle of deviation. But, to include all these variables in the conditions of the FLC will complicate its structure. It will also make the derivation of the input-output data points needed for the construction of the inference rules a difficult task. To make things simpler while maintaining the practicality of the problem, a contraint (constraint #4 below), which is not too restrictive, is considered in addition to other ones implied by the aforementioned problem description and adopted in [44]. 1. The robot is considered to be a single point. 2. Each obstacle is represented by its bounding circle. 3.The speed of each obstacle is constant with a fixed direction between its previous, present and predicted positions. 4. The distance traveled by the NOF in ΔT time is comparable to its diameter. Of course, constraint 3 presupposes that the obstacles do not collide while moving. Also, constraint 4, with the problem configuration as depicted in Figure 8 and its use in the determination of the input-output data (see below), will reduce the number of FLC input variables to 2; predicted angle and distance. The present position of the NOF is still accounted for but not used explicitly in the controller conditions. Figure 8 considers a quadrant filled by side-to-side obstacles each of which may constitute the predicted position of the NOF. Suppose that the robot is in position R (present position) and the NOF predicted position is in (A22, B21). The robot initial intension is to move straight to G (no deviation) if collision is deemed impossible. Otherwise, an angle of deviation needs to be determined. Due to constraint 4, the present position of the NOF could be any of the neighboring obstacles such that the distance between the center of each of these obstacles and the center of (A22, B21) is approximately equal to the obstacle diameter. For the purpose of explaining how the deviation angle is to be determined for every possible pair of predicted angle and distance of the NOF, a rough representation of 8 critical neighboring obtacles is considered. These are: (A21, B20), (A22, B20), (A23, B20), (A21, B21), (A23, B21), (A21, B22), (A22, B22) and (A23, B22). If the segment between the present position of the robot and the point it reaches after ΔT time, if it moves straight to G, penetrates the square formed by the outer tangent lines to the noted 8 obstacles, a deviation from the straight line between the robot and target point is required. Otherwise, no deviation is necessary. The amount of deviation is to be specified based on having the robot move in a direction that is just sufficient to avoid hitting not only the predicted obstacle position, but also any possible present position devised [44]. The emphasis has been not only on obtaining collision-free paths, but also on the optimization of travel time or path between the start and target points of the robot. Genetic algorithms have, therefore, been used to come up with an optimal or near optimal fuzzy rule-base off-line by employing a number of user-defined scenarios. Although the noted fuzzy-genetic approach provided good testing results on scenarios some of which were used in training and others were not, it had its limitations. A different set of rules needed to be determined for every specific number of moving obstacles. The approach presented in this section considers the off-line derivation of a general fuzzy rule-base; that is, a base that can be used on-line by the robot independently of the number of moving obstacles [45]. This is achieved using the data-driven learning algorithm presented in Section 4 and by devising a method for the derivation of the training data based on the general setting of the DMP problem and not on specific scenarios. Collision-free paths and reduction of travel time are still within the goals considered in the derivation of the fuzzy logic controller (FLC). Furthermore, the noise insensitivity and generalization capability of the FLC construction algorithm are emphasized again here and tested in this practical control case. Comparison of the results with those obtained by the fuzzy-genetic one and ANFIS is also done. The robot needs to move from a start point S to a target point G located in some quadrant where the moving obstacles exist. The purpose is to find an obstacle-free path, which takes the robot from S to G with minimum time. A fuzzy controller represented by a set of fuzzy inference rules is to be constructed to achieve this objective. The robot moves incrementally from one point to another in accordance with time steps, each of duration ΔT, and at the end of each step it needs to decide on the movement direction. Due to the problem objective, once the robot is at some point it needs to consider moving in a straight line towards the target point unless the information collected about the moving obstacles tells otherwise due to a possible collision. Hence, the information that needs to be obtained has to relate, in principle, to the position of each obstacle and its velocity relative to the robot position; i.e., the obstacle velocity vector. But, since the robot knows the position of each obstacle at every time step, an alternative to the use of the relative velocity can be the present and predicted positions of each obstacle. The predicted position can be computed based on the obstacle present and previous positions. Ppredicted is assumed the linearly extrapolated position of each obstacle from its present position Ppresent along the line formed by joining Ppresent and Pprevious . Thus, Ppredicted = Ppresent + ( Ppresent - Pprevious) But, to process all this information by the robot controller is difficult. The procedure that can be applied here, and which leads to a simplification of the controller structure, consists of using the collected information to determine the “nearest obstacle forward” (NOF) to the robot [44]. Then, only the information related to this obstacle is used by the FLC to provide decisions. The NOF is the obstacle located in front of the robot and with velocity vector pointing towards the line joining the robot position to the target point. In this way it needs to constitute the most possible collision danger relative to other obstacles whether the robot chooses to move straight to the target (Figure 7). The NOF can equivalently 10 G O 3= N O F O2 V2 G RO 3 = an gle O3 V3 R O 3 = distan ce D O1 G RD = deviation V1 R S Figure 7. Illustration of NOF, angle, distance and deviation. Figure 8. A general configuration of the DMP problem used in the data derivation. An g D ist D ev An g D ist D ev An g D ist D ev An g D ist D ev -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 90 90 90 90 90 90 90 0.3 0.5 0.7 1 1.2 1.4 1.5 1.8 2 4 5 15 24 0.3 0.5 0.7 1 1.2 1.4 1.5 33 15 0 0 0 0 0 0 0 0 0 0 0 -33 -15 0 0 0 0 0 90 90 90 90 90 -45 -45 -45 -45 -45 -45 -45 -45 -45 -45 -45 45 45 45 45 1.8 2 4 5 15 0.3 0.5 0.7 1 1.2 1.4 1.8 2 2.4 5 20 0.3 0.5 0.7 1 0 0 0 0 0 90 90 62 15 0 0 0 0 0 0 0 -90 -90 -62 -15 45 45 45 45 45 45 -22 -22 -22 -22 -22 -22 -22 -22 -22 -22 -22 22 22 22 1.2 1.4 1.8 2 2.4 20 0.3 0.5 0.7 1 1.2 1.4 1.8 3 10 20 24 0.3 0.5 0.7 0 0 0 0 0 0 90 90 90 33 25 13 0 0 0 0 0 -90 -90 -90 22 22 22 22 22 22 22 22 0 0 0 0 0 0 0 0 0 0 0 0 1 1.2 1.4 1.8 2 3 10 20 0.3 0.5 0.7 1 1.2 1.4 1.8 2.4 3 4 10 20 -33 -25 -13 0 0 0 0 0 90 90 90 57 45 37 25 15 0 0 0 0 Table 2. Input-output data pairs obtained using the method described in Section 6. A n g l e A 1 A 2 A 3 A 4 A 5 A 6 A 7 D i s t a n c e D 1 V 7 V 9 V 9 V 9 V 1 V 1 V 4 D 2 V 6 V 9 V 9 V 9 V 1 V 1 V 4 D 3 V 5 V 8 V 9 V 9 V 1 V 2 V 5 D 4 V 5 V 6 V 7 V 8 V 4 V 4 V 5 D 5 V 5 V 5 V 6 V 7 V 4 V 5 V 5 D 6 V 5 V 5 V 6 V 7 V 4 V 5 V 5 D 7 V 5 V 5 V 5 V 6 V 5 V 5 V 5 D 8 V 5 V 5 V 5 V 5 V 5 V 5 V 5 Table 3. Final fuzzy system obtained by learning. 11 to 90 degrees respectively. These ranges are considered to account for all possible values. The control surface of the FLC, whose rules are shown in Table 3, is given in Figure 10, the root mean square error is 4.679 and the δ parameter value is 0.5. The obtained FLC is tested on various scenarios containing different numbers of obstacles. The cases of 3, 5, and 2 cases of 8 obstacles are considered and the simulation results are presented in Figure 11. In all the cases, the robot travels from point S to point G without hitting any of the obstacles. Also, the traveled paths are optimal in the sense that the deviations which took place at the end of every time step are in most cases just as necessary in order for the robot to remain as close as possible to the robot-destination direct path while not colliding with the obstacles. Moreover, two of these scenarios (Figures 11(a) and 11(d)) were presented in [44] and had obstacles with distinct diameters. Some had diameters close to the one considered in this study, and others with larger diameters. Despite this, the robot path chosen by the constructed FLC does not hit any of the obstacles. This shows that the constructed FLC can work properly for obstacles whose diameter values differ from the one used in the data derivation. Of course, a significant increase in the diameters would make the chances of having the robot hitting the obstacles higher. Table 4 shows the distance ratio (traveled distance/direct distance) using the presented data-driven fuzzy approach and the fuzzy-genetic one. The ratio in the fuzzy methodology for the case of 3 obstacles is a bit higher than that obtained in [44]. Thus, given that the robot speed in moving from one point to another in accordance with the previously-noted incremental time steps is the same in both approaches, then slightly higher time duration is required in our approach for the robot to reach destination. This result, however, is quite acceptable given the fact that the presented methodology is general in the sense that it can be applied independently of the number of moving obstacles. The data points in Table 2 are used to construct a Takagi-Sugeno type fuzzy system by applying ANFIS [13] and compare the results with those obtained above using the algorithm presented in Section 4. Although the data in Table 2 were determined using measuring instruments, and they are thus, noisy, care has been taken to make these data points as accurate as possible. Due to this and the fact that the actual values of the data are unknown, we consider the data in Table 2 as approximately noise-free. To be surer of the existence of noise in the data, arbitrary small modifications will be introduced later to some angles of deviations to make the corresponding data pairs noisy. Under these circumstances, a comparison between the results of the algorithm in Section 4 and ANFIS will again be done. Testing the algorithm for generalization and noise insensitivity plus generalization will also be investigated. M e m b e r s h ip A 1 A 1 A 2 A 3 4 A 0 2 2 A 5 A 6 7 0 .5 0 -9 0 -2 2 -4 5 4 5 9 0 (a ) A N G L E M e m b e r s h ip 1 D D 2 3 D D 4 5 D 6 D D 7 8 ≈ D 1 0 .5 ≈ 0 0 .3 .7 .5 1 1 .2 1 .4 1 .8 2 2 5 (b ) D IS T A N C E M e m b e r s h ip V 1 V 2 V 3 V 4 V 5 V V 6 V 7 8 V 9 1 0 .5 0 -9 0 -6 0 -3 0 0 3 0 6 0 9 0 (c ) D E V IA T IO N Figure 9. Input and output membership functions used in learning. Figure 10. Control surface of the FLC constructed using the algorithm in Section 4 and the data in Table 2. The angle and deviation are in degrees and the distance is in meters. of the NOF; i.e., any of the above-described neighboring obstacles, which are guaranteed to reside inside the previously-noted square. Among the two movement directions RD1 and RD2, which lead to the avoidance of the obstacles positions, the one with the smallest deviation angle; i.e., RD1 is chosen. This serves the travel path reduction objective. Now, based on the problem configuration in Figure 8 and the described general approach for the determination of the necessary deviation for every possible pair of predicted distance and angle of the NOF, various locations of the NOF within the noted quadrant were considered and accordingly input-output data were derived. The derived data points are shown in Table 2. These are obtained based on obstacle diameter equal to 0.5 meters and robot traveled distance in ΔT time equal to 2 meters. The data points in Table 2 were used in the fuzzy learning algorithm in Section 4 and a set of inference rules (Table 3) was obtained using the input and output membership functions (MF’s) shown in Figure 9. The ranges of the distance, angle and deviation as considered are from 0 to 25 meters, –90 to 90 degrees and –90 The ANFIS obtained FLC using the data in Table 2 and 8 triangular membership functions over each input variable has the input-output surface shown in Figure 12 and RMSE equal to 5.438. The surface in Figure 10 is much more consistent than Figure 12 with our expectations of the control surface as configured using the data in Table 2. This is especially true in the range of small distances where most of robot deviations are necessary. 12 (a) (b ) (c) (d ) Figure 11. Paths traveled by the robot in 4 scenarios using the FLC in Table 3: (a) 3 obstacles, (b) 5 obstacles and (c) and (d) have 8 obstacles each. OBSTACLES DIRECT DISTANCE (m) 3 14 5 14 8 20 8 20 TRAVELED DISTANCE (m) 15 15.8 23 20.9 RATIO (OUR APPROACH) 1.07 1.286 1.15 1.045 RATIO (GENETIC) 1.046 1.05 Table 4. Traveled distance and ratios for the presented approach and the fuzzy-genetic one. Figure 12. Control Surface of the FLC constructed by ANFIS using the data in Table 2. The angle and deviation are in degrees and the distance is in meters. 13 (a) (b ) (c) (d ) Figure 13. Paths traveled by the robot in 4 scenarios using the FLC obtained by ANFIS: (a) 3 obstacles, (b) 5 obstacles and (c) and (d) have 8 obstacles each. Ang -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 90 90 90 90 90 90 90 Dist 0.3 0.5 0.7 1 1.2 1.4 1.5 1.8 2 4 5 15 24 0.3 0.5 0.7 1 1.2 1.4 1.5 Dev 33 15 0 0 0 0 0 0 0 0 0 0 0 -33 -15 0 0 0 0 0 Ang 90 90 90 90 90 -45 -45 -45 -45 -45 -45 -45 -45 -45 -45 -45 45 45 45 45 Dist 1.8 2 4 5 15 0.3 0.5 0.7 1 1.2 1.4 1.8 2 2.4 5 20 0.3 0.5 0.7 1 Dev 0 0 0 0 0 86 87 58 13 -2 0 -2 0 -3 0 0 -86 -86 -58 -13 Ang 45 45 45 45 45 45 -22 -22 -22 -22 -22 -22 -22 -22 -22 -22 -22 22 22 22 Dist 1.2 1.4 1.8 2 2.4 20 0.3 0.5 0.7 1 1.2 1.4 1.8 3 10 20 24 0.3 0.5 0.7 Dev 0 2 0 3 0 2 90 87 86 33 21 13 0 -3 0 -2 0 -87 -90 -87 Ang 22 22 22 22 22 22 22 22 0 0 0 0 0 0 0 0 0 0 0 0 Dist 1 1.2 1.4 1.8 2 3 10 20 0.3 0.5 0.7 1 1.2 1.4 1.8 2.4 3 4 10 20 Dev -31 -22 -13 0 1 0 2 0 87 90 88 59 45 35 24 14 0 2 -1 0 Table 5. Input-output data pairs obtained by modifying the deviation in 33 data pairs in Table 3. 14 Figure 14. Control surface of the FLC constructed by ANFIS using the data in Table 5. The angle and deviation are in degrees and the distance is in meters. (a ) (b ) (c) (d ) Figure 15. Paths traveled by the robot in 4 scenarios using the FLC obtained by ANFIS with the data in Table 5: (a) 3 obstacles, (b) 5 obstacles and (c) and (d) have 8 obstacles each. 15 they have not been spelled out, discussed, validated nor tested as is done in this study. The following summarized results and conclusions are to be noted. When the data pairs are noise-free and complete, the proposed algorithm provided lower error values than the methods based on clustering and fuzzy partitions. Further, the fuzzy partition neural-network method, clustering-position-gradient and ANFIS gave lower error values than the presented algorithm. In the example given in Section 5 and the robot navigation case, however, ANFIS gave an inferior function-shape representation. Furthermore, when the data are noisy, which is the practical case (Section 2), it has been shown in the non-linear function case and robot navigation that the proposed fuzzy learning algorithm (Section 4) has a performance preference over ANFIS. The proposed algorithm is also capable of generalization. It extrapolates very reliably to regions of missing data. It is also able to combat noise and generalize simultaneously. In addition, the algorithm provides readable fuzzy controllers. That is, controllers which can be interpreted linguistically in a simple manner. The T-S type fuzzy models, as was explained in Section 3, cannot furnish this linguistic aspect. The fuzzy learning algorithm in Section 4 has also been shown capable of providing fuzzy if-then rules that can be represented in the form of a rule-table, it has only one parameter to be tuned and the setting of initial rules can easily be done. The algorithm also permits thinking tolerant to imprecision. This can be concluded from the learning procedure, which is based on error reduction rather than error minimization and customized setting of the error threshold to reflect the level of precision needed in a particular situation. It can also be concluded from the resulting noise insensitivity and generalization capability of the algorithm. It is worth noting here as well that the learning procedure in the introduced algorithm is independent of the form of the error function and also of the shape of the fuzzy system membership functions. Further strengthening of the drawn conclusions should later on come up through the application of the algorithm and the other design methods to more non-linear functions and practical control cases while accounting for the criteria defined in Section 2. This can be done by programming the referenced methods, especially those that have recently accounted for the issues of noise and generalization, and testing them in different types of data. The testing of the ANFIS obtained fuzzy system on the 4 scenarios in Figure 11 gave the robot trajectories shown in Figure 13. The ANFIS obtained trajectories lengths were respectively 20.2, 17.3, 23.4 and 25.9 meters for cases (a), (b), (c) and (d) in Figure 13. These are therefore larger than the paths lengths in Figure 11, which were obtained using the fuzzy algorithm (Table 4). Also, ANFIS gave two collision cases; one in (c) and another in (d). Now, we consider the data points in Table 5. These were obtained from Table 2 by randomly introducing between 1 to 4 degrees modifications to 33 of the deviation angles and in such a manner to adequately cover the input space. The use of the data in Table 5 in the algorithm described in Section 4 gave the same FLC as the one in Table 3 with the same parameter value. Thus, the control surface is the one shown in Figure 10. Hence, the robot paths shown in Figure 11 remain unchanged. The ANFIS obtained system, however, turned out to be different from the one obtained with the use of the original data (Table 2). It had the surface shown in Figure 14. Figure 15 shows the testing of the ANFIS given system on the 4 scenarios in Figure 11. Regarding the robot paths shown in Figure 15, they have the following respective lengths for cases (b), (c) and (d): 23, 26.1, and 26 meters. In case (a), the robot is not able to reach destination and in case (d) one hit occurs. ANFIS, therefore gave here longer trajectories than those in Figure 13 and in Figure 11. Furthermore, the fuzzy controller construction algorithm was tested for generalization by eliminating 8 data points from those listed in Table 2. The eliminated points were those with input pairs located in the input region for 0°<Angle≤90° and 2 meters <Distance≤25 meters. The learning was therefore based on the remaining 72 data. The fuzzy system whose rules are given in Table 3 was again obtained with the same parameter value. The control surface is, thus, as given in Figure 10. In addition, the same 8 points were eliminated from the data listed in Table 5. The learning based on the remaining 72 data points also gave the fuzzy system in Table 3 and the same parameter value. The presented results validate the fact that the algorithm is capable of combating noise, which is usually present in the data, generalizing to regions of missing data and combating noise and generalizing simultaneously. The comparison of Figures 12 and 14 and also Figures 13 and 15 shows that the fuzzy systems constructed by ANFIS are noise sensitive. They are also sensitive to data exclusion and do not present a good generalization capability. This was concluded by looking at the ANFIS obtained surfaces using the above-noted 72 data points remaining after the exclusion of the mentioned 8 points from Tables 2 and 5. These surfaces, in fact, turned out to be different and no better than the ones presented in Figures 12 and 14 and the testing on the 4 scenarios in Figure 11 did not lead to any improvement over those obtained in Figures 13 and 15. REFERENCES [1] Takagi T., Sugeno M., “Fuzzy identification of systems and its application to modeling and control,” IEEE Trans. Systems, Man and Cybernetics, SMC-15, 1, pp.116-132, 1985. [2] Horikawa S., Furuhashi T., Uchikawa Y., “On fuzzy modeling using fuzzy neural networks with the backpropagation algorithm,” IEEE Trans. Neural Networks, Vol. 3, No. 5, pp.801-806, 1992. [3] Jang JSR., “Self-learning fuzzy controllers based on temporal back-propagation,” IEEE Trans. Neural Networks, Vol. 3, No. 5, pp. 714-723, 1992. [4] Nomura H., Hayashi I., Wakami N., “A learning method of fuzzy inference rules by descent method,” IEEE Int’l Conf. on Fuzzy Systems, San Diego, California, March 18-22, 1992, pp. 203-210. VII. SUMMARY AND CONCLUSIONS In this study, performance criteria, which need to be considered to test and compare data-driven fuzzy system modeling algorithms using non-linear control functions, have been defined based on a practical perspective and on important aspects of intelligent human thinking represented by approximate reasoning and generalization. Hence, priority has been given to the criteria accounting for noisy and incomplete training data. Despite the extreme importance of these performance criteria, 16 Conference, Palma de Mallorca, Spain, September 22-25, 1999, pp. 55-58. [26] Saade J.J, Al-Khatib M., “Efficient representation of non-linear functions by fuzzy controllers design algorithms,” 7th IEEE Int’l Conference on Electronics, Circuits and Systems, December 17-20, 2000, Beirut, Lebanon, pp. 554-557. [27] Oh S., Pedrycz W., “Identification of fuzzy systems by means of an auto-tuning algorithm and its application to non-linear systems,” Fuzzy Sets and Systems, Vol. 115, pp. 205-230, 2000. [28] Branco P.J.C., Dente J.A., “Fuzzy systems modeling in practice,” Fuzzy Sets and Systems, Vol. 121, pp. 73-93, 2001. [29] Leski J.M., “Neuro-fuzzy systems with learning tolerant to imprecision,” Fuzzy Sets and Systems, Vol. 138, pp. 427-439, 2003. [30] Zadeh L. A., “Outline of a new approach to the analysis of complex systems and decision processes,” IEEE Trans. Systems, Man and Cybernetics, Vol. SMC-3, No. 1, pp. 28-44, 1973. [31] Saade J.J. “A unifying approach to defuzzification and comparison of the outputs of fuzzy controllers,” IEEE Trans. Fuzzy Systems, Vol. 4, No. 3, pp. 227-237, 1996. [32] Klir G., Yuan B., Fuzzy Sets and Fuzzy Logic, Theory and Applications, Prentice Hall PTR, New Jersey, 1995. [33] Saade J.J. and Schwarzlander H., “Ordering fuzzy sets over the real line: an approach based on decision making under uncertainty,” Fuzzy Sets and Systems, Vol. 50, pp. 237-246, 1992. [34] Fujimura K., Samet H., “A hierarchical strategy for path planning among moving obstacles,” IEEE Trans. Robotics and Automation, Vol. 5, pp. 61-69, 1989. [35] Griswold N.C., Eem J., “Conrtol for mobile robots in the presence of moving objects,” IEEE Trans. Robotics and Automation, Vol. 6, pp. 263-268, 1990. [36] Lamadrid J.G., “Avoidance of obstacles with unknown Trajectories: Locally optimal paths and periodic sensor readings,” Int.J. Robotics Research , pp. 496-507, 1994. [37] Fiorini P., Shiller Z., “Motion planning in dynamic environments using the relative velocity paradigm,” Proc. IEEE Conference on Robotics and Automation, 1993, pp. 560-565. [38] Barraquand J., Langlois B., Latombe J.C., “Numerical potential field techniques for robot path planning,” IEEE Trans. Systems, Man and Cybernetics, Vol. 22, pp. 224-241, 1992. [39] Donnart J.Y., Meyer J.A., “Learning reactive and planning rules in a motivationally autonomous animat,” IEEE Trans. Systems, Man and Cybernetics B, Vol. 26, pp. 381-395, 1996. [40] Floreano D., Mondada F., “Evolution of homing navigation in a real mobile robot,” IEEE Trans. Systems, Man and Cybernetics B, Vol. 26, pp. 396-407, 1996. [41] Beaufrere B., Zeghloul S., “A mobile robot navigation method using a fuzzy logic approach,” Robotica, Vol. 13, pp. 437-448, 1995. [42] Matinez A., et al., “Fuzzy logic based collision avoidance for a mobile robot,” Robotica, Vol. 12, pp. 521-527, 1994. [43] Pin F.G., Watanabe Y., “Navigation of mobile robots using a fuzzy behaviorist approach and custom-designed fuzzy inferencing boards,” Robotica, Vol. 12, pp. 491-503, 1994. [44] Pratihar D.K., Deb K., Ghosh A., “A genetic-fuzzy approach for mobile robot navigation among moving obstacles,” Int. J. Approximate Reasoning, Vol. 20, pp. 145-172, 1999. [45] Al-Khatib M., Saade J. J., “An efficient data-driven fuzzy approach to the motion planning problem of a mobile robot,” Fuzzy Sets and Systems, Vol. 134, pp. 65-82, 2003. [5] Shi Y., Mizumoto M., “Some considerations on conventional neuro-fuzzy learning algorithms,” Fuzzy Sets and Systems, Vol. 112, pp. 51-63, 2000. [6] Shi Y., Mizumoto M., “A new approach of neuro-fuzzy learning algorithm for tuning fuzzy rules,” Fuzzy Sets and Systems, Vol. 112, pp. 99-116, 2000. [7] Lin Y., Cunningham III G.A., Coggeshall S.V., “Using fuzzy partitions to create fuzzy systems from input-output data and set the initial weights in a fuzzy neural network,” IEEE Trans. Fuzzy Systems, Vol. 5, No. 4, pp. 614-621, 1997. [8] Klawonn F., Kruse R., “Constructing a fuzzy controller from data,” Fuzzy Sets and Systems, Vol. 85, pp.177-193, 1997. [9] Tarng Y.S., Yeh Z.M., Nian C.Y., “Genetic synthesis of fuzzy logic controllers in turning,” Fuzzy Sets and Systems, Vol. 83, pp. 301-310, 1996. [10] Delgado M., Skarmita A.F.G., Martin F., “A fuzzy clustering-based rapid prototyping for fuzzy rule-based modeling,” IEEE Trans. Fuzzy Systems, Vol. 5, No. 2, pp. 223-233, 1997. [11] Carse B., Fogarty T.C., Munro A., “Evolving fuzzy rule-based controllers using genetic algorithms,” Fuzzy Sets and Systems, Vol. 80, pp. 273-293, 1996. [12] Chen J.Q., Xi Y.G., Zhang Z.J., “A clustering algorithm for fuzzy model identification,” Fuzzy Sets and Systems, Vol. 98, pp. 319-329, 1998. [13] Jang J.S.R., “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Systems, Man and Cybernetics, Vol. 23, No. 3, pp. 665-685, 1993. [14] Jang J.S.R., Sun C.T., “Neuro-fuzzy modeling and control,” Proceedings of the IEEE, Vol. 83, No. 3, pp. 378-406, 1995. [15] Shi Y., Mizumoto M., “An improvement of neuro-fuzzy learning algorithm for tuning fuzzy rules,” Fuzzy Sets and Systems, Vol. 118, pp. 339-350, 2001. [16] Sugeno M., Yasukawa T., “A fuzzy-logic-based approach to qualitative modeling,” IEEE Trans. on Fuzzy Systems, Vol. 1, No. 1, pp. 7-31, 1993. [17] Siarry P., Guely F., “A genetic algorithm for optimizing Takagi-Sugeno fuzzy rule-bases,” Fuzzy Sets and Systems, Vol. 99, pp. 37-47, 1998. [18] Papadakis S.E., Theocharis J.B., “A GA-based fuzzy modeling approach for generating TSK models,” Fuzzy Sets and Systems, vol. 131, pp. 121-152, 2002. [19] Lee C.W., Shin Y.C., “Construction of fuzzy systems using least-squares method and genetic algorithm,” Fuzzy Sets and Systems, Vol. 137, pp. 297-323, 2003. [20] Saade J.J. “A defuzzification based new algorithm for the design of Mamdani-type fuzzy controllers,” Mathware and Soft Computing, Vol. 7, pp. 159-173, 2000. [21] Buckley J.J., Hayashi Y., “Numerical relationships between neural networks, continuous functions, and fuzzy systems,” Fuzzy Sets and Systems, Vol. 60, pp.1-8, 1993. [22] Klawonn F, Novak V. The relation between inference and interpolation in the framework of fuzzy systems, Fuzzy Sets and Systems, Vol. 81, 1996, pp. 331-354. [23] Koczy L.T., Zorat A., “Fuzzy systems and approximation,” Fuzzy Sets and Systems, Vol. 85, pp. 203-222, 1997. [24] Lee J., Chae S., “Analysis on function duplicating capabilities of fuzzy controllers,” Fuzzy Sets and Systems, Vol. 56, pp.127-143, 1993. [25] Saade J.J., “A new algorithm for the design of Mamdani-type fuzzy controllers,” Proc. Of the EUSFLAT-ESTYLF Joint 17 Merging data sources based on semantics, contexts and trust Lovro Šubelj, David Jelenc, Eva Zupančič, Dejan Lavbič, Denis Trček, Marjan Krisper and Marko Bajec Abstract—Matching and merging of data from heterogeneous sources is a common need in various scenarios. Despite numerous algorithms proposed in the recent literature, there is a lack of general and complete solutions combining different dimensions arising during the matching and merging execution. We propose a general framework, and accompanying algorithms, that allow joint control over various dimensions of matching and merging. To achieve superior performance, standard (relational) data representation is enriched with semantics and thus elevated towards the real world situation. Data sources are merged using collective entity resolution and redundancy elimination algorithms that are managed through the use of different contexts – user, data and also trust contexts. Introduction of trust allows for an adequate trust management and efficient security assurance which is, besides a general solution for matching and merging, the main novelty of the proposition. Index Terms—merging data, semantic elevation, context, trust management, entity resolution, redundancy elimination. ✦ 1 I NTRODUCTION W the recent advent of Semantic Web and open (on-line) data sources, merging of data from heterogeneous sources is rapidly becoming a common need in various fields. Different scenarios of use include analyzing heterogeneous datasets collectively, enriching data with some on-line data source or reducing redundancy among datasets by merging them into one. Literature provides several state-of-theart approaches for matching and merging, although there is a lack of general solutions combining different dimensions arising during the matching and merging execution. We propose a general and complete solution that allows a joint control over these dimensions. Data sources commonly include not only relational data, but also semantically enriched data. Thus a state-of-the-art solution should employ semantically elevated algorithms, to fully exploit the data at hand. However, due to a vast diversity of data sources, also an adequate data architecture has to be employed. In particular, the architecture should support all types and formats of data, and provide appropriate data for each algorithm. As algorithms favor different representations and levels of semantics behind the data, architecture should be structured appropriately. Due to different origin of (heterogeneous) data sources, the trustworthiness (or accuracy) of their data can often be questionable. Specially, when many such datasets are merged, the results are likely to be inexact. A common approach for dealing with data sources that provide untrustworthy or conflicting ITH • L. Šubelj, D. Jelenc, E. Zupančič, D. Lavbič, D. Trček, M. Krisper and M. Bajec are with University of Ljubljana, Faculty of Computer and Information Science. statements, is the use of trust management systems and techniques. Thus matching and merging should be advanced to a trust-aware level, to jointly optimize trustworthiness of data and accuracy of matching or merging. Such collective optimization can significantly improve over other approaches. The article proposes a general framework for matching and merging execution. An adequate data architecture enables either pure relational data, in the form of networks, or semantically enriched data, in the form of ontologies. Different datasets are merged using collective entity resolution and redundancy elimination algorithms, enhanced with trust management techniques. Algorithms are managed through the use of different contexts that characterize each particular execution, and can be used to jointly control various dimensions of variability of matching and merging execution. The rest of the article is structured as follows. The following section gives a brief overview of the related work, focusing mainly on trust-aware matching and merging. Next, section 3, presents employed data architecture and discusses semantic elevation of the proposition. Section 4 formalizes the notion of trust and introduces the proposed trust management techniques. General framework, and accompanying algorithms, for matching and merging are presented in section 5, and further discussed in section 6. Section 7 concludes the article. 2 R ELATED WORK Recent literature proposes several state-of-the-art solutions for matching and merging data sources. Relevant work and approaches exist in the field of data 18integration [1], [2], [3], [4], data deduplication [5], [6], [7], information retrieval, schema and ontology matching [8], [9], [10], [11], and (relational) entity resolution [1], [12], [13]. However, the propositions mainly address only selected issues of more general matching and merging problem. In particular, approaches only partially support the variability of the execution; commonly only homogeneous sources, with predefined level of semantics, are employed; or the approaches discard the trustworthiness of data and sources of origin. Literature also provides various trust-based, or trust-aware, approaches for matching and merging [14], [15]. Although they formally exploit trust in the data, they do not represent a general or complete solution. Mainly, they explore the idea of Web of Trust, to model trust or belief in different entities. Related work (on Web of Trust) exists in the fields of identity verification [16], information retrieval [17], [18], social network analysis [19], [20], data mining and pattern recognition [21], [22]. Our work also relates to more general research of trust management and techniques that provide formal means for computing with trust (e.g. [23]). 3 DATA 3.1 Representation with networks Most natural representation of any relational domain are networks. They are based upon mathematical objects called graphs. Informally speaking, graph consists of a collection of points, called vertices, and links between these points, called edges (Fig. 1). Let VN , EN be a set of vertices, edges for some graph N respectively. We define N as N = (VN , EN ) where VN EN = ⊆ {v1 , v2 . . . vn }, {{vi , vj }| vi , vj ∈ VN ∧ i < j}. (1) (2) Edges are sets of vertices, hence they are not directed (undirected graph). In the case of directed graphs equation (2) rewrites to EN ⊆ {(vi , vj )| vi , vj ∈ VN ∧ i 6= j}, (3) where (vi , vj ) is an edge from vi to vj . The definition can be further generalized by allowing multiple edges between two vertices and loops (edges that connect vertices with themselves). Such graphs are called multigraphs (Fig. 1 (b)). ARCHITECTURE An adequate data architecture is of vital importance for efficient matching and merging. Key issues arising are as follows: (1) architecture should allow for data from heterogeneous sources, commonly in various formats; (2) semantical component of data should be addressed properly; and (3) architecture should also deal with (partially) missing and uncertain data. To achieve superior performance, we propose a three level architecture (Fig. 3). Standard relational Fig. 1. (a) directed graph; (b) labeled undirected multidata representation on the bottom level (data level) graph (labels are represented graphically); (c) network is enriched with semantics (semantic level) and thus representing a group of related traffic accidents (round elevated towards the topmost real world level (abstract vertices correspond to participants and cornered corlevel). Datasets on data level are represented with respond to vehicles). networks, when the semantics are employed through the use of ontologies. In practical applications we commonly strive to Every dataset is (preferably) represented on data store some additional information along with the verand semantic level. Although both describe the same tices and edges. Formally, we define labels or weights set of entities on abstract level, the representation on for each node and edge in the graph – they represent each level is independent from the other. This separa- a set of properties that can also be described using tion resides from the fact that different algorithms of two attribute functions matching and merging execution privilege different (4) AVN : VN → ΣV1 N × ΣV2 N × . . . , representations of data – either pure relational or EN EN (5) AEN : EN → Σ1 × Σ2 × . . . , semantically elevated representation. Separation thus V E G G results in more accurate and efficient matching and AN = (AVN , AEN ), where Σi , Σi are sets of all merging, moreover, representations can complement possible vertex, edge attribute values respectively. each other in order to boost the performance. Networks are most commonly seen as labeled, or The following section gives a brief introduction weighted, multigraphs with both directed and undito networks, used for data level representation. Sec- rected edges (Fig. 1 (c)). Vertices of a network repretion 3.2 describes ontologies and semantic elevation of sent some entities, and edges represent relations bedata level (i.e. semantic level). Proposed data architec- tween them. A (relational) dataset, represented with a ture is formalized and further discussed in section 3.3. 19network on the data level, is thus defined as (N, AN ). Fig. 2. Ontology representing various classes, relations and attributes related to traffic accidents and automobile insurance domain. Classes are colored with orange, relations with blue and attributes with green. Key concepts of the ontology are Event, Person, Driver, Witness, Owner and Vehicle. 3.2 Semantic elevation using ontologies Ontologies are a tool for specifying the semantics of terminology systems in a well defined and unambiguous manner [24] (Fig. 2). They can simply be defined as a network of entities, restricted and annotated with a set of axioms. Let EO , AO be the sets of entities, axioms for some ontology O respectively. Dataset, represented with an ontology on semantic level, is defined as O = (EO , AO ) where EO AO ⊆ ⊆ EC ∪ EI ∪ ER ∪ EA, a a {a| EO ⊆ EO ∧ a axiom on EO }. (6) (7) Entities EO consist of classes E C (concepts), individuals E I (instances), relations E R (among classes and individuals) and attributes E A (properties of classes); and axioms AO are assertions (over entities) in a logical form that together comprise the overall theory described by ontology O. This article focuses on ontologies based on descriptive logic that, besides assigning meaning to axioms, enable also reasoning capabilities. The latter can be used to compute consequences of the previously made assumptions (queries), or to discover non-intended consequences and inconsistencies within the ontology. With the advent of Semantic Web, ontologies are rapidly gaining importance. One of the most prominent applications of ontologies is in the domain of semantic interoperability (among heterogeneous software systems). While pure semantics concerns the study of meanings, semantic elevation means to achieve semantic interoperability and can be considered as a subset of information integration (including data access, aggregation, correlation and transformation). Thus one of the key aspects of semantic elevation is to derive a common representation of classes, individuals, relations and attributes within some ontology. We employ a concept of knowledge chunks [9], where each entity is represented with its name and a set of semantic relations (or attributes), their values and (ontology) identifiers. All of the data about a certain entity is thus transformed into attribute-value format, with an identifier of the data source of origin appended to each value. Knowledge chunks, denoted k ∈ K, thus provide a (common) synthetic representation of 20an ontology that is used during the matching and merging execution. For more details on knowledge chunks, and their construction from a RDF(S) (Resource Description Framework Schema) repository or an OWL (Web Ontology Language) ontology, see [9], [25]. Notion of knowledge chunks is introduced also on data level. Hence, each network is represented in the same, easily maintainable, form, allowing for common matching and merging algorithms. Exact description of the transformation between networked data and knowledge chunks is not given, although it is very similar to the definition of inferred axioms in equation (12). 3.3 Let (N, AN ) be a dataset, represented as a network on data level. Without loss for generality, we assume that N is an undirected network. Inferred ontology (ẼÕ , ÃÕ ) on semantic level is defined with ẼC ẼI ẼR ẼA Fig. 3. (a) information-based view of the data architecture; (b) data-based view of the data architecture. {vertex, edge}, VN ∪ EN , {isOf, isIn}, {AVN , AEN } (8) (9) (10) (11) and (12) ÃÕ = {v isOf vertex| v ∈ VN } ∪ {e isOf edge| e ∈ EN } ∪ {v isIn e| v ∈ VN ∧ e ∈ EN ∧ v ∈ e} ∪ {v.AVN = a| v ∈ VN ∧ AVN (v) = a} ∪ {e.AEN = a| e ∈ EN ∧ AEN (e) = a}. Three level architecture As previously stated, every dataset is (independently) represented on three levels – data, semantic and abstract level (Fig. 3). Bottommost data level holds data in a pure relational format (i.e. networks), mainly to facilitate state-of-the-art relational algorithms for matching. Next level, semantic level, enriches data with semantics (i.e. ontologies), to further enhance matching and to promote semantic merging execution. Data on both levels represent entities of topmost abstract level, which serves merely as an abstract (artificial) representation of all the entities, used during matching and merging execution. The information captured by data level is a subset of that of semantic level. Similarly, the information captured by semantic level is a subset of that of abstract level. This information-based view of the architecture is seen in Fig. 3 (a). However, representation on each level is completely independent from the others, due to absolute separation of data. This provides an alternative data-based view, seen in Fig. 3 (b). = = = = We denote IN : (N, AN ) 7→ (ẼÕ , ÃÕ ). One can −1 easily see that IN ◦ IN is an identity (transformation preserves all the information). On the other hand, given a dataset (EO , AO ), represented with an ontology on semantic level, inferred (undirected) network (Ñ , ÃÑ ) on data level is defined with ṼÑ ẼÑ = EO ∩ E I , = a {EO I ∩ E | a ∈ AO ∧ (13) a EO ⊆ EO } (14) and ÃṼ : ṼÑ → E C × E A , (15) ÃẼ : ẼÑ → E R . (16) Ñ Ñ Instances of ontology are represented with the vertices of the network, and axioms with its edges. Classes and relations are, together with the attributes, expressed through vertex, edge attribute functions. We denote IO : (EO , AO ) 7→ (Ñ , ÃÑ ). Transformation IO discards purely semantic information (e.g. relations between classes), as it cannot be represented on the data level. Thus IO cannot be inverted as IN . However, all the data, and data related information, is preserved (e.g. relations among individuals, and individuals and classes). Due to limitations of networks, only axioms, relating at most two individuals in EO , can be represented with the set of edges ẼÑ (equation (14)). When this is not sufficient, hypernetworks (or hypergraphs1 ) should be employed instead. Nevertheless, networks should suffice in most cases. One more issue has to be stressed. Although IN and IO give a “common” representation of every dataset, the transformations are completely different. For instance, presume (N, AN ) and (EO , AO ) are (given) representations of the same dataset. Then To manage data and semantic level independently (or jointly), a mapping between the levels is required. In practice, data source could provide datasets on both, data and semantic level. The mapping is in that case trivial (i.e. given). However, more commonly, data source would only provide datasets on one of 1. Hypergraphs are similar to ordinary graphs only that the edges 21can connect multiple vertices. the levels, and the other has to be inferred. IN (N, AN ) 6= (EO , AO ) and IO (EO , AO ) 6= (N, AN ) in general – inferred ontology, network does not equal given ontology, network respectively. The former nonequation resides in the fact that network (N, AN ) contains no knowledge of the (pure) semantics within ontology (EO , AO ); and the latter resides in the fact that IO has no information of the exact representation used for (N, AN ). Still, transformations IN and IO can be used to manage data on a common basis. Last, we discuss three key issues regarding an adequate data architecture, presented in section 3. Firstly, due to variety of different data formats, a mutual representation must be employed. As the data on both data and semantic level is represented in the form of knowledge chunks (section 3.2), every piece of data is stored in exactly the same way. This allows for common algorithms of matching and merging and makes the data easily manageable. Furthermore, introduction of knowledge chunks naturally deals also with missing data. As each chunk is actually a set of attribute-value pairs, missing data only results in smaller chunks. Alternatively, missing data could be randomly inputted from the rest and treated as extremely uncertain or mistrustful (section 4). Secondly, semantical component of data should be addressed properly. Proposed architecture allows for simple (relational) data and also semantically enriched data. Hence no information is discarded. Moreover, appropriate transformations make all data accessible on both data and semantic level, providing for specific needs of each algorithm. Thirdly, architecture should deal with (partially) missing and uncertain or mistrustful data, which is thoroughly discussed in the following section. 4 T RUST AND TRUST MANAGEMENT When merging data from different sources, these are often of different origin and thus their trustworthiness (or accuracy) can be questionable. For instance, personal data of participants in a traffic accident is usually more accurate in the police record of the accident, then inside participants’ social network profiles. Nevertheless, an attribute from less trusted data source can still be more accurate than an attribute from more trusted one – a relationship status (e.g. single or married) in the record may be outdated, while such type of information is inside the social network profiles quite often up-to-date. A complete solution for matching and merging execution should address such problems as well. A common approach for dealing with data sources that provide untrustworthy or conflicting statements, is the use of trust management (systems). These are, alongside the concept of trust, both further discussed in sections 4.1 and 4.2. 4.1 Definition of trust Trust is a complex psychological-sociological phenomenon. Despite of, people use term trust in everyday life widely, and with very different meanings. Most common definition states that trust is an assured reliance on the character, ability, strength, or truth of someone or something. In the context of computer networks, trust is modeled as a relationship between entities. Formally, we define a trust relationship as ωE : E × E → ΣE (17) where E is a set of entities and ΣE a set of all possible, numerical or descriptive, trust values. ωE thus represents one entity’s attitude towards another and is used to model trust(worthiness) TE of all entities in E. To this end, different trust modeling methodologies and systems can be employed, from qualitative to quantitative (e.g. [14], [15], [23]). We introduce trust on three different levels. First, we define trust on the level of data source, in order to represent trustworthiness of the source in general. Let S be the set of all data sources. Their trust is defined as TS : S → [0, 1], where higher values of TS represent more trustworthy source. Second, we define trust on the level of attributes (or semantic relations) within the knowledge chunks. The trust in attributes is naturally dependent on the data source of origin, and is defined as TAs : As → [0, 1], where As is the set of attributes for data source s ∈ S. As before, higher values of TAs represent more trustworthy attribute. Last, we define trust on the level of knowledge chunks. Despite the trustworthiness of data source and attributes within some knowledge chunk, its data can be (semantically) corrupted, missing or otherwise unreliable. This information is captured using trustworthiness of knowledge chunks, and again defined as TK : K → [0, 1], where K is a set of all knowledge chunks. Although the trust relationships (equation (17)), needed for the evaluation of trustworthiness of data sources and attributes, are (mainly) defined by the user, computation of trust in knowledge chunks can be fully automated using proper evaluation function (section 4.2). Three levels of trust provide high flexibility during matching and merging. For instance, attributes from more trusted data sources are generally favored over those from less trusted ones. However, by properly assigning trust in attributes, certain attributes from else less trusted data sources can prevail. Moreover, trust in knowledge chunks can also assist in revealing corrupted, and thus questionable, chunks that should be excluded from further execution. Finally, we define trust in some particular value within a knowledge chunk, denoted trust value T . This 22is the value in fact used during merging and matching execution and is computed from corresponding trusts on all three levels. In general, T can be an arbitrary function of TS , TAs and TK . Assuming independence, we calculate trust value by concatenating corresponding trusts, T = T S ◦ T As ◦ T K . (18) Concatenation function ◦ could be a simple multiplication or some fuzzy logic operation (trusts should in this case be defined as fuzzy sets). 4.2 Trust management During merging and matching execution, trust values are computed using trust management algorithm based on [15]. We begin by assigning trust values TS , TAs for each data source, attribute respectively (we actually assign trust relationships). Commonly, only a subset of values must necessarily be assigned, as others can be inferred or estimated from the first. Next, trust values for each knowledge chunk are not defined by the user, but are calculated using the chunk evaluation function feval (i.e. TK = feval ). An example of such function is a density of inconsistencies within some knowledge chunk. For instance, when attributes Birth and Age of some particular knowledge chunk mismatch, this can be seen as an inconsistency. However, one must also consider the trust of the corresponding attributes (and data sources), as only inconsistencies among trustworthy attributes should be considered. Formally, density of inconsistencies is defined as feval (k) = N̂inc (k) − Ninc (k) N̂inc (k) , (19) where k is a knowledge chunk, k ∈ K, Ninc (k) the number of inconsistencies within k and N̂inc (k) the number of all possible inconsistencies. Finally, after all individual trusts TS , TAs and TK have been assigned, trust values T are computed using equation (18). When merging takes place and two or more data sources (or knowledge chunks) provide conflicting attribute values, corresponding to the same (resolved) entity, trust values T are used to determine actual attribute value in the resulting data source (or knowledge chunk). For further discussion on trust management during matching and merging see section 5. 5 M ATCHING AND MERGING DATA SOURCES Matching and merging is employed in various scenarios. As the specific needs of each scenario vary, different dimensions of variability characterize every matching and merging execution. These dimensions are managed through the use of contexts [9], [26]. Contexts allow a formal definition of specific needs arising in diverse scenarios and a joint control over various dimensions of matching and merging execution. The following section discusses the notion of contexts more throughly and introduces different types of contexts used. Next, sections 5.2, 5.3 describe employed entity resolution and redundancy elimination algorithms respectively. The general framework for matching and merging is presented and formalized in section 5.4, and discussed in section 6. 5.1 Contexts Every matching and merging execution is characterized by different dimensions of variability of the data, and mappings between. Contexts are a formal representation of all possible operations in these dimensions, providing for specific needs of each scenario. Every execution is thus characterized with the contexts it defines (Fig. 4), and can be managed and controlled through their use. The idea of contexts originates in the field of requirements engineering, where it has been applied to model domain variability [26]. It has just recently been proposed to model also variability of the matching execution [9]. Our work goes one step further as it introduces contexts, not bounded only to user or scenario specific dimensions, but also data related and trust contexts. Fig. 4. Characterization of merging and matching execution defining one context in user dimension, two contexts in data dimension and all contexts in trust dimension. Merging data from heterogeneous sources can be seen Formally, we define a context C as as a two-step process. The first step resolves the real world entities of abstract level, described by the data C : D → {true, f alse}, (20) on lower levels, and constructs a mapping between the levels. This mapping is used in the second step where D can be any simple or composite domain. A that actually merges the datasets at hand. We denote context simply limits all possible values, attributes, rethese subsequent steps as entity resolution (i.e. match- lations, knowledge chunks, datasets, sources or other, 23that are considered in different parts of matching and ing) and redundancy elimination (i.e. merging). merging execution. Despite its simple definition, a context can be a complex function. It is defined on any of the architecture levels, preferably on all. Let CA , CS and CD represent the same context on abstract, semantic and data level respectively. The joint context is defined as CJ = CA ∧ CS ∧ CD . (21) In the case of missing data (or contexts), only appropriate contexts are considered. Alternatively, contexts could be defined as fuzzy sets, to address also the noisiness of data. In that case, a fuzzy AND operation should be used to derive joint context CJ . We distinguish between three types of contexts due to different dimensions characterized (Fig. 4). user User or scenario specific contexts are used mainly to limit the data and control the execution. This type coincides with dimensions identified in [9]. An example of user context is a simple selection or projection of the data. data Data related contexts arise from dealing with relational or semantic data, and various formats of data. Missing or corrupted data can also be managed through the use of these contexts. trust Trust and data uncertainty contexts provide for an adequate trust management and efficient security assurance between and during different phases of execution. An example of trust context is a definition of required level of trustworthiness of data or sources. Detailed description of each context is out of scope of this article. For more details on (user) contexts see [9]. 5.2 Entity resolution altogether (in a collective fashion), is denoted collective (relational) entity resolution algorithm. We employ a state-of-the-art (collective) relational clustering algorithm proposed in [12]. To further enhance the performance, algorithm is semantically elevated and adapted to allow for proper and efficient trust management. The algorithm is actually a greedy agglomerative clustering approach. Entities (on lower levels) are represented as a group of clusters C, where each cluster represents a set of entities that resolve to the same entity on abstract level. At the beginning, each (lower level) entity resides in a separate cluster. Then, at each step, the algorithm merges two clusters in C that are most likely to represent the same entity (most similar clusters). When the algorithm unfolds, C holds a mapping between the entities on each level (i.e. maps entities on lower levels through the entities on abstract level). During the algorithm, similarity of clusters is computed using a joint similarity measure (equation (28)), combining attribute, relational and semantic similarity. First is a basic pairwise comparison of attribute values; second introduces relational information into the computation of similarity (in a collective fashion); and third represents semantic elevation of the algorithm. Let ci , cj ∈ C be two clusters of entities. Using knowledge chunk representation, attribute cluster similarity is defined as simA (ci , cj ) = P (22) trust(k .a, k .a)sim (k .a, k .a), i j A i j ki,j ∈ci,j ∧a∈ki,j where ki,j ∈ K are knowledge chunks, a ∈ As is an attribute and simA (ki .a, kj .a) similarity between two attribute values. (Attribute) similarity between two clusters is thus defined as a weighted sum of similarities between each pair of values in each knowledge chunk. Weights are assigned due to trustworthiness of values – trust in values ki .a and kj .a is computed using First step of matching and merging execution is to resolve the real world entities on abstract level, described by the data on lower levels. Thus a mapping between the levels (entities) is constructed and used in consequent merging execution. Recent literature proposes several state-of-the-art approaches for entity trust(ki .a, kj .a) = min{T (ki .a), T (kj .a)}. (23) resolution (e.g. [5], [1], [12], [13], [6]). A naive approach is a simple pairwise comparison of attribute Hence, when even one of the values is uncertain or values among different entities. Although, such an mistrustful, similarity is penalized appropriately, to approach could already be sufficient for flat data, this prevent matching based on (likely) incorrect informais not the case for relational data, as the approach tion. completely discards relations between the entities. For computation of similarity between actual atFor instance, when two entities are related to similar tribute values simA (ki .a, kj .a) (equation (22)), differentities, they are more likely to represent the same ent measures have been proposed. Levenshtein disentity. However, only the attributes of the related tance [27] measures edit distance between two strings entities are compared, thus the approach still discards – number of insertions, deletions and replacements the information if related entities resolve to the same that traverse one string into the other. Another class entities – entities are even more likely to represent the of similarity measures are TF-IDF2 -based measures same entities when their related entities resolve to, (e.g. Cos TF-IDF and Soft TF-IDF [28], [29]). They treat not only similar, but the same entities. An approach that uses this information, and thus resolves entities 24 2. Term Frequency-Inverse Document Frequency. attribute values as a bag of words, thus the order of words in the attribute has no impact on the similarity. Other attribute measures are also Jaro [30] and JaroWinkler [31] that count number of matching characters between the attributes. Different similarity measures prefer different types of attributes. TF-IDF-based measures work best with longer strings (e.g. descriptions), when other prefer shorter strings (e.g. names). For numerical attributes, an alternative measure has to be employed (e.g. simple evaluation, followed by a numerical comparison). Therefore, when computing attribute similarity for a pair of clusters, different attribute measures are used with different attributes (equation (22)). Using data level representation, we define a neighborhood for vertex v ∈ VN as nbr(v) = {vn | vn ∈ VN ∧ {v, vn } ∈ EN } (24) and cluster c ∈ C as nbr(c) = {cn | cn ∈ C ∧ v ∈ c ∧ cn ∩ nbr(v) 6= ∅}. (25) Neighborhood of a vertex is defined as a set of connected vertices. Similarly, neighborhood of a cluster is defined as a set of clusters, connected through the vertices within. For a (collective) relational similarity measure, we adapt a Jaccard coefficient [12] measure for trust-aware (relational) data. Jaccard coefficient is based on Jaccard index and measures the number of common neighbors of two clusters, considering also the size of the clusters’ neighborhoods – when the size of neighborhoods is large, the probability of common neighbors increases. We define P T T cn ∈nbr(ci )∩nbr(cj ) trust(ein , ejn ) (26) simR (ci , cj ) = |nbr(ci ) ∪ nbr(cj )| where eTin , eTjn is the most trustworthy edge connecting vertices in cn and ci , cj respectively (for the computation of trust(eTin , eTjn ), a knowledge chunk representation of eTin , eTjn is used). (Relational) similarity between two clusters is defined as the size of a common neighborhood (considering also the trustworthiness of connecting relations), decreased due to the size of clusters’ neighborhoods. Entities related to a relatively large set of entities that resolve to the same entities on abstract level, are thus considered to be similar. Alternatively, one could use some other similarity measure like Adar-Adamic similarity [32], random walk measures, or measures considering also the ambiguity of attributes or higher order neighborhoods [12]. For the computation of the last, semantic, similarity, we propose a random walk like approach. Using a semantic level representation of clusters ci , cj ∈ C, we do a number of random assumptions (queries) over underlying ontologies. Let Nass be the number of times the consequences (results) of the assumptions made matched, Ñass number of times the consequences were undefined (for at least one ontology) and N̂ass the number of all assumptions made. FurT be the trustworthiness of ontolthermore, let Nass ogy elements used for reasoning in assumptions that matched (computed as a sum of products of trusts on the paths of reasoning, similar as in equation (23)). Semantic similarity is then defined as simS (ci , cj ) = T Nass (ci , cj ) N̂ass (ci , cj ) − Ñass (ci , cj ) . (27) Similarity represents the trust in the number of times ontologies produced the same consequences, not considering assumptions that were undefined for some ontology. As the expressiveness of different ontologies vary, and some of them are even inferred from relational data, many of the assumptions could be undefined for some ontology. Still, for N̂ass (ci , cj ) − Ñass (ci , cj ) large enough, equation (27) gives a good approximation of semantic similarity. Using attribute, relational and semantic similarity (equations (22), (26) and (27)) we define a joint similarity for two clusters as 1 (28) δA + δR + δS (δA simA (ci , cj ) + δR simR (ci , cj ) + δS simS (ci , cj )), sim(ci , cj ) = where δA , δR and δS are weights, set due to the scale of relational and semantical information within the data. For instance, setting δR = δS = 0 reduces the algorithm to a naive pairwise comparison of attribute values, which should be used when no relational or semantic information is present. Finally, we present the collective clustering algorithm employed for entity resolution (algorithm 1). First, the algorithm initializes clusters C and priority queue of similarities Q, considering the current set of clusters (lines 1-5). Each cluster represents at most one entity as it is composed out of a single knowledge chunk. Algorithm then, at each iteration, retrieves currently the most similar clusters and merges them (i.e. matching of resolved entities), when their similarity is greater than threshold θS (lines 7-11). As clusters are stored in the form of knowledge chunks, matching in line 11 results in a simple concatenation of chunks. Next, lines 12-17 update similarities in the priority queue Q, and lines 18-22 insert (or update) also neighbors’ similarities (required due to relational similarity measure). When the algorithm terminates, clusters C represent chunks of data resolved to the same entity on abstract level. This mapping between the entities (i.e. their knowledge chunk representations) is used to merge the data in the next step. Threshold θS represents minimum similarity for two clusters that are considered to represent the same entities. Optimal value should be estimated from the 25data. Algorithm 1 Collective entity resolution 1: Initialize clusters as C = {{k}| k ∈ K} 2: Initialize priority queue as Q = ∅ 3: for ci , cj ∈ C and sim(ci , cj ) ≥ θS do 4: Q.insert(sim(ci , cj ), ci , cj ) 5: end for 6: while Q 6= ∅ do 7: (sim(ci , cj ), ci , cj ) ← Q.pop() {Most similar.} 8: if sim(ci , cj ) < θS then 9: return C 10: end if 11: C ← C − {ci , cj } ∪ {ci ∪ cj } {Matching.} 12: for (sim(cx , ck ), cx , ck ) ∈ Q and x ∈ {i, j} do 13: Q.remove(sim(cx , ck ), cx , ck ) 14: end for 15: for ck ∈ C and sim(ci ∪ cj , ck ) ≥ θS do 16: Q.insert(sim(ci ∪ cj , ck ), ci ∪ cj , ck ) 17: end for 18: for cn ∈ nbr(ci ∪ cj ) do 19: for ck ∈ C and sim(cn , ck ) ≥ θS do 20: Q.insert(sim(cn , ck ), cn , ck ) {Or update.} 21: end for 22: end for 23: end while 24: return C that are compared due to their names, and also due to different values they hold; and relations between entities (attributes) represent co-occurrence in the knowledge chunks. As certain attributes commonly occur with some other attributes, this would further improve the resolution. Another possible improvement is to address also the attribute values in a similar manner. As different values can represent the same underlying value, value resolution, done prior to attribute resolution, can even further improve the performance. 5.3 Redundancy elimination After the entities, residing in the data, have been resolved (section 5.2), the next step is to eliminate the redundancy and merge the datasets at hand. This process is somewhat straightforward as all data is represented in the form of knowledge chunks. Thus we merely need to merge the knowledge chunks, resolved to the same entity on abstract level. Redundancy elimination is done entirely on semantic level, to preserve all the knowledge inside the data. When knowledge chunks hold disjoint data (i.e. attributes), they can simply be concatenated together. However, commonly various chunks would provide values for the same attribute and, when these values are inconsistent, they need to be handled appropriately. A naive approach would count only the number of occurrences of some value, when we consider also their trustworthiness, to determine the most probable value for each attribute. Let c ∈ C be a cluster representing some entity on abstract level (resolved in the previous step), let k1 , k2 . . . kn ∈ c be its knowledge chunks and let k c be the merged knowledge chunk, we wish to obtain. Furthermore, for some attribute a ∈ A· , let X a be a random variable measuring the true value of a and let Xia be the random variables for a in each knowledge chunk it occurs (i.e. ki .a). Value of attribute a for the merged knowledge chunk k c is then defined as ^ arg max P (X a = v| Xia = ki .a). (29) Three more aspects of the algorithm ought to be discussed. Firstly, pairwise comparison of all clusters during the execution of the algorithm is computationally expensive, specially in early staged of the algorithm. Authors in [12] propose an approach in which they initially find groups of chunks that could possibly resolve to the same entity. In this way, the number of comparisons can be significantly decreased. Secondly, due to the nature of (collective) relational similarity measures, they are ineffective when none of the entities has already been resolved (e.g. in early stages of the algorithm). As the measure in equation (26) counts the number of common neighbors, this always evaluates to 0 in early stages (in general). Thus relative similarity measures should be used v i after the algorithm has already resolved some of the entities, using only attribute and semantic similarities. Each attribute is thus assigned the most probable Thirdly, in the algorithm we implicitly assumed that value, given the evidence observed (i.e. values ki .a). all attributes, (semantic) relations and other, have the By assuming pair-wise independence among Xia (consame names or identifiers in every dataset (or knowl- ditional on X a ) and uniform distribution of X a equaedge chunk). Although, we can probably assume that tion (29) simplifies to all attributes within datasets, produced by the same Y arg max P (Xia = ki .a|X a = v). (30) source, have same and unique names, this cannot be v i generalized. We propose a simple, yet effective, solution. The Finally, conditional probabilities in equation (30) are problem at hand could be denoted attribute resolution, approximated with trustworthiness of values, as we merely wish to map attributes between the T (ki .a) for ki .a = v (31a) datasets. Thus we can use the approach proposed for a a P (Xi |X ) ≈ entity resolution. Entities are in this case attributes 26 1 − T (ki .a) for ki .a 6= v (31b) Fig. 5. Entity resolution and redundancy elimination for two relational datasets, representing a group of traffic accidents (above). One dataset is also annotated with ontology in Fig. 2. hence At the end, only the data that was actually provided by some data source, should be preserved. Thus Y Y 1 − T (ki .a). (32) all inferred data (through IN or IO ; section 3.3) is T (ki .a) k c .a = arg max v ki .a6=v ki .a=v discarded, as it is merely an artificial representation needed for (common) entity resolution and redunOnly knowledge chunks containing attribute a are dancy elimination. Still, all provided data and semanconsidered. tical information is preserved and properly merged We present the proposed redundancy elimination with the rest. Hence, although redundancy eliminaalgorithm (algorithm 2). tion is done on semantic level, resulting dataset is given on both data and semantic level (that compleAlgorithm 2 Redundancy elimination ment each other). Last, we discuss the assumptions of independence 1: Initialize knowledge chunks K C among Xia and uniform distribution of X a . Clearly, 2: for c ∈ C and a ∈ A· do Q Q 3: k c .a = arg maxv k∈c∧k.a=v T (k.a) k∈c∧k.a6=v 1− both assumptions are violated, still the former must be made in order for the computation of most probable T (k.a) value to be feasible. However, the latter can be elim4: end for inated when distribution of X a can be approximated 5: return K C from some large-enough dataset. The algorithm uses knowledge chunk representation of semantic level. First, it initializes merged 5.4 General framework knowledge chunks k c ∈ K C . Then, for each attribute Proposed entity resolution and redundancy eliminak c .a, it finds the most probable value among all tion algorithms (sections 5.2, 5.3) are integrated into a given knowledge chunks (line 3). When the algo- general framework for matching and merging (Fig. 6). rithm unfolds, knowledge chunks K C represent a Framework represents a complete solution, allowing a merged dataset, with resolved entities and eliminated joint control over various dimensions of matching and redundancy. Each knowledge chunk k c corresponds merging execution. Each component of the framework to unique entity on abstract level, and each attribute is briefly presented in the following, and further 27discussed in section 6. holds the most trustworthy value. Fig. 6. General framework for matching and merging data from heterogeneous sources. Initially, data from various sources is preprocessed 6 D ISCUSSION appropriately. Every network or ontology is trans- The following section discusses key aspects of the formed into a knowledge chunk representation and, proposition. when needed, also inferred on an absent architecture Proposed framework for matching and merging level (section 3.3). After preprocessing is done, all represents a general and complete solution, applicable data is represented in the same, easily manageable, in all diverse areas of use. Introduction of contexts form, allowing for common, semantically elevated, allows a joint control over various dimensions of subsequent analyses. matching and merging variability, providing for spePrior to entity resolution, attribute resolution is cific needs of each scenario. Furthermore, data archidone (section 5.2). The process resolves and matches tecture combines simple (relational) data with semanattributes in the heterogeneous datasets, using the tically enriched data, which makes the proposition same algorithm as for entity resolution. As all data applicable for any data source. Framework can thus is represented in the form of knowledge chunks, be used as a general solution for merging data from this actually unifies all the underlying networks and heterogeneous sources, and also merely for matching. ontologies. The fundamental difference between matching, inNext, proposed entity resolution and redun- cluding only attribute and entity resolution, and dancy elimination algorithms are employed (sec- merging, including also redundancy elimination, is, tions 5.2, 5.3). The process thus first resolves entities in besides the obvious, in the fact that merged data is the data, and then uses this information to eliminate read-only. Since datasets, obtained after merging, do the redundancy and to merge the datasets at hand. not necessarily resemble the original datasets, the data Algorithms explore not only the relations in the data, cannot be altered thus the changes would apply also but also the semantics behind it, to further improve in the original datasets. Alternative approach is to the performance. merely match the given datasets and to merge them Last, postprocessing is done, in order to discard all only on demand. When altering matched data, user artificially inferred data and to translate knowledge can change the original datasets (that are in this phase chunks back to the original network or ontology still represented independently) or change the merged representation (section 3). Throughout the entire ex- dataset (that was previously demanded for), in which ecution, components are jointly controlled through case he must also provide an appropriate strategy, (defined) user, data and trust contexts (section 5.1). how the changes should be applied in the original Furthermore, contexts also manage the results of the datasets. algorithms, to account for specific needs of each sceProposed algorithms employ relational data, senario. mantically enriched with ontologies. With the advent Every component of the framework is further en- of Semantic Web, ontologies are gaining importance hanced, to allow for proper trust management, and mainly due to availability of formal ontology lanthus also for efficient security assurance. In particular, guages. These standardization efforts promote several all the similarity measures for entity resolution are notable uses of ontologies like assisting in commutrust-aware, moreover, trust is even used as a primary nication between people, achieving interoperability evidence in the redundancy elimination algorithm. (communication) among heterogeneous software sysThe introduction of trust-aware and security-aware tems and improving the design and quality of softalgorithms represents the main novelty of the propo- ware systems. One of the most prominent applica28tions is in the domain of semantic interoperability. sition. While pure semantics concerns the study of meanings, semantic elevation means to achieve semantic interoperability and can be considered as a subset of information integration (including data access, aggregation, correlation and transformation). Semantic elevation of proposed matching and merging framework represents one major step towards this end. Use of trust-aware techniques and algorithms introduces several key properties. Firstly, an adequate trust management provides means to deal with uncertain or questionable data sources, by modeling trustworthiness of each provided value appropriately. Secondly, algorithms jointly optimize not only entity resolution or redundancy elimination of provided datasets, but also the trustworthiness of the resulting datasets. The latter can substantially increase the accuracy. Thirdly, trustworthiness of data can be used also for security reasons, by seeing trustworthy values as more secure. Optimizing the trustworthiness of matching and merging thus also results in an efficient security assurance. Next, we discuss the main rationale behind the introduction of contexts. Although, contexts are merely a way to guide the execution of some algorithm, their definition is relatively different from that of any simple parameter. The execution is controlled with mere definition of the contexts, when in the case of parameters, it is controlled by assigning different values. For instance, when default behavior is desired, the parameters still need to be assigned, when in the case of contexts, the algorithm is used as it is. For any general solution, working with heterogeneous clients, such behavior can significantly reduce the complexity. As different contexts are used jointly throughout matching and merging execution, they allow a collective control over various dimensions of variability. Furthermore, each execution is controlled and also characterized with the context it defines, which can be used to compare and analyze different executions or matching and merging algorithms. Last, we briefly discuss a possible disadvantage of the proposed framework. As the framework represents a general solution, applicable in all diverse domains, the performance of some domain-specific approach or algorithm can still be superior. However, such approaches commonly cannot be generalized and are thus inappropriate for practical (general) use. 7 C ONCLUSION Article proposes a general framework, and accompanying algorithms, for matching and merging data from heterogeneous sources. All the proposed algorithms are trust-aware, which enables the use of appropriate trust management and security assurance techniques. An adequate data architecture supports not only (pure) relational data, but also semantically enriched data, to promote semantically elevated analyses that thoroughly explore the data at hand. Matching and merging is done using state-of-the-art collective entity resolution and redundancy elimination algorithms that are managed and controlled through the use of different contexts. Framework thus allows a joint control over various dimensions of variability of matching and merging execution. Further work will include empirical evaluation of the proposition on some large testbeds. Next, soft computing and fuzzy logic will be introduced for contexts manipulation and trust management, to provide for inexactness of contexts and ambiguity of trust phenomena. Moreover, trust management will be advanced to a collective approach, resulting also in a collective redundancy elimination algorithm. Last, all proposed algorithms will be adapted to hypernetworks (or hypergraphs), to further generalize the framework. ACKNOWLEDGMENT This work has been supported by the Slovene Research Agency ARRS within the research program P20359. R EFERENCES [1] I. Bhattacharya and L. Getoor, “Iterative record linkage for cleaning and integration,” in Proceedings of the ACM SIGKDD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004, pp. 11–18. [2] W. W. Cohen, “Data integration using similarity joins and a word-based information representation language,” ACM Transactions on Information Systems, vol. 18, no. 3, pp. 288–321, 2000. [3] M. Hernandez and S. Stolfo, “The merge/purge problem for large databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 127–138, 1995. [4] M. Lenzerini, “Data integration: A theoretical perspective,” in Proceedings of the ACM SIGMOD Symposium on Principles of Database Systems, 2002, pp. 233–246. [5] R. Ananthakrishna, S. Chaudhuri, and V. Ganti, “Eliminating fuzzy duplicates in data warehouses,” in Proceedings of the International Conference on Very Large Data Bases, 2002, pp. 586– 597. [6] D. Kalashnikov and S. Mehrotra, “Domain-independent data cleaning via anlysis of entity-relationship graph,” ACM Transactions on Database Systems, vol. 31, no. 2, pp. 716–767, 2006. [7] A. Monge and C. Elkan, “The field matching problem: Algorithms and applications,” Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 267– 270, 1996. [8] S. Castano, A. Ferrara, and S. Montanelli, “Matching ontologies in open networked systems: Techniques and applications,” Journal on Data Semantics, pp. 25–63, 2006. [9] ——, “Dealing with matching variability of semantic web data using contexts,” in Proceedings of the International Conference on Advanced Information Systems Engineering, 2010, to be presented. [10] J. Euzenat and P. Shvaiko, Ontology matching. Springer-Verlag, 2007. [11] E. Rahm and P. A. Bernstein, “A survey of approaches to automatic schema matching,” Journal on Very Large Data Bases, vol. 10, no. 4, pp. 334–350, 2001. [12] I. Bhattacharya and L. Getoor, “Collective entity resolution in relational data,” ACM Transactions on Knowledge Discovery from 29 Data, vol. 1, no. 1, p. 5, 2007. [13] X. Dong, A. Halevy, and J. Madhavan, “Reference reconciliation in complex information spaces,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005, pp. 85–96. [14] M. Nagy, M. Vargas-Vera, and E. Motta, “Managing conflicting beliefs with fuzzy trust on the semantic web,” in Proceedings of the Mexican International Conference on Advances in Artificial Intelligence, 2008, pp. 827–837. [15] M. Richardson, R. Agrawal, and P. Domingos, “Trust management for the semantic web,” in Proceedings of the International Semantic Web Conference, 2003, pp. 351–368. [16] M. Blaze, J. Feigenbaum, and J. Lacy, “Decentralized trust management,” in Proceedings of the IEEE Symposium on Security and Privacy, 1996, pp. 164–173. [17] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Keinberg, “Automatic resource compilation by analyzing hyperlink structure and associated text,” Proceedings of the International World Wide Web Conference, pp. 65–74, 1998. [18] T. Joachims, “A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization,” in Proceedings of the International Conference on Machine Learning, 1997, pp. 143–151. [19] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2001, pp. 57–66. [20] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM, vol. 46, no. 5, pp. 604–632, 1999. [21] H. Kautz, B. Selman, and M. Shah, “Referral web: combining social networks and collaborative filtering,” Communications of the ACM, vol. 40, no. 3, pp. 63–65, 1997. [22] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “GroupLens: an open architecture for collaborative filtering of netnews,” in Proceedings of ACM Conference on Computer Supported Cooperative Work, 1994, pp. 175–186. [23] D. Trcek, “A formal apparatus for modeling trust in computing environments,” Mathematical and Computer Modelling, vol. 49, no. 1-2, pp. 226–233, 2009. [24] T. R. Gruber, “A translation approach to portable ontology specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199– 220, 1993. [25] S. Castano, A. Ferrara, and S. Montanelli, “The iCoord knowledge model for P2P semantic coordination,” in Proceedings of the Conference on Italian Chapter of AIS, 2009. [26] A. Lapouchnian and J. Mylopoulos, “Modeling domain variability in requirements engineering with contexts,” in Proceedings of the International Conference on Conceptual Modeling. Gramado, Brazil: Springer-Verlag, 2009, pp. 115–130. [27] V. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, 1966. [28] W. W. Cohen, P. Ravikumar, and S. E. Fienberg, “A comparison of string distance metrics for name-matching tasks,” in Proceedings of the IJCAI Workshop on Information Integration on the Web, 2003, pp. 73–78. [29] E. Moreau, F. Yvon, and O. Capp, “Robust similarity measures for named entities matching,” in Proceedings of the International Conference on Computational Linguistics, 2008, pp. 593–600. [30] M. A. Jaro, “Advances in record linking methodolg as applied to the 1985 census of tampa florida,” Journal of the American Statistical Society, vol. 84, no. 406, pp. 414–420, 1989. [31] W. E. Winkler, “String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage.” in Proceedings of the Section on Survey Research Methods, 1990, pp. 354–359. [32] L. Adamic and E. Adar, “Friends and neighbors on the web,” Social Networks, vol. 25, pp. 211–230, 2001. 30 Evaluation models for e-learning platforms and the AHP approach: a case study Colace, F.; De Santo, M. Abstract - Our “information-oriented” society shows an increasing exigency of life-long learning. In such framework, the E-Learning approach is becoming an important tool to allow the flexibility and quality requested by such a kind of learning process. In the recent past, a great number of on-line platforms have been introduced on the market showing different characteristics and services. With a plethora of ELearning providers and solutions available in the market, there is a new kind of problem faced by organizations consisting in the selection of the most suitable ELearning suite. This paper proposes a model for describing, characterizing and selecting E-Learning platform. The E-Learning solution selection is a multiple criteria decision-making problem that needs to be addressed objectively taking into consideration the relative weights of the criteria for any organization. We formulate the quoted multi criteria problem as a decision hierarchy to be solved using the Analytic Hierarchy Process (AHP). In this paper we will show the general evaluation strategy and some obtained results using our model to evaluate some existing commercial platforms. Objects, the availability of several E-Learning platforms and the diffusion of standards, like SCORM, to improve interoperability. Evaluation of E-Learning platforms requires evaluating not only the implementing software package, but additional features as well, including, among the others the supported teaching and delivering schema, the provided QoS and so on. With respect to this question, both pedagogical and technological aspects must be carefully evaluated. In the first case, it is necessary to develop new training models clearly defining how to organize new training paths and the didactic contents associated with them, as well as how to provide these contents in relation to the user who benefits from them. As for the technological aspect, new tools for distributing knowledge must be created, tools able to reproduce as efficiently as possible pedagogical training models. In fact, a series of features should be taken into account when one evaluates E-Learning platforms, starting from the function and usability of the overall learning system in the context of the human, social and cultural organization within which it is to be used. Obviously, the analysis of the features of a system is not sufficient: it is also important to understand how they are integrated to facilitate learning and training and what principles are applied to guide the way the system is used. To evaluate them both pedagogical and technological aspects must be carefully evaluated. So the goal of this paper is to show a model for selecting the most suitable ELearning solution taking into account its technological and pedagogical aspects. In literature there are many approaches to the evaluation of E-Learning platform. A common approach is the introduction of some evaluation grids able to evaluate the various aspects of an ELearning platform. The weak point of this approach is in the subjectiveness of the judgements. The starting point of the proposed model is the formulation of a multi criteria decision problem to be solved by the Analytic Hierarchy Process (AHP). The hierarchical structure of the problem allows the decision maker to compare various features that characterize E-Learning platforms. The Analytical Hierarchy Process (AHP) is a decision-aiding method developed by Saaty [2][3][4]. It aims at quantifying relative priorities for a given set of alternatives on a ratio scale, based on the judgment of the decision-maker, and stresses the importance of the intuitive judgments of a decision-maker as well as the consistency of Keywords – E-Learning, E-Learning Platform, Multiple Criteria Decision Making Problem Introduction The whole world is undergoing a change that maybe is the most important one in the last thirty years, and, through the spreading of new information technologies, is deeply modifying relations among countries, markets, people and culture. The technological revolution has clearly promoted a globalization process (nowadays Internet represents the global village) and information exchange. Information can be considered as an economical value whose significance is closely associated with the knowledge that it offers. Updated knowledge is a fundamental and decisive aspect of professions related to the New Economy but the new society’s dynamism does not well adapt itself to past training models developed in more static or slowly changeable contexts [1]. The continuous need of new knowledge and competences has really shattered this boundary and professional people have to qualify themselves and to be willing to acquire new knowledge. So new didactic models have arisen. In this scenario one of the most promising approaches is the E-Learning approach. Several enabling factors played key role in today developments, including, among the other, the wide acceptance of the concept of Learning Manuscript received April, 7th, 2008. Authors are with the Dipartimento di Ingegneria dell’ Informazione e Ingegneria Elettrica – DIIIE, Università degli Studi di Salerno, Via Ponte don Melillo, 1 84084 Fisciano (SA), Italy, e-mail contact author: [email protected] 31 Tools, technological and pedagogical requisites for a distance learning application will be defined, in order to outline an evaluation model. the comparison of alternatives in the decisionmaking process. Since a decision-maker bases judgments on knowledge and experience, then makes decisions accordingly, the AHP approach agrees well with the behaviour of a decisionmaker. The strength of this approach is that it organizes tangible and intangible factors in a systematic way, and provides a structured yet relatively simple solution to the decision-making problems [5][6]. So the real aim of this paper is to introduce the application of the AHP in E-Learning Platform Evaluation. The paper briefly reviews the concepts and applications of E-Learning platform and of the multiple criteria decision analysis, the AHP's implementation steps. Finally we the obtained results applying the proposed approach on some existing commercial and Open Source ELearning platforms. Learning Content Management System (LCMS) A Learning Content Management System includes all the functions enabling creation, description, importation or exportation of contents as well as their reuse and sharing. Contents are generally organized into independent containers, called learning objects, able to satisfy one or more didactic goals. An advanced LCMS must be able to store interactions between the user and each learning object, aiming at gathering detailed information about their utilization and efficacy. When one talks about on-line learning, it is natural to think of interactive media-based contents. Actually, this is only a part of the widespread contents. The contents available before the spreading of on-line learning were mainly documents, and most of them have been proposed as didactic material in HTML format for on-line courses. In addition, interactive media have been sometimes introduced, such as audio, video or training resources created by using other multimedia tools (for example, Flash). A good LCMS should accurately choose the contents to be offered to the student during the lessons as well as the way in which they must be provided. The importance of LCMS is related to the growing distance learning request that is determining a significant increase in content production. The current effort is to avoid a useless duplication of contents by realizing learning objects consonant to given standards in order to reuse them in different contexts and platforms. All the contents must be appropriately stored in special repositories and be easily accessible and updatable. In fact, a LCMS must be designed so as to enable a constant updating of its contents, allowing this process (if possible) to semiautomatically take place. It is important to point out that, from our point of view, contents are not considered as objects external to the platform but as integral parts of it. This is possible thanks to the services that constitute the learning content management system. The trend towards a growing of training resources, though necessary to better characterize the training process, does not allow the teacher an easy consultation and use of these ones. At the same time, such an important number of resources can disorientate students that may run the risk of not choosing, during the auto-training phase, the contents more suitable to them. A solution to this problem is given by a more detailed description for each content so as to avoid ambiguity or duplication among them. In particular, some information will E-Learning Platforms The Internet offers effective tools for exchanging information that can be used in different ways for on-line learning. Chat (textual message exchange) and e-mail are currently the most widespread ones, since they have first arisen in the Internet world. However, new technologies and the use of wider transmitting bands allow to utilize audio/video communication tools in real time as well as to share multimedia contents. At first, online learning platforms had to integrate such services. NetMeeting application developed by Microsoft is a useful example to understand how a distance learning tool was structured. NetMeeting offers such services as on-line textual chat, videoconferencing, audio chat, application sharing and whiteboards. At least until the first half of the 90s, this was the predominant way of organizing distance education platforms. Once technological problems related to the delivery and implementation of such services was resolved, industries have began to improve platforms by introducing modules and services able to manage pedagogical aspects (associated with the training process) [7] as well as content updating and availability. The most part of contemporary elearning platform can be viewed as organized into three fundamental macro components: a Learning Management System (LMS), a Learning Content Management System (LCMS) and a Set of Tools for distributing training contents and for providing interaction [8]. The LMS integrates all the aspects for managing on-line teaching activities. The LCMS offers services that allow managing contents while paying particular attention to their creation, importation and exportation. The Set of Tools represents all the services that manage teaching processes and interactions among users. In the following, after describing in detail the characteristics of the LCMS, LMS, and Set of 32 learning platform that aims to be compatible with a high number of hardware platforms, operating systems and standard applications. Standardized descriptions of users can be then used within the platform to store personal data, training profiles and the most significant events characterizing their training path. A LMS must implement a functionality that adds a significant value to the distance learning process. This functionality is that enabling the student to consult, at any time, results he/she has reached and, consequently, to monitor his/her preparation level. This possibility allows the student to understand his/her own gaps and, possibly, to identify the training contents more suitable to his formative requirements [13]. As for course management, an LMS can generally manage self-paced, asynchronous instructor-led and synchronous instructor-led courses. Selfpaced courses are usually asynchronous, in hypertextual format, and give much freedom to the student who accesses a course index. The LMS system manages these courses starting from their creation. Asynchronous courses are run by an instructor, but they do not foresee interactive moments between students and instructor. Their design foresees delivery of strongly multimediaoriented contents. Synchronous courses generally make use of collaborative learning that is of all the tools that allow creating interactions in real time between students and instructor. The LMS must keep track of who is present at the courses. These functions are useful to students, who can know how they are using the course, and teachers, who can control student participation in the courses, as well as to administrators that evaluate the use of on-line courses in order to determine their efficiency and convenience. support the content so as to better identify the domain in which resources are included and to draw LCMS and teacher’s attention to the most peculiar characteristics of the training content. In literature, this descriptive process is known as metadata description [9]. At present, the scientific community and industries engaged in this field are trying to define standard metadata rules, so as to encourage understanding of the real semantic content of the various training resources. From this point of view, such organizations as LTSC supported by IEEE or IMS Global Learning Consortium [10][11] are trying to create standardization rules and processes able to describe training resources as well as the user and training paths. Therefore, the aim is not only to facilitate and automate research and training resource acquisition over the web, but also to find the contents that better satisfy the student training needs [12]. Learning Management System (LMS) The Learning Management System (LMS) embraces all the services for managing on-line teaching activities. In particular, it aims to offer management functionality to training platform users: system administrators, teachers and students. From students’ point of view, a LMS must offer services able to evaluate and report the acquired skills storing the training path followed by them. The System administrator should have the possibility of drawing up statistics on the use of platform services in order to better organize online learning service delivery. A LMS should give the teacher the possibility of verifying the right formulation of the various lessons and suggesting changes (in case it is semi-automatically inferred from student tracking) in the learning path. Therefore, the functionalities of a LMS integrated within a distance learning platform can be synthesized as follows: • Student management • Course management • Student skill assessment • Student activity monitoring and tracking • Activity reporting A student management system integrated within a LMS must manage a database containing standardized descriptions of student data so as to better identify the user and his/her characteristics. This type of description is generally based on the XML meta-language (Extensible Markup Language), an element that guarantees data portability. When we talk about portability, we refer to the possibility of accessing a resource, in this case, the students’ descriptions, independently of the computer type and operating system. This characteristic is necessary for an e- Tools for delivering and accessing contents On-line training efficiency is directly related to the tools made available by the delivery platform as well as to their usage easiness. The services should satisfy teacher and student needs and it is therefore necessary that the same kinds of services are different in accordance with the user. In particular, teachers should be provided with tools enabling them to manage teaching processes for single individuals or groups, as well as all the interactions, including asynchronous discussions or live events. In addition, it is important to provide the teacher with updated reports on learner or learner groups’ progresses so as to better manage evaluation processes and facilitate activities. Besides, it is necessary to give students the possibility of synchronously and asynchronously communicating with both the teacher and other students. We will shortly analyze some of the most popular services that 33 Some platforms can include, within their own infrastructures, functionalities for exchanging email messages, but most of them allow the integration with tools developed just for this purpose, such as Outlook Express, Netscape Messenger, Eudora, etc. characterize on-line training platforms from a collaborative point of view, and that they tend to integrate within themselves. The Virtual Classroom Service is a service designed for distributing courses in a synchronous mode, and also for supporting on-line live teaching. This type of service aims to reproduce the mechanisms present in a classroom during a traditional training session and is considered as a kind of container in which all the services able to recreate a virtual classroom atmosphere will be included. The use of a virtual classroom is obviously foreseen during “live” lessons in order to better manage synchronous interactions. The synchronous communication systems are based on audio and video conferencing technologies. The possibility of transmitting network videoconferencing has been implemented through the introduction of compressing movie techniques that allow reducing the use of bandwidth during the transmission in comparison with the uncompressed movies, intelligibility being equal. However, it is true that compressed video stream representations do not generally guarantee high definition movie reproductions. The latter can be anyway obtained by using high capability transmitting channels (a satellite channel, for example), whose utilization can be more expensive. Audio/video conferencing tools allow the display and dialogue in real-time among the various members located in remote areas. The interface generally presents a window in which the video captured by a video camera is displayed. Another service enabling synchronous communication within e-learning platforms is provided by chat. This service allows participants to send textual messages to the other students or the teacher in a public mode (all the participants see all the things) or a private one (only who is directly involved receives the communication). Chat service surely increases collaboration within the environment in which it is used, but the teacher or tutor must continuously monitor its utilization, since it could lead to a lack of attention and confusion within the virtual classroom. In addition to a textual chat, the most recent platforms tend to implement a vocal one by using VoIP mechanisms. From an historical point of view, the whiteboard has been one of the first services made available by an online learning platform. This service makes it available and shareable to teachers and learners a virtual space, usually called whiteboard. Both teachers and learners can work with it by virtue of control rights. This tool allows to write and draw on a shared space and to display PowerPoint presentations and images. E-mail has been one of the first asynchronous communication tools used by elearning environments. Thanks to this service, students can send messages to a specific addressee only by having his/her e-mail address. Characterizing distance learning platforms As previously discussed, an on-line learning platform can be characterized through an analysis that takes into account:  the adopted teaching methodologies  the level of the training path personalization  operative modalities and didactic interaction quality  learning assessment and student tracking methods  typology and quality of both didactic material and support system In order to meet the exigencies of distance training processes, support technologies should also have characteristics that make the training process functional and available. In particular, the student should be allowed to fully benefit from auto-learning, auto motivation and auto-evaluation methods [14], and at the same time tutor and teachers should be provided with a direct and constant contact with the learners. So distance learning platforms must adopt a pedagogical approach based on constructivism a theory that is based on results of Piaget's research [15]. Constructivist learning is based on students' active participation in problem-solving and critical thinking regarding a learning activity which they find relevant and engaging. They are "constructing" their own knowledge by testing ideas and approaches based on their prior knowledge and experience, applying these to a new situation, and integrating the new knowledge gained with pre-existing intellectual constructs. So a constructivist e-learning platform is an environment where learners collaborate and support each other using a variety of tools and resources, as well as an environment where knowledge is constructed and learners assume a central role in the cognitive process. On-line learning platforms can implement easily a constructivist approach [16] because they can allow easily:  encouragement and acceptance of student autonomy and initiative  encouragement of students to engage in dialogue, both with the teacher and within the group  continuous feedback In other words, an on-line learning platform must be able to efficiently and effectively manage the single components of the process and their interactions. A distance learning platform that has these characteristics must carry out four principal 34  services for including and updating user profile  services for creating courses and cataloguing them  services for creating tests described through a standard  user tracking services  services for managing reports on course frequency and use  services for creating, organizing and managing own training contents or contents provided by other producers The aspect related to the offered services is particularly interesting, because it characterizes the pedagogical approach. An analysis of the teaching tools made available by the various platforms is therefore necessary. These tools, as previously discussed, can be divided into two fundamental categories:  asynchronous communication tools  synchronous communication tools Such tools as e-mail, discussion forum or newsgroup surely belong to the first category. Asynchronous services are really important for an e-learning platform, since they eliminate the space and time limits that can exist among the interlocutors. Tools that belong to the second category are:  textual or vocal chat  whiteboard  live video stream  virtual classroom  application and file sharing. Real-time communication is used to carry out at distance activities that are normally performed in face-to-face meetings. In this way, learners can interact with teachers creating an atmosphere more similar to that of a traditional classroom. The use of these new technologies will lead to a pedagogical approach based on group’s interactions, where the teacher has the role of facilitating and organizing discussions. This approach debates traditional teaching methods (in which teachers are dominant and students are passive) and substitutes them for one based on active pedagogy. On the basis of the previous considerations, we have grouped the parameters of interest into four macro fields:  system requisites  training resources and course management  user management  services offered to users For each macro field, an evaluation grid has been designed. functions: communication, information sharing, information access and co-operation. These functionalities characterize both the pedagogical and technological approach. As for technical requisites, the best solution to be adopted in platform design should be based on the utilization of a multilayered, web-based architecture [17][18]. In particular an e-learning platform must be webbased, in this way the client can access the environment by simply using a web browser, without compelling the user to install other software into his/her computer. This characteristic should be always taken into account by industries producing distance training environments. Thanks to it, students only need a basic knowledge in computer science enabling them to interact with a browser, which also avoids difficult installations of not open source software. Another technical requisite to be considered is portability, that is, the possibility for a platform to rightly work independently of the computer and the operating system on which it runs. Obviously, the possibility of not installing open source software into the client machine increases system portability, since it guarantees that all clients can use the same services. A further requisite, as previously described, is the system compatibility with the most accredited descriptive standards of training resources and users, such as AICC [19] and IMS [10]. Compatibility with these standards is fundamental, since it allows to import and export contents and courses realized by different industries, and gives the platform the possibility of being equipped with a still little used tool: the Intelligent Tutoring System (ITS). An ITS is an application that can semiautomatically reach decisions after acquiring information by the LMS and LCMS. In other words, an ITS has the task of monitoring students’ behaviour and advise them on the most suitable retrieval programs [20]. Besides, on the basis of the acquired data, it can advise the teacher on a different lesson organization and a different technology use. In fact, a course designer must have the possibility of making the several training process modules interactive, of adapting the training paths to the specific learner needs, and defining new training paths by using those already existing. Such operations are surely speeded up by adopting descriptive standards, even when an ITS is still not used. Another aspect to be evaluated is related to the services integrated into the LMS and LCMS. As for management, services able to manage enrolments, training paths, and student tracking are really significant and add a new value. Platforms including such systems are surely ahead of others in services, as these tools will represent in the next future the core of an elearning environment. In general, at present, indispensable management services are the following: The Multiple Criteria Decision Analysis and the AHP Approach The selection of an E-Learning platform is not a trivial or easy process. Project managers are faced with decision environments and problems in 35 contains the list of alternatives. 3. Construct a set of pair-wise comparison matrices (size NxN) for each of the lower levels with one matrix for each element in the level immediately above by using the relative scale measurement shown in Table 1. The pair-wise comparisons are done in terms of which element dominates the other. 4. There are n(n-1) judgments required to develop the set of matrices in step 3. Reciprocals are automatically assigned in each pair-wise comparison. 5. Hierarchical synthesis is now used to weight the eigenvectors by the weights of the criteria and the sum is taken over all weighted eigenvector entries corresponding to those in the next lower level of the hierarchy. 6. Having made all the pair-wise comparisons, the consistency is determined by using the eigenvalue, max, to calculate the consistency index, CI as follows: CI = (max -n)/(n-1) where n is the matrix size. Judgment consistency can be checked by taking the consistency ratio (CR) of CI with the appropriate value in Table 2. The CR is acceptable, if it does not exceed 0.10. If it is more, the judgment matrix is inconsistent. To obtain a consistent matrix, judgments should be reviewed and improved. 7. Steps 3-6 are performed for all levels in the hierarchy projects that are complex. The elements of the problems are numerous, and the interrelationships among the elements are extremely complicated. Relationships between elements of a problem may be highly nonlinear; changes in the elements may not be related by simple proportionality. Multiple criteria decision-making (MCDM) approaches are major parts of decision theory and analysis. They seek to take explicit account of more than one criterion in supporting the decision process [21]. The aim of MCDM methods is to help decision-makers learn about the problems they face, to learn about their own and other parties' personal value systems, to learn about organizational values and objectives, and through exploring these in the context of the problem to guide them in identifying a preferred course of action. In other words, MCDM is useful in circumstances which necessitate the consideration of different courses of action, which can not be evaluated by the measurement of a simple, single dimension [21]. A good solution for the MCDM problem is in the AHP approach. After a long period of debate, in fact, on the effective value of the AHP approach Harker and Vargas [22] and Perez [23] proved that the AHP approach is based upon a firm theoretical foundation. The AHP approach is composed by the following steps: 1. Define the problem and determine its goal. 2. Structure the hierarchy from the top (the objectives from a decision-maker's viewpoint) through the intermediate levels (criteria on which subsequent levels depend) to the lowest level which usually Numerical rating Verbal judgments of preferences 9 Extremely preferred 8 Very strongly to extremely 7 Very strongly preferred 6 Strongly to very strongly 5 Strongly preferred 4 Moderately to strongly 3 Moderately preferred 2 Equally to moderately 1 Equally preferred Table 1: Pair-wise comparison scale for AHP preferences Size of matrix 1 2 3 4 5 6 7 8 9 10 Random Consistency 0 0 0.58 0.09 1.12 1.24 1.32 1.41 1.45 1.49 Table 1 Average random consistency (RI) 36 important the tracking of the progress of the students. Another characteristic of this user group is the not very wide internet connection bandwidth. The second scenario describes a typical situation: E-Learning platform has to support the activities of some courses. So in this scenario management tools are very important. Also the collaborative tools have to be considered. The last scenario involves the use of an ELearning platform in the case of professional training. In this case the target group is not very skilled on ICT technologies and needs to interact with very simple and clear graphic user interfaces. In this case the usability feature has a really importance. Also the tools for the adaptation of learning path are important because the target group could be very heterogeneous. So according to the AHP approach we have to compare the various platforms each other for every feature and scenario. First of all we have to declare the standing of the features ordered by importance. For the various scenarios we have the following standing (Table): THE AHP APPROACH AND THE SELECTION OF AN E-LEARNING PLATFORM E-Learning platforms have to satisfy some rules in order to be effective and, besides, some platforms can be really effective only in some well defined scenario. Obviously this is a Multiple Criteria Decision Problem. So the first step is to set the interest scenarios; in this paper we consider the following cases: An ECDL course, a blended university course, a professional training course. In the following paragraphs we will describe in more details the selected scenarios. So now the first step is the definition of the AHP hierarchy. Obviously in this case the first level is the selection of the best E-Learning platform for the selected scenario. The second level is composed by features that have in account pedagogical, technological and usability aspects. In particular we have introduced five main features:      Management Collaborative Approach Management and enjoyment interactive learning objects Usability Adaptation of learning path of ECDL Course Management Management and enjoyment of interactive learning objects Obviously every feature involves, in their determination, some sub-features. In order to test our approach we selected the following platforms:  Docent [24]  Quasar [25]  Claroline [26]  IWT [27]  Running Platform [28]  Moodle [29]  ATutor [30] Blended Course Management Management and enjoyment of interactive learning objects Professional Training Usability Adaptation of learning path Usability Collaborative Approach Management and enjoyment of interactive learning objects Adaptation of learning path Usability Management Adaptation Collaborative of learning Approach path Table 3: Standing of considered features ordered by importance for the considered scenarios Collaborative Approach  ADA [31]  Ilias3 [32]  Docebo [33] After this phase in order to have a value for every feature we considered some evaluation grids introduced in [8] in order to evaluate the following indexes: Management Index Management Index = IM = Obtained Value for the supported tools / Max Value This index aims to evaluate how many services for the management of students and of their progress are in the various platforms. In the table we show the obtained results. In this table the column Weight indicates the relative importance of the feature. Now we can describe in details the proposed approach for the various scenarios. We have to outline that the various scenarios are obtained from the analysis of real cases. In particular we have considered scenarios that are in our University. The first involves the selection of an ELearning platform for the endowment of ECDL courses. In this case the platform has to support classes composed by thirty students. These students are not really familiar with computers’ world. So the usability feature has to be highly and carefully evaluated. In this scenario it is very 37 Weight Docent Quasar Claroline IWT Running Platform Moodle Atutor ADA Ilias 3 Docebo 3 3 3 3 3 3 3 3 3 3 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 Contents Insertion 1 1 1 1 1 1 1 1 1 1 1 Contents Sharing 2 2 2 2 2 2 2 2 2 2 2 Standard Contents Import 1 1 1 1 1 0 1 1 0 1 1 Contents Import 2 2 2 2 2 2 2 2 2 2 2 New Course Creation 1 1 1 1 1 1 1 1 1 1 1 Course Indexing 1 1 1 1 1 1 1 1 1 1 1 Report 2 2 2 2 2 0 2 2 2 2 2 Assessment Management 2 2 2 2 2 2 2 2 2 2 2 Course List 1 1 1 1 1 1 1 1 1 1 1 Assessment Report Analyzer 2 2 2 2 2 2 2 2 2 2 2 On-Line User Registration 1 1 1 1 1 1 1 1 1 1 1 Multi-User Management 1 1 1 1 1 1 1 1 1 1 1 Total 24 24 24 24 22 21 24 24 23 24 21 1 1 1 0.92 0.87 1 1 0.96 1 0.87 Progress Tracking Multi Course Management Student’s Group Management IM Index Table 4: Obtained results for the Management Index Collaborative Index IC = Obtained Value for the supported tools / Max Value This index aims to evaluate how many “collaborative” services are in the various platforms. With the term “collaborative” services we intend these platform services allowing the interaction among students and/or teachers. In the table 5 we show the obtained results. In this table the column Weight indicates the relative importance of the feature. 38 Weight Docent Quasar Claroline IWT Running Platform Moodle Atutor ADA Ilias 3 Docebo E-Mail 1 1 1 1 1 1 1 1 1 1 1 Forum 2 2 2 2 2 2 2 2 2 2 2 Chat 2 2 2 2 2 2 2 2 2 2 2 Whiteboard 2 2 0 0 2 1 0 2 0 0 0 A/V Streaming 2 2 0 0 2 0 0 0 2 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 2 0 0 0 0 0 0 Virtual Classroom 3 3 3 0 3 0 0 0 0 0 0 Total 16 16 10 7 16 8 7 9 9 7 7 1 0.62 0.44 1 0.50 0.44 0.56 0.56 0.44 0.44 Contents Download Application Sharing IC Index Table 5: Obtained results for the Collaborative Index Management and enjoyment interactive learning objects index of MIO = Obtained Value for the supported tools / Max Value This index aims to evaluate how many services for the management and enjoyment of interactive learning objects are in the various platforms. In the table 6 we show the obtained results. In this table the column Weight indicates the relative importance of the feature. 39 Weight Docent Quasar Claroline IWT Running Platform Moodle Atutor ADA Ilias 3 Docebo 2 2 0 0 2 1 0 2 0 0 0 3 3 0 0 3 0 0 0 3 0 0 3 3 0 0 3 0 0 0 0 0 0 Virtual Classroom 3 3 3 0 3 0 0 0 0 0 0 Total 11 11 3 0 8 1 0 2 3 0 0 1 0.27 0.00 0.73 0.10 0.00 0.18 0.73 0.00 0.00 Whiteboard A/V Streaming Application Sharing MIO Index Table 6: Obtained results for the Management and enjoyment of interactive learning objects index Adaptation of user’s learning path index Usability For the usability feature we used a questionnaire introduced by Nielsen [34]. The aim is to evaluate the use easiness of the platforms and of their interfaces. The obtained results are depicted in the table 7: LPA = Obtained Value for the supported tools / Max Value This index aims to evaluate how many services for the adaptation of user’s formative learning path are in the various platforms. These services have to allow the creation of personalized learning paths and the continue assessment of students. In the table 8 we show the obtained results. In this table the column Weight indicates the relative importance of the feature. Usability Index Docent 0.65 Quasar 0.70 Claroline 0.85 IWT 0.65 Running 0.75 Moodle 0.80 Atutor 1.00 ADA 0.85 Ilias3 0.70 Docebo 0.85 formative Table 7: Obtained results for the Usability Index 40 Weight Docent Quasar Claroline IWT Running Platform Moodle Atutor ADA Ilias 3 Docebo Progress Tracking 3 3 3 3 3 3 3 3 3 3 0 Student’s Group Management 2 2 2 2 0 2 2 2 2 2 2 Report 3 3 3 3 3 0 3 3 3 3 3 Assessment Management 2 2 2 2 2 2 2 2 2 2 2 Multi-User Management 1 1 1 1 1 1 1 1 1 1 1 Total 11 11 11 11 9 8 11 11 11 11 8 1.00 1.00 1.00 0.82 0.73 1.00 1.00 1.00 1.00 0.73 LPA Index Table 8: Obtained results for the Adaptation of users formative learning path index At the end of this phase we can compare the “relative” obtained results of platforms in every feature in order to have a standing. According to the AHP approach we defined the “absolute” weight of every feature keeping in mind the constraints of the selected scenario. According to the AHP strategy we can compose the results in the following way: Platform Final Score = The obtained results for each scenario are the depicted in the next figures:  Weight * PlatformValue i 1,..,5 i i 0,1800 0,1647 0,1600 0,1400 0,1322 0,1274 0,1200 0,0996 0,1000 0,1000 0,0942 0,0900 0,0876 0,0800 0,0600 0,0550 0,0493 0,0400 0,0200 0,0000 Docent Quasar Claroline IWT Running Figure 1: Obtained Results for the ECDL scenario 41 Moodle ATutor ADA ILIAS2 Docebo 0,2 0,1887 0,18 0,1584 0,16 0,14 0,12 0,1122 0,0979 0,1 0,0909 0,0874 0,0846 0,0833 0,08 0,06 0,0489 0,0477 0,04 0,02 0 Docent Quasar Claroline IWT Running Moodle ATutor ADA ILIAS2 Docebo Figure 2: Obtained Results for the blended course scenario 0,2 0,18 0,1735 0,16 0,14 0,1237 0,12 0,1064 0,1098 0,1022 0,1 0,0915 0,0823 0,08 0,0743 0,0762 ILIAS2 Docebo 0,06 0,06 0,04 0,02 0 Docent Quasar Claroline IWT Running Moodle ATutor ADA Figure 3: Obtained Results for the professional training a general and objective model for the evaluation of E-Learning platforms. This task is not trivial because a good evaluation model has to take in account not only the platform and its services but also the scenario where it has to work. So in this paper we have introduced an evaluation model based on the use of AHP approach. The AHP approach, in fact, is useful in circumstances which necessitate the consideration of different courses of action, which can not be evaluated by the measurement of a simple, single dimension. In this way we can evaluate an E-Learning platform considering both its application in the interest scenario, both its comparison with other considered platforms. We tested our approach on four E-Learning platforms and in three scenarios. The obtained results are encouraging and effective. The proposed method, in fact, does not only evaluate the platform but also its effectiveness in the considered scenario. In this paper, for example, we showed as in some scenario the performances of a commercial platform as Docent are similar to the ones of “academic” frameworks. We aim to extend the proposed approach to new scenarios and platforms. The AHP approach allows not only to evaluate the platforms but to test them application in a well defined scenario. In fact Docent platform has very good results in the first two scenarios while in the third it is not still true. In fact in the third case all the management or collaborative tools are not very important. The obtained results confirm that the difference between commercial platforms and open source in general is still very high, but our method shows as in some scenarios this is not true. In this case it can suggest the use of a cheaper platform. Conclusion In order to accurately evaluate the potentialities of an online learning platform, it is important to pay attention to its three main components: Learning Management System, Learning Content Management System and Virtual environment for teaching and services associated with it. An efficient system must be able to integrate into oneself all these components so that they can efficaciously interact with each other. Besides, it is necessary that such platforms make reporting data services available, so as to allow accurate analyses on activities carried out by users. One of the most interesting problem is the introduction of 42 [21] Belton V., “Multiple criteria decision analysis practically the only way to choose”, Hendry LC, Eglese RW, editors. Operational research tutorial papers, 1990 References [1] Ubell R., “Engineers turn to E-Learning”, IEEE Spectrum, Volume: 37, 2000 [22] Harker PT, Vargas LG, “The theory of ratio scale estimation: Saaty's analytic hierarchy process”, Management Science, 1987;33(1):1383 [2] Saaty TL., “Decision making for leaders”, Belmont, California: Life Time Learning Publications, 1985. [23] Perez J., “Some comments on Saaty's Management Science, 1995;41(6):1091-1095 [3] Saaty TL., “How to make a decision: the analytic hierarchy process”, European Journal of Operational Research, NorthHolland 1990 [24] Docent: http://www.docent.com/ [4] McCaffrey, J., “The Analytic Hierarchy Process”, MSDN Magazine, June 2005 (Vol. 20, No. 6) [25] Quasar: http://www.quasaronline.it/elearning/ [26] Claroline: http://www.claroline.net/ [5] Drake, P.R. (1998). "Using the Analytic Hierarchy Process in Engineering Education", International Journal of Engineering Education, 1998 [27] IWT: http://www.didatticaadistanza.com/ [28] Running Platform: http://rp.csedu.unisa.it/portal [6] Skibniewski MJ, Chao L., “Evaluation of advanced construction technology with AHP method”, Journal of Construction Engineering and Management, ASCE 1992 [29] Moodle: http://moodle.org/ [30] ATutor: http://www.atutor.ca/ [7] Jonassen D.H., “Thinking Technology, toward a Costructivistic Design Model”, Educational technology XXXIV, 1994 [31] ADA: http://ada.lynxlab.com/ [32] Ilias: http://www.ilias.de/ [8] Colace F., De Santo M., Vento M., “Evaluating On-line Learning Platforms: a Case Study”, Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS’03), 2003 [33] Docebo: http://www.docebo.org/doceboCms/ [34] Nielsen, J., “Usability Engineering”, Academic Press, San Diego, 1993 [9] Berners-Lee, T., “Metadata Architecture”, Unpublished white paper, January 1997, http://www.w3.org/pub/WWW/DesignIssues/Metadata.html [10] The IMS http://www.imsproject.org/ Enterprise [11] IEEE Learning Technology http://www.ltsc.ieee.org/ Specification, Standard Committee, [12] Schakelman, J.L., “The changing role of on-line pedagogy: how many istructional management systems, metadata, and problem-based learning combine to facilitate learner-centered instruction”, SIGUCCS 2001 [13] Carchiolo V., Longheu A., Malgeri M., “Learning through Ad-hoc Formative Paths”, ICALT 2001 [14] H.S. Barrows, A.C. Kelson, “Problem based learning in secondary education and the problem based learning institute”, Springfield, 1993 [15] Piaget, J., “To understand is to invent”, New York, Grossman, 1973 [16] Jonassen D.H., Peck K. L., Wilson B. G., Pfeiffer W. S., “Learning with Technology: A Constructivist Perspective”, Prentice Hall, 1998 [17] Drira, K., Villemur, T., Baudin, V., Diaz, M., “A MultiParadigm Layered Architecture for Synchronous Distance Learning”, Proceedings of the 26th Euromicro Conference, 2000. [18] Anido, L., Llamas, M., Fernández, M.J., Caeiro, M., Santos, J., Rodríguez, J., “A Component Model for Stardardized Web-based Education”, WWW10, 2001, Hong Kong [19] AICC CMI http://www.aicc.org/ Guidelines for AHP”, interoperability, [20] Zhou, Y., Evens, M. W., “A Practical Student Model in an Intelligent Tutoring System”, ICTAI, 1999 43 Academic Ranking of World Universities 2009/2010 Mester, G.  Abstract— This paper proposes an analysis of the Academic Ranking of World Universities, published every year and gives on overview of the present situation in Higher Education. The publication of the ―Institute of Higher Education, Shanghai Jiao Tong University‖ – Academic Ranking of World Universities 2009, the Ranking Web of world universities 2010 (Spain) and the QS World University Rankings 2009 are analyzed. The paper gives an analysis of the scientific journal publications from professors/researchers from the USA and Europe. Index Terms— Academic Ranking, Higher Education, QS World University Rankings 2009, Ranking Web of World universities 2010, World Universities. Table 1. Shanghai World rank list of the top 20 universities in 2009 (05. Nov. 2009) The Academic Ranking of World Universities – ARWU is first published in June 2003 by the Center for World-Class Universities and the Institute of Higher Education of Shanghai Jiao Tong University. ARWU uses the following indicators to rank world universities, including: - the number of alumni and staff winning Nobel Prizes and Fields Medals, - number of highly cited researchers selected by Thomson Scientific, number of articles published in journals of Nature and Science, - number of articles indexed in Science Citation Index - Expanded and Social Sciences Citation Index, and per capita performance with respect to the size of an institution [3]. 1. OVERVIEW OF THE PRESENT SITUATION IN HIGHER EDUCATION Europe has 4752 higher education institutions, with over 17 million students and 1.5 million staff. All across Europe, countries and universities are in a process of modernization. From an EU perspective, these reforms are part of the Lisbon Strategy [1], [2]. According to the publication of the well-known institution "Institute of Higher Education Shanghai Jiao Tong University” [3] the rank list of the top 20 Universities in the world space of higher education in 2009 appears in the following order: According to the latest edition of the Web Ranking of World Universities [4], [5] published by the Spanish National Research Council's Cybermetrics Lab, the rank list of the top 15 universities in the world space of higher education (from top 8000 universities) in January 2010 looks as follows: Manuscript received April 29, 2010. Part of this paper is published in the VIPSI-2010 Conference, Amalfi, Italy, March 4-7, 2010. Gyula Mester is with the Department of Informatics, University of Szeged, Hungary (e-mail: [email protected]). 44 Table 4. Distribution by country Distribution by continent is illustrated in Table 5. Table 2. Web Ranking of World Universities of the top 15 universities in January 2010 Comparison of the main world universities' rankings is illustrated in Table 3. Table 5. Distribution by continent The next table summarizes the actual coverage of the Ranking, in terms of number of countries and higher education institutions around the world. Table 3. Comparison of the main World Universities' Rankings Distribution by country is illustrated in Table 4. Table 6. Coverage of the Webometrics Ranking of World Universities Rank of Universities of Serbia, Slovenia and Macedonia 2010 are illustrated in Tables 7, 8 and Tables 9. 45 Table 10. QS World rank list of the top 20 universities in 2009 2. PERFORMANCE RANKING OF SCIENTIFIC PAPERS FOR WORLD UNIVERSITIES Table 7. Rank of Universities of Serbia 2010 According to the latest edition of the: 2009 Performance Ranking of Scientific Papers for World Universities published by: Higher Education & Accreditation Council of Taiwan the rank list of the top 10 universities in the world space of higher education in October 2009 was the following: Table 8. Rank of Universities of Macedonia 2010 Table 9. Rank of Universities of Slovenia 2010 For each country only the Higher Education Institutions ranked till the 8,000th position are included [4]. Complete coverage is over 17,000 organizations Table 11. Performance Ranking of Scientific Papers for World Universities in 2009 This annual ranking project began in 2007 and evaluates and ranks the scientific paper performance for the top 500 universities worldwide. Three criteria represented by eight indicators were used to assess a university’s overall scientific paper performance: research productivity (20%), reseach impact (30%) and research excellence (50%). The QS World University Rankings™ have become popular since they were launched in 2004. The QS rank list of the top 20 universities in the world space of higher education looks as follows [6]: 3. CONCLUSION On the basis of the performed analysis I think that we do not have enough time to achieve the goals of the Lisbon Declaration in the European Higher Education Area. I propose the adoption of a new (Lisbon) strategy in the European Higher Education Area in the year 2010. ACKNOWLEDGMENT I would like to acknowledge the great hepfulness of Prof, Dr. Veljko Milutinovic for his encouragement of my overall research agenda in 46 the field of higher education and academic ranking of world universities. REFERENCES [1] Gyula Mester, Academic Ranking of World Universities 2009/2010, Proceedings of the VIPSI Conference, pp. 136, Amalfi, Italy, 2010. [2] Gyula Mester, The Lisbon Strategy 2000 in Higher Education of Europe, Proceeding of the International Conference on Advances in the Internet, Processing, Systems, and Interdisciplinary research, VIPSI 2009, pp. 1-5, ISBN: 86-7466-117-3, Belgrade, Serbia, 2009. [3] http://www.arwu.org/ /index.jsp [4] http://www.webometrics.info [5] Aguillo, I.F.; Ortega, J. L. & Fernández, M. (2008). Webometric Ranking of World Universities: Introduction, Methodology, and Future Developments. Higher Education in Europe, 33(2/3): 234-244. [6] http://www.topuniversities.com [7] http://ranking.heeact.edu.tw/en-us/2009/Page/ Methodology [8] http://www.google.com [9] Gyula Mester, Dusan Bobera: „O potrebi bržeg uključivanja visokog obrazovanja Srbije u Evropski prostor visokog obrazovanja i naučnog rada“, Proceedings of the TREND 2007, pp. 169-172, Kopaonik, Serbia, 2007. [10] Gyula Mester, Predlog poboljšanja statusa visokih strukovnih škola Srbije u Bolonjskom sistemu studija, Proceedings of the Conference TREND 2010, pp. 58-62, Kopaonik, Serbia.2010. Biography Dr. Gyula Mester received his D. Sc. degree in Engineering from the University of Novi Sad in 1977. Currently, he is a Professor at the University of Szeged, Department of Informatics, Hungary. He is the author of 168 research papers. His professional activities include R/D in different fields of robotics engineering: Intelligent Mobile Robots, Humanoid Robotics, Sensor-Based Remote Control. He is an invited reviewer of more scientific journals and the author of several books. He is the coordinator of the Robotics Laboratory from the University of Szeged, in European Robotics Research Network. His CV has been published in the Marquis “Who’s Who in the World 1997”. 47 Visual and Aural: Visualization of Harmony in Music with Colour Bojan Klemenc, Peter Ciuha, Lovro Šubelj and Marko Bajec Faculty of Computer and Information Science, University of Ljubljana ABSTRACT—Music is strongly intertwined with everyday life, however its inner structure may not be comprehensible to everyone. Using other senses like vision can help us to understand the music better and produce a synergy between two senses. For this purpose we designed a prototype visualization that shows the structure of music and represents harmony with colour by connecting similar aspects in music and visual perception. We improve current visualization methods by calculating a common colour for a group of concurrent tones based on harmonic relationships between tones. Moreover we extend the colour calculation to broader temporal segments to enable visualization of harmonic structure of a piece. The basis for mapping of tones to colour is the key-spanning circle of thirds combined with the colour wheel. The resulting visualization is rendered in real time and can be interactively explored. Index terms— music visualization, colour, concurrent tones, MIDI 1. INTRODUCTION isualizing data is a challenge. Visualization helps V us to grasp, what would otherwise be difficult to comprehend and may enable us to see patterns that were unnoticed without visualization. It should not include redundant elements and it should be intuitive. We have to search for appropriate mapping of source data into visual dimensions. In this paper we focus on a specific domain of visualizing music. In the case of music we are dealing with a stream of sound data. The basic data unit we use is a musical tone, so the input to the visualization is a stream of tones. The stream does not necessary represent music – it can be a stream of arbitrary tones, as only a small subset of possible streams is usually referred to as music. However the visualization has to account for these as well and visualise them appropriately. As the aim is to make visualization meaningful and useful in practice, we have to explore possibilities of different mappings. We try to find interconnecting aspects of sound and visual perception. In accordance with this idea, we developed a prototype visualization that connects similar aspects of music and visual perception. The input to the visualization tool is in MIDI format. The basis for the visualization is a modified piano roll notation, which uses spatial dimensions for visualising time, pitch and instruments. Harmony, which is one the most important aspects in tonal music, is represented with colour. In comparison to existing related visualizations that use colour to denote pitch classes or a predefined set of chords, our visualization takes into account that concurrent sounding tones are not perceived as separated, but also as a whole. For this purpose the musical piece is segmented into time slices and each segment is assigned a colour based on a method using vector addition inside a key spanning circle of thirds assigned to colour wheel [6, 2]. As human perception of harmony is not limited to a moment in time we expanded the method to encompass a broader time range and used it to visualise harmonic structure of broader temporal segments. The resulting visualization offers a view of the composition as a whole. Additionally it can be observed in real-time together with listening to the source data, which enables the user to make a more direct connection between the source and the visualization thus enabling faster comprehension. The rest of the paper is organised as follows. In section 2 we review relevant related work, in section 3 we give a detailed explanation of our visualization, details about implementation are given in section 4. The resulting visualization is reviewed and discussed in section 5 and concluding remarks are in section 6. 2. RELATED WORK There are many possibilities for mapping tonal data or whole musical structures into visual elements. Some of them are only aesthetically pleasing, such as transfor- 48 mation of a physical property of sound like amplitude into visual effects. However the real value is in visualizations of music that offer additional information that may otherwise stay unnoticed or be difficult to understand by a musically untrained listener. A well known visualization is musical notation, however it takes years of training for someone to look at a score and know what it sounds like. An intuitive visualization is comprised of a time axis and an axis with some other value of interest. In case of using time on x-axis and pitch on y-axis we get a piano roll notation, which is used as basis for some visualizations. Colour usage also varies throughout different visualizations. Smith and Williams [13] discussed a MIDI based visualization of music in 3-dimensional space, using colour to denote timbre1 . Music Animation Machine [7] encompasses a number of visualizations including piano roll and Tonnetz. It also uses colours to mark pitch classes. The assignment of colour to pitch class is based on assigning the colour wheel to the circle of fifths. Similar assignment was proposed by Scriabin (beginning of the 20th century). The basic idea of this assignment is that closely related keys or tones are mapped into related colours. Prior to Scriabin a commonly used mapping was colour to pitch, used already by Newton. However is not well suited to represent harmony because adjacent tones are weakly harmonically related. An outline of historical development of mappings of colour to pitch classes is given by Wells [14]. The comp-i system [9] expands the piano roll notation into three dimensions to allow the user to visually explore the source MIDI dataset and offers a view of the structure of the music as a whole, additionally allow to explore the hierarchy of music using ConeTree visualization [11]. Mardirossian and Chew [8] visualise tonal distribution of a piece by using Lerdahl’s twodimensional pitch space – they divide the piece into uniform slices and use a key-finding algorithm to determine the most likely key for the each slice. Keys are coloured by aligning the colour wheel and the circle of fifths. Bergstrom’s isochord [1] visualization highlights consonant intervals between tones and chords at a given time. It is based on Tonnetz grid and offers a view of changing of the harmony over time. Sapp [12] visualizes hierarchy of key regions of a given composition, where horizontal axis represents time and vertical axis represents the duration of the key-finding algorithm’s sliding window. Colour hues are assigned to keys by taking a part of the circle of fifths and mapping it into the colour wheel. A summary of visualizations is given by Isaacson [5]. 1 F C a e d B♭ G h D f♯ g E♭ A c G♯ c♯ g♯ f C♯ b♭ e♭ F♯ E B Figure 1: The key-spanning circle of thirds assigned to the colour wheel. 3. VISUALIZING HARMONY WITH COLOUR 3.1 Assignment of colours to musical tones The term of harmony encompasses consonance (especially of concurrent sounding tones), but more broadly it also involves the study of tonal progressions. The perception of consonance and dissonance of concurrent tones is related to the ratios of the tone frequencies [3]. In order of rising dissonance the most consonant interval between two tones is unison with a tone ratio of 1:1, followed by octave (ratio of 1:2), perfect fifth (2:3), major third (3:4), minor third (4:5) etc. Tones with simple (small integer) frequency ratios are perceived as similar – unison is made up of two same tones, similarity of octaves is also called octave equivalence and in consequence two tones that lie an octave apart belong to same pitch class. Following a series of perfect fifths from a chosen tone (belonging to a certain pitch class), after 12 steps we arrive roughly to the same pitch class. In this way we can generate all 12 pitch classes of the chromatic scale. These pitch classes can be organised in a circle of fifths where to adjacent tones are a perfect fifth (or perfect forth in opposite direction) apart. Similar tones are close together and dissimilar tones are on opposite sites. In addition of representing tones, the circle of fifths can also represent tonalities. Because we want to map similar tones to similar colours, the colour wheel is assigned to the circle of fifths. In the colour wheel the colours that are perceived similar are close together while complementary colours are on opposite sides. With such mapping the difference Timbre is also called tone colour. 49 (a) Without broader temporal segments. (a) Without broader temporal segments. (b) Visualised with broader temporal segments. Dissonance of the sequence is visible trough grey layers surrounding the chords. (b) With broader temporal segments. Figure 3: Visualization of a C major triad played as a broken chord on left and as a block chord on the right. Figure 2: Visualization of C major and F♯ major triads played successively. 3.3 or similarity between two colours is much more important than the psychological meaning of the colours so in consequence the initial orientation and alignment of the colour wheel and the circle of fifths can be chosen arbitrary. Our initial assignment is shown in Figure 1. 3.2 Calculating common colour for concurrent tones Concurrent tones are not perceived as entirely separate, but also as a whole [10]. To model this perception we can calculate a common colour for a group of tones. To reflect the difference between dissonant tone combinations, which are perceived as unpleasant and unstable on one side, and consonant, which are perceived as pleasant, dissonant combinations are represented by unsaturated colours and consonant by saturated colours. Combinations in between are also possible. Colour hue should represent similarity of the tone combinations. The tones of the 12-tone chromatic scale are represented by a vector originating in the centre of the circle and pointing towards the appropriate pitch class. To calculate a common colour for a combination of tones, the vectors are added together. The direction of the resultant vector represents the hue and the length represents the saturation. This method does not produce satisfactory for every combination because although the circle of fifths shows the similarity of unison, octave, perfect fifth and perfect forth, it does not show similarity of major and minor thirds. To account for this we use a revised method [6] for calculating colour of concurrent tones that uses key-spanning circle of thirds [4] instead of the circle of fifths. The key-spanning circle of thirds is made up of two circles of fifths slightly rotated in correspondence to each other, so that the clockwise neighbour of a tone in the circle signed with capital letters is its major third (Figure 1). Common colour of broader temporal segments The method for calculating colours works on concurrent tones – the piece has to be segmented in small time slices, with each analysed separately. But the concept of harmony more broadly encompasses more than just the consonance of concurrent sounding tones, it includes tonal progressions. If we have a series of random major chords, each chord’s colour would be fully saturated, but the sequence itself may be dissonant (Figure 2(a) shows C major and F♯ major triads being played in succession – each triad is consonant, but the sequence is dissonant). Broken chords are coloured tone by tone, although they are a spread out variant of a block chord (Figure 3(a) shows C major triad being played first as broken chord and as a block chord thereafter; the yellow coloured E tone in the broken chord visualization is noticeable). To address these problems neighbouring segments are joined to form broader segments and the colour is calculated for each joined segment using the method for calculating the colour of concurrent tones. The size of the joining window can be adjusted. 3.4 Integrating colour with spatial dimensions The basis for visualization is the piano roll notation. In the piano roll notation the x-axis represents time and the y-axis represents pitch. As a particular pitch may be played by instruments with different timbre at the same time, we extended the visualization with zaxis representing instruments. Each tone is drawn as a cylinder of fixed thickness with varying opacity depending on the loudness of the tone in given moment – silent tones are almost transparent, while loud tones are opaque. Decaying tones get gradually more transparent. The colour of tones varies and depends on colours of the segments. As very small segments are impractical 50 for real-time visualization, they are extended to reduce calculations and render time. The boundary between two extended segments is one of following events: start of a new tone, end of a tone, explicit change of loudness. Colour is calculated at the beginning and at the end of the segment, the colour values for the inside of the segment are linearly interpolated between the beginning and the end colours. This greatly reduces calculation time as in most cases change between two minimal segments is just gradual decay of tones. Harmonic structure of broader temporal segments is visualised by drawing semi-transparent layers around the tones (Figure 2(b) and 3(b)). The colour of the layer is determined by joining the segments with appropriate size of the joining window and calculating the colour for the joined broad segment. The factor of transparency of the layers is dependent on the number of joining window sizes to be displayed at a time (transparency of layers increases with their number). For performance reasons the number of layers and maximum joining window size is limited. Figure 4: The main visualization window displaying the extended piano roll visualization of an excerpt from Smetana’s Vltava. The harmonic relationships between concurrent tones and broader temporal segments are shown with colour. 4. IMPLEMENTATION The rendering of the visualization is made in OpenGL as the volume of data needed to be rendered in real-time can become large for some pieces. The visualization takes MIDI data as input, which is sufficient as colour calculation method takes the 12 tones of the chromatic scale as input. Tones that lie outside of the 12-tone chromatic scale are displayed with proper height in the 3-dimensional space, however for the purpose of calculating colour they are rounded to the nearest tone in the scale. Another reason for using MIDI is that it eliminates the problem of extracting tones from recorded sound. Input data is processed and rendered in realtime, allowing live input and observations of results. Figure 4 shows the main window of the visualization tool. As perception of harmony extends also in time dimension, colour for broader temporal segments is also calculated and displayed. This solves the problems with broken chords, arpeggios and dissonant sequences of tones or chords as can be seen in Figure 3 and 2. In this way tones that are played in sequence instead of concurrently are given properly coloured “context”. Colours of broader temporal segments of appropriate length also point to the possible chord for that part of the piece. Figure 5 depicts some examples of visualization of different musical pieces. In Figure 5(a) we can see slowly changing colours that indicate progression trough related chords, but at the end the colour settles in violet of D minor in which the piece is written. The piece in Figure 5(b) is centred around orange colour of D major. Arpeggio in the middle has a proper coloured context, although the constituent tones have varying colours. Goldsmith’s Star Trek Theme employs a lot key modulation, which can be seen as stable regions of one dominating colour and sudden changes of hue between the regions (Figure 5(c)). The use of dissonance depends on music styles – some avoid dissonances, some use it in very short segments that are afterwards resolved to stable consonances, some use it very extensively. Prokofiev’s Toccata in D minor has extensive regions of dissonance that can be seen as grey areas in Figure 5(d). The dissonant part consists of consonant and dissonant concurrent tone combinations, but the calculation of common colour for broader temporal segments results in dominant grey colour. Other types of music genres like popular music, jazz, folk music can be visualised without problems. Sounds 5. RESULTS AND DISCUSSION The purpose of our visualization is to show harmonic relationships with colour. For example consonant tone combinations like C and G, C and F, C and E have saturated colours, dissonant combinations like C and F♯, C and C♯ have low saturation. Related triads like C major, A minor, G major have similar colour hues (red to magenta), while C major and C minor, which are not harmonically related have distant hues (magenta and blue respectively). Complex tone combinations involving dissonant tones result in low saturated colours. As tone loudness is also taken into account when calculating colour, the resulting visualization has smooth transitions of colour and changes in transparency. This is especially noticeable in the visualization of decaying of the tones. 51 (a) Excerpt from Brahms’s Ballades, Op. 10 (b) Excerpt from Tchaikovsky’s Waltz of the Flowers (c) Excerpt from Goldsmith’s Star Trek Theme (d) Excerpt from Prokofiev’s Toccata in D minor, Op. 11 Figure 5: Examples of visualization of different compositions demonstrating the representation of harmony with colour. from instruments without definable pitch (i.e. most percussion instruments) are omitted. The input to visualization can be an arbitrary stream of tonal data so performances with mistakes or even random input can be visualised. Mistakes in performance are noticeable when compared to properly performed pieces for example in differences in colour. Random input results in numerous dissonances and the dominant colour is grey. light with different spectrums may produce same colour sensation. The original method for calculating colour considered only concurrent sounding tones. As human perception of harmony is not limited to a moment in time we expanded the method to encompass a broader time range and used to visualise harmonic structure of broader temporal segments of different lengths. The aim of the proposed visualization is to enable easier understanding and learning of harmony in music, to have an overview over the whole composition, compare it with other compositions and see what we may have missed by only listening. The visualization approaches these goals by creating a synergy between two distinct senses. The approach are still open for improvement. For instance, although major and parallel minor chords are differentiated by minor differences in hue, the psychological difference is bigger. Further work could also be done on improving the method for calculating the colour of tones to include more than only 12 tones of the chromatic scale or visualising rhythm as well. 6. CONCLUSIONS The proposed visualization of music strives not only to be aesthetically pleasing but to reveal the structure of music and show harmonic relationships in music using colour. To achieve this it uses a mapping that translates similarity in perception of tones to similarity in perception of colour. We use a method based on vector addition inside the key-spanning circle of thirds, which takes a group of tones as input and calculates a common colour for the group. This functions in similar way our auditory system perceives a group of tones as a whole, sometimes even completely merging the tones. However, given a resulting colour it is not possible to figure out which tones were used to calculate it. Nevertheless this is similar to the way colour perception works, where 52 7. REFERENCES [1] T. Bergstrom, K. Karahalios, and J. C. Hart, “Isochords: Visualizing structure in music,” in GI’07: Proceedings of Graphics Interface 2007, 2007, pp. 297–304. [2] P. Ciuha, B. Klemenc, and F. Solina, “Visualization of concurrent tones with colour,” 2010, submitted to ACM Multimedia 2010. [3] D. Deutsch, The Psychology of Music, 2nd ed. Academic Press, 1998. [4] G. Gatzsche, M. Mehnert, D. Gatzsche, and K. Brandenburg, “A symmetry based approach for musical tonality analysis,” in 8th International Conference on Music Information Retrieval (ISMIR 2007), Vienna, Austria, 2007. [5] E. J. Isaacson, “What you see is what you get: on visualizing music,” in ISMIR, 2005, pp. 389–395. [6] B. Klemenc, “Visualization of music on the basis of translation of concurrent tones into color space,” Dipl. Ing. thesis, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, 2008. [7] S. Malinowski. (2007) Music animation machine. [Online]. Available: http://www.musanim.com [8] A. Mardirossian and E. Chew, “Visualizing music: Tonal progressions and distributions,” in 8th International Conference on Music Information Retrieval, Vienna, Austria, September 2007. [9] R. Miyazaki, I. Fujishiro, and R. Hiraga, “Exploring midi datasets,” in SIGGRAPH 2003 conference on Sketches & applications. New York, NY, USA: ACM Press, 2003. [10] R. Parncutt, Harmony: A Psychoacoustical Approach. Springer-Verlag, 1989, ch. 2. [11] G. G. Robertson, J. D. Mackinlay, and S. K. Card, “Cone trees: animated 3d visualizations of hierarchical information,” in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’91), 1991, pp. 189–194. [12] C. S. Sapp, “Harmonic visualizations of tonal music,” in ICMC’01: Proceedings of the International Computer Music Conference 2001, 2001, pp. 419–422. [13] S. M. Smith and G. N. Williams, “A visualization of music,” in VIS’97: Proceedings of the 8th conference on Visualization 1997, 1997, pp. 499–503. [14] A. Wells, “Music and visual color: A proposed correlation,” in Leonardo, vol. 13, 1980, pp. 101–107. 53 Reviewers by Countries Argentina Olsina, Luis; National University of La Pama Ovando, Gabriela P.; Universidad Nacional de Rosario Rossi, Gustavo; Universidad Nacional de La Plata Australia Abramov, Vyacheslav; Monash University Begg, Rezaul; Victoria University Bem, Derek; University of Western Sydney Betts, Christopher; Pegacat Computing Pty. Ltd. Buyya, Rajkumar; The University of Melbourne Chapman, Judith; Australian University Limited Chen, Yi-Ping Phoebe; Deakin University Hammond, Mark; Flinders University Henman, Paul; University of Queensland Palmisano, Stephen; University of Wollongong Ristic, Branko; Science and Technology Organisation Sajjanhar, Atul; Deakin University Sidhu, Amandeep; University of Technology, Sydney Sudweeks, Fay; Murdoch University Austria Derntl, Michael; University of Vienna Hug, Theo; University of Innsbruck Loidl, Susanne; Johannes Kepler University Linz Stockinger, Heinz; University of Vienna Sutter, Matthias; University of Innsbruck Walko, Zoltan Brazil Parracho, Annibal; Universidade Federal Fluminense Traina, Agma; University of Sao Paulo Traina, Caetano; University of Sao Paulo Vicari, Rosa; Federal University of Rio Grande Belgium Huang, Ping; European Commission Canada Fung, Benjamin; Simon Fraser University Grayson, Paul; York University Gray, Bette; Alberta Education Memmi, Daniel; UQAM Neti, Sangeeta; University of Victoria Nickull, Duane; Adobe Systems, Inc. Ollivier-Gooch, Carl; The University of British Columbia Paulin, Michele; Concordia University Plaisent, Michel; University of Quebec Reid, Keith; Ontario Ministry og Agriculture Shewchenko, Nicholas; Biokinetics and Associates Steffan, Gregory; University of Toronto Vandenberghe, Christian; HEC Montreal Croatia Jagnjic, Zeljko; University of Osijek Czech Republic Kala, Zdenek; Brno University of Technology Korab, Vojtech; Brno University of technology Lhotska, Lenka; Czech Technical University Cyprus Kyriacou, Efthyvoulos; University of Cyprus Denmark Bang, Joergen; Aarhus University Edwards, Kasper; Technical University Denmark Orngreen, Rikke; Copenhagen Business School Estonia Kull, Katrin; Tallinn University of Technology Reintam, Endla; Estonian Agricultural University Finland Lahdelma, Risto; University of Turku Salminen, Pekka; University of Jyvaskyla France Bournez, Olivier Cardey, Sylviane; University of Franche-Comte Klinger, Evelyne; LTCI – ENST, Paris Roche, Christophe; University of Savoie Valette, Robert; LAAS - CNRS Germany Accorsi, Rafael; University of Freiburg Glatzer, Wolfgang; Goethe-University Gradmann, Stefan; Universitat Hamburg Groll, Andre; University of Siegen Klamma, Ralf; RWTH Aachen University Wurtz, Rolf P.; Ruhr-Universitat Bochum Greece Katzourakis, Nikolaos; Technical University of Athens Bouras, Christos J.; University of Patras and RACTI Hungary Nagy, Zoltan; Miklos Zrinyi National Defense University India Pareek, Deepak; Technology4Development Scaria, Vinod; Institute of Integrative Biology Shah, Mugdha; Mansukhlal Svayam Ireland Eisenberg, Jacob; University College Dublin Israel Feintuch, Uri; Hadassah-Hebrew University Italy Badia, Leonardo; IMT Institute for Advanced Studies Berrittella, Maria; University of Palermo Carpaneto, Enrico; Politecnico di Torino Japan Hattori, Yasunao; Shimane University Livingston, Paisley; Linghan University Srinivas, Hari; Global Development Research Center Obayashi, Shigeru; Institute of Fluid Science, Tohoku University Mexico Morado, Raymundo; University of Mexico Netherlands Mills, Melinda C.; University of Groningen Pires, Luís Ferreira; University of Twente New Zealand Anderson, Tim; Van Der Veer Institute 53 Philippines Castolo, Carmencita; Polytechnic University Philippines Poland Kopytowski, Jerzy; Industrial Chemistry Research Institute Portugal Cardoso, Jorge; University of Madeira Natividade, Eduardo; Polytechnic Institute of Coimbra Oliveira, Eugenio; University of Porto Republic of Korea Ahn, Sung-Hoon; Seoul National University Romania Moga, Liliana; “Dunarea de Jos” University Serbia Mitrovic, Slobodan; Otorhinolaryngology Clinic Stanojevic, Mladen; The Mihailo Pupin Institute Ugrinovic, Ivan; Fadata, d.o.o. Singapore Tan, Fock-Lai; Nanyang Technological University Slovenia Kocijan, Jus; Jozef Stefan Institute and University of Nova Gorica South Korea Kwon, Wook Hyun; Seoul National University Spain Barrera, Juan Pablo Soto; University of Castilla Gonzalez, Evelio J.; University of La Laguna Perez, Juan Mendez; Universidad de La Laguna Royuela, Vicente; Universidad de Barcelona Vizcaino, Aurora; University of Castilla-La Mancha Vilarrasa, Clelia Colombo; Open University of Catalonia Sweden Johansson, Mats; Royal Institute of Technology Switzerland Niinimaki, Marko; Helsinki Institute of Physics Pletka, Roman; AdNovum Informatik AG Rizzotti, Sven; University of Basel Specht, Matthias; University of Zurich Taiwan Lin, Hsiung Cheng; Chienkuo Technology University Shyu, Yuh-Huei; Tamkang University Sue, Chuan-Ching; National Cheng Kung University Ukraine Vlasenko, Polina; EERC-Kyiv United Kingdom Ariwa, Ezendu; London Metropolitan University Biggam, John; Glasgow Caledonian University Coleman, Shirley; University of Newcastle Conole, Grainne; University of Southampton Dorfler, Viktor; Strathclyde University Engelmann, Dirk; University of London Eze, Emmanuel; University of Hull Forrester, John; Stockholm Environment Institute Jensen, Jens; STFC Rutherford Appleton Laboratory Kolovos, Dimitrios S.; The University of York McBurney, Peter; University of Liverpool Vetta, Atam; Oxford Brookes University Westland, Stephen; University of Leeds WHYTE, William Stewart; University of Leeds Xie, Changwen; Wicks and Wilson Limited USA Bach, Eric; University of Wisconsin Bazarian, Jeffrey J.; University of Rochester School Bolzendahl, Catherine; University of California Bussler, Christoph; Cisco Systems, Inc. Charpentier, Michel; University of New Hampshire Chester, Daniel; Computer and Information Sciences Chong, Stephen; Cornell University Collison, George; The Concord Consortium DeWeaver, Eric; University of Wisconsin - Madison Ellard, Daniel; Network Appliance, Inc Gaede, Steve; Lone Eagle Systems Inc. Gans, Eric; University of California Gill, Sam; San Francisco State University Gustafson, John L..; ClearSpeed Technology Hunter, Lynette; University of California Davis Iceland, John; University of Maryland Kaplan, Samantha W.; University of Wisconsin Langou, Julien; The University of Tennessee Liu, Yuliang; Southern Illinois University Edwardsville Lok, Benjamin; University of Florida Minh, Chi Cao; Stanford University Morrissey, Robert; The University of Chicago Mui, Lik; Google, Inc Rizzo, Albert ; University of Southern California Rosenberg, Jonathan M. ; University of Maryland Shaffer, Cliff ; Virginia Tech Sherman, Elaine; Hofstra University Snyder, David F.; Texas State University Song, Zhe; University of Iowa Wei, Chen; Intelligent Automation, Inc. Yu, Zhiyi; University of California Venezuela Candal, Maria Virginia; Universidad Simon Bolívar IPSI Team Advisors for IPSI Developments and Research: Zoran Babovic, Darko Jovic, Aleksandar Crnjin, Marko Stankovic, Marko Novakovic Authors of papers are responsible for the contents and layout of their papers. 54 Welcome to IPSI BgD Conferences and Journals! http://www.internetconferences.net http://www.internetjournals.net CIP – Katalogizacija u publikaciji Narodna biblioteka Srbije, Beograd ISSN 1820 – 4503 = The IPSI BGD Transactions on Internet Research COBISS.SR - ID 119128844

Log In

Merging data sources based on semantics, contexts and trust

Related papers

Related papers

Related topics