Academia.eduAcademia.edu

A (sub)graph isomorphism algorithm for matching large graphs

2004, IEEE Transactions on Pattern Analysis and Machine Intelligence

We present an algorithm for graph isomorphism and subgraph isomorphism suited for dealing with large graphs. A first version of the algorithm has been presented in a previous paper, where we examined its performance for the isomorphism of small and medium size graphs. The algorithm is improved here to reduce its spatial complexity and to achieve a better performance on large graphs; its features are analyzed in detail with special reference to time and memory requirements. The results of a testing performed on a publicly available database of synthetically generated graphs and on graphs relative to a real application dealing with technical drawings are presented, confirming the effectiveness of the approach, especially when working with large graphs.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento Abstract—We present an algorithm for graph isomorphism and subgraph isomorphism suited for dealing with large graphs. A first version of the algorithm has been presented in a previous paper, where we examined its performance for the isomorphism of small and medium size graphs. The algorithm is improved here to reduce its spatial complexity and to achieve a better performance on large graphs; its features are analyzed in detail with special reference to time and memory requirements. The results of a testing performed on a publicly available database of synthetically generated graphs and on graphs relative to a real application dealing with technical drawings are presented, confirming the effectiveness of the approach, especially when working with large graphs. Index Terms—Graph-subgraph isomorphism, large graphs, attributed relational graphs. æ 1 INTRODUCTION IN the last years, the scientific community active in the fields of pattern analysis, pattern recognition, and computer vision has considered graphs with increasing interest, and the applications employing graphs have multiplied. Graphs are commonly used for providing structural descriptions of images by decomposing them into parts and associating graph nodes and branches to components and their relationships. Handwritten characters, ideograms, and symbols in documents and 3D scenes, just to mention a few examples, have been described in this way [1], [2], [3], [4], [5]. From the point of view of pattern analysis and recognition, the most important problem of graph processing is matching graphs or subgraphs for comparing them. An extensive review of graph matching algorithms for pattern recognition has been recently made [6]. In exact graph matching, a strict correspondence between the two graphs is sought; another basic research problem, called inexact matching, concerns the extension of the matching concepts to the case in which similarity between two graphs, not their exact correspondence, is of interest [2]. The most common inexact algorithms (error-correcting algorithms) use a set of editing operations, such as node and branch insertion, deletion, or substitution, in order to find an exact matching between the two graphs [7], [8]. Algorithms evaluating the distance between graphs in order to estimate their degree of similarity have also been proposed (e.g., see [9]). In this paper, the attention will be devoted to exact matching. Besides being part of error correcting matching procedures, exact matching may be of interest in different pattern analysis and recognition contexts, thus deserving attention by the pattern recognition community. As it is well-known, among the different types of graph matching (monomorphism, isomorphism, graph subgraph isomorphism) subgraph isomorphism is a NP-complete problem, while it is still an open question if also graph isomorphism is a NP-complete problem. The exponential time requirement of matching algorithms . L.P. Cordella, P. Foggia, and C. Sansone are with the Dipartimento di Informatica e Sistemistica, Universitá di Napoli ”Federico II” Via Claudio 21, I-80125 Napoli, Italy. E-mail: {cordel, foggiapa, carlosan}@unina.it. . M. Vento is with the Dipartimento di Ingegneria dell’Informazione e di Ingegneria Elettrica, Universitá di Salerno Via Ponte Don Melillo, 1 I-84084, Fisciano (SA), Italy. E-mail: [email protected]. Manuscript received 16 Apr. 2002; revised 27 Jan. 2004; accepted 17 Feb. 2004. Recommended for acceptance by E. Hancock. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 116354. VOL. 26, NO. 10, OCTOBER 2004 1367 has been the main impediment for applications requiring graphs of large size (hundreds or thousands of nodes). Low complexity algorithms suited for matching large graphs have been a subject of research during the last three decades. Some of the proposed algorithms reduce the overall computational complexity of the matching process by imposing restrictions on the graphs (e.g., polynomial algorithms for trees, planar graphs, or bounded valence graphs [10]). An alternative approach is that of using an adequate representation of the searching process and pruning unprofitable paths in the search space, without imposing any restriction on the graph structure. A procedure that significantly reduces the size of the search space is the backtracking algorithm proposed by Ullmann [11]. This algorithm, devised for both graph isomorphism and subgraph isomorphism, is still today one of the most commonly used for exact graph matching. In [12], it is compared with other algorithms, resulting the most convenient in terms of matching time, in case of one-to-one matching. During the process, the algorithm allows the integrated comparison of semantic information. For the above reasons, in the following, we will systematically compare our results with those of the Ullmann’s algorithm. Among graph isomorphism algorithms, it is also necessary to mention the Nauty algorithm [13], which transforms the graphs to be matched to a canonical form before checking for the isomorphism. Even if it is considered one of the fastest graph isomorphism algorithms available, it has been shown that there are categories of graphs for which it employs exponential time. Furthermore, it cannot be used for solving the graph-subgraph isomorphism problem. A rather recent method [14] attempts to reduce the overall computational cost when matching a sample graph against a large set of prototypes, resulting in a quadratic time with respect to graph size, but with an exponential memory requirement and preprocessing time. Other existing techniques, such as nondeterministic ones (e.g., [15]), are so powerful as to reduce the complexity, in most cases, from exponential to polynomial, but are not guaranteed to find an exact and optimal solution. In this paper, we propose a deterministic matching method for verifying both isomorphism and subgraph isomorphism. The algorithm has general validity since no constraints are imposed on graph topology. A state space representation (SSR) of the matching process is used and a set of five feasibility rules for pruning the search tree are introduced. The adopted representation allows one to simultaneously carry out the syntactic and semantic comparison of the pairs of nodes to be matched. With respect to a preliminary version of the algorithm, described in [16] and referred to as the VF algorithm, the main improvement introduced is that the data structures employed during the exploration of the search space are organized in such a way to significantly reduce memory requirements. Thus, the algorithm is suitable for matching graphs with thousands of nodes and branches. An accurate testing has been performed on a publicly available database of synthetically generated graphs [17] and on attributed graphs obtained from a real application in the field of technical drawings. A comparative experimental analysis completes the performance characterization of the algorithm. The paper is organized as follows: In Section 2, a short description of the improved graph-matching algorithm, named VF2, is described and the results of the theoretical analysis of its overall efficiency, in terms of computational and spatial complexity, are presented. Section 3 is devoted to algorithm testing and comparative analysis of the obtained results. Final notes and conclusions are in Section 4. 2 THE VF2 ALGORITHM A matching process between two graphs G1 ¼ ðN1 ; B1 Þ and G2 ¼ ðN2 ; B2 Þ consists in the determination of a mapping M which associates nodes of G1 with nodes of G2 and vice versa, according to 0162-8828/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1368 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 10, OCTOBER 2004 Fig. 1. The VF2 matching algorithm. some predefined constraints. Generally, the mapping M is expressed as the set of pairs ðn; mÞ (with n 2 G1 and m 2 G2 ) each representing the mapping of a node n of G1 with a node m of G2 . A mapping M  N1  N2 is said to be an isomorphism iff M is a bijective function that preserves the branch structure of the two graphs. A mapping M  N1  N2 is said to be a graph-subgraph isomorphism iff M is an isomorphism between G2 and a subgraph of G1 . The process of finding the mapping function can be suitably described by means of a State Space Representation (SSR) [18]. Each state s of the matching process can be associated to a partial mapping solution MðsÞ, which contains only a subset of M. MðsÞ univocally identifies two subgraphs of G1 and G2 , say G1 ðsÞ and G2 ðsÞ, obtained by selecting from G1 and G2 only the nodes included in MðsÞ, and the branches connecting them. In the following, we will denote by M1 ðsÞ, M2 ðsÞ, B1 ðsÞ, and B2 ðsÞ the sets of nodes of G1 ðsÞ and G2 ðsÞ and the corresponding branches. According to these definitions, a transition from a generic state s to a successor s0 represents the addition to the partial graphs associated to s in the SSR, of a pair ðn; mÞ of matched nodes. Among all the possible SSR states, only a small subset is consistent with the wanted morphism type, in the sense that there are no conditions that preclude the possibility of reaching a complete solution. It can be proven that the consistency condition, in case of isomorphism or graph subgraph isomorphism, is that the partial graphs G1 ðsÞ and G2 ðsÞ associated to MðsÞ are isomorphic. Our algorithm introduces a set of rules able to verify the consistency conditions, making possible the generation of consistent states only. Moreover, the number of states generated in the process can be further reduced by adding a set of rules (that we call k-look-ahead rules) for checking in advance if a consistent state s has no consistent successors after k steps. Hereinafter, all the mentioned rules will be called feasibility rules. For the sake of convenience, let us introduce the so-called feasibility function F ðs; n; mÞ, which is true if the addition to a state s of the pair ðn; mÞ satisfies all the feasibility rules. The above feasibility rules depend only on the structure of the input graphs. However, if the input graphs have node and branch attributes, they also must be taken into account. Thus, the most general form of the feasibility function is: F ðs; n; mÞ ¼ Fsyn ðs; n; mÞ ^ Fsem ðs; n; mÞ; ð1Þ where Fsyn (syntactic feasibility) depends only on the structure of the graphs, and Fsem (semantic feasibility) depends on the attributes. A high-level description of the algorithm we propose is outlined in Fig. 1. In the initial state s0 , the mapping function does not contain any component, i.e., Mðs0 Þ ¼ ;. For each intermediate state s, the algorithm computes the set P ðsÞ (see next section for more details) of the node pairs that are candidate to be added to the current state s. For each pair p belonging to P ðsÞ, the feasibility rules are evaluated; if they succeed, i.e., F ðs; n; mÞ is true, being p ¼ ðn; mÞ, the successor state s0 ¼ s [ p is computed and the whole process recursively applies to s0 . Note that the algorithm explores the search graph in the SSR according to a depth-first search strategy. Using this simple formulation, a state can be reached through different paths. In order to avoid that, during the matching process, the algorithm generates useless and already generated states, a special procedure for generating a node successor is used. An arbitrary, total order relation (denoted by  ) is defined on those nodes of G2 which belong to the set P ðsÞ. Since the node insertion order in the partial solution MðsÞ does not influence the resulting state, the algorithm ignores any pair ðni ; mj Þ in P ðsÞ if this set already contains a node mk  mj . This simple strategy allows the algorithm to generate each state only once. In the following section, we will examine how the set P ðsÞ is defined; then, in Section 2.2, we will address the definition of the feasibility rules. 2.1 Computation of the Candidate Pairs Set P ðsÞ The set P ðsÞ of all the possible pairs candidate to be added to the current state is obtained by considering first the sets of the nodes directly connected to G1 ðsÞ and G2 ðsÞ. Let us denote with T1out ðsÞ and T2out ðsÞ the sets of nodes, not yet in the partial mapping, that are the destination of branches starting from G1 ðsÞ and G2 ðsÞ, respectively; similarly, with T1in ðsÞ and T2in ðsÞ, we will denote the sets of nodes, not yet in the partial mapping, that are the origin of branches ending into G1 ðsÞ and G2 ðsÞ. The set P ðsÞ will be made of all the node pairs ðn; mÞ, with n belonging to T1out ðsÞ and m to T2out ðsÞ, unless one of these two sets is empty. In this case, the set P ðsÞ is likewise obtained by considering T1in ðsÞ and T2in ðsÞ, respectively. In presence of not connected graphs, for some state s, all of the above sets may be empty. In this case, the set of candidate pairs making up P ðsÞ will be the set P d ðsÞ of all the pairs of nodes not contained neither in G1 ðsÞ nor in G2 ðsÞ. 2.2 Feasibility Rules Five feasibility rules are defined: Rpred , Rsucc , Rin , Rout , and Rnew . The first two rules check the consistency of the partial solution Mðs0 Þ obtained by adding the considered candidate pair ðn; mÞ to the current partial solution MðsÞ. The remaining three rules are introduced for pruning the search tree; in particular, Rin and Rout perform a 1-look-ahead in the searching process, and Rnew a 2-lookahead. In conclusion, the proposed feasibility function is: Fsyn ðs; n; mÞ ¼ Rpred ^ Rsucc ^ Rin ^ Rout ^ Rnew : ð2Þ Given a graph G ¼ ðN; BÞ and a node n 2 N, the sets, respectively, containing the predecessors and the successors of n will be denoted by PredðG; nÞ and SuccðG; nÞ. Also, in the following definitions, we will use the sets T1 ðsÞ ¼ T1in ðsÞ [ T1out ðsÞ and Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, N~1 ðsÞ ¼ N1  M1 ðsÞ  T1 ðsÞ. Similar expressions hold for T2 and N~2 . Here, are the formal definitions of the five rules for the subgraph isomorphism case: Rpred ðs; n; mÞ () ð8n0 2 M1 ðsÞ \ PredðG1 ; nÞ9m0 2 PredðG2 ; mÞ j ðn0 ; m0 Þ 2 MðsÞÞ^ ð8m0 2 M2 ðsÞ \ PredðG2 ; mÞ9n0 2 PredðG1 ; nÞ j ðn0 ; m0 Þ 2 MðsÞÞ; ð3Þ Rsucc ðs; n; mÞ () ð8n0 2 M1 ðsÞ \ SuccðG1 ; nÞ9m0 2 SuccðG2 ; mÞ j ðn0 ; m0 Þ 2 MðsÞÞ^ ð8m0 2 M2 ðsÞ \ SuccðG2 ; mÞ9n0 2 SuccðG1 ; nÞ j ðn0 ; m0 Þ 2 MðsÞÞ; ð4Þ Rin ðs; n; mÞ () ðCardðSuccðG1 ; nÞ \ T1in ðsÞÞ  CardðSuccðG2 ; mÞ \ T2in ðsÞÞÞ^ ðCardðPredðG1 ; nÞ \ T1in ðsÞÞ  CardðPredðG2 ; mÞ \ ð5Þ T2in ðsÞÞÞ; Rout ðs; n; mÞ () ðCardðSuccðG1 ; nÞ \ T1out ðsÞÞ  CardðSuccðG2 ; mÞ \ T2out ðsÞÞÞ^ ð6Þ ðCardðPredðG1 ; nÞ \ T1out ðsÞÞ  CardðPredðG2 ; mÞ \ T2out ðsÞÞÞ; Rnew ðs; n; mÞ () CardðN~1 ðsÞ \ PredðG1 ; nÞÞ  CardðN~2 ðsÞ \ PredðG2 ; nÞÞ^ ð7Þ CardðN~1 ðsÞ \ SuccðG1 ; nÞÞ  CardðN~2 ðsÞ \ SuccðG2 ; nÞÞ: Considering the graph isomorphism instead of the subgraph isomorphism, the rules Rsucc and Rpred maintain the same form, while in rules Rin , Rout , and Rnew , the  operator must be substituted by ¼ . 2.3 Semantic Feasibility When we turn our attention to attributed graphs, the inclusion of node/branch attributes in the matching algorithm can be performed in two ways, depending on whether we have symbolic or (real-valued) numeric attributes. For symbolic attributes that, in some cases, are derived from numeric attributes through a quantization process, we suppose that a compatibility relation  is defined between two node/branch attributes. For some applications,  may coincide with the equality relation, while in other cases a more “tolerant” definition may be necessary. Each time we check the feasibility of a new pair, the attributes of the nodes and branches being added are tested for semantic compatibility. Formally, we can define: Fsem ðs; n; mÞ () n  m ^ 8ðn0 ; m0 Þ 2 MðsÞ; ðn; n0 Þ 2 B1 ) ðn; n0 Þ  ðm; m0 Þ ^ 8ðn0 ; m0 Þ 2 MðsÞ; ðn0 ; nÞ 2 B1 ) ðn0 ; nÞ  ðm0 ; mÞ: ð8Þ For numeric attributes, we exploit this information in two ways. First, a compatibility relation is defined on the basis of a thresholding on the absolute difference of the attributes being matched, leading to a semantic feasibility function analogous to (8). Furthermore, a cost function is introduced to give a quantitative evaluation of the dissimilarity between two nodes or branches. The algorithm, in its exploration of the search space, saves only the matching that obtains the minimum total cost. The cost is actually computed for each state s, as the sum of the cost of its parent state and of the costs due to the newly added nodes and branches. Since these latter are assumed to be not negative, the total cost of a state will be greater than or equal to the costs of all its ancestors. We can use this information to prune all the states whose cost is greater than the cost of the best goal state reached so far, further reducing the search space. VOL. 26, NO. 10, 2.4 OCTOBER 2004 1369 Data Structures and Implementation Issues In order to make the algorithm run with an acceptable time and space complexity also on large graphs, it is important to employ well-devised data structures for performing the computation of P ðsÞ and of F ðs; n; mÞ. In the actual implementation, the following data structures are used: Two vectors, core_1 and core_2, whose dimensions correspond to the number of nodes in G1 and G2 , respectively, containing the current mapping; in particular, core_1[n] contains the index of the node paired with n, if n is in M1 ðsÞ, and the distinguished value NULL_NODE otherwise. The same encoding is used for core_2. . Four vectors, in_1, out_1, in_2, out_2, whose dimensions are equal to the number of nodes in the corresponding graphs, describing the membership of the terminal sets. In particular, in_1[n] is nonzero if n is either in M1 ðsÞ or in T1in ðsÞ; similar definitions hold for the other three vectors. The actual value stored in the vectors is the depth in the SSR tree of the state in which the node entered the corresponding set. Using the vectors described above, the tests for the membership of the various sets require a constant time. It follows that the computation of P ðsÞ can be done in a time in the worst case proportional to j N1 j þ j N2 j , while the computation of F ðs; n; mÞ can be performed in a time proportional to the number of the branches involving n and m. It is important to note that all the vectors have the following property: If an element is nonnull in a state s, it will remain nonnull in all the states descending from s. This property, together with the depth-first strategy of the search, is used to avoid the need to store a different copy of the vectors for each state: When the algorithm backtracks, it restores the previous value of the vectors. The memory requirement, with respect to the number of nodes N, is quite lower than in other similar algorithms. In fact, except for the six vectors, that are shared among the states, each state needs a constant (and small) amount of memory, and the depth-first search strategy ensures that there can be at most N states in memory at a time. It follows that the memory required is ðNÞ, with a small constant factor. Table 1 summarizes the time and spatial complexity of our algorithm compared with that of Ullmann’s Algorithm as can be deduced from [11] and [12], in the best and worst case. The analytical estimation of the computational complexity in the average case is not a simple task unless some very restrictive assumptions are made. For this reason, we have performed a set of tests aimed at evaluating the average time required by the matching in case of both isomorphism and graph-subgraph isomorphism, as reported in the following section. Time complexity has been obtained considering that in the best case our algorithm visits N states, while in the worst case N! states need to be explored. . 3 EXPERIMENTAL RESULTS We have systematically compared the results of the VF2 algorithm with those obtained on the same data by Ullmann’s Algorithm, for the reasons mentioned in Section 1. Although an implementation of the latter algorithm was already available on the Web (ftp:// ftp.iam.unibe.ch/pub/Tools/GUB_toolkit.tar.Z), for our testing we have developed a more effective code and made it also available on the Web (http://amalfi.dis.unina.it/graph/). Moreover, as regards the isomorphism case, we also compared the results obtained by Ullmann’s Algorithm and by ours with those obtained by the Nauty Algorithm. In particular, in our tests, we used the version 2.0b9 of the Nauty Algorithm made available by B.D. McKay at the URL: http://cs.anu.edu.au/~bdm/nauty. For the isomorphism case, the database used for testing algorithms’ performance was made of 10,000 couples of isomorphic graphs: This is part of a wider database of synthetically Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1370 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 10, OCTOBER 2004 TABLE 1 Spatial and Time Complexity of VF2 and of Ullmann’s Algorithm in the Best and Worst Case TABLE 2 A Comparison between VF2 and Nauty as a Function of the Graph Size and of the Kind of the Graphs generated graphs, especially developed for benchmarking purposes [17] and available on the Web (http://www.iapr.org/ benchmarks.html). On the other hand, the performance of the graph-subgraph matching algorithm has been evaluated in the context of a real problem: the detection of component parts in large images. Namely, mechanical line drawings and topographic maps have been considered. 3.1 Graph Isomorphism The performance of the three algorithms has been evaluated on the following kinds of graphs: randomly connected graphs (3,000 couples), regular and irregular 2D meshes (4,000 couples), and bounded valence graphs (3,000 couples). Each category contains couples of graphs of different size, ranging from a few dozens to about 1,000 nodes. For each size and kind of graph, 100 different couples have been considered. In the following, a brief description of each category is given. Randomly connected graphs are graphs where edges connect nodes without any structural regularity. To generate these graphs, we have adopted the same model proposed in [11]: It fixes the value  of the probability that an edge is present between two distinct nodes. The probability distribution is assumed to be uniform, and the edges are independent. 2D mesh graphs are considered for simulating applications dealing with regular structures as those operating at the lower levels of a vision task. The considered meshes are 4-connected (i.e., each node is connected only with the nodes at north, south, east, and west) by directed edges. Irregular 2D meshes have been introduced for simulating the behavior of the algorithms in presence of slightly distorted meshes. These have been obtained from regular 2D meshes by the addition of N edges (where  is a positive constant), each connecting nodes that have been randomly determined according to a uniform distribution. Bounded valence graphs model those applications in which each object (i.e., a node) establishes a fixed number of relations (edges) with other objects. Three different values of the valence v (3, 6, and 9) have been considered. Table 2 summarizes the obtained results,1 showing the algorithm that achieves the best performance for each combination of graph size and type. From the table, it results that VF2 performs better on 56 of the 100 considered combinations, while on the remaining 44 cases the best algorithm is Nauty. Moreover, albeit it is not evident from the table, in the cases in which Nauty obtains the best performance, VF2 is always the second best; on the other hand, there are six cases in which both VF2 and Ullmann’s algorithms outperform Nauty. From the analysis of the table, it appears that Nauty is more convenient on randomly connected graphs that exhibit no regular structure, especially when the edge density becomes high. This kind of graph, anyway, does not adequately represent the graph structures found in many applications, where the graphs often show some form of regularity. On the other hand, for graphs with a more regular structure, VF2 is more efficient, especially for large graph sizes. 3.2 Graph-Subgraph Isomorphism of Attributed Graphs In order to test our algorithm in the context of a graph-subgraph isomorphism application, we have employed a set of attributed graphs derived from large line drawings according to a method described in [20]. In particular, we have used two publicly available images,2 respectively, representing a mechanical drawings (ENGINE-2) and a cadastral map (MAP-1). From each image, a set of subimages corresponding to the image connected components was extracted and represented as graphs. Features of the obtained graphs and subgraphs are shown in Table 3. Fig. 2 shows the MAP-1 image with some of the component parts 1. Further experimental results can be found at the URL: http:// amalfi.dis.unina.it/graph. 2. A CD-ROM with the images was made available by M. Burge and W. Kropatsch during the SSPR workshop held in Sidney in August 1998. Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 10, OCTOBER 2004 1371 TABLE 3 Some Features of Graphs and Subgraphs Obtained from the Test Images Fig. 2. (a) The MAP-1 image and (b) some of the parts used as subgraphs. Fig. 3. Matching times for the subgraphs of the MAP-1 image (a) and of the ENGINE-2 image (b). considered. In the adopted encoding of the images, graph branches represent strokes and graph nodes represent stroke junctions or end points. Nodes and branches are labeled with numeric attributes characterizing absolute position of the points and shape and orientation of the strokes connecting them. For our test, we have neglected node attributes and used stroke length and orientation as branch attributes. A simple semantic feasibility rule has been defined, allowing two branches to match if they are roughly similar. A semantic feasibility rule requiring the equality of the corresponding attributes would have been of course more effective in reducing the matching effort, but, as already mentioned, an inexact rule provides a more realistic estimate of the algorithm behavior in real applications. The feasibility rule assumes that two branches are similar if their lengths differ by less than 30 percent and their orientations differ by less than 30 degrees. The matching times of the VF2 algorithm have been compared with those obtained with Ullmann’s Algorithm, modified so as to take into account the same semantic feasibility rule. Fig. 3 reports the performance of the two algorithms on the two considered images. It can be seen that our algorithm performs significantly better, especially when the size of the subgraphs is over about 20 nodes. In fact, while the matching time for Ullmann’s Algorithm rises rapidly with the number of subgraph nodes, the time needed by our algorithm is almost independent of the number of nodes. The time ratio reaches four orders of magnitude for subgraphs of more than 100 nodes. 4 CONCLUSIONS We have presented and evaluated, both analytically and experimentally, a graph matching algorithm, whose computational complexity is reduced, due to the use of a set of feasibility rules during the matching process. The algorithm is tailored for dealing with large graphs without making particular assumptions on the nature of the graphs to be matched and can be used for both isomorphism and graph-subgraph isomorphism. Another distinctive feature of the algorithm is its ability to deal also with Attributed Relational Graphs, profitably exploiting the information held by the semantic part of the ARG in order to further reducing the matching time. The achievement seems of particular interest since almost all of the algorithms presented in the literature till now do not satisfy all the above requirements together. Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1372 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] I. Rocha and T. Pavlidis, “A Shape Analysis Model with Application to a Character Recognition System,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, pp. 393-404, 1994. L.G. Shapiro and R.M. Haralick, “Structural Description and Inexact Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, no. 3, pp. 505-519, 1981. L.P. Cordella and M. Vento, “Symbol Recognition in Documents: A Collection of Techniques?” Int’l J. Document Analysis and Recognition, vol. 3, pp. 73-88, 2000. L. Jianzhuang and L.Y. Tsui, “Graph-Based Method for Face Identification from a Single 2D Line Drawing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1106-1119, 2001. J. Llados, E. Marti, and J.J. Villanueva, “Symbol Recognition by ErrorTolerant Subgraph Matching between Region Adjacency Graphs,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1137-1143, 2001. D. Conte, P. Foggia, C. Sansone, and M. Vento, “Thirty Years of Graph Matching in Pattern Recognition,” Int’l J. Pattern Recognition and Artificial Intelligence, vol. 18, no. 3, pp. 265-298, 2004. L.P. Cordella, P. Foggia, C. Sansone, and M. Vento, “Subgraph Transformations for the Inexact Matching of Attributed Relational Graphs,” Computing, vol. 12, pp. 43-52, 1998. W.H. Tsai and K.S. Fu, “Subgraph Error-Correcting Isomorphisms for Syntactic Pattern Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 48-62, 1983. L.G. Shapiro and R.M. Haralick, “A Metric for Comparing Relational Descriptions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, pp. 90-94, 1985. E.M. Luks, “Isomorphism of Graphs of Bounded Valence can be Tested in Polynomial Time,” J. Computer System Science, pp. 42-65, 1982. J.R. Ullmann, “An Algorithm for Subgraph Isomorphism,” J. Assoc. for Computing Machinery, vol. 23, pp. 31-42, 1976. B.T. Messmer, “Efficient Graph Matching Algorithms for Preprocessed Model Graphs,” PhD Thesis, Inst. of Computer Science and Applied Mathematics, Univ. of Bern, 1996. B.D. McKay, “Practical Graph Isomorphism,” Congressus Numerantium, vol. 30, pp. 45-87, 1981. H. Bunke and B.T. Messmer, “Efficient Attributed Graph Matching and Its Application to Image Analysis,” Proc. Image Analysis and Processing, pp. 4555, 1995. W.J. Christmas, J. Kittler, and M. Petrou, “Structural Matching in Computer Vision Using Probabilistic Relaxation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 749-764, 1995. L.P. Cordella, P. Foggia, C. Sansone, and M. Vento, “Evaluating Performance of the VF Graph Matching Algorithm,” Proc. 10th Int’l Conf. Image Analysis and Processing, pp. 1172-1177, Sept. 1999. P. Foggia, C. Sansone, and M. Vento, “A Database of Graphs for Isomorphism and Sub Graph Isomorphism Benchmarking,” Proc. Third IAPR TC-15 Int’l Workshop Graph Based Representations, pp. 176-188, 2001. N.J. Nilsson, Principles of Artificial Intelligence. Springer-Verlag, 1982. B.T. Messmer and H. Bunke, “A Decision Tree Approach to Graph and Subgraph Isomorphism Detection,” Pattern Recognition, vol. 32, pp. 19791998, 1999. M. Burge and W.G. Kropatsch, “A Minimal Line Property Preserving Representation for Line Images,” Computing, vol. 62, no. 4, pp. 355-368, 1999. . For more information on this or any computing topic, please visit our Digital Library at www.computer.org/publications/dlib. Authorized licensed use limited to: UR Futurs. Downloaded on April 24, 2009 at 04:48 from IEEE Xplore. Restrictions apply. VOL. 26, NO. 10, OCTOBER 2004