Academia.eduAcademia.edu

Graph-Based Knowledge Representation for GIS Data

This paper presents a proposal to create a graph representation for GIS, using both spatial and non-spatial data and also including spatial relations between spatial objects. Because graphs are a powerful and flexible knowledge representation we will be able to combine spatial and non-spatial data at the same time and this is one of the strengths of the proposal. We hope to apply this knowledge representation to the data mining process with GIS data including three types of spatial relations: topological, orientation and distance.

GRAPH-BASED KNOWLEDGE REPRESENTATION FOR GIS DATA Manuel Pech Palacio1, David Sol1, Jesús González2 {sp205175, sol}@mail.udlap.mx, [email protected] 1 Universidad de las Américas-Puebla 2 Instituto Nacional de Astrofísica Óptica y Electrónica Puebla, México Abstract This paper presents a proposal to create a graph representation for GIS, using both spatial and non-spatial data and also including spatial relations between spatial objects. Because graphs are a powerful and flexible knowledge representation we will be able to combine spatial and non-spatial data at the same time and this is one of the strengths of the proposal. We hope to apply this knowledge representation to the data mining process with GIS data including three types of spatial relations: topological, orientation and distance. 1. Introduction In the last years the human capabilities in generating and collecting data have been increasingly widespread. The explosive growth in data and databases has created a need for techniques and tools that can transform the data into useful information and knowledge. In the beginning, the goals of these techniques and tools were to discover knowledge that could exist in relational data. Nowadays, with the growth of the applications that deal with georeference data, an important increase is noticed in the management and analysis of spatial data. Spatial data has many characteristics that distinguish it from relational data. For example, it has topological, distance, and direction information organized by multidimensional spatial indexed structures. Another difference is the query language that is used to access spatial data. The complexity of the spatial data type is another important feature. Different approaches have been developed for knowledge discovery from spatial data, next we briefly present some of them: Generalization [22][14]. Data and objects often contain detailed information at primitive concept levels. It is often desirable to summarize a large set of data and present it at a high concept level. It assumes the existence of background knowledge in the form of concept hierarchies. In the case of a spatial database, there can be two kinds of concept hierarchies, thematic and spatial. Lu et al. [22] extended attribute-oriented induction to spatial databases and presented two algorithms, spatial data dominant and non-spatial data dominant generalizations. Clustering [16][23][28][26] can be defined as the process of grouping physical or abstract objects into classes of similar objects. Spatial data clustering identifies clusters, or densely populated regions, according to some measurement in a large, multidimensional data set. In many situations it is desirable to explore spatial associations [19][11] to discover rules which associate one or more spatial objects with other spatial objects. There are various kinds of spatial predicates that could constitute a spatial association rule. Examples include topological relations like intersects, overlap, disjoint; spatial orientations like left_of, west_of; and distance information such as close_to, or far_away. Approximation and aggregation [17]. Clustering approaches try to answer questions like where the clusters in the spatial database can be located. Another problem is to find out why the clusters are there. We can rephrase the question to ask about the characteristics of the clusters in terms of the objects that are close to them. We need to analyze the objects in the cluster and the objects close to them. Finally we have three other methods to discover knowledge in datasets: x x x Mining an image database [12][11] can be viewed as another approach of spatial data mining. Classification learning [20] is the task of assigning an object to a class from a given set of classes based on the attribute values of the object. Spatial Trend Detection [9] can describe a regular change of one or more non-spatial attributes of an object that changes its position in time. The remainder of this paper is organized as follows: Sections 2 and 3 present basic topics about spatial and Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE non-spatial data mining. Section 4 describes three types of spatial relations between spatial objects. The Subdue system is described in section 5. In section 6 we present our graph-based knowledge representation for GIS data. Section 7 shows our conclusions. 2. Spatial data mining Spatial data describes information about the space occupied by objects. Spatial data is continuously obtained by diverse types of applications such as GIS’s, medical applications and computerized cartography. Consequently, data analysis by manual techniques is sometimes a hard task, due to the large volume of data as well as its complexity. To deal with the problem, different methods have been proposed and applied to discover knowledge in spatial data. These methods have been implemented using techniques from different fields like machine learning, database technology and statistics. Spatial data mining is defined as the discovery of implicit and previously unknown knowledge in spatial databases [13]. Representative characteristics, structures or clusters, and spatial associations are examples of knowledge discovered from spatial data. Geographic data in general consists of thematic and spatial data [1]. Thematic data is alphanumeric data related to spatial objects. Spatial data, on the other hand, is described using two different properties: geometry and topology. According to [1], spatial location and size are considered geometric properties, whereas adjacency (object A is located to the right of object B) and inclusion (object A is included in object B) are considered topological properties. The methods to discover knowledge can be focused either on thematic or spatial properties of spatial objects in a spatial database, or both. 3. Non-spatial data mining Data mining can be seen as the search for hidden patterns that may exist in databases [23]. The explosive growth in the generation of data and its collection in databases have created a need for techniques and tools that can extract useful information and knowledge from it. Some of the data mining techniques apply to structural data and others to non-structural data. Structural data is defined as data that describes the relations among the objects described in the data. We can see the data objects as variables in the attribute-value representation, but now we also have relations among those variables. 4. Spatial relations In [8], Martin Ester et al. introduce three types of spatial relations: topological, distance and direction relations. They are called binary relations since we can determine spatial relations between pairs of objects. The authors define topological relations as those which are invariant under topological transformations. If both objects are rotated, translated or scaled simultaneously the relations are preserved. They present a definition of topological relations derived from the nine intersections model [5][6][7]. The topological relations between two objects are: disjoint, meets, overlaps, equal, cover, covered-by, contains, and inside as we show in figure 1. Each element in the figure describes a different topological spatial relation. A disjoint B A meet B A contains B A inside B A equals B A covers B A coveredBy B A overlaps B Figure 1 Topological relations The second type of relation refers to distance relations. These relations compare the distance between two objects with a given constant using arithmetic operators like <,>, and =. The distance between two objects is defined as the minimum distance between them (see figure 2). Knowledge discovery in databases refers to the task of finding interesting knowledge, regularities, or high-level information from datasets, which can then be analyzed from different angles. People working in many different fields including database systems, knowledge-base systems, artificial intelligence, machine learning and statistics have shown great interest in data mining. Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE A close to B A far from B Figure 2 Distance relations The authors define a direction relation A R B of two spatial objects using one representative point of the object A and all points of the destination object B. It is possible to define several possibilities of direction relations depending on the number of points that are considered in the source and the destination objects. The representative point of a source object may be the center of the object or a point on its boundary. The representative point is used as the origin of a virtual coordinate system and its quadrants define the directions. Examples are shown in figure 3. For instance, object D is to the south of object C and to the east of object A. B (maximal straight-line segments) and vertices (endpoints of the edges). Each point on an edge is equidistant from exactly two sites, and each vertex is equidistant from at least three as we can see in figure 4. This polygonal partition of the plane is called the Voronoi diagram. B north A Figure 4 Voronoi diagram C C northeast A 5. The Subdue system D A D east A D south C A west D Figure 3 Direction relations Voronoi diagram The Voronoi diagram (figure 4) is considered one of the fundamental data structures in computational geometry. Given some number of points in the plane, their Voronoi diagram divides the plane according to the nearest-neighbor rule. This rule states that each point is associated with the region of the plane that is closer to it. A definition of a Voronoi diagram [2] can be stated as follows: Lets S denote a set of n points in the plane. For two distinct sites p, q  S, the dominance of p over q is defined as the sub set of the plane being at least as close to p as to q: dom( p, q) ^x  R 2 | G ( x, p) d G ( x, q )` The Euclidian distance function is denoted by G. dom(p,q) is a closed half plane bounded by the perpendicular bisector of p and q. The function of the bisector is to separate all the points of the plane closer to p from those that are closer to q. This is also known as the separator of p and q. The region of a site p  S is the portion of the plane lying in all of the dominances of p over the remaining sites in S. The Subdue system [15][27] (developed at the University of Texas at Arlington) is a general data mining tool that can be applied to any domain that can be represented as a graph. It discovers substructures that compress the original database and finds interesting structural concepts from data. A substructure is a connected subgraph within the graph. By replacing previously-discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue has the capability to use a constrained inexact graph match that can consider similar, but not identical, instances of a substructure as a match. Subdue uses the minimum description length principle (MDL) to guide the search towards more appropriate substructures. The Subdue system uses a graph-based representation. Objects in the data (concepts) become vertices or small sub-graphs in the graph, and relationships between objects become directed or undirected edges in the graph. A substructure is a connected sub-graph within the graph. This graph representation serves as input to the Subdue system. Figure 5 shows an example of an input database and its graph representation. The example is presented in terms of the house domain, where a house is defined as a triangle on a square. T represents a triangle, S a square, C a circle and R a rectangle. The objects in the figure (i.e. T1, S1, R1) become labeled vertices in the graph, and the relationships (i.e on(S1, R1), shape(C1, circle)) become labeled edges. The graph representation of the substructure discovered by Subdue from this data is shown in figure 6 where Subdue found four instances of triangle on a square. The half planes created are convex polygons. The boundary of a region consists of at most n–1 edges Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE Input Database Input Graph S1 T1 S1 on C1 object R1 T2 T3 T4 S2 S3 S4 shape triangle shape square on object on object shape object circle on on object shape rectangle on on on object shape triangle shape square on object S1 shape triangle shape square on object shape object circle on object shape triangle shape square on object object shape rectangle on on S1 S1 Figure 7 Graph representation of the house domain after the substructure replacement Figure 5 Graph representation of the house domain 6. Graph-based knowledge representation Substructure s hape Instance 1 triangle object on s hape object square Instance 2 T1 T2 S1 S2 Instance 3 Instance 4 T3 T4 S3 S4 Figure 6 Substructure and instance discovered from the house domain by Subdue An instance of a substructure in an input graph consists of a set of vertices and edges from the input graph that match the graphical definition of the substructure. A neighboring edge of a substructure instance is an edge in the input graph that is not contained in the instance, but is connected to at least one vertex in the instance. An external connection of an instance of a substructure is a neighboring edge of the instance that is connected to at least one vertex not contained in the instance. After a substructure is discovered, each instance of the substructure in the input graph is replaced by a single vertex representing the entire substructure as we show in figure 7, where the substructure discovered by Subdue (object shape triangle on object shape square) was labeled as S1. Subdue continues the search for the best substructure until all possible substructures have been considered or the amount of computation exceeds a given limit. In our previous work [24][25] we applied the Subdue System to non-spatial data using a spatial dominant approach. Now we propose a knowledge representation to model GIS data using graphs. Our idea is to create a graph-based model to represent spatial and non-spatial data and use the model for generating a dataset composed of both type of data, so we can apply a data mining technique (i.e. Subdue system) using this knowledge representation to spatial and nonspatial data at the same time and get enriched results (patterns found through data mining) considering both kind of data about objects and the spatial relations among them. In order to enrich the spatial data mining process it is advisable to take into account all of the elements which are used in a geographic representation (i.e. spatial objects, descriptive attributes and the relationships between them). These relationships without a doubt enrich the spatial analysis processes. For example, we could find out the most important characteristics of the geometric objects located at some distance from a particular point; identify the representative pattern of houses located along the boundaries of a highway which crosses some region of the state of Puebla. Another example of the application of this technology is in the risk zones near the Popocatépetls volcano; in this case it would be important to know the characteristics of the evacuation routes which would be used in situations of volcanic activity, i.e., what are the soil characteristics of the evacuation routes; could they withstand the atmospheric conditions and the passage of vehicles in an emergency situation? As we have seen, an important characteristic of spatial data is that the attributes of the neighbors of a specific object may have an influence on the object itself. Three types of spatial relations will be taken into account for the data mining tasks in the model: topological, orientation, and distance relations. Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE We initially use the 4-intersection model to describe the topological relations. In our future work we plan to use the 9-intersection model for the topological relations. In these models, the topological relations between two objects A and B are defined in terms of the intersections of object A’s interior (Aº), object A’s boundary (˜A) and object A’s exterior (A¯) with object B’s interior (Bº), object B’s boundary (˜B) and object B’s exterior (B¯). The exterior of an object is represented by its complement. In figure 8 we present three different disjoint relations between object A and object B, however, they have the same 9-intersection matrix due to the infinite complements of the objects. We can see in their intersection matrixes that the complements could not play roles in distinguishing the disjoint relations. Object B Object A Object B Object A Object B i Ø Ø -Ø -Ø -Ø -Ø ii Int Bou Ext Ext Ø Ø -Ø Bou Int Bou Ext Int -Ø -Ø -Ø Ext Ø Ø -Ø Bou Ext Ø Ø -Ø Int Bou Int Bou Ext Int Object C Ø Ø -Ø Ø Ø -Ø -Ø -Ø -Ø Ext Int Bou Ext Bou -Ø -Ø -Ø Ø Ø Ø Ø Ø Ø Ø Ø -Ø Voronoi 9-intersection Figure 9 Distinguishing disjoint relations with V9I [3] As we have mentioned, we initially use the 4intersection model to describe the topological relations, and we plane in a second phase to use the 9-intersection model. Once we have defined the basic graph-based representation we will extend it to use the Voronoi-based 9-intersection. In the graph-based model the spatial data will be represented by vertices and edges. Vertices will be used to represent the spatial objects and their attributes (data describing the objects). The number of vertices of the graph will be determinate by: n n  ¦ numAttributesPerObject (ni ) i 1 Figure 8 Different disjoint relations with the same 9intersection matrix [3] Using the V9I model is possible to distinguish disjoint relations since each object has limited neighbors instead of having relations with all other objects. In figure 9 we Ø Ø -Ø 9-intersection iii Chen et al. [3] proposed the Voronoi-based 9instersection model (V9I) as a modified version of the point-set-based 9-intersection model to improve this situation. The modification is made by replacing the exterior of a spatial object (complement) with its Voronoi region. The Voronoi region of an entity has a special meaning, the influence region of itself and is defined as the area containing all locations closer to itself than to any other. The interaction model based on Voronoi diagrams can be described as an extension of the 4 and 9intersection topological model. Ø Ø -Ø Int Int Bou Ext Ext AºŀB¯ ˜AŀB¯ A¯ŀB¯ However, this model fails to distinguish certain disjoint relations and also to identify the topological relations between two entities with holes [3]. Object A Object B Bou Aºŀ˜B ˜Aŀ˜B A¯ŀ˜B Object A Int AºŀBº ˜AŀBº A¯ŀBº present an example of the result matrixes using the 9intersection and the Voronoi 9-intersection model of two disjoint objects. In the second case exterior A – exterior B is the only one relation that is not empty. Where n is the number of spatial objects included in the dataset. Edges will represent the spatial relations between two particular objects (binary relations). The capabilities of the model to represent the relation between these objects will be of great impact in the results of the data mining processes. The world is described by objects and the relation between the objects, we can figure out the relations as the elements describing the interaction of the objects with each other. The number of edges of the graph will be determinate by: n ¦ numAttributesPerObject ( n )  num Re lationsAmongObjects i i 1 Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE The proposed model is shown in figure 10 where we have two spatial objects which are connected through topological, distance, and direction relations. This knowledge representation has the potential to create graphs using both spatial and non-spatial data and also the spatial relations between the spatial objects. B A house J C river H road F I D E G value value bu te a tt ri b u te a ttri a tt ri b u te at tribu value te . value direction relation Spatial object topological relation Spatial object distance relation Figure 10 Proposed schema Combining spatial and non-spatial data in a dataset is one of the strengths of the model. Some mining approaches [22] apply data mining techniques first to the non-spatial data and next to the spatial data or in inverse order. We are proposing to apply data mining techniques over datasets including both spatial and non-spatial data as a whole. Figure 11 shows an example of a database composed of ten objects and their spatial relations. There are seven houses (objects A, B, C, D, E, F, and G), a lake (object H, a river (object J), and a road (object I). As we can see, there are five houses near to the lake; two of them touch the boundary of the lake. The river touches the boundary of the lake and the boundary of the road. Additionally there are two houses near to the road, but not near to the lake like the other ones. Looking at this figure, we see that most of the houses are near to the lake and we may generalize this distribution of the houses as a pattern (i.e. most houses are located near a place where there is water). Non-spatial analysis may answer questions like what the characteristics of houses near to a lake are (i.e. the houses are built using some special material, which are their safety restrictions), type of soil where the houses were built. On the other hand, spatial analysis may answer questions like where the clusters of houses are, and what the distribution of the objects in the area of analysis is. lake Figure 11 Spatial database representing object of the real world By using the proposed graph-based model we generate a graph like the one shown in figure 12. The vertices in the graph represent either the spatial objects (i.e. house, lake, road, and river) or the attributes describing the objects (i.e. object’s name). Following the schema there are twenty vertices in the graph; ten vertices represent the spatial objects and the other ten represent their attributes. object name A object name meet object D object near name meet H touch meet object name object name C meet near near B object near object name name F I near E object name J object name G Figure 12 Representing spatial data, non-spatial data and spatial relations in graph format The edges in the graph represent either the spatial relations between the objects or the name of an attribute of the object (i.e. name). For example the spatial relation between the object lake and the object road is represented by the edge label as “touch”, telling us that there is a spatial relation between the objects and more specific that their boundaries are touching. The number of edges in the graph will be the number of spatial relations between the objects plus the number of attributes describing each object. Once we have created the graph, it will be used as data input for a graph-based data mining system (i.e. the Subdue system). As we mentioned in section 5, the Subdue system can work with datasets from any domain that can be represented as a graph (graph-based data representation), but it has not been tested with data from a geographic database; consequently, we are proposing to analyze this system in order to know its capabilities (if it is required we can add Subdue the necessary capabilities to deal with geographic data) to mine spatial and nonspatial data as well. Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE 7. Conclusions In this paper we propose a graph-based data representation for spatial and non spatial data including spatial relations between objects (topological, distance, and direction relations). The model will enrich the spatial data mining process because it will allow creating datasets composed of three basic elements used in a geographic representation. The 4-intersection model is used to describe the topological relations. Our idea in a second phase is to integrate the 9-intersection model, and extend it using the Voronoi-based 9-intersection model. We already tried the Subdue system with non-spatial data (using a spatial dominant approach [24][25]) and now we are improving the results by applying Subdue to datasets composed of spatial and non-spatial data. As we mentioned, our proposal consists of generating the capability to analyze geometric attributes among real world elements to find behaviors and regularities as well. Our methodology will include mechanisms of geometric data processing in order to represent them in the graph model, where data with traditional attributes and geometric attributes are combined to describe a regular behavior of elements of the real world. 9. Acknowledgement Project 38257-H. Habitar y vivir. Análisis del espacio habitacional de la ciudad de Puebla 1690-1890. Universidad de las Américas Puebla’s excellence scholarship. 8. References [1] Adhikary, Junas. Knowledge Discovery in Spatial Databases - Progress and Challenges. School of Computing Science, Simon Fraser University. 1996. [2] Aurenhammer, Franz. Voronoi diagrams – a survey of a fundamental geometric data structure, ACM Computing Surveying. 1991. [3] Chen, Jun, Zhilin LI, Chengming Li, C. M. Gold. Describing Topological Relations with Voronoi-based 9Intersection Model. 1999. [4] Chen, Ming-Syan, Jiawei Han, Philip S. Yu. Data Mining: An overview from Database Perspective. 1996. [5] Egenhofer Max J. A model for detailed binary topological relationships. National Center for Geographic Information and Analysis and Department of Surveying Engineering. Department of Computer Science, University of Maine. 1993. [6] Egenhofer, Max J., J. R. Herring. Categorizing binary topological relationships between regions, lines, and points in geographic databases. Technical Report, Department of Surveying Engineering, University of Maine, Orono. 1991. [7] Egenhofer Max J., Robert D. Franzosa. On the equivalent of topological relations. Research Article, Int. J. Geographical Information Systems. 1995. [8] Ester, Martin, Alexander Frommelt, Hans-Peter Kriegel, Jörg Sander. Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support. Submitted to Special Issue on: “Integration of Data Mining with Database Technology”, Data Mining and Knowledge Discovery, an International Journal, Kluwer Academic Publishers. 1999. [9] Ester, Martin, Alexander Frommelt, Hans-Peter Kriegel, Jörg Sander. Algorithms for Characterization and Trend Detection in Spatial Databases. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 98), New York City, NY. 1998. [10] Ester, Martin, Hans-Peter Kriegel, Jörg Sander. Spatial Data Mining: A Database Approach. Proceedings of the Fifth Int. Symposium on Large Spatial Databases (SSD 97), Berlin, Germany, Lecture Notes in Computer Science, Springer. 1997. [11] Fayyad, Usama, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Eds. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park, CA. 1996. [12] Fayyad, Usama, P. Smyth. Image Database Exploration: Progress and Challenges. In Proceedings of 1993 Knowledge Discovery in Databases Workshop. AAAI Press, Menlo Park, CA. 1993. [13] Frawley, W. J., G. Piatetsky-Shapiro, C. J. Matheus. Knowledge Discovery in Databases: An Overview. In Piatetsky-Shapiro G., W. J. Frawley. Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park. 1991. [14] Han, jiawei, Yandong Cai, Nick Cercone. Knowledge Discovery in Databases: An attribute-oriented approach. Proceedings of the 18th International Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE Conference on Very Large Databases (VLDB 92), British Columbia, Canada. 1992. Proceedings of Far East Workshop on Geographic Information Systems, Singapore. 1993. [15] Holder, L. B., D. J. Cook, J. Gonzalez, and I. Jonyer. Structural Pattern Recognition in Graphs, to appear in Pattern Recognition and String Matching (D. Chen and X. Cheng, eds.), Kluwer Academic Publishers, 2002. [23] Ng, Raymond T., Jiawei Han. Efficient and Effective Clustering Methods for Spatial Data Mining. Proceedings of the 20th Very Large Databases Conference (VLDB 94), Santiago, Chile. 1994. [16] Kaufman, Leonard, Peter J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, Inc. 1990. [24] Pech Palacio Manuel. Tesis para obtener el grado de Maestro en Ciencias con Especialidad en Sistemas Computacionales. Departamento de Ingeniería en Sistemas Computacionales. Universidad de las Américas, Puebla. Mayo 2002. [17] Knorr, Edwin M., Raymond T. Ng. Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining. IEEE Trans. Knowledge and Data Engineering. 1996. [18] Kolatch, Erica. Clustering Algorithms for Spatial Databases: A Survey. Department of Computer Science, University of Maryland, Collage Park. 2001. [19] Koperski Krzysztof, Jiawei Han. Discovery of Spatial Association Rules in Geographic Information Databases. Proceedings of the 4th International Symposium on Spatial Databases (SSD 95), SpringerVerlag, Berlin. 1995. [20] Koperski, Krzysztof, Jiawei Han, Nebojsa Stefanovic. An efficient Two-Step Method for Classification of Spatial Data. Proceedings of the Symposium on Spatial Data Handling (SDH 98), Vancouver, Canada. 1998. [21] Laurini Robert, D. Thompson. Fundamentals of Spatial Information Systems, Academic Press. 1992. [25] Pech Palacio Manuel, Sol David, González Jesús. Adaptation and Use of Spatial and Non-Spatial Data Mining. Proceeding GEOPRO 2002, Instituto Politécnico Nacional. 2002. [26] Sheikholeslami, Gholamhosein, Surojit Chatterjee and Aidong Zhang. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proceedings of the 24th Very Large Databases Conference (VLDB 98), New York, NY. 1998. [27] Subdue System. University of Texas in Arlington. Internet site, visit last time February 2003. http://ailab.uta.edu/subdue/. [28] Zhang, Tian, Raghu Ramakrishnan, Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada. 1996. [22] Lu, Wei, Jiawei Han, Beng Chin Ooi. Discovery of General Knowledge in Large Spatial Databases. Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE