Visual Data Mining Techniques
Visual Data Mining Techniques
Visual Data Mining Techniques
831-843
*A earlier version of this paper with focus on visualization techniques and their classification (see section I) has been published in
[21]
Data to be Visualized
1. one-dimensional
2. two-dimensional
Visualization Technique
criteria [20] (Fig. 43.1): The data to be visual- ing. Note that the three dimensions of our
ized, the visualization technique, and the inter- classification data type to be visualized, visu-
action technique used. alization technique, and interaction technique
The data type to be visualized [32] may be can be assumed to be orthogonal. Orthogonal-
1D data, such as temporal (time-series) data; 2D ity means that any of the visualization tech-
data, such as geographical maps; multidimen- niques may be used in conjunction with any of
sional data, such as relational tables text, hyper- the interaction techniques for any data type.
text news articles, and web documents; or Note also that a specific system may be designed
hierarchies and graphs, such as telephone calls to support different data types and that it may
and Web documents, algorithms, and software. use a combination of visualization and inter-
The visualization technique used may be clas- action techniques. More details can be found
sified as standard 2D/3D displays, such as bar in Keim and Ward [21].
charts and x-y plots, geometrically transformed
displays, such as hyperbolic plane [36] (Fig.
43.2a) and parallel coordinates [18], icon-based
43.2 Methodology of Visual Data Mining
displays, such as chernoff faces [9] and stick
figures [24,23] (Fig. 43.2c), dense pixel displays, The data analyst typically specifies first some
such as the recursive pattern [4] (Fig. 43.2b) parameters to restrict the search space; data
and circle segments [5], stacked displays, such mining is then performed automatically by an
as treemaps [31,19] (Fig. 43.2d) and dimen- algorithm, and finally the patterns found by the
sional stacking [37]. The third dimension of the automatic data-mining algorithm are presented
classification is the interaction technique used. to the data analyst on the screen. For data
Interaction techniques allow users to directly mining to be effective, it is important to include
navigate and modify the visualizations, as well the human in the data exploration process and
as select subsets of the data for further oper- combine the flexibility, creativity, and general
ations. Examples include dynamic projection, knowledge of the human with the enormous
interactive filtering, interactive zooming, inter- storage capacity and the computational power
active distortion, interactive linking, and brush- of today’s computers. Since there is a huge
815
Figure 43.2 Some popular information visualization techniques. (a) Geometrically transformed displays: Interactive visualiza
tion of high dimensional data using the hyperbolic plane [36] Genre separation in movie space (red ‘‘x’’ marks science fiction,
black ‘‘D’’ marks animation and green ‘‘þ’’ movies belonging to both genres) ß ACM (b) Dense pixel displays: Recursive
Pattern [4] based on a generic back and forth recursive arrangement schema to represent each data value as a colored pixel and
each attribute in separate sub windows (example visualization shows the stock prices for Dow Jones, Gold, IBM and US Dollar
are depicted for almost seven consecutive years, seven vertical bars correspond to the seven years (level (3) patterns) and the
subdivision of the bars to the 12 month within each year (level (2) patterns), the coloring maps high attribute values (stock
prices) to light colors and low attributes values (stock prices) to dark colors) (c) Iconic displays: Stick Figures [24,23]
visualization of multidimensional data using properties of angle and/or length of the limbs (US Census Data Median Household
Income and Age of Householder) (d) Stacked displays: TreeMaps [31,9] splitting the screen into rectangles in alternating
horizontal and vertical directions in each level (example visualization shows a hierarchical file system of a large hard disk)
Visual data exploration usually follows a Visual data mining is based on an automatic
three-step process: Overview first, zoom and part, the data-mining algorithm, and an inter-
filter, and then details-on-demand (which has active part, the visualization technique. There
been called the Information Seeking Mantra are three common approaches to integrate the
[32]). First, the data analyst needs to get an human in the data exploration process to realize
overview of the data. In the overview, the data different kinds of visual data mining approaches
analyst identifies interesting patterns or groups (Fig. 43.3):
in the data and focuses on one or more of them.
. Preceding Visualization (PV): Data is visual-
For analyzing the patterns, the data analyst
ized in some visual form before running a
needs to drill down and access details of the
data-mining algorithm. By interaction with
data. Visualization technology may be used for
the raw data the data analyst has full control
all three steps of the data exploration process.
over the analysis in the search space. Inter-
Visualization techniques are useful for showing
esting patterns are discovered by exploring
an overview of the data, allowing the data ana-
the data.
lyst to identify interesting subsets. In this step, it
is important to keep the overview visualization . Subsequent Visualization (SV): An automatic
while focusing on the subset using another visu- data-mining algorithm performs the data-
alization technique. An alternative is to distort mining task by extracting patterns from a
the overview visualization in order to focus on given dataset. These patterns are visualized
the interesting subsets. This can be performed to make them interpretable for the data ana-
by dedicating a larger percentage of the display lyst. Subsequent visualizations enable the
to the interesting subsets while decreasing screen data analyst to specify feedbacks. Based on
utilization for uninteresting data. To further the visualization, the data analyst may want
explore the interesting subsets, the data analyst to return to the data-mining algorithm and
needs a drill-down capability in order to observe use different input parameters to obtain
the details about the data. Note that visualiza- better results.
tion technology not only provides the base visu- . Tightly Integrated Visualization (TIV): An
alization techniques for all three steps but also automatic data-mining algorithm performs
bridges the gaps between the steps. Visual data an analysis of the data but does not produce
mining can be seen as a hypothesis-generation the final results. A visualization technique is
process; the visualizations of the data allow the used to present the intermediate results of
data analyst to gain insight into the data and the data exploration process. The combin-
come up with new hypotheses. The verification ation of some automatic data-mining algo-
of the hypotheses can also be done via data rithms and visualization techniques enables
visualization, but may also be accomplished by specified user feedback for the next data-
automatic techniques from statistics, pattern mining run. Then, the data analyst identifies
recognition, or machine learning. As a result, the interesting patterns in the visualization of
visual data mining usually allows faster data the intermediate results based on his domain
exploration and often provides better results, knowledge. A motivation of this approach is
especially in cases where automatic data-mining to achieve independence of the data-mining
algorithms fail. In addition, visual data explor- algorithms from the application. A given
ation techniques provide a much higher degree automatic data-mining algorithm can be
of user satisfaction and confidence in the find- very useful in one domain but may have
ings of the exploration. This fact leads to a high drawbacks in some other domain. Since
demand for visual exploration techniques and there is no automatic data-mining algorithm
makes them indispensable in conjunction with (with one parameter setting) suitable for
automatic exploration techniques. all application domains, tightly integrated
817
Date
Date Date
Visualization + Interaction
DM-Algorithm
Visualization of step 1
the data DM-Algorithm
DM-Algorithm
step n
DM-Algorithm Result
Visualization of Result
Result the data
important parameter is the support of an asso- ingness of the rule. Using the visualization, the
ciation rule, which is defined as the percentage user is able to see groups of related rules and the
of transactions in which the items co-occur. impact of different confidence and support
Let I ¼ {i1 , . . . in } be a set of items and let D levels. The number of rules that can be visual-
be a set of transactions, where each transaction ized, however, is limited, and the visualization
T is a set of items such that T I. An associ- does not support combinations of items on the
ation rule is an implication of the form X ) Y , left- or right-hand side of the association rules.
where X 2 I, Y 2 I, and X , Y 6¼ ;. The confi- Fig. 43.5 shows two alternative visualizations
dence c is defined as the percentage of transac- called mosaic and double-decker plots [15].
tions that contain Y, given X. The support is the The basic idea is to partition a rectangle on the
percentage of transactions that contain both X y-axis according to one attribute and make the
and Y. For given support and confidence levels, regions proportional to the sum of the corres-
there are efficient algorithms to determine all ponding data values. Compared to bar charts,
association rules [1]. A problem, however, is mosaic plots use the height of the bars instead of
that the resulting set of association rules is usu- the width to show the parameter value. Then
ally very large, especially for low support and each resulting area is split in the same way
confidence levels. Using higher support and according to a second attribute. The coloring
confidence levels may not be effective, since reflects the percentage of data items that fulfill
useful rules may then be overlooked. a third attribute. The visualization shows the
Visualization techniques have been used to support and confidence values of all rules of
overcome this problem and to allow an inter- the form X1 X2 ) Y . Mosaic plots are restricted
active selection of good support and confidence to two attributes on the left side of the associ-
levels. Fig. 43.4 shows SGI MineSets Rule ation rule. Double-decker plots can be used to
Visualizer [17], which maps the left- and right- show more than two attributes on the left side.
hand sides of the rules to the x- and y-axes of the The idea is to show a hierarchy of attributes on
plot and shows the confidence as the height of the bottom (Heineken, Coke, chicken, in the
the bars and the support as the height of the example shown in Fig. 43.5) corresponding to
discs. The color of the bars shows the interest- the left-hand side of the association rules; the
Figure 43.4. MineSet’s Association Rule Visualizer [17] maps the left and right hand sides of the rules to the x and y axes of
the plot and shows the confidence as the height of the bars and the support as the height of the discs; color of the bars shows the
interestingness of the rule (example visualization shows market basket data for customer buying patterns) ßSGI
819
x11 100
not sardines
x12 50
x12 P(x12 and x21)
sardines
Figure 43.5 Association Rule Visualization [15] partitions a rectangle on the y axis according to one attribute and makes the
regions proportional to the sum of the corresponding data values. ß ACM (a) Mosaic Plot: 2D mosaic plot of attributes Ax1
and Ax2 ; high lighting show up in the mosaic plot as a third dimension (b) Double Decker Plot: example visualization shows a
hierarchy of supermarket basket items: Heineken, Coke, chicken and sardines.
bars on the top correspond to the number of are approaches that use neural networks, gen-
items in the corresponding subset of the data- etic algorithms, or Bayesian networks to solve
base and therefore visualize the support of the the classification problem. Since most algo-
rule. The colored areas in the bars correspond rithms work as black-box approaches it is
to the percentage of data transactions that con- often difficult to understand and optimize the
tain an additional item (sardines, in Fig. 43.5) decision model. Problems such as over-fitting or
and therefore correspond to the support. Other tree pruning are difficult to tackle.
approaches to association rule visualization in- Visualization techniques can help to over-
clude graphs with nodes corresponding to items come these problems. The decision tree visuali-
and arrows corresponding to implications as zer in SGI’s MineSet system [17] shows an
used in DBMiner [16] and association matrix overview of the decision tree together with im-
visualizations to cluster-related rules [12]. portant parameters such as the attribute value
distributions. The system allows an interactive
selection of the attributes shown and helps the
43.4 Classification user understand the decision tree. A more so-
phisticated approach that also helps in decision
Classification is the process of developing a tree construction is visual classification, as pro-
classification model based on a training dataset posed by Ankerst et al. [3]. The basic idea is to
with known class labels. To construct the clas- show each attribute value by a colored pixel and
sification model, the attributes of the training arrange them in bars. The pixels of each attri-
dataset are analyzed and an accurate descrip- bute bar are sorted separately and the attribute
tion or model of the classes based on the attri- with the purest value distribution is selected as
butes available in the dataset is developed. The the split attribute of the decision tree. The pro-
class descriptions are used then to classify data cedure is repeated until all leaves correspond
for which the class labels are unknown. Classifi- to pure classes. An example of the decision
cation is sometimes also called supervised learn- tree resulting from this process is shown in
ing because the training set is used to teach the Fig. 43.7. Compared to a standard visualization
system how to classify the data. There are many of a decision tree, additional information is pro-
algorithms for solving classification talks. The vided that is helpful for explaining and analyz-
most popular approaches are algorithms that ing the decision tree, namely
inductively construct decision trees. Examples
are ID3 [25], CART [7], ID5 [34,35], C4.5 [26], . Size of the nodes (number of training records
SLIQ [22], and SPRINT [30]. In addition, there corresponding to the node)
820
Figure 43.6 MineSets Decision Tree Visualizer [17] displays decision trees as 3D landscapes, each node contains bars whose
height, color, and disk correspond to important parameters. ß SGI
Figure 43.7 Visual Classification [3] shows each attribute value by a colored pixel and arranges them in bars (example shows a
visualization of a decision trees for the DNA segment training data from the Statlog benchmark having 19 attributes). ß ACM
. Quality of the split (purity of the resulting to easily interact with the classification algo-
partitions) rithms in order to optimize the model gener-
. Class distribution (frequency and location of ation and classification process.
the training instances of all classes).
Some of this information might also be pro-
43.5 Clustering
vided by annotating the standard visualization
of a decision tree (for example, annotating the Clustering is the process of finding a partitioning
nodes with the number of records or the gini- of the dataset into homogeneous subsets called
index), but this approach clearly fails for more clusters. Unlike classification, clustering is un-
complex information such as the class distribu- supervised learning. This means that the classes
tion. In general, visualizations can help us to are unknown and no training set with class
better understand the classification models and labels is available. A wide range of clustering
821
understanding the clustering and guiding the and bottom-middle in Fig. 43.10) visualize the
clustering process. partitioning potential of a large number of
Another interesting approach is the HD-Eye projections. The properties are based on histo-
system [14]. The HD-Eye system considers the gram information of the point density in the
clustering problem a partitioning problem and projected space. The number of data points
supports a tight integration of advanced cluster- belonging to the maximum corresponds to the
ing algorithms and state-of-the-art visualization color of the icon. The color follows a given
techniques, allowing the user to directly interact color table ranging from dark colors for large
in the crucial steps of the clustering process. The maxima to bright colors for small maxima. The
crucial steps are the selection of dimensions to measure of how well a maximum is separated
be considered, the selection of the clustering from the others corresponds to the shape of the
paradigm, and the partitioning of the dataset. icon, and the degree of separation varies from
Novel visualization techniques are employed to sharp spikes for well separated maxima to blunt
help the user identify the most interesting pro- spikes for badly separated maxima. The color-
jections and subsets as well as the best separ- and curve-based point density displays present
ators for partitioning the data. Fig. 43.10 shows the density of the data and allow a better under-
an example screenshot of the HD-Eye system standing of the data distribution, which is cru-
with its basic visual components for cluster sep- cial for an effective partitioning of the data. The
aration. The separator tree represents the clus- visualizations are used to decide which dimen-
tering model produced so far in the clustering sions are used for the partitioning. In addition,
process. The abstract iconic displays (top-right the partitioning can be specified interactively
Figure 43.10 HD Eye screenshot [14] showing different visualizations of projections and the separator tree. Clockwise from the
top: separator tree, iconic representation of 1D projections, 1D projection histogram, 1D color based density plots, iconic
representation of multidimensional projections and color based 2D density plot (example visualization shows a large molecular
biology dataset) ß IEEE
823
directly within the visualizations, allowing the mation from text with high reliability. The goals
user to define nonlinear partitionings. of the text-mining process are automatic docu-
ment clusterization/categorization, assignment
of keywords to text documents, topic identifica-
tion and tracking in ordered (time) sequences of
43.6 Text
text documents, searching documents based on
With the growing importance of electronic the content categories and not only keywords,
media for storing and exchanging text docu- generation and analysis of user profiles based on
ments, there is also a growing interest in tools the usage of text databases, and other related
that can help us find and sort information in- problems. A wide range of automatic text-
cluded in the text documents. Text documents mining algorithms have been proposed in the
are semistructured data, in that they are neither literature over the last few decades [10,11].
completely unstructured nor completely struc- An interesting visual data-mining approach is
tured. For example, a document may contain ThemeRiver [13]. The ThemeRiver visualization
some structured fields, such as title, authors, depicts thematic variations over time within a
publication date, length, and category, as well large collection of documents. The thematic
as largely unstructured text components, such changes are shown in the context of a timeline
as abstract and content. Text mining is a process and corresponding external events. The docu-
in finding for patterns in text databases, and ment collection’s timeline, selected thematic
may be defined as the process of analyzing text content, and thematic strength are indicated by
to extract information from it. Text mining rec- the river’s directed flow, composition, and
ognizes that complete understanding of natural- changing width, respectively. The directed flow
language text, a long-standing goal of computer from left to right is interpreted as movement
science, is not immediately attainable and through time, and the horizontal distance be-
focuses on extracting a small amount of infor- tween two points on the river defines a time
Figure 43.11 ThemeRiver [13]: visualization of thematic changes in documents (example visualization shows Castro data from
November 1959 through June 1961). ß IEEE.
824
6. H. H. Bock. Automatic Classification. Vanden 23. R. M. Pickett. Visual Analyses of Texture in the
hoeck and Ruprecht, Göttingen, 1974. Detection and Recognition of Objects. Academic
7. L. Breiman, J. Friedman, R. Olshen, and C. Press, New York, 1970.
Stone. Classification and Regression Trees. 24. R. M. Pickett and G. G. Grinstein. Icono
Wadsworth and Brooks, Monterey, CA, 1984. graphic displays for visualizing multidimen
8. S. Card, J. Mackinlay, and B. Shneiderman. sional data. In Proc. IEEE Conf. on Systems,
Readings in Information Visualization. Morgan Man and Cybernetics, IEEE Press, Piscataway,
Kaufmann, 1999. NJ, pages 514 519, 1988.
9. H. Chernoff. The use of faces to represent points 25. J. R. Quinlan. Induction of decision trees. Ma
in k dimensional space graphically. Journal chine Learning, pages 81 106, 1986.
Amer. Statistical Association, 68:361 368, 1973. 26. J. R. Quinlan. C4.5: Programs for Machine
10. J. Han and M. Kamber. Data Mining: Concepts Learning. Morgan Kaufmann, Los Altos, CA,
and Techniques. Morgan Kaufmann Publishers, 1993.
2001. 27. R. M. Rohrer, J. L. Sibert, and D. S. Ebert. A
11. D. J. Hand, H. Mannila, and P. Smyth. Prin shape based visual interface for text retrieval.
ciples of Data Mining. MIT Press, 2001. IEEE Computer Graphics and Applications,
12. M. Hao, M. Hsu, U. Dayal, S. F. Wei, T. 19(5):40 47, 1999.
Sprenger, and T. Holenstein. Market basket an 28. H. Schumann and W. Müller. Visualisierung:
alysis visualization on a spherical surface. Visual Grundlagen und allgemeine Methoden. Springer,
Data Exploration and Analysis Conference, San 2000.
Jose, CA, 2001. 29. D. W. Scott. Multivariate Density Estimation.
13. S. Havre, B. Hetzler, L. Nowell, and P. Whit Wiley and Sons, 1992.
ney. Themeriver: Visualizing thematic changes 30. J. Shafer, R. Agrawal, and M. Mehta. SPRINT:
in large document collections. Transactions on A scalable parallel classifier for data mining.
Visualization and Computer Graphics, 2001. Conf. on Very Large Databases, 1996.
14. A. Hinneburg, D. Keim, and M. Wawryniuk. 31. B. Shneiderman. Tree visualization with tree
HD Eye: Visual Mining of High dimensional maps: A 2D space filling approach. ACM
Data. IEEE Computer Graphics and Applica Transactions on Graphics, 11(1):92 99, 1992.
tions, 19(5), 1999. 32. B. Shneiderman. The eye have it: A task by data
15. H. Hofmann, A. Siebes, and A. Wilhelm. Visu type taxonomy for information visualizations.
alizing association rules with interactive mosaic In Visual Languages, 1996.
plots. SIGKDD Int. Conf. On Knowledge Dis 33. B. Spence. Information Visualization. Pearson
covery & Data Mining (KDD 2000), Boston, Education Higher Education publishers, UK,
MA, 2000. 2000.
16. D. T. Inc. Dbminer. http://www.dbminer.com, 34. P. E. Utgoff. Incremental induction of decision
2001. trees. Machine Learning, 4:161 186, 1989.
17. S. G. Inc. Mineset. http://www.sgi.com/software/ 35. P. E. Utgoff, N. C. Berkman, and J. A. Clouse.
mineset, 2001. Decision tree induction based on efficient
18. A. Inselberg and B. Dimsdale. Parallel coordin tree restructuring. Machine Learning, 29:5 44,
ates: A tool for visualizing multi dimensional 1997.
geometry. In Proc. Visualization 90, San Fran 36. J. Walter and H. Ritter. On interactive visual
cisco, CA, pages 361 370, 1990. ization of high dimensional data using the
19. B. Johnson and B. Shneiderman. Treemaps: A hyperbolic plane. In Proc. ACM SIGKDD Inter
space filling approach to the visualization of national Conference on Knowledge Discovery and
hierarchical information. In Proc. Visualization Data Mining, pages 123 131, 2002.
’91 Conf, pages 284 291, 1991. 37. M. O. Ward. Xmdvtool: Integrating multiple
20. D. Keim. Visual exploration of large databases. methods for visualizing multivariate data. In
Communications of the ACM, 44(8):38 44, 2001. Proc. Visualization 94, Washington, DC, pages
21. D. Keim and M. Ward. Visual Data Mining 326 336, 1994.
Techniques, Book Chapter in: Intelligent Data 38. C. Ware. Information Visualization: Perception
Analysis, an Introduction by D. Hand and M. for Design. Morgen Kaufman, 2000.
Berthold. Springer Verlag, 2 edition, 2002. 39. L. Yan. Interactive exploration of very large
22. M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: relational data sets through 3d dynamic projec
A fast scalable classifier for data mining. Conf. tions. SIGKDD Int. Conf. On Knowledge Dis
on Extending Database Technology (EDBT), covery & Data Mining (KDD 2000), Boston,
Avignon, France, 1996. MA, 2000.