Rough Cluster Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA Kevin Voges University of Canterbury Nigel Pope Griffith University Mark

Brown University of Queensland Track: Marketing Research and Research Methodologies Abstract This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers. Introduction Cluster analysis is an important technique in market research. The accepted purpose of this technique is to group individuals or objects into clusters so that objects in the same cluster are more similar to one another than they are to objects in other clusters (Hair, Anderson, Tatham & Black 1998, p. 470). Comprehensive reviews of the technique can be found in Punj and Stewart (1983) and Arabie and Hubert (1994). A common method of cluster analysis is the k-means approach, where data points are randomly selected as initial seeds or centroids, and the remaining data points are assigned to the closest centroid on the basis of the distance between them (MacQueen 1967). The aim is to obtain maximal homogeneity within subgroups or clusters, and maximal heterogeneity between clusters.

K-means cluster analysis is less affected by data idiosyncrasies than hierarchical clustering techniques (Punj & Stewart 1983), but the approach still suffers from many of the problems associated with all traditional statistical analysis methods. These methods were developed for use with variables which are normally distributed and which have an equal variancecovariance matrix in all groups. In most realistic marketing data sets, neither of these conditions necessarily holds. Over the last few years a number of new techniques have been developed which make few assumptions about the statistical characteristics of the data being

ANZMAC 2003 Conference Proceedings Adelaide 1-3 December 2003

1625

analysed (Hruschka 1986, 1993; Kowalczyk & Piasta 1998; Matsatsinis, Hatzis & Samaras 1998; Voges 1997; Voges & Pope 2000). This paper describes the application of one of these techniques, rough clustering, to a segmentation analysis of on-line shopping orientations. Rough Clustering Rough clustering (do Prado, Engel & Filho 2002; Voges, Pope & Brown 2002) is an extension of the theory of rough or approximation sets, introduced by Pawlak (1982, 1991). Rough sets theory is based on the assumption that information is associated with every record in the data matrix (in rough sets terminology, every object of the information system). This information is expressed by means of variables (in rough set terminology, attributes) that serve as descriptions of the objects. None of the traditional assumptions of multivariate analysis are relevant, as the data are treated from the perspective of set theory (that is, as object descriptors rather than variables with statistical properties). For introductions to the theory of rough sets, see Pawlak (1991), Lin and Cercone (1997), or Munakata (1998). The complete information system expresses all the knowledge available about the objects being studied. More formally, the information system is a pair, S = (U , A ), where U is a non-empty finite set of objects called the universe and A = {a1, , aj} is a non-empty finite set of attributes on U . With every attribute a A we associate a set V a such that a:U Va. The set Va is called the domain or value set of a (eg. the attribute gender would have a value set of {male, female}). The initial detailed data contained in the information system is used as the basis for the development of subsets of the data that are coarser or rougher than the original set. As with any data analysis technique, detail is lost, but the removal of detail is controlled to uncover the underlying characteristics of the data. The technique works by lowering the degree of precision in data, based on a rigorous mathematical theory. By selecting the right roughness or precision of data, we will find the underlying characteristics (Munakata 1998, p. 141). A core concept of rough sets is that of equivalence between objects (called indiscernibility). Objects in the information system about which we have the same knowledge form an equivalence relation. These equivalence relations divide the universe into partitions, which can then be used to build new subsets. Let S = (U, A) be an information system, and let B A and X U . We can describe the subset X using only the information contained in the attribute values from the subset B by constructing two subsets, referred to as the B-lower and B-upper approximations of X, and denoted as B*(X) and B*(X), respectively, where: B*(X) = { x | [x]B X } and B*(X) = { x | [x]B X } This definition of a rough set in terms of two other sets is the simple but powerful insight contributed by Pawlak, and has led to numerous publications exploring its implications. A third set is also useful in analysis, the boundary region, which is the difference between the upper approximation and the lower approximation. Rough clustering is a simple extension of the notion of rough sets, involving two additional requirements an ordered value set of attributes and a distance measure (Voges, Pope & Brown 2002). The value set is ordered to allow a meaningful distance measure, and clusters of objects are formed on the basis of their distance from each other, in a similar manner to standard clustering techniques. In addition,. Clusters are formed in a similar manner to

Marketing Research and Research Methodologies Track

1626

agglomerative hierarchical clustering (Hair et al. 1998). However, an object can belong to more than one cluster. Clusters can then be defined by a lower approximation (objects exclusive to that cluster) and an upper approximation (all objects in the cluster which are also members of other clusters), in a similar manner to rough sets. An introduction to rough clustering, and a detailed comparison between rough clustering and k-means clustering, can be found in Voges, Pope and Brown (2002). For any pair of objects, p and q, the distance between the objects is defined as: D ( p, q ) = ( {1 j N} | R ( p, j ) - R ( q, j ) | ) That is, the absolute differences between the values for each object pair's attributes are summed (the Euclidean distance is not appropriate, given that the attributes are not necessarily metric). The distance measure ranges from 0 (indicating indiscernible objects) to a maximum determined by the number of attributes and the size of the value set for each attribute. In the study discussed below, there are five attributes ranging from 1 to 7, so the maximum possible distance between any two objects would be 30. As will be seen later, considering small distances up to 5 can form viable clusters. The algorithm for producing rough clusters is quite straightforward and is described as follows (Voges, Pope & Brown 2002). Initially, a distance matrix for all object pairs is calculated. All object pairs at interobject distance D, where D steps from 0 to a determined maximum, are then identified. Each object pair (ai, a j ) can be in one of three situations in relation to current cluster membership, with the following consequences: 1. Both objects have not been assigned to any prior cluster. A new cluster is started with aI and aj as the first members. 2. Both objects are currently assigned to clusters. Object aI is assigned to object a js earliest cluster, and object aj is assigned to object a Is earliest cluster. The earliest cluster is the first cluster to which the object was assigned. 3. One object, a i is assigned to a cluster and the other object, aj is not assigned to a cluster. Object aj is assigned to object ais earliest cluster. An application of the rough clustering technique to of on-line shopping orientation is reported in the following section. Application A rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation, perceived risk and intention to purchase products via the Internet (Brown 1999; Brown, Pope & Voges 2003). The rough cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. All measures were constructed as multi-item Likert-type scales with responses ranging from strongly disagree to strongly agree. As rough clustering requires ordered discrete data, the multi-item scores were mapped onto an ordered attribute with a range of seven, with each value for the attribute representing 14 to 15 percent of the data set. This process was found to provide a rough cluster solution that was most comprehensible. Discretization is the process by which variable values are combined to

ANZMAC 2003 Conference Proceedings Adelaide 1-3 December 2003

1627

form a new (smaller) range of values. How best to do this is an ongoing issue in many of these new approaches to data analysis (Komorowski, Pawlak, Polkowski & Skowron 1999). Results and Conclusion Rough clustering produces more clusters than standard cluster analysis (Voges, Pope and Brown 2002). The number of clusters required to describe the data is a function of the interobject distance (D). In addition, the lower approximation of each cluster is dependent on the number of clusters selected for the solution. More clusters means an object has a higher chance of being in more than one cluster, which moves the object from the lower approximation to the boundary region and reduces the size of the lower approximation. A number of factors need to be considered when determining the best maximum value for D and the best number of clusters to include in the solution. A solution with too few clusters will not provide a useful interpretation of the partitioning of the data. On the other hand, too many clusters will make interpretation difficult. In addition, the degree of overlap between the clusters needs to be minimised to ensure that each cluster identified provides additional information to aid with interpretation. One way to achieve this is to maximise the sum of the lower approximations of the clusters being used to provide a possible solution. Determining a good cluster solution requires a trade-off between these factors. Table 1: Size of Upper and Lower Approximations for Values of Maximum D From 2 to 6 Max D = 2 Max D = 3 Max D = 4 Max D = 5 Max D = 6 * * * * |B (k)| |B*(k)| |B (k)| |B*(k)| |B (k)| |B*(k)| |B (k)| |B*(k)| |B*(k)| |B*(k)| 12 6 12 6 12 6 12 6 12 6 1 11 2 10 3 10 4 8 5 7 6 7 7 7 8 6 9 6 10 6 11 8 12 7 |B*(k)| 83 11 11 7 10 10 10 8 8 7 7 7 7 7 6 6 6 3 5 53 40 35 30 25 25 19 34 25 17 23 20 17 14 25 23 6 15 14 6 14 14 12 13 12 26 29 28 18 18 17 85 69 82 57 62 61 62 54 40 50 48 46 8 28 2 22 18 6 1 10 16 7 11 9 15 47 23 35 27 19 139 4 17 116 12 54 131 0 14 121 1 8 113 9 19 94 8 38 96 8 105 2 105 2 87 8 83 10 81 6 70 150 186 186 160 186 161 171 141 114 133 130 122 137 2 0 4 0 4 7 4 5 0 2 3 1 10 0 43 9 15 23

168 136

138 166

32 100

(Source: Voges, Pope & Brown 2002, p. 218) Table 1 is an example of this trade-off between the number of clusters and cluster overlap, showing two representative situations a 12-cluster solution and a 6-cluster solution (the full analysis obtains all solutions between these values, but only these two are shown for clarity). The table shows the number of objects in each cluster as the maximum D is progressively

Marketing Research and Research Methodologies Track

1628

increased from 2 to 6. The rough clustering algorithm as described above is run several times, with the value of D increased each time. For each maximum value of D shown in the table, the first column (B*(k)) shows the size of the upper approximation of the cluster (ie. the number of objects in the cluster, including those objects that could be in other clusters). The next two columns (B*(k)) show the values of the lower approximations (ie. the number of objects unique to each cluster). The second column shows B*(k) when 12 clusters are considered, and the third shows B*(k) when 6 clusters are considered. As the maximum D is increased, more objects are assigned to more clusters (if this process were continued, all objects would eventually be assigned to every cluster). As Table 1 shows, a maximum value of D of 6 produces several large clusters with very few unique members (the lower approximation, B*(k), is quite small, zero in some cases). For the first two values of maximum D, the sum of the lower approximations (shown at the bottom of the table) is larger for the 12 clusters than for the 6 clusters. Further investigation showed that for these two values of maximum D, the sum of the lower approximations continues to increase as the number of clusters is increased. This shows that for these two solutions, the algorithm is mainly performing Step 1 (ie. as new objects are being incorporated into the solution they are being assigned to new clusters). For a maximum D value of 4, this pattern reverses. That is, the sum of the lower approximations for the first six clusters is larger than the sum of the lower approximations for the first twelve clusters. The sum of lower approximations then decreases as the value of maximum D increases. This shows that for maximum values for D of 4 and above, the rough clustering algorithm is starting to assign objects based on Steps 2 and 3 (ie. as new objects are being incorporated into the solution they are being assigned to existing clusters). This suggests that the best maximum value for D is at least 4. For the next stage of the analysis, a nine-cluster solution with an interobject distance of five was obtained. These values were chosen as they produced a solution containing 90% of the data set, with a sum of the lower approximations of 108 (ie. 108 uniquely assigned objects - see Voges, Pope and Brown, 2002). Table 2 Cluster Means and Standard Deviations for Shopping Orientations (D = 5) Cluster Enjoyment Loyalty Mean (sd) Mean (sd) 1 2 3 4 5 6 7 8 9 -0.22 (1.70) -0.01 (1.98) -0.05 (1.93) 0.21 (1.69) 1.26 (1.43) -0.26 (1.48) 0.92 (1.51) -1.32 (1.47) -0.86 (1.50) 1.48 (1.39) -0.69 (1.64) 1.07 (1.43) 0.93 (1.35) 0.95 (1.46) -1.28 (1.45) 0.15 (1.82) -0.01 (1.56) 0.64 (1.54) Shopping Orientation Price Mean (sd) -0.64 (1.77) 1.03 (1.58) -0.71 (1.52) -0.91 (1.48) -0.06 (1.43) 1.10 (1.63) 1.60 (1.20) -0.66 (1.80) -0.82 (1.55) Convenience Personalizing Mean (sd) Mean (sd) 0.23 (1.77) -1.84 (1.19) -1.13 (1.32) 0.38 (1.64) 0.93 (1.74) 1.27 (1.41) 0.84 (1.59) -0.39 (1.90) -1.57 (1.25) 0.58 (1.67) -0.33 (1.73) 0.54 (1.57) -0.17 (1.52) 0.07 (1.99) -0.68 (1.67) 1.56 (1.41) 1.50 (1.23) 0.53 (1.67)

(Source: Voges, Pope & Brown 2002, p. 221)

ANZMAC 2003 Conference Proceedings Adelaide 1-3 December 2003

1629

Table 2 presents the cluster centroids for the rough cluster solution. The table shows that multiple shopping orientations are identifiable, with shoppers being differentiated by the degree to which they responded to statements regarding shopping orientation. Numbers above 0.90 suggest a strong positive orientation towards that particular aspect of shopping, while numbers below 0.90 suggest a strong negative orientation. Cluster 1 shows a strong loyalty orientation. Cluster 2 can be interpreted as a shopper looking for the best price, who is prepared to sacrifice convenience to obtain that price. Cluster 3 shows loyalty at the expense of convenience, and Cluster 4 shows loyalty at the expense of price. Cluster 5 shows positive scores on enjoyment, loyalty and convenience. Three of the clusters show complex combinations of orientations. Cluster 5 shows positive scores on enjoyment, loyalty and convenience, Cluster 6 shows a concern for price and convenience at the expense of loyalty, while Cluster 7 shows concern for enjoyment, price and personal shopping. Two of the clusters form combinations of orientations that are more difficult to interpret, or at best reflect a negative shopping orientation. Cluster 9 has a negative value for convenience, and Cluster 8 shows concern for personal shopping at the expense of enjoyment. Rough clustering solutions are different to k-means solutions because of the possibility of multiple cluster membership of objects. Rough clustering provides a more flexible solution to the clustering problem, and can be conceptualized as extracting concepts from the data, rather than strictly delineated subgroupings (Pawlak 1991). Traditional clustering methods generate extensional descriptions of groups (ie, what objects are members of each cluster), whereas clustering techniques based on rough sets theory generate intensional descriptions (ie. what are the main characteristics of each cluster) (do Prado et al. 2002). These concepts provide interpretations of different shopping orientations present in the data. Such concepts can be an aid to marketers attempting to describe potentially new segments of consumers. References Arabie, P & Hubert, L 1994, Cluster analysis in marketing research, in Advanced Methods of Marketing Research, ed RP Bagozzi, Blackwell, Cambridge, MA. Brown, MR 1999, Buying or browsing? An investigation of the relationship between shopping orientation, perceived risk, and intention to purchase products via the Internet, PhD thesis, Griffith University. Brown, MR, Pope, NKLl & Voges, KE 2003, Buying or browsing? An exploration of shopping orientations and online purchase intentions, European Journal of Marketing, vol. 37, no. 11/12. do Prado, HA, Engel, PM & Filho, HC 2002, Rough clustering: An alternative to find meaningful clusters by using the reducts from a dataset, in Rough Sets and Current Trends in Computing, Third International Conference, RSCTC 2002, eds JJ Alpigini, JF Peters, J Skowronek & N Zhong, Springer, Berlin. Hair, JE, Anderson, RE, Tatham, RL & Black, WC 1998, 5th edn, Multivariate Data Analysis, Prentice-Hall International, London. Hruschka, H 1986, Market definition and segmentation using fuzzy clustering methods, International Journal of Research in Marketing, vol. 3, pp. 117-134. Hruschka, H 1993, Determining market response functions by neural network modeling: A comparison to econometric techniques, European Journal of Operations Research, vol. 66, pp. 27-35.

Marketing Research and Research Methodologies Track

1630

Komorowski, J, Pawlak, Z, Polkowski, L & Skowron, A 1999, Rough sets: A tutorial, in Rough-Fuzzy Hybridization: A New Trend in Decision Making, eds. SK Pal & A Skowron, Springer, Singapore. Kowalczyk, W & Piasta, Z 1998, Rough-set inspired approach to knowledge discovery in business databases, in Research and Development in Knowledge Discovery and Data Mining, eds X Wu, R Kotagiri & KB Korb, Springer, Berlin. Lin, TY & Cercone, N (eds) 1997, Rough Sets and Data Mining: Analysis for Imprecise Data, Kluwer, Boston. MacQueen, J 1967, Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, Volume 1, eds LM Le Cam & J Neyman, University of California Press, Berkeley, CA. Matsatsinis, NF, Hatzis, CN & Samaras, AP 1998, Identifying consumer's preferences using artificial neural network techniques, in Managing in Uncertainty: Theory and Practice, eds C Zopounidis & PM Pardalos, Kluwer, Dordrecht, Netherlands. Munakata, T 1998, Fundamentals of the New Artificial Intelligence: Beyond Traditional Paradigms, Springer-Verlag, New York. Pawlak, Z 1982, Rough sets, International Journal of Information and Computer Sciences, vol. 11, no. 5, pp. 341-356. Pawlak, Z 1991, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, Boston. Punj, G & Stewart, DW 1983, Cluster analysis in marketing research: Review and suggestions for application, Journal of Marketing Research, vol. 20, May, pp. 134-148. Voges, KE 1997, Using evolutionary algorithm techniques for the analysis of data in marketing, Cyber-Journal of Sport Marketing, vol. 1, no. 2, pp. 66-82. Voges, KE & Pope, NKLl 2000, An overview of data mining techniques from an adaptive systems perspective in Visionary Marketing for the Twenty-first Century: Facing the Challenge - Proceedings of ANZMAC 2000, ed A O'Cass, Australian and New Zealand Marketing Academy, Brisbane, Australia. Voges, KE, Pope, NKLl & Brown, MR 2002, Cluster analysis of marketing data examining on-line shopping orientation: A comparison of k-means and rough clustering approaches in Heuristics and Optimization for Knowledge Discovery, eds HA Abbass, RA Sarker & CS Newton, Idea Group Publishing, Hershey, PA.

ANZMAC 2003 Conference Proceedings Adelaide 1-3 December 2003

1631

You might also like