Academia.eduAcademia.edu

Comparing decision bound and exemplar models of categorization

1993, Perception & Psychophysics

for their excellent comments on an earlier version of this article. We would also like to thank Cindy Castillo and Marisa Murphy for help in typing the manuscript and Christine Duvauchelle for help in editing it. Correspondence concerning this article should be addressed to F.

Perception & Psychophysics 1993. 53 (1), 49-70 Comparing decision bound and exemplar models of categorization W. TODD MADDOX and F. GREGORY ASHBY University of California, Santa Barbara, California The performance of a decision bound model of categorization (Ashby, 1992a; Ashby & Maddox, in press) is compared with the performance of two exemplar models. The first is the generalized context model (e.g., Nosofsky, 1986, 1992) and the second is a recently proposed deterministic exemplar model (Ashby & Maddox, in press), which contains the generalized context model as a special case. When the exemplars from each category were normally distributed and the optimal decision bound was linear, the deterministic exemplar model and the decision bound model provided roughly equivalent accounts of the data. When the optimal decision bound was nonlinear, the decision bound model provided a more accurate account of the data than did either exemplar model. When applied to categorization data collected by Nosofsky (1986,1989), in which the category exemplars are not normally distributed, the decision bound model provided excellent accounts of the data, in many cases significantly outperforming the exemplar models. The decision bound model was found to be especially successful when (1) single subject analyses were performed, (2) each subject was given relatively extensive training, and (3) the subject's performance was characterized by complex suboptimalities. These results support the hypothesis that the decision bound is of fundamental importance in predicting asymptotic categorization performance and that the decision bound models provide a viable alternative to the currently popular exemplar models of categorization. Decision bound models of categorization (Ashby, 1992a; Ashby & Maddox, in press) assume that the subject learns to assign responses to different regions of perceptual space. When categorizing an object, the subject determines in which region the percept has fallen and then emits the associated response. The decision bound is the partition between competing response regions. In contrast, exemplar models assume that the subject computes the sum of the perceived similarities between the object to be categorized and every exemplar of each relevant category (Medin & Schaffer, 1978; Nosofsky, 1986). Categorization judgments are assumed to depend on the relative magnitude of these various sums. This article compares the ability of decision bound and exemplar models to account for categorization response probabilities in seven different experiments. The aim is not to compare the performance of the decision. bound model with all versions of exemplar theory; clearly this is beyond the scope of any single article. Rather, the goal is to compare the decision bound model with one of the most highly successful and widely tested versions of exemplar theory-namely, the generalized context model (GCM; Nosofsky, 1986, 1987, 1989; Nosofsky, Clark, & Shin, 1989; Shin & Nosofsky, 1992). In addition, a recently proposed deterministic exemplar model (DEM; Ashby & Maddox, in press), which contains the GCM as a special case, will also be applied to the data. The GCM, which has been applied to a wide variety of categorization conditions (see Nosofsky, 1992, for a review), provides good quantitative accounts of the data, in many instances accounting for over 98% of the variance. There have been cases, however, in which the model fits were less satisfactory, accounting for less than 85 % of the variance in the data (see Nosofsky, 1986, Table 4). Of the seven experiments, the first five involve categories in which the exemplars are normally distributed along each stimulus dimension and single subject analyses are performed. To date, the GeM only has been applied to data sets in which the category exemplars are not normally distributed, and in only one of these cases (Nosofsky, 1986) have single subject analyses been performed. This article provides the first test of the GCM's ability to account for data from normally distributed categories. The final two data sets were reported by Nosofsky (1986, 1989) and involve experiments in which the category exemplars are not normally distributed and only a small number of exemplars were utilized. Three of the The contribution of each author was equal. Parts of this research were presented at the 22nd Annual Mathematical Psychology meetings at the University of California, Irvine; at the 23rd Annual Mathematical Psychology Meetings at the University of Toronto; and at the 24th Annual Mathematical Psychology Meetings at Indiana University. This research was supported in part by a UCSB Humanities/Social Sciences Research Grant to W.T.M. and by National Science Foundation Grant BNS8819403 to F.G.A. This work has benefited from discussion with Jerome Busemeyer and Richard Hermstein. We would like to thank Lester Krueger, Robert Nosofsky, Steven Sloman, and John Tisak for their excellent comments on an earlier version of this article. We would also like to thank Cindy Castillo and Marisa Murphy for help in typing the manuscript and Christine Duvauchelle for help in editing it. Correspondence concerning this article should be addressed to F. G. Ashby, Department of Psychology, University of California, Santa Barbara, CA 93106. 49 Copyright 1993 Psychonomic Society, Inc. 50 MADDOX AND ASHBY four categorization conditions are identical in the two studies, and the stimulus dimensions are the same. The main difference between the two studies is that single subject analyses were performed on the Nosofsky (1986) data, and the data were averaged across subjects in the Nosofsky (1989) data. DECISION BOUND THEORY Decision bound theory (also called general recognition theory) rests on one critical assumption-namely, that there is trial-by-trial variability in the perceptual information associated with every stimulus. In the case of threshold level stimuli, this assumption dates back to Fechner (1866) and was exploited fully in signal detection theory (Green & Swets, 1966). However, with the highcontrast stimuli used in most categorization experiments, the assumption might appear more controversial. There are at least two reasons, however, why even in this case, variability in the percept is expected. First, it is well known that the number of photons striking each rod or cone in the retina during presentation of a visual stimulus has trial-by-trial variability. In fact, the number has a Poisson probability distribution (Barlow & Mollon, 1982), and one characteristic of the Poisson distribution is that its mean equals its variance. Thus, more intense stimuli are associated with greater trial-by-trial variability at the receptor level. Specifically, the standard deviation of the number of photons striking each rod or cone is proportional to the square root of the mean stimulus intensity. Second, the visual system is characterized by high levels of spontaneous activity. For example, ganglion cells in the optic nerve often have spontaneous firing rates of as high as 100 spikes per second. If this argument is accepted, then a second question to be asked is whether such variability is likely to affect the outcome of categorization judgments. For example, when one is classifying pieces of fruit such as apples or oranges, trial-by-trial variability in perceived color (i.e., in hue) is unlikely to lead to a categorization error. Even if perceptual variability does not affect the outcome of a categorization judgment, however, the existence of such variability has profound effects on the nature of the decision process. For example, in the presence of perceptual variability, the decision problem in a categorization task is identical to the decision problem in an identification task. In both cases, the subject must learn the many different percepts that are associated with each response. As a consequence, a theory of identification that postulates perceptual variability needs no extra structure to account for categorization data. In contrast, the most widely known versions of exemplar theory, including the context model . (Medin & Schaffer, 1978) and the generalized context model (Nosofsky, 1986), postulate a set of decision processes that are unique to the categorization task. To formalize this discussion, consider an identification task with two stimuli, A and B, which differ on two phys- ical dimensions. In this case, stimulus i (i = A or B) can be described by the vector where VI i and V2i are values of stimulus i on physical dimensions I and 2, respectively. Decision bound theory assumes that the subject misperceives the true stimulus coordinates Vi because of trial-by-trial variability in the percept (i.e., perceptual noise) and because the mapping from the physical space to the perceptual may be nonlinear (e.g., because of response compression during sensory transduction).! Denote the subject's mean percept of stimulus i by Xi. A natural assumption would be that Xi is related to Vi via a power or log transformation. In either case, the subject's percept of stimulus i is represented by Xpi = Xi + ep , (1) where ep is a random vector that represents the effects of perceptual noise. 2 Given the Equation 1 model of the perceptual representation, the next step in building a theory of either identification or categorization is to postulate a set of appropriate decision processes. The ability to identify or categorize accurately is of fundamental importance to the survival of every biological organism. Plants must be categorized as edible or poisonous. Faces must be categorized as friend or foe. In fact, every adult has had a massive amount of experience identifying and categorizing objects and events. In addition, much anecdotal evidence testifies to the expertise of humans at categorization and identification. For example, despite years of effort by the artificial intelligence community, humans are far better at categorizing speech sounds or handwritten characters than are the most sophisticated machines. These facts suggest that a promising method for developing a theory of human decision processes in identification or categorization is to study the decision processes of the optimal classifier-that is, of the device that maximizes identification or categorization accuracy. The optimal classifier has a number of advantages over any biological organism, so humans should not be expected to respond optimally. Nevertheless, one might expect humans to use the same strategy as the optimal classifier does, even if they do not apply that strategy with as much success. The optimal classifier was first studied by R. A. Fisher more than 50 years ago (Fisher, 1936) and today its behavior is well understood (see, e.g., Fukunaga, 1990; Morrison, 1990). Consider an identification or categorization experiment with response alternatives A and B. Because of the Poisson nature of light, even the optimal classifier must deal with trial-by-trial variability in the stimulus information. Suppose the stimulus values recorded by the optimal classifier, on trials when stimulus i DECISION BOUND MODEL OF CATEGORIZAnON is presented, are denoted by Wj. Then the optimal classifier constructs a discriminant function ho(wj) and responds A or B according to the rule: if ho(wj) ! < 0; then respond = 0; then guess > 0; then respond D a A (2) B The discriminant function partitions the perceptual space into two response regions. In one region [where ho(wj) < 0], response A is given. In the other region [where ho(Wi) > 0], response B is given. The partition between the two response regions [where ho(wj) = 0] is the decision bound. The position and shape of the decision bound depends on the stimulus ensemble and on the exact distribution of noise associated with each stimulus. If the noise is normally distributed and the task involves only two stimuli, the optimal bound is always either a line or a quadratic curve. 3 . Although identification and categorization are treated equivalently by the optimal classifier, in practice these two tasks provide insights about different components of processing. Identification experiments are good for studying the distribution of perceptual noise (Ashby & Lee, 1991). This is because, with confusable stimuli, small changes in the perceptual distributions can have large effects on identification accuracy (Ennis & Ashby, in press). In contrast, in most categorization experiments, overall accuracy is relatively unaffected by small changes in the perceptual distributions. Most categorization errors occur because the subject incorrectly learned the rule for assigning stimuli to the relevant categories. Thus, categorization experiments are good for testing the hypothesis that subjects use decision bounds. In many categorization experiments, each category contains only a small number of exemplars (typically fewer than seven). Such a design is not the best for investigating the possibility that subjects use decision bounds, because with such few stimuli, many different bounds will typically yield identical accuracy scores. Ideally, each category would contain an unlimited number of exemplars and competing categories would overlap. In such a case, any change in the decision bound would lead to a change in accuracy. Ashby and his colleagues (Ashby & Gott, 1988; Ashby & Maddox, 1990, 1992) have reported the results of a number of categorization experiments in which the exemplars in each category had values on each stimulus dimension that were normally distributed. All experiments involved two categories and stimuli that varied on two dimensions. Representative stimuli are shown in Figure 1. For example, in an experiment with the circular stimuli, on each trial a random sample is drawn from either the Category A or the Category B (bivariate) normal distribution. This specifies an order pair (VIi, V2j). A circle is then presented with diameter VI j and with a radial line of orientation V2 j. The subject's task is to determine whether the stimulus is a member of Category A or Category B. Feedback is given on each trial. In each experi- 51 2 b o 2 Figure 1. Representative stimuli from (a) Condition R (rectangular stimuli) and (b) Condition C (circular stimuli) of Application I: Data Sets 1-5. ment, Ashby and his colleagues tested whether each subject's A and B responses were best separated by the optimal bound 4 or by a bound predicted by one of several popular categorization models (e.g., prototype, independent decisions, and bilinear models). Although subjects did not respond optimally, the best predictor of the categorization performance of experienced subjects, across all the experiments, was the optimal classifier. When subjects responded differently than the optimal classifier, the bound that best described their performance was always of the same type as the optimal bound. Specifically, when the optimal bound was a quadratic curve, the best-fitting bound was a quadratic curve, and when the optimal bound was linear, the best-fitting bound appeared to be linear. These facts suggest that the notion of a decision bound may have some fundamental importance in human categorization. That is, rather than compute similarity to the category prototypes, or add the similarity to all category exemplars, perhaps subjects behave as the optimal classifier and refer directly to some decision bound. According to this view, experienced categorization (i.e., after the decision bound is learned) is essentially automatic. When faced with a stimulus to categorize, the subject determines the region in which the stimulus representation has fallen and then emits the associated response. Exemplar information is not needed; only a response label is 52 MADDOX AND ASHBY retrieved. This does not mean that exemplar information is unavailable, however, because presumably exemplar information is used to construct the decision bound. Although the data provide tentative support for the notion of a decision bound, the data also suggest that subjects do not respond optimally. A theory of human categorization must account for the fundamental suboptimalities of human perceptual and cognitive processes. Decision bound theory assumes that subjects attempt to respond optimally but fail because, unlike the optimal classifier, humans (1) do not know the locations of every exemplar within each category, (2) do not know the true parameters of the perceptual noise distributions, (3) sometimes misremember or misconstruct the decision boundary, and (4) sometimes have an irrational bias against a response alternative. The first two forms of suboptimality cause the subject to use a decision bound that is different from the optimal bound. Specifically, decision bound theory assumes that rather than using the optimal discriminant function ho(Xpi) in the Equation 2 decision rule, the subject uses a suboptimal function h(Xpi) that is of the same functional form as the optimal function (i.e., a quadratic curve when the optimal bound is a quadratic curve, and a line when the optimal bound is linear). The third cause of suboptimal performance in biological organisms is imperfect memory. Trial-by-trial variability in the subject's memory of the decision bound is called criterial noise. In decision bound theory, criterial noise is represented by the random variable ec (normally distributed with mean 0 and variance aD, which is assumed to have an additive effect on the discriminant value h(Xpi). A final cause of suboptimality is a bias against one (or more) response alternative. A response bias can be modeled by assuming that rather than compare the discriminant value to zero, as in the Equation 2 decision rule, the subject compares it to some value o. A positive value of 0 represents a bias against response alternative B. In summary, decision bound theory assumes that rather than use the optimal decision rule of Equation 2, the sub- . ject uses the following rule: if h(Xpi) + < 0; then respond A ec = 1 0; then guess (3) . > 0; then respond B As a consequence, the probability of responding A on trials when stimulus i is presented is given by P(A 1stimulus i) = P[h(xpi)+ec < 01 stimulus i]. (4) Several versions of this model can be constructed, depending on the form of the decision bound. In this article, we consider (1) the general quadratic classifier, (2) the general linear classifier, and (3) the optimal decision bound model (not to be confused with the optimal classifier discussed above). The three versions will be only briefly introduced here. More detailed derivations of each of these three models is given in Ashby and Maddox (in press). In an experiment with normally distributed categories, the optimal classifier uses a quadratic decision bound if stimulus variability within Category A differs from the variability within Category B along any stimulus dimension or if the two categories are characterized by a different set of covariances. The general quadratic classifier assumes that the subject attempts to respond optimally but misestimates some of the category means, variances, or covariances and therefore uses a quadratic decision bound that differs from the optimal. With the two perceptual dimensions X t and X2, the decision bound of the general quadratic classifier satisfies h(X]'X2) = atxt + + a2x~ + a3XtX2 btX t + b2X2 + Co (5) for some constants a], a2, a 3, b], b 2, and Co. An important property of this model is that the effects of perceptual and criterial noise can be estimated uniquely. Separate estimates of perceptual and criterial noise have been obtained in the past (see, e.g., Nosofsky, 1983), but these have required a comparison of performance across several different experiments. If the variability within Category A is equal to the variability within Category B along both dimensions, and if the two categories are characterized by the same covariance, then the decision bound of the optimal classifier is linear. The general linear classifier assumes that the subject makes the inference (perhaps incorrectly) that these conditions are true, so he or she chooses a linear decision bound; but because the category means, variances, and covariance are unknown, a suboptimal linear bound is chosen. The general linear classifier is a special case of the general quadratic classifier in which the coefficients a], a2, and a3 in Equation 5 are zero. At this point, decision bound theory makes no assumptions about categorization at the algorithmic level-that is, about the details of how the subject comes to assign responses to regions. There are a number of possibilities. 5 Some require little computation on the part of the subject. Specifically, it is not necessary for the subject to estimate category means, variances, covariances, or likelihoods, even when responding optimally. In this case, the only requirement is that the subject be able to learn whether a given percept is more likely to have been generated by an exemplar from Category A or B. EXEMPLAR THEORY Generalized Context Model Exemplar theory (see, e.g., Estes, 1986a, 1986b; Smith & Medin, 1981) assumes that, on each trial of a categorization experiment, the subject performs some sort of global match between the representation of the presented stimulus and the memory representation of every exemplar of each category and chooses a response on the basis of these similarity computations. The assumption that the global matching operation includes all members of each DECISION BOUND MODEL OF CATEGORIZAnON category seems viable in the kinds of categorization tasks used in many laboratory experiments, because it is a common experimental practice to construct categories with only four to six exemplars. With natural categories, however, the assumption seems less plausible. For example. when one is deciding that a chicken is a bird, it seems unlikely that one computes the similarity of the chicken in question to every bird one has ever seen. Of course, if performed in parallel, this massive amount of computation may occur, but it certainly disagrees with introspective experience. Perhaps the most widely known of the exemplar models is the GCM, developed by Medin and Schaffer (1978) and elaborated by Estes (1986a) and Nosofsky (1984, 1986). According to the GCM, the probability that stimulus i is classified as a member of Category A, P(A Ii), is given by (3 E T/i) jECA P(A Ii) (6) where} E CJ represents all exemplars of Category J, T/ij is the similarity of stimulus i to exemplar}, and (3 is a response bias. The similarity T/i) between a pair of stimuli is assumed to be a monotonically decreasing function of the psychological distance between point representations of the two stimuli. Thus, the GCM assumes no trialby-trial variability in the perceptual representation. The psychological distance between stimuli i and} is given by d ij = c[wlxti-xtjl' + (l-w) IX2i-X2jIT/" (7) where w is the proportion of attention allocated to Dimension 1 and the nonnegative parameter c scales the psychological space. The parameter c can be interpreted as a measure of overall stimulus discriminability and should increase with increased exposure duration or as subjects gain experience with the stimuli (Nosofsky, 1985, 1986). The exponent r ~ 1 defines the nature of the distance metric. The most popular cases occur when r = 1 (cityblock distance) and when r = 2 (Euclidean distance). Two specific functions relating psychological distance to similarity are popular. The exponential decay function assumes that the similarity between stimuli i and} is given by (e.g., Ennis, 1988; Nosofsky, 1988a; Shepard, 1957, 1964, 1986, 1987, in press) T/ij = exp( -di). (8) In contrast, the Gaussianfunction assumes (e.g., Ennis, 1988; Ennis, Mullen, & Frijters, 1988; Nosofsky, 1988a; Shepard, 1986, 1987, 1988) that T/ij = exp( -d:j ). (9) In most applications of the GeM, the exponential similarity function is paired with either the city-block or the Euclidean distance metric or else the Gaussian similarity function is paired with the Euclidean metric (e.g., 53 Nosofsky, 1986, 1987). The GCM has accounted successfully for the relationship between identification, categorization, and recognition performance with stimuli constructed from both separable and integral dimensions under a variety of different conditions (see Nosofsky, 1992, for a review). A Deterministic Exemplar Model The act of categorization can be subdivided into two components (see, e.g., Massaro & Friedman, 1990). The first involves accessing the category information that is assumed relevant to the decision-making process, and the second involves using this information to select a response. The decision bound model and the GCM differ on both of these components. First, the decision bound model assumes that the subject retrieves the response label associated with the region in which the stimulus representation falls, whereas the GCM assumes that the subject performs a global similarity match to all exemplars of each category. Second, the decision bound model assumes a deterministic decision process (i.e., Equation 3), whereas the GCM assumes a probabilistic decision process (i.e., Equation 6) that is described by the similaritychoice model of Luce (1963) and Shepard (1957). In a deterministic decision process, the subject always gives the same response, given the same perceptual and cognitive information. In a probabilistic process, the perceptual and cognitive information is used to compute the probability associated with each response alternative. Thus, given the same information, sometimes one response is given and sometimes another. A poor fit of one model relative to the other could be attributed to either component. Thus, it is desirable to investigate a model that differs from both the GCM and the decision bound model on only one of these two components. Probabilistic versions of the decision bound model could be constructed and so could deterministic versions of exemplar theory. As we will see in the next section, however, the data seem to support a deterministic decision process, and for this reason, it is especially interesting to examine a deterministic version of exemplar theory. Nosofsky (1991) proposed a deterministic exemplar model in which the summed similarity of the probe to all exemplars of each category are compared and the response associated with the highest summed similarity is given. This model, however, does not contain the GeM as a special case. Ashby and Maddox (in press) proposed a deterministic exemplar model in which the relevant category information is the log of the summed similarity of the probe to all exemplars of each category. Specifically, the model assumes that the subject uses the decision rule Respond A if 10g(ET/iA) > 10g(1)liB); Otherwise respond B, (10) where ET/iJ represents the summed similarity of stimu- 54 MADDOX AND ASHBY Ius i to all members of Category J. The log is important because the resulting model contains the GCM as a special case. With criterial noise and a response bias 0, the Equation 10 decision rule becomes Respond A if log(E1/iA)-log(E1/iB) > o+ec ; Otherwise respond B, (II) where the subject is biased against B if 0 < o. Now, if ec has a logistic distribution with a mean of 0 and varithen the probability of responding A given stimance a~, ulus i can be shown to equal (Ashby & Maddox, in press) /3(E1/iA)Y peA I i) (12) where ~Oe 7r 'Y = - - and /3 = .J3ac I +eo~ . Thus, this model is equivalent to the GCM when 'Y = I. In other words, the GCM can be interpreted as a deterministic exemplar model in which the criterial noise variance a~ = 7r 2 /3. The 'Y parameter indicates whether response selection is more or less variable than is predicted by the GCM. If 'Y < I, the transition from a small value of peA Ii) to a large value is more gradual than is predicted by the GCM; response selection is too variable. If 'Y > I, response selection is less variable than the GCM predicts. MODEL FITTING AND TESTING When testing the validity of a model with respect to a particular data set, one must consider two problems. The first is to determine how unknown parameters will be estimated; the second is to determine how well the model describes ("fits") the data. The method ofmaximum likelihood is probably the most powerful method (see Ashby, 1992b; Wickens, 1982).6 Consider an experiment with Categories A and B and a set of n stimuli, Sh S2' ... , Sn. For each stimulus, a particular model predicts the probabilities that the subject will respond A and B, which we denote by peA I S;) and PCB IS;), respectively. The results of an experimental session are a set of n responses, r" r2, ... , r n , where we arbitrarily set r; = 1 if response A was made to stimulus i and ri = 0 if response B was made. According to the model, and assuming that the responses are independent, the likelihood of observing this set of n responses is The maximum likelihood estimators are those values of the unknown parameters that maximize L(r" r2, ... , r n ) [denoted L(r) for short]. Maximum likelihood estimates are also convenient when one wishes to evaluate the empirical validity of a model. For example, under the null hypothesis that the model is correct, the statistic G 2 = - 21nL(r) has an asymptotic chi-square distribution with degrees of freedom equal to the number of experimental trials (n) minus the number of free parameters in the model. Rather than assess the absolute ability of a model to account for a set of data, it is often more informative to test whether a more general model fits significantly better than a restricted model. Consider two models, M, and M 2 • Suppose Model M, is a special case of Model M2 (i.e., M, is nested within M2 ) in the sense that M, can be derived from M2 by setting some of the free parameters in M 2 to fixed values. Let Gf and G~ be the goodness-of-fit values associated with the two models. Because M, is a special case of M 2 , note that G~ can never be larger than Gf. If Model M, is correct, the statistic G~ - Gf has an asymptotic chi-square distribution with degrees of freedom equal to the difference in the number of free parameters between the two models. Using this procedure, one can therefore test whether the extra parameters of the more general model lead to a significant improvement in fit (see Wickens, 1982, for an excellent overview of parameter estimation and hypothesis testing using the method of maximum likelihood). The G 2 tests work fine when one model is a special case of the other. For example, G 2 tests can be used to determine whether the extra parameters of the general quadratic classifier provide a "significant" improvement in fit over the general linear classifier. One goal of this article, however, is to test models that are not special cases of one another (i.e., models that are not nested), such as the proposed comparisons between the exemplar and decision bound models. Fortunately, a goodness-of-fit statistic called Akaike's (1974) information criterion (AIC) has been developed that allows comparisons between models that are not nested, such as the exemplar and decision bound models. The AlC statistic, which generalizes the method of maximum likelihood, is defined as AICCM;) -21nL; Gt + + 2N; 2N;, (13) where N; is the number of free parameters in Model M; and InL; is the log likelihood of the data as predicted by Model i after its free parameters have been estimated via maximum likelihood. By including a term that penalizes a model for extra free parameters, one can make a comparison across models with different numbers of parameters. The model that provides the most accurate account of the data is the one with the smallest Ale. (See Sakamoto, Ishiguro, & Kitagawa, 1986, or Takane & Shibayama, 1992, for a more thorough discussion of the minimum AlC procedure.) APPLICATION 1 NORMALLY DISTRIBUTED CATEGORIES This section reports the results of fitting decision bound and exemplar models to the data from five experiments with normally distributed categories. In each experiment, DECISION BOUND MODEL OF CATEGORIZAnON stimuli like those shown in Figure 1 were used. The dimensions of the rectangles (height and width) have been found to be integral (e.g., Garner, 1974; Wiener-Erlich, 1978), whereas the dimensions of the circles (size and orientation) have been found to be separable (Garner & Felfoldy, 1970; Shepard, 1964). In all experiments, each category was defined by a bivariate normal distribution. Such distributions can be described conveniently by their contours ofequal likelihood, which are always circles or ellipses. Although the size of each contour is arbitrary, the shape and location conveys important category information. The center of each contour is always the category prototype (i.e., the mean, median, or mode), and the ratio formed by dividing the contour width along Dimension 1 by the width along Dimension 2 is equal to the ratio of the standard deviations along the two dimensions. Finally, the orientation of the principle axis of the elliptical contour provides information about the correlation between the dimensional values. a _.-c o ..d- .~ uC:: ..d .~ ... S ,," , / width (size) ,, Data Sets 1 and 2: Linear Optimal Decision Bound The contours of equal likelihood that describe the categories of the first two data sets are shown in Figure 2. Note that, in both experiments, variability within each category is equal on the two dimensions, and the values on the two dimensions are uncorrelated. In both experiments, the optimal stimulus bound (represented by the broken line in Figure 2) is linear (V2 = VI and V2 = 450 - Vh for the first and second experiments, respectively, where VI corresponds to the width or size dimension, and V 2 corresponds to the height or orientation dimension). A subject perfectly using the optimal bound would correctly classify 80 % of the stimuli in each experiment. 7 Six subjects participated in each experiment, 3 with the rectangular stimuli (Condition R; see Figure la) and 3 with the circular stimuli (Condition C; see Figure lb). At the beginning of every experimental session, each subject was shown the stimulus corresponding to the Category A and Category B distribution means, along with their category labels. Each of these stimuli were presented 5 times each in an alternating fashion, for a total of 10 stimulus presentations. This was followed by 100 trials of practice and then 300 experimental trials. Feedback was given after every trial. Only the experimental trials were included in the subsequent data analyses. The exact experimental methods were identical to those described by Ashby and Maddox (1992, Experiment 3). All subjects using the circular stimuli completed three experimental sessions. For the rectangular stimuli, 2 subjects from Experiment 1 completed two sessions and 1 subject completed one session, whereas the 3 subjects from Experiment 2 completed four, two, and three sessions, respectively. All subjects achieved at least 75 % correct during their last experimental session. Three decision bound models were fit to the data from each subject's last experimental session: the general linear classifier, the optimal decision bound model, and the gen- 55 b _.- Ci o ..d- .~ .d~ uC:: ... S , " '(9 lJ A ' ,, ,, , width (size) Figure 2. Contours of equal likelihood and optimal decision bounds (broken lines) for Application 1: (a) Data Set 1 and (b) Data Set 2. eral quadratic classifier. Each model assumed that the amount of perceptual noise did not differ across stimuli or stimulus dimensions. In both experiments, exemplars from each category were presented equally often, so there was no a priori reason to expect a response bias toward either category. Thus, the response bias was set to zero in all three models. The general linear classifier has three free parameters: a slope and intercept parameter that describe the decision bound, and one parameter that represents the combined effect of perceptual and criterial noise. 8 Because the optimal decision bound model uses the decision bound of the optimal classifier, the slope and intercept are constrained by the shape and orientation of the category contours of equal likelihood; thus, the model has only one free parameter, which represents the sum of the perceptual and criterial noise (see Note 8). The general quadratic classifier has seven parameters: five of the six parameters in Equation 5 (one can be set to 1.0 without loss of generality), a perceptual noise parameter, and a criterial noise parameter. In both experiments, the optimal classifier uses a linear decision bound, and thus decision bound theory predicts 56 MADDOX AND ASHBY that subjects will use some linear decision bound in both experiments. If the theory is correct, the extra parameters of the general quadratic classifier should lead to no significant improvement in fit. Goodness-of-fit tests (G2) strongly supported this prediction. For 10 of the 12 subjects, the three-parameter general linear classifier and the seven-parameter general quadratic classifier yielded identical G 2 values. For the other 2 subjects (both from Experiment 2), the general quadratic classifier provided a slightly better absolute fit to the data, but in both cases, the improvement in fit was not statistically significant (p > .25). In order to detennine the absolute goodnessof-fit of the general linear classifier, G 2 tests were perfonned between the general linear classifier and the "null" model (i.e., the model that perfectly accounts for the data). For all 12 subjects, the G 2 tests were not statistically significant (p > .25). (In other words, the null hypothesis that the general linear classifier is correct could not be rejected.) In addition, for 9 of the 12 subjects, the fits of the general linear classifier were significantly better (p < .05) than those for the optimal decision bound model. Thus, these analyses strongly support the hypothesis that subjects used a suboptimal decision bound that was of the same fonn as the optimal bound (in this case, linear). The DEM has four parameters in this application: a scaling parameter c; a bias parameter B and attention parameter W, and the response selection parameter "I (see Equations 7 and 12). The GCM has only the first three of these free parameters ("I = 1 in the GCM). Three versions of each model were tested. One version assumed city-block distance and an exponential decay similarity function; one version assumed Euclidean distance and a Gaussian similarity function; and the third assumed Euclidean distance and an exponential similarity function. Krantz and Tversky (1975) suggested that the perceptual dimensions of rectangles may be shape and area rather than height and width. A transfonnation from the dimensions of height and width to shape and area is accomplished by rotating the height and width dimensions by 45°. In order to test the shape-area hypothesis, versions of the GeM and DEM that included an additional free parameter corresponding to the degree of rotation were also applied to the data. 9 The resulting models, which we call the GeM(O) and DEM(O), were fitted to the data by using the three distance-similarity function combinations described above. The details of the fitting procedure are described in the Appendix. Of the three distance-similarity function combinations tested, the Euclidean-Gaussian version of the GeM and GeM(O) provided the best account of the data for both the rectangular (8 of 12 cases) and circular (10 of 12 cases) stimuli. The Euclidean-Gaussian version of the DEM and DEM(O) provided the best account of the data for the circular stimuli (11 of 12 cases), and the Euclidean-exponential version fitted best for the rectangular stimuli (9 of 12 cases). Table 1 presents the goodness-of-fit values for the generallinear classifier and for the best-fitting versions of Table 1 Goodness-of-Fit Values (AIC) for Application 1: Data Sets 1 and 2 Conditionl GCM(O) DEM(O) Subject GLC GCM DEM R/I R/2 R/3 Mean C/I C/2 C/3 Mean R/I RI2 R/3 Mean CIl C/2 C/3 Mean 120.8 119.5 69.1 103.1 Data Set I 153.8 119.9 177.3 135.3 138.0 70.7 156.4 108.6 143.5 159.5 131.2 144.7 118.9 127.9 72.8 106.5 167.5 251.1 179.3 199.3 195.3 250.0 199.3 214.9 169.3 245.1 181.9 198.8 187.0 249.0 196.5 210.8 171.7 246.0 183.1 200.3 129.0 191.8 221.2 180.7 Data Set 2 155.0 133.6 208.5 195.0 226.9 223.2 196.8 183.9 154.7 194.7 224.2 191.2 131.5 197.1 220.7 183.1 172.1 218.4 248.2 212.9 195.8 229.2 248.0 224.3 182.6 225.8 250.7 219.7 178.8 221.0 252.7 217.5 176.3 219.7 250.2 215.4 Mean* 174.0 198.1 176.7 176.9 191.6 Note-Rows correspond to subjects and columns to models. GLC. general linear classifier; GCM. generalized context model; DEM. deterministic exemplar model; GCM(O). GeM with additional 0 parameter; DEM(O). DEM with additional 0 parameter. *Across 12 subjects. the GCM, DEM, GCM(O), and DEM(O). The DEM [or DEM(O)) provides the best fit for 3 subjects, and the GeM provides the best fit for 1 subject. For the remaining 8 subjects, the general linear classifier provides the best fit. Note, however, that the general linear classifier perfonns only slightly better than the DEM or DEM(O). In general, the fits of the GeM [and GeM(O)] are worse than those for the DEM [and DEM(O)]. In fact, the DEM fits better than the GCM(O) for 11 of the 12 subjects, suggesting that the GCM is more improved by the addition of the 'Y parameter (a parameter associated with the decision process) than by the addition of the 0 parameter (a parameter associated with the perceptual process). The poor perfonnance of the GCM [and GCM(O)] apparently occurs because response selection was less variable than predicted by these models; a fact that is reflected in the estimates of the DEM's 'Y parameter. Table 2 presents the median 'Y estimates for the best-fitting DEM and DEM(O) from Table 1. As predicted, in every case the median "I ~ 1. The "I values are quite large for Data Set 1, espe- Table 2 Median 'Y Estimates for Best-Fitting DEM and DEM(O) Reported in Table 1 Stimuli Data Set I DEM DEM(!J) Data Set 2 DEM DEM(!J) Rectangles 5.12 2.59 2.13 1.00 Circles 2.27 2.46 1.40 1.49 Note-Rows correspond to stimuli and columns to models. DEM. deterministic exemplar model; DEM(!J), DEM with additional 0 parameter. DECISION BOUND MODEL OF CATEGORIZATION cially for the rectangular stimuli. As one might expect, given this result, the difference in goodness of fit between the DEM and GCM(O) is also quite large in Data Set I. Two important conclusions can be drawn from these data. First, the superior fits of the general linear classifier and the DEM over the GCM suggest that response selection is less variable than that predicted by the GCM, especially in Data Set I. Second, when the optimal decision bound is linear, it is difficult to distinguish between the performance of a deterministic exemplar model and a decision bound model that assumes subjects use linear decision bounds in the presence of perceptual and criterial noise. A a width (size) Data Sets 3-5: Nonlinear Optimal Decision Bound If the amount of variability along any dimension or if the amount of covariation between any pair of dimensions differs for the two categories, then the optimal bound will be nonlinear (Ashby & Gott. 1988; Morrison, 1990). To examine the ability of subjects to learn nonlinear decision bounds, Ashby and Maddox (1992) conducted three experiments using normally distributed categories. Both experiments involved two conditions: one with the rectangular stimuli (Condition R; see Figure la) and one with the circular stimuli (Condition C; see Figure lb). Four subjects participated in each condition. Each subject completed between three and five experimental sessions. The contours of equal likelihood used in Data Sets 3-5 (Ashby & Maddox, 1992, Experiments 1-3) are shown in Figures 3a-3c, respectively. Because the shape and orientation of the Category A and B contours of equal likelihood differ, the optimal bound is highly nonlinear (represented by the broken line in Figures 3a-3c). A subject perfectly using this bound would correctly classify 90%, 78%, and 90% of the stimuli in Data Sets 3-5, respectively. In contrast, a subject perfectly using the most accurate linear bound would correctly classify 65 %, 60%, and 75 % of the stimuli in the three experiments, respectively. There were large individual differences in accuracy during the final session, but, for 23 of 24 subjects, accuracy exceeded that predicted by the most accurate linear bound. Accuracy ranged from 68 % to 82 %, 59 % to 73 %, and 81 % to 91 %, for Data Sets 3-5, respectively. Details of the experimental procedure and summary statistics are presented in Ashby and Maddox (1992) and will not be repeated here. The three decision bound models described in the last section were fitted separately to the data from each subject's first and last experimental sessions. Decision bound theory predicts that the best-fitting model should be the general quadratic classifier because the optimal bounds are all quadratic. With normally distributed categories, the optimal decision bound model and the general linear classifier are a special case of the general quadratic classifier, so G 2 tests were performed to determine whether the extra parameters of the general quadratic classifier led to a significant improvement in fit. These results, which are re- 57 b width (size) A Cl ....S! C .at ....... v ,d'" Cl .r::l .!! ... .E CO B width (size) Figure 3. Contours of equal likelihood and optimal decision bouncl'i (broken lines) for Application 1: (a) Data Set 3, (b) Data Set 4, and (c) Data Set S. ported in Table 3, confirm the superiority of the general quadratic classifier with these data. There is evidence that 3 subjects in Data Set 3 used linear bounds during their first session, but, for the data from the last session, the general linear classifier is rejected in every case. The superiority of the general quadratic classifier over the optimal decision bound model 58 MADDOX AND ASHBY Table 3 Proportion of Times That the General Quadratic Classifier Fits Significantly Better (p < .05) Than the General Linear Classifier or Optimal Decision Bound Model Optimal Decision Bound Model General Linear Classifier Data Set First Session Last Session First Session Last Session 3 4 5 5/8 8/8 8/8 8/8 8/8 8/8 7/8 8/8 7/8 8/8 8/8 6/8 suggests that, except for 2 subjects in Data Set 5, the subjects did not use optimal bounds. Over the three data sets, the best fit clearly is provided by the general quadratic classifier. In fact, the null hypothesis that this is the correct model could not be rejected for any of the subjects in their last session. For the first session, the model was rejected only for 3 subjects in Data Set 3 (p < .05). Thus, in addition to being the best of the three decision bound models, the general quadratic classifier also provides an adequate absolute account of the data. These results support the hypothesis that subjects use a suboptimal decision bound of the same form as the optimal bound (in this case, quadratic). For the GCM [and GCM(O)], the Euclidean-exponential version provided the best fit for 38 % and 39 % of the subjects who classified the rectangular and circular stimuli, respectively. The Euclidean-exponential version of the DEM [and DEM(O)] also provided the best fit for 38% and 44 % of the subjects who classified the rectangular and circular stimuli, respectively. Table 4 compares the goodness-of-fit values for the general quadratic classifier, and the Euclidean-exponential version of the GCM, DEM, GCM(O), and DEM(O). For the data from the first session, the general quadratic classifier fits best in 16 of 24 cases, the DEM [or DEM(O)] Table 4 Goodness-of-Fit Values (AlC) for Application 1: Data Sets 3-5 GQC GCM GCM(8) DEM CIS First Last First Last R/I RI2 R/3 R/4 Mean 291.6 271.6 365.3 285.2 303.4 326.3 306.3 199.9 255.9 272.1 344.5 296.7 367.2 352.2 340.2 327.6 312.4 206.5 268.4 278.7 CII C/2 C/3 C/4 Mean 410.1 312.3 326.2 386.2 358.7 273.3 216.2 217.9 273.8 245.3 402.4 370.0 322.4 405.8 375.2 RII R/2 R/3 R/4 Mean 326.1 294.6 246.8 312.0 294.9 290.7 297.9 259.6 336.8 296.3 Cl1 C/2 C/3 C/4 Mean 257.8 328.4 242.2 185.1 253.4 R/l R/2 R/3 R/4 Mean C/1 C/2 C/3 C/4 Mean First DEM(8) Last First Last First Last Data Set 3 337.3 295.5 368.4 312.9 328.5 329.4 314.0 208.4 269.1 280.2 344.1 288.5 361.8 353.9 337.1 323.3 311.3 197.2 261.1 273.2 325.2 291.1 358.6 302.6 319.4 325.7 307.3 198.9 260.7 273.2 295.9 212.1 246.3 280.9 258.8 404.4 371.8 319.8 402.4 374.6 284.1 213.2 236.8 279.1 253.3 404.4 360.8 324.4 407.4 374.3 296.3 213.2 238.1 281.8 257.4 405.6 363.6 321.6 403.3 373.5 285.2 215.1 231.2 280.8 253.1 384.1 340.6 358.0 349.0 357.9 304.3 325.9 275.7 362.9 317.2 Data Set 4 370.6 320.1 327.6 345.4 340.9 289.9 325.6 275.5 340.4 307.9 384.0 331.2 345.8 346.0 351.8 306.5 327.2 276.5 365.2 318.9 365.6 302.1 331.2 337.5 334.1 287.7 319.1 275.4 342.3 306.1 212.2 207.7 215.2 224.5 214.9 361.1 379.0 316.3 317.9 343.6 338.6 286.2 282.2 341.0 312.0 299.1 371.8 293.0 178.1 285.5 229.7 246.3 227.0 331.2 258.6 359.0 381.0 318.8 321.2 345.0 341.3 288.3 284.3 342.8 314.2 291.0 358.6 266.4 180.3 274.1 217.6 239.9 232.1 309.5 249.8 203.2 257.1 210.0 189.7 215.0 112.5 114.2 133.9 219.1 144.9 209.8 241.2 204.2 205.4 215.2 119.4 161.9 165.5 207.3 163.5 Data Set 5 207.9 243.2 194.9 206.1 213.0 110.4 146.1 160.6 206.5 155.9 182.4 236.4 184.6 205.4 202.2 116.3 148.4 144.1 208.4 154.3 184.2 226.7 187.1 206.9 201.2 94.1 146.7 132.2 207.7 145.2 190.9 113.5 63.0 91.6 114.8 128.8 57.5 122.3 79.4 97.0 213.7 93.7 75.3 100.9 120.9 144.5 51.3 152.1 88.9 109.2 205.9 95.5 75.9 102.3 119.9 143.9 42.5 127.5 84.4 99.6 216.2 91.4 73.8 97.6 119.8 147.6 52.6 136.9 91.5 107.2 207.5 92.8 77.0 97.1 118.6 145.8 41.3 124.1 87.2 99.6 237.5 270.2 288.4 221.2 256.7 292.2 239.9 277.1 225.9 Mean* 211.8 Note-Rows correspond to subjects and columns to models. GQC, general quadratic classifier; GMC, generalized context model; DEM, deterministic exemplar model; GCM (8), GCM with additional 8 parameter; DEM(8), DEM with additional 8 parameter. *24 subjects DECISION BOUND MODEL OF CATEGORIZAnON fits best in 4 cases, and the GCM [or GCM(O)] fits best in 4 cases. For the data from the last session, the general quadratic classifier fits best in 16 of 24 cases, the DEM [or DEM(O)] fits best in 5 cases, and the GCM [or GCM(O)] fits best in 3 cases. In Data Sets 1 and 2, where the optimal decision bound was linear, the general linear classifier fits only slightly better than the DEM [or DEM(O)]. In fact, the average goodness-of-fit value of the general linear classifier was less than 3 AlC points better than the average goodnessof-fit value of the DEM. However, in Data Sets 3-5, where the optimal decision bound is nonlinear, the decision bound model enjoys a clear advantage over the DEM and DEM(O). For the data from the first experimental session, the average goodness-of-fit value of the general quadratic classifier is more than 20 AIC points better than the average value of the DEM [and about 14 AlC points better than the DEM(O)]. For the final experimental session, the general quadratic classifier betters the DEM by about 14 AIC points and the DEM(O) by about 10 AIC points. 10 As with Data Sets 1 and 2, the GCM and GCM(O) fits are worse than the DEM [and DEM(O)] fits. However, a closer examination reveals that the largest discrepancy occurs in Data Set 4. One possible explanation for this result is that response selection is less variable in Data Set 4 than in Data Sets 3 and 5. Since larger values of 'Yare associated with less variability in response selection, this hypothesis predicts that the 'Y values for Data Set 4 should be larger than those for Data Sets 3 and 5. Table 5 presents the median 'Y estimates for the best-fitting DEM and DEM(8) from Table 4. As predicted, the largest median 'Y values occurred in Data Set 4. Note also that in 9 of the 12 cases reported in Table 5, the median 'Y value increased from the first to the last experimental session. Several important conclusions can be drawn from the quantitative analysis of Application 1. First, the GCM was consistently outperformed by the DEM and by the decision bound model. Much of this disparity can be attributed to the fact that when the optimal decision bound was linear, response selection was less variable than predicted by the GCM. When the optimal decision bound was nonlinear, both the GCM and the DEM were outperformed by the decision bound model. Table 5 Median 'Y &timates for Best-Fitting DEM and DEM(6) Reported in Table 4 DEM DEM (6) First First Last Last 1.51 1.25 3 1.59 1.07 4 2.71 2.19 2.62 2.80 1.13 5 1.14 1.46 1.90 Circles 3 1.85 1.61 1.60 2.07 4.29 4 3.45 4.48 5.67 1.30 5 1.03 2.12 1.65 Note-Rows correspond to stimuli and columns to models. DEM, deterministic exemplar model; DEM(6). DEM with additiona16 parameter. Stimuli Rectangles Data Set 59 One might ask why the DEM performed more poorly than the decision bound model when the optimal decision bound was nonlinear. One way to answer this question is to compare their respective P(A Ix) = .5 contours [i.e., the set of all x for which P(A) = .5], under the assumption that no response bias exists. The points that make up this contour favor the two response alternatives equally, in other words, they are equivocal with respect to category membership. As a result, we call such a contour the equivocality contour (Ashby & Maddox, in press). In the decision bound model, the equivocality contour is the unbiased decision bound. In the exemplar model, the equivocality contour is the set of coordinates for which summed similarity to the two categories is equal (i.e., the set of all x satisfying 1:1/1.4 = 1:1/xB). If the equivocality contours for the decision bound model and the exemplar model agree, then the performance of the models should be similar. As throughout this article, assume that the amount of perceptual noise is constant across all perceptual dimensions and is uncorrelated. Ashby and Maddox (in press; see also Nosofsky, 1990) showed that, under these conditions, there exist parameter values that allow the Euclidean-Gaussian exemplar models to mimic exactly the equivocality contour of the optimal decision bound model (in the perceptual space). Thus, if subjects respond optimally, it should be very difficult to discriminate between decision bound and exemplar models. Although the equivocality contours for the optimal decision bound model and the Euclidean-Gaussian exemplar models are identical in the case of independent perceptual noise, the decision bound and exemplar models treat suboptimality differently. The exemplar models stress the importance of selective attention (Le., the stretching and shrinking of the perceptual dimensions), which indirectly affects the equivocality contour. The decision bound model assumes that the subject operates on the decision bound directly. When the optimal bound is linear, as in Data Sets 1 and 2, manipulating attention is essentially equivalent to changing the decision bound slope and intercept. However, if the optimal bound is nonlinear, the effects of manipulating attention will be limited. In this case, the general quadratic classifier is more powerful than the exemplar models. Of course, if the direct action of the subject is one of selective attention, the extra power of the general quadratic classifier (in the form of extra free parameters) is wasted. The success of the general quadratic classifier in Data Sets 3-5 supports the hypothesis that the decision bound is a fundamental construct of human categorization. A second conclusion to be drawn from Application 1 is that the decision bound model consistently outperformed the GCM and DEM. Of course, these results do not falsify exemplar theory. Although we tested an important class of exemplar models, other versions may have been more successful. We can conclude, however, that the decision bound models provide a viable alternative to the exemplar-similarity-based models of categorization. 60 MADDOX AND ASHBY One weakness of the present analysis is that some rather strong assumptions were needed about the mapping from the stimulus space to the perceptual space. When fitting the GCM, it is cornmon practice to first collect similarity judgments or identification responses on the stimulus ensemble and then to subject the data to some sort of multidimensional scaling (MDS) analysis. The coordinates of the stimuli from the MDS solution are then assumed to estimate the coordinates of the stimuli in the perceptual space of the categorization task (e.g., Nosofsky, 1986).11 This approach is untenable with normally distributed categories, owing to the unlimited number of exemplars. To test the decision bound model more completely, it is of interest to apply the model to data where this sort of MDSbased analysis was performed. Data of this sort are examined in Application 2, to which we tum now. APPLICATION 2 NONNORMALLY DISTRIBUTED CATEGORIES This section compares the performance of the decision bound model with that of the GCM and DEM at predicting categorization data from experiments in which the category exemplars are not normally distributed and contain only a small number of exemplars. Two data sets collected by Nosofsky (1986, 1989) were chosen. In both cases, the complete stimulus ensemble consisted of 16 circles of the type shown in Figure I (constructed by combining factorially 4 levels of each dimension). All categorization conditions involved two categories with four exemplars each. Thus, the training ensemble contained 8 stimuli. Feedback was given on each trial. After categorization accuracy reached a criterion level, the ensemble was enlarged to include a1l16 stimuli (the additional 8 stimuli were termed "transfer" stimuli by Nosofsky, 1986). In the Nosofsky (1986) experiment, 2 highly practiced subjects participated in a large number of identification sessions, followed by several sessions of categorization. Because decision bound theory is a theory of the performance of individual subjects, these data are highly appropriate as a test of the theory. The second data set (Nosofsky, 1989), which consists of data averaged across a large number of inexperienced subjects, is less appropriate for testing decision bound theory, but will serve to test the theory's generalizability. In the Nosofsky experiments (1986, 1989), decision bound theory predicts that the optimal decision bound is neither a linear nor a quadratic function of the dimensional values. Even so, it seems reasonable to assume that subjects might use linear or quadratic bounds in these experiments. This is because it has been hypothesized that the multivariate normal distribution provides a good model of many natural categories (Ashby, 1992a; see also Fried & Holyoak, 1984), and with normally distributed categories, the optimal decision bound is always linear or quadratic. Thus, if humans frequently categorize at nearoptimal levels, they will have much experience with linear and quadratic bounds. If so, it makes sense that they would use these bounds when confronted with the artificial categories constructed by Nosofsky (1986, 1989). We begin by describing the general method used in fitting the various models and then tum to the model comparisons. General Method Application of the exemplar models is straightforward. The MDS solution derived from the identification conditions of Nosofsky (1986,1989) will be used in conjunction with Equations 6, 7, 9, and 12 to generate predicted response probabilities. Following Nosofsky (1986, 1989), the Euclidean distance metric and Gaussian similarity function are assumed. Two augmented versions of the GCM and DEM were applied to the data as well. The first, proposed by Ashby and Maddox (in press), allowed for oblique perceptual dimensions. The second allowed the scaling constant c (from Equation 7) to differ for training and transfer stimuli. One hypothesis is that experience with category exemplars increases their perceptual dissimilarity (Nosofsky, 1986). If so, c should be larger for training than for transfer stimuli. In many cases, the goodness-of-fit values for these two models were worse than those for the standard GCM and DEM. When the goodness of fit was improved, however, inclusion of these models never affected the qualitative results (see Tables 6 and 9), so they will not be discussed further. When applying the decision bound model, the MDS coordinates Nosofsky (1986, 1989) obtained from the identification confusions were used as estimates of the perceptual means (i.e., Xi from Equation I). As in Application I, perceptual variability was assumed to be constant across dimensions, and to be uncorrelated. Thus, only one perceptual variance parameter was estimated. This is the simplest perceptual representation allowed in decision bound theory, and in light of the results of Ashby and Lee (1991; see Figure 6, p. 161), it is surely incorrect. We chose this perceptual representation for two reasons. First, Ashby and Perrin (1988) showed that these distributional assumptions produce a dissimilarity metric that is equivalent to the measure used by the GCM and DEM when equal amounts of attention are allocated to each stimulus dimension (i.e., when w = .5; see Equation 7). Nosofsky (1986, 1989) argued that it is necessary to incorporate selective attention components (at least within the framework of the exemplar-similarity model) in order to predict data from several of the "dimensional" (i.e., size and angle) categorization conditions (Nosofsky, 1986, 1989; see Figures 4-6 in the present paper). This selective attention manifests itself as a stretching of distance relations along the attended dimension, and a shrinking of distance relations along the unattended dimension. Ashby and Lee (1991, 1992) argue that data in these "dimensional" categorization conditions can be accounted for equally well (at least within a framework of the decision bound model) without postulating any stretching or shrinking of distance relations (i.e., selective attention), but rather by acknowledging the different decision bounds required for identification and categorization. The fact that DECISION BOUND MODEL OF CATEGORIZAnON the decision bound model we propose predicts identical similarity relations as the GCM with no selective attention allows a test of this hypothesis. Second, such a simple perceptual representation forces the decision bound to account for most of the variance in the data. Therefore, this is a good method for testing the hypothesis that the decision bound is of fundamental importance in predicting asymptotic categorization performance. Data Set 6: Nosofsky (1986) The stimulus dimensions used by Nosofsky (1986) were the same as those shown in Figure Ib, except that only the upper half of the circle and radial line were presented (see Nosofsky, 1989, Figure 2, for an example). Sixteen stimuli were constructed from the factorial combination of 4 levels of circle size and 4 levels of orientation of the radial line. Two subjects participated in a large number of identification sessions, followed by several sessions of categorization. In the categorization conditions, 4 stimuli were assigned to Category 1 and a different 4 were assigned to Cate- 61 gory 2. During the training phase of the experiment, any of these 8 stimuli could appear on a given trial and corrective feedback was provided following the subject's response. During the transfer phase of the experiment, all 16 stimuli were included, and subjects were given corrective feedback only when a training exemplar was presented. The data of interest are those collected during the transfer phase only. The following four categorization conditions, illustrated in Figure 4 (ignore the line or curve, which will be discussed later), were utilized. (1) Size: Category 1 exemplars were small in size, whereas Category 2 exemplars were large in size. (2) Criss-cross: Category I contained exemplars with large size/small angle or small sizellarge angle, whereas Category 2 contained exemplars with large size/large angle or small size/small angle. (3) lnteriorexterior: Category 1 contained exemplars with intermediate size and angles, whereas Category 2 contained exemplars with extreme values on each dimension. (4) Diagonal: Category 1 contained exemplars that fell below a line with a slope of approximately -1, whereas Cate- NosofskyO 986) Categorization Conditions. Subject 1 Figure 4. MDS coordinates (Euclidean distance) for Nosofsky (1986), Subject 1. Labeled exemplars were presented during training and transfer for each of the four categorization conditions. The line or curve denotes the decision bound predicted by the best-fitting decision bound model. 62 MADDOX AND ASHBY Table 6 Goodness-of-Fit Values (AlC) for Application 2, Data Set 6 (Nosofsky, 1986) Size GCM DEM 70.4 71.9 CrissCross Subject I InteriorExterior 208.3 183.6 247.7 185.7 Diagonal Size Exemplar Models 131.9 102.6 129.4 104.8 CrissCross Subject 2 InteriorExterior 275.8 277.3 231.3 224.2 Diagonal 130.3 118.4 Decision Bound Models GQC 76.7 112.3 135.6 110.5 102.6 119.2 111.6 105.8 GLC 73.4 1,925.0 971.0 146.2 97.6 1,237.0 614.0 103.4 Note-Rows correspond to models and columns to subjects and conditions. GCM, generalized context model; DEM, deterministic exemplar model; GQC, general quadratic classifier; GLC, general linear classifier. gory 2 contained exemplars that fell above the line. In Figure 4, each training exemplar is labeled with a 1 or 2, depending on its category membership. Model Fits and Comparisons Table 6 presents the goodness-of-fit values for the GCM, OEM, general quadratic classifier, and general linear classifier for each subject and categorization condition. The GCM provides the best fit for Subject I in the dimensional condition, but in the seven other applications, a decision bound model provides the best account of the data. In the criss-cross, interior-exterior, and diagonal conditions, the decision bound model performed substantially better than the GCM or OEM. In fact, across both subjects, the average fit value in these three conditions was 115.4,204.2, and 186.4 for the decision bound model, the GCM, and the OEM, respectively.12 Table 7 presents the observed probability of responding "1" for each of the 16 stimuli by subject and condition. In addition, the predicted probability of responding "I" is presented below the observed response probabilities for the best-fitting decision bound model, GCM, and OEM, respectively. Figures 4 and 5 present the MDS coordinates for each stimulus and the decision bound predicted by the best-fitting decision bound model for each subject in each categorization condition. The predictions of the three models are most dissimilar for the criss-cross and interior-exterior conditions, so we will examine these conditions in greater detail. For purposes of elaboration, the stimuli can be numbered from 1 to 16. The numbering scheme for the stimuli is presented in Table 8. First, consider the interior-exterior condition. Stimuli 3, 5, 12, and 14 are all exemplars of Category 2, and they all have approximately the same similarity relations to the exemplars of Category 1. Therefore, exemplar models predict that accuracy should be nearly equal for these stimuli, a prediction that is not supported by the data. The average accuracy for Stimuli 3,5, and 12 was 72%, but for Stimulus 14 it was only 50 %. As a consequence, the exemplar models failed badly for Stimulus 14. The GeM predicted an average accuracy to Stimulus 14 of 72 %, and the OEM predicted 67 %. In contrast, the decision bound model successfully predicted the low accuracy to Stimulus 14. (The decision bound model predicted an average accuracy of 51 %.) It did this by assuming that the decision bound passed close to the mean of the Stimulus 14 perceptual distribution. Next, consider the criss-cross condition. For each of the transfer stimuli, the two nearest training exemplars are always from the same category, and the average observed probability with which the subjects assigned these stimuli to the same category as these nearest neighbors was .755. The exemplar models can account for these high observed probabilities, but if they do, they must predict that accuracy is near chance for training Stimuli 6, 7, 10, and 11. This is because these stimuli have the property that the two nearest training exemplars are from the contrasting category. Exemplar models predict that ifthe response probabilities of transfer stimuli are dominated by the category membership of their nearest neighbors, it must make the same predictions for training stimuli. In fact, average accuracy for responses to Stimuli 6, 7, 10, and II was 67.5 % but the GCM predicted an average accuracy of only 55.5 %. In contrast, the decision bound model predicted an average accuracy for responses to Stimuli 6, 7, 10, and 11 of 65.5%. These results agree with those from Application I. When the best-fitting decision bound was linear (or nearly linear), the goodness-of-fit difference between the exemplar and decision bound models was small. However, when the best-fitting decision bound was highly nonlinear (as in the criss-eross and interior-exterior conditions), the decision bound model fitted substantially better than the exemplar models. There is good evidence that both subjects responded suboptimally in this experiment. 13 Suppose that they used a suboptimal decision bound. The only way the exemplar models can account for this fact is by uniformly expanding or contracting the space (by manipulating the parameter c), by stretching or shrinking one of the perceptual dimensions (by manipulating the attention weight w), or by changing the intercept of the P(A) = .5 contour (by manipulating the response bias (3). If the subject's bound DECISION BOUND MODEL OF CATEGORIZAnON Table 7 Observed and Predicted Probability of Responding "I" by Subject and Categorization Condition for Nosofsky (1986) Subject I Stimulus 2 3 4 5 6 7 8 9 10 II 12 13 14 Subject 2 Size CrissCross InteriorExterior Diagonal .982 .989 .994 .993 .996 .992 .996 .995 .990 .992 .996 .995 .995 .993 .996 .996 .765 .764 .759 .762 .804 .815 .805 .808 .824 .834 .821 .823 .818 .812 .794 .797 .110 .089 .100 .098 .143 .134 .147 .145 .140 .156 .167 .164 .181 .155 .162 .159 .000 .003 .001 .001 .000 .005 .002 .002 .033 .032 .016 .018 .126 .131 .242 .235 .750 .766 .830 .816 .958 .952 .971 .965 .264 .293 .215 .218 .326 .291 .401 .370 .738 .710 .627 .645 .799 .827 .733 .749 .902 .885 .822 .829 .697 .705 .614 .649 .229 .303 .393 .367 .300 .257 .221 .204 .982 .968 .989 .988 .772 .754 .785 .800 .074 .066 .111 .076 .325 .288 .485 .459 .296 .375 .309 .410 .229 .178 .266 .227 .143 .206 .199 .209 .713 .663 .713 .728 .712 .722 .658 .712 .384 .394 .425 .410 .241 .246 .321 .273 .769 .746 .688 .723 .695 .742 .652 .679 .402 .362 .279 .301 .119 .135 .218 .143 .504 .470 .282 .354 1.000 .994 .993 .994 .920 .944 .918 .933 .705 .711 .610 .624 .354 .385 .296 .294 .973 .970 .974 .968 .755 .766 .833 .835 .419 .360 .425 .440 .172 .124 .143 .158 .912 .903 .894 .882 .444 .462 .528 .514 .105 .134 .138 .150 .047 .042 .033 .041 .836 .814 .785 .781 .271 .293 .331 .295 Size CrissCross InteriorExterior Diagonal .966 .982 .985 .986 .979 .969 .970 .971 .960 .972 .972 .973 .961 .964 .961 .962 .852 .831 .824 .824 .811 .797 .783 .783 .734 .786 .765 .766 .764 .710 .686 .688 .330 .349 .392 .386 .426 .367 .397 .394 .297 .334 .354 .353 .254 .248 .262 .262 .045 .048 .047 .045 .060 .056 .053 .051 .136 .105 .082 .081 .371 .399 .359 .358 .863 .798 .689 .686 .932 .906 .853 .856 .421 .416 .325 .324 .407 .435 .504 .502 .689 .680 .624 .621 .732 .745 .640 .636 .775 .797 .719 .717 .526 .508 .601 .599 .290 .332 .482 .477 .338 .329 .330 .324 .896 .899 .932 .935 .547 .535 .669 .669 .096 .112 .150 .120 .372 .288 .401 .390 .318 .282 .294 .331 .169 .189 .338 .344 .264 .303 .210 .207 .704 .737 .703 .709 .618 .661 .702 .719 .293 .310 .506 .509 .424 .400 .330 .305 .891 .887 .704 .715 .731 .663 .635 .653 .252 .261 .309 .316 .274 .261 .227 .198 .497 .514 .277 .304 .986 .994 .992 .993 .930 .934 .936 .925 .706 .644 .649 .613 .174 .203 .220 .213 .976 .984 .978 .984 .843 .820 .814 .816 .229 .304 .365 .322 .061 .052 .073 .064 .948 .964 .959 .966 .693 .649 .577 .622 .157 .126 .151 .127 .025 .020 .027 .022 .887 .884 .910 .917 .344 .358 .280 .353 63 64 MADDOX AND ASHBY Table 7 (continued) Subject 2 Subject 1 Stimulus Size CrissCross 15 .000 .007 .003 .004 .000 .007 .004 .004 .141 .139 .191 .197 .057 .035 .021 .025 16 InteriorExterior Diagonal .425 .449 .421 .379 .149 .144 .158 .113 .058 .067 .054 .045 .012 .017 .007 .007 Size CrissCross InteriorExterior Diagonal .035 .054 .047 .046 .020 .024 .015 .014 .100 .112 .265 .258 .076 .046 .049 .044 .252 .314 .359 .334 .108 .093 .157 .124 .025 .045 .045 .039 .012 .005 .005 .004 Note-Rows correspond to stimuli and columns to categorization conditions. Top row: observed probability of responding" I." Second row: predicted probability of responding" I" for the bestfitting decision bound model. Third row: predicted probability of responding" I" for the GCM. Bottom row: predicted probability of responding" I" for the OEM. NosofskyO 986) Categorization Conditions. Subject 2 Size Interior-Exterior .1 1. 1. I. ., ., ,I .' I. P.l N l IJ Criss-Cross Diagonal ., I. 1. ., ., 1. ANGLE Figure S. MDS coordinates (Euclidean distance) for Nosofsky (l9ll6), Subject 2. Labeled exemplars were presented during training and transfer for eacb of the four categorization conditions. The line or curve denotes the decision bound predieted by tbe best-fitting decision bound model. DECISION BOUND MODEL OF CATEGORIZAnON Table 8 Numbering Scheme for Stimuli From Application 2, Data Sets 6 and 7 (Nosofsky, 1986, 1989) t - - -__ . ~ ~ A _ n Size 13 9 5 I __ 1_ ~ ~ el_'g 14 10 6 15 II 7 16 12 8 2 3 4 Note-Nosofsky (1986, 1989) for the specific size and angle values. is linear, these transformations will be effective, but if the bound is highly nonlinear, these transformations will often be too crude. Data Set 7: Nosofsky (1989) Our final empirical application is to the categorization data collected by Nosofsky (1989). The stimulus dimensions used by Nosofsky (1989) were identical to those from the 1986 study, although the actual stimuli were somewhat more discriminable. The experimental procedure was the same, but with three important exceptions. First, a large number of subjects were run, and different subjects participated in the identification and categorization conditions. Second, each subject received little training (only one experimental session). Finally, the data were collapsed across subjects. In addition, the interior-exterior condition was replaced with another dimensional condition in which angle was relevant (see Figure 6). The fact that the categorization data were averaged across subjects causes problems for the decision bound models, because decision bound theory is a theory of individual categorization performance. The theory assumes that the experienced subject utilizes a fixed decision bound and that trial-by-trial fluctuations in performance are due to the effects of perceptual and criterial noise. If 2 subjects each use a linear bound with a different slope, the averaged data will be inconsistent with any linear (or quadratic) bound. In fact, the averaged data will be consistent with decision bound theory only in the special case in which each subject uses a bound that is a simple intercept shift of the others. In this case, the intercept shifts will be absorbed into the criterial and perceptual noise parameters. In the simpler categorization conditions, such as the dimensional (size and angle) conditions, it seems plausible that subjects will use bounds of the same shape Nosofsky(989) Categorization Conditions Angle .2 I. .2 I. I. .2 .2 Diagonal .1 2. 65 .1 ANGLE Figure 6. MDS coordinates (Euclidean distance) for Nosofsky (1989). Labeled exemplars were presented during training and transfer for each of the four categorization conditions. The line or curve denotes the decision bound predicted by the best-fitting decision bound model. 66 MADDOX AND ASHBY but with simple intercept shifts, and so the decision bound models should do better in these conditions than in the criss-eross and diagonal conditions. Certainly, though, we expect performance of the decision bound models to be poorer when the data are averaged across subjects. Table 10 Observed and Predicted Probability of Responding "." by Subject and Categorization Condition for Nosofsky (1989) Stimulus Model Fits and Comparisons Table 9 presents the goodness-of-fit values for the GCM, DEM, general quadratic classifier, and general linear classifier for each categorization condition. The GeM provides the best fit for the criss-eross and diagonal conditions, and the general linear classifier provides the best fit for the two dimensional (size and angle) conditions. Table 10 presents the observed probability of responding "1" for each of the 16 stimuli by subject and condition. In addition, the predicted probability of responding "1" is presented (below the observed probabilities) for the best-fitting decision bound model (general quadratic classifier or general linear classifier), GCM, and DEM, respectively. Figure 6 presents the MDS coordinates for each stimulus and the best-fitting decision bound for each categorization condition. The modeling results for Nosofsky (1989) differ in several important ways from the results for Nosofsky (1986). First, the DEM never fit the Nosofsky (1989) data better than the GCM. This result suggests that response selection was neither more nor less variable than predicted by the GCM. The 'Y values, which ranged from .98 for the angle condition to 1.28 for the criss-cross condition, support this hypothesis. In contrast, the DEM provided a better fit than did the GeM in several of the Nosofsky (1986) conditions (see Table 6). In each of these cases, the 'Y values differed considerably from 'Y = 1 (Subject 1, crisscross, 'Y = .21, interior-exterior, 'Y = 3.07; Subject 2, interior-exterior, 'Y = 1.57, diagonal, 'Y = 1.51). In light ofthese results, it is likely that the Nosofsky (1989) results are due to the small amount of training given each subject, to the fact that the data were averaged across subjects, or to some combination of both. Second, in each condition, the goodness-of-fit values for the best-fitting exemplar and decision bound models GeM OEM GQC OLC Angle CrissCross 3 4 5 6 7 8 9 10 11 12 Table 9 Goodness-of-Fit Values (AlC) for Application 2, Data Set 7 (Nosofsky, 1989) Size 2 Diagonal 13 81.5 87.7 Exemplar Models 95.0 101.4 %.9 102.2 102.4 104.6 14 80.8 77.4 Decision Bound Models 86.5 101.9 483.5 84.3 105.3 104.0 15 Note-Rows correspond to mJdels and columns to conditions. GeM, generalized context model; OEM, deterministic exemplar model; GQC, general quadratic classifier; OLC, general linear classifier. Size Angle .973 .989 .992 .991 .985 .989 .991 .990 .973 .985 .986 .986 .987 .985 .986 .985 .870 .868 .865 .873 .892 .848 .839 .848 .842 .828 .814 .823 .853 .842 .825 .831 .311 .291 .313 .325 .243 .307 .327 .339 .244 .263 .282 .292 .245 .257 .277 .285 .027 .037 .026 .032 .034 .028 .020 .024 .041 .020 .014 .017 .963 .943 .933 .937 .572 .596 .554 .561 .157 .174 .191 .193 .024 .015 .014 .014 .988 .955 .956 .959 .662 .644 .608 .617 .229 .210 .224 .227 .024 .028 .031 .031 .913 .915 .916 .921 .573 .556 .532 .541 .134 .119 .134 .138 .015 .010 .007 .007 .927 .963 .974 .976 .573 .560 .542 .553 .096 .118 .130 .135 CrissCross Diagonal .222 .240 .219 .220 .405 .397 .367 .371 .608 .584 .551 .552 .768 .783 .762 .755 .460 .443 .400 .419 .460 .502 .489 .496 .575 .556 .555 .554 .635 .626 .601 .606 .716 .670 .632 .648 .538 .555 .572 .572 .446 .431 .474 .476 .324 .344 .358 .367 .776 .794 .808 .811 .595 .524 .564 .565 .283 .295 .329 .326 .465 .497 .435 .434 .775 .767 .782 .782 .906 .884 .892 .892 .%5 .966 .961 .961 .222 .219 .219 .219 .570 .474 .514 .514 .700 .688 .689 .689 .814 .876 .852 .852 .093 .107 .131 .131 .213 .267 .286 .286 .488 .503 .465 .466 .748 .788 .777 .777 .035 .072 .035 .035 .100 .100 .098 .098 .253 .230 .205 .206 DECISION BOUND MODEL OF CATEGORIZATION Table 10 (Continued) Stimulus Size Angle CrissCross 16 .041 .025 .022 .026 .024 .012 .009 .009 .174 .172 .186 .181 Diagonal .640 .527 .559 .560 Note-Rows correspond to stimuli and columns to categorization conditions. Top row: observed probability of responding " .. " Second row: predicted probability of responding "I" for the best-fitting decision bound model. Third row: predicted probability of responding "I" for the GCM. Bottom row: predicted probability of responding "I" for the DEM. are very similar. The biggest difference occurs in the angle condition, where the goodness-of-fit value of the general linear classifier is 10.1 AIC points better than the goodness-of-fit value of the GCM. This difference is substantially smaller than the differences observed in the Nosofsky (1986) experiment, where the best and worst AlC values differed by as much as 100 or more points. Third, the exemplar models fit the averaged data from the criss-eross and diagonal conditions (i.e., the Nosofsky, 1989 data) better than the single subject data from these same two conditions (i.e., the Nosofsky, 1986 data). For the decision bound models, however, the fits were similar for the averaged and single subject data sets. For the dimensional (size and angle) conditions, each of the models fit the averaged data about as well as the single subject data. This suggests that in complicated categorization tasks (such as the criss-cross and diagonal conditions), exemplar models may fit data averaged across subjects better than the data of any single subject. Finally, the fact that the decision bound model accounted for the two dimensional (size and angle) categorization conditions better than the GCM or DEM supports the claim that, at least within the framework of a decision bound approach to categorization, shifts in the decision bound are more important than shifts in selective attention. However, it is possible, as suggested by Nosofsky (1989; see also Ashby & Lee, 1992; Nosofsky & Smith, 1992), that other decision bound models, which assume selective attention shifts (see Nosofsky, 1989, Figure 7, p. 288) might also provide good fits to these data. SUMMARY AND CONCLUSIONS The goal of this article was to develop and test a decision bound theory of categorization by: (1) applying the model to data from normal, as well as nonnormally distributed categories, and (2) comparing the performance of the model with a currently popular exemplar model of categorization. In Application 1, the models were fit to data from five categorization experiments (36 subjects total). In every experiment, the category exemplars were normally distributed and each subject was given extensive training. The performance of the decision bound 67 model was compared with the currently popular and widely tested GCM (see Nosofsky, 1992, for an extensive review) and with a deterministic exemplar model (DEM; Ashby & Maddox, in press) that contains the GCM as a special case, but includes a parameter that allows it to predict data in which response selection is either more or less variable than what is predicted by the GCM. When the optimal decision bound was linear, the DEM and a decision bound model (the general linear classifier) provided nearly equivalent accounts of the data, and they both significantly outperformed the GCM. When the optimal decision bound was highly nonlinear, both the GCM and the DEM were outperformed by a decision bound model that postulated quadratic decision bounds. Taken together, these results suggest (1) that the poor performance of the GCM is due partly to the fact that it postulates a probabilistic rather than a deterministic decision rule, and (2) that the exemplar models have only a limited ability to account for suboptimal performance. In particular, they are inferior to the decision bound model at accounting for complex suboptimalities. In Application 2, we began a preliminary investigation of the ability of the decision bound models to account for categorization data in which the exemplars from each category were not normally distributed. When applied to data in which single subjects were given extensive training (Nosofsky, 1986), the decision bound model provided excellent accounts of the data, and in many cases it significantly outperformed the GCM and DEM. When applied to data that were averaged across subjects, each of whom received little training, the performance of the decision bound model was still quite impressive, especially when applied to data from conditions in which only one stimulus dimension was relevant. Future research could extend the present work in a number of directions. First, it would be interesting to expand the type of analysis presented in Application 2 to other categorization conditions and stimulus dimensions. Second, it would be of interest to examine the tradeoff between parameters associated with perceptual and decisional processes. Throughout this article, we assumed a very restricted perceptual representation. Although this was appropriate for the goals of the article and yielded good accounts of the data, it is possible that models with more general perceptual representations might provide better fits to the present data sets and might be necessary to predict data from other experiments, especially those involving highly confusable stimuli. REFERENCES AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723. ASHBY, F. G. (1992a). Multidimensional models of categorization. In F. G. Ashby (Ed.), Multidimensional models o/perception and cognition (pp. 449-483). Hillsdale, NJ: Erlbaum. ASHBY, F. G. (I 992b). Multivariate probability distributions. In F. G. Ashby (Ed.), Multidimensional models 0/ perception and cognition (pp. 1-34). Hillsdale, NJ: Erlbaum. 68 MADDOX AND ASHBY ASHBY, F. G., '" GOTT, R. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory. & Cognition, 14, 33-53. ASHBY, F. G., '" LEE, W. W. (1991). Predicting similarity and categorization from identification. Journal of Experimental Psychology: General, 120, 150-172. ASHBY, F. G., '" LEE, W. W. (1992). On the relationship between identification, similarity, and categorization: Reply to Nosofsky and Smith (1992). Journal ofExperimental Psychology: General, 121, 385-393. ASHBY, F. G., '" MADDOX, W. T. (1990). Integrating information from separable psychological dimensions. Journal ofExperimental Psychology: Human Perception & Performance, 16, 598-612. ASHBY, F. G., '" MADDOX, W. T. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Jour- nal ofExperimental Psychology: Human Perception & Performance, 18, 50-71. ASHBY, F. G., '" MADDOX, W. T. (in press). Relations between prototype, exemplar, and decision bound models of categorization. Jour- nal of Mathematical Psychology. ASHBY, F. G., '" PERRIN, N. A. (1988). Toward a unified theory of similarity and recognition. Psychological Review, 95, 124-15. BARLOW, H. 8., '" MOLLON, J. D. (1982). Psychophysical measurements of visual perfonnance. In H. B. Barlow & J. D. Mollon (Eds.), The senses (pp. 114-132). Cambridge: Cambridge University Press. BUSEMEYER, J. R., '" MYUNG, I. J. (1992). An adaptive approach to human decision making: Learning theory, decision theory, and human performance. Journal ofExperimental Psychology: General, 121, 177-194. ENNIS, D. M. (1988). Confusable and discriminable stimuli: Comment on Nosofsky (1986) and Shepard (1986). Journal ofExperimental Psychology: General, 117, 408-411. ENNIS, D. M., '" ASHBY, F. G. (in press). The relative sensitivities of same-different and identification judgment models to perceptual dependence. Psychometrilw. ENNIS, D. M., MULLEN, K., '" FRlJTERS, J. E. R. (1988). Variants of the method of triads: Unidimensional Thurstonian models. British Journal of Mathematical & Statistical Psychology, 41, 25-36. ESTES, W. K. (l986a). Array models for category learning. Cognitive Psychology, 18, 500-549. ESTES, W. K. (I 986b). Memory storage and retrieval processes in category learning. Journal of Experimental Psychology: General, liS, 155-174. FECHNER, G. T. (1866). Elements ofpsychophysics. New York: Holt, Rinehart & Winston. FISHER, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7, 179-188. FRIED, L. S., '" HOLYOAK, F. J. (1984). Induction of category distributions: A framework for classification learning. Journal ofExperimental Psychology: Learning, Memory. & Cognition, 10,234-257. FUKUNAGA, K. (1990). Statistical pattern recognition (2nd ed.). San Diego, CA: Academic Press. GARNER, W. R. (1974). The processing of information and structure. New York: Wiley. GARNER, W. R., '" FELFOLDY, G. L. (1970). Integrality of stimulus dimensions in various types of information processing. Cognitive Psychology, I, 225-241. GREEN, D. M., '" SWETS, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. HINTZMAN, D. L. (1986). "Schema abstraction" in a multiple-trace memory model. Psychological Review, 93, 411-428. KRANTZ, D. H., '" TVERSKY, A. (1975). Similarity of rectangles: An analysis of subjective similarity. Journal of Mathematical Psychology, 12, 4-34. LUCE, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (pp. 103-189). New York: Wiley. MASSARO, D. W., '" FRIEDMAN, D. (1990). Models of integration given multiple sources of information. Psychological Review, 97, 225-252. MEDIN, D. L., '" SCHAFFER, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238. MORRISON, D. F. (1990). Multivariate statistical methods (3rd cd.). New York: McGraw-Hill. NOSOFSKY, R. M. (1983). Information integration and the identification of stimulus noise and criterial noise in absolute judgement. Journal of Experimental Psychology: Human Perception & Performance, 9, 299-309. NOSOFSKY, R. M. (1984). Choice, similarity, and the context theory of classification. Journal ofExperimental Psychology: Leaming. Memory, & Cognition, 10, 299-309. NOSOFSKY, R. [M). (1985). Overall similarity and the identification of separable-dimension stimuli: A choice model analysis. Perception & Psychophysics, 38, 415-432. NOSOFSKY, R. M. (1986). Attention, similarity, and the identificationcategorization relationship. Journal ofExperimental Psychology: General, 115, 39-57. NOSOFSKY, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal ofExperimental Psychology: Learning, Memory, & Cognition, 13, 87-108. NOSOFSKY, R. M. (l988a). On exemplar-based representations: Reply to Ennis (1988). Journal ofExperimental Psychology: General, 117, 412-414. NOSOFSKY, R. M. (l988b). Similarity, frequency, and the category representation. Journal ofExperimental Psychology: Learning, Memory. & Cognition, 14, 54-65. NOSOFSKY, R. M. (1989). Further tests of an exemplar-similarity approach to relating identification and categorization. Perception & Psychophysics, 45, 279-290. NOSOFSKY, R. M. (1990). Relations between exemplar-similarity and likelihood models of classification. Journal ofMathematical Psychology, 34, 393-418. NOSOFSKY, R. M. (1991). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal ofExperimental Psychology: Human Perception & Performance, 17, 3-27. NOSOFSKY, R. M. (1992). Exemplar-based approach to relating categorization, identification and recognition. In F. G. Ashby (Ed.), Multidimensional models ofperception and cognition (pp. 363-393). Hillsdale, NJ: Erlbaum. NOSOFSKY, R. M., CLARK, S. E., '" SHIN, H. J. (1989). Rules and exemplars in categorization, identification, and recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition, IS, 282-304. NOSOFSKY, R. M., '" SMITH, J. E. K. (1992). Similarity, identification, and categorization: Comment on Ashby and Lee (1991). Journal of Experimental Psychology: General, 121, 237-245. SAKAMOTO, Y., ISHIGURO, M., '" KITAGAWA, G. (1986). A1wike information criterion statistics. Dordrecht: Reidel. SHEPARD, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrilw, 22, 325-345. SHEPARD, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87. SHEPARD, R. N. (1986). Discrimination and generalization in identification and classification: Comment on Nosofsky. Journal of Experimental Psychology: General, 115, 58-61. SHEPARD, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317-1323. SHEPARD, R. N. (1988). Time and distance in generalization and discrimination: Reply to Ennis (1988). Journal ofExperimental Psychology: General, 117, 415-416. SHEPARD, R. N. (in press). Integrality versus separability of stimulus dimensions: From an early convergence of evidence to a proposed theoretical basis. In G. R. Lockhead & J. R. Pomerantz (Eds.), The perception of structure. SHIN, H. J., '" NOSOFSKY, R. M. (1992). Similarity-scaling studies of "dot-pattern" classification and recognition. Journal ofExperimental Psychology: General, 121, 278-304. SMITH, E. E., '" MEDIN, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press. STEVENS, S. S. (1975). Psychophysics: Introduction to its perceptual. neural, and social prospects. New York: Wiley. DECISION BOUND MODEL OF CATEGORIZAnON TAKANE, Y., & SHIBAYAMA, T. (1992). Structure in stimulus identification data. In F. G. Ashby (Ed.), Multidimensional models ofperception and cognition (pp. 335-362). Hillsdale, NJ: Erlbaum. TVERSKY, A. (1977). Features of similarity. Psychological Review, 84, 327-352. WICKENS, T. D. (1982). Models for behavior: Stochastic processes in psychology. San Francisco: W. H. Freeman. WIENER-ERLICH, W. K. (1978). Dimensional and metric structures in multidimensional stimuli. Perception & Psychophysics, 24, 399-414. NOTES I. It may be inaccurate to define a nonlinear mapping from the physical space to the perceptual space as a misperception of the true dimensional values. Although it is true that a psychophysical mapping of this sort may yield a dimensional representation that differs from the physical dimensions of the stimulus, it is generally assumed (e.g., Stevens, 1975) that this mapping is one to one. In other words, each set of physical coordinates corresponds to a unique set of perceptual coordinates. Perceptual noise, on the other hand, is assumed to have a one-to-many mapping (i.e., each presentation of the same stimulus yields a unique perceptual effect). 2. Throughout this article we assume that perceptual noise, represented by the random variable ep, is multivariate normally distributed with mean 0 and variance (}~. 3. With normally distributed categories, the optimal decision bound is always quadratic. In 2 dimensions, the general quadratic equation is written ho(wi) = a,wf; + a,wl; + a'WIiW'; + b,w'i + b,w,; + co' The equation for an ellipse, a circle, and a line are a special case of the general quadratic equation. For example, when the coefficients u" a" and u, are zero, ho(wi) is linear. 4. In the Ashby and Gott (1988) and Ashby and Maddox (1990) experiments' the optimal bound was linear. In the Ashby and Maddox (1992) experiments, the optimal bound was a quadratic curve. 5. An example is Busemeyer and Myung's (1992) rule competition model, which is composed of two parts. The first is an adaptive network model that describes how individuals choose different types of decision bounds (e.g., linear or quadratic). The second part uses a "hill climbing" algorithm to predict how subjects learn to fine tune their decision rule by adjusting the decision bound parameters. 6. Maximum likelihood estimators have many desirable properties. First, they are always consistent. Second, if an efficient unbiased estimator exists, maximum likelihood estimation will generally find it. 7. When calculating the optimal bound for the circular stimuli, we arbitrarily assumed that one quarter of a semicircle (i.e., 1r/4 radian) is psychologically equal to one quarter of the screen width (i.e., 250 pixels). If the subject assumes some other relation between the two dimensions, the optimal bound in the perceptual space will have some nonunit slope (see Ashby & Maddox, 1992, for a fuller discussion of this point). 8. In the general linear classifier, the separate effects of perceptual and criterial noise cannot be estimated due to the nonidentifiability of these two parameters. 9. This transformation was accomplished by premultiplying the dimensional values by the matrix W = [COSO -SinO], cos 0 sin 0 which rotates the perceptual dimensions by 0°. 10. As in Data Sets I and 2, the general quadratic classifier was fit with the response bias parameter () = 0, whereas the fits of the exemplar models include a bias parameter. It is possible that the fits of the exemplar models could be reduced by up to 2 AlC points if one assumes no response bias (i.e., (3 = .5). There were two cases in which the fit of the DEM(O) was within 2 AIC points of that for the general quadratic classifier. Refitting the data for these 2 subjects, under the assumption that (3 = .5, did not improve the fit. 69 II. Tversky (1977) and many others have argued that this method of data analysis also involves strong (and, in general, untenable) assumptions about the mapping from the stimulus space to the perceptual space. 12. Because category base rates are equal in these data sets, it is possible, as in Application I, that assuming no response bias could improve the fit of the GCM and DEM. However, even if the bias and attention weight parameters were set equal to .5, the maximum improvement in fit would be 4 Ale points, which would not change any of the qualitative results in Table 6. 13. Ashby and Lee (1991) fitted the optimal decision bound model to these data. Although it performed somewhat better than the GCM, the optimal model performed more poorly than the suboptimal decision bound models of Table 6. APPENDIX Fitting the Exemplar Models This appendix describes the specific techniques used in fitting the GCM, GCM(O), DEM, and DEM(O) to Application I: Data Sets 1-5. Two considerations are important. First, one must determine whether repeated presentations of a given exemplar should lead to independent memory traces (e.g., Hintzman, 1986) or to a single memory trace. Exemplar models that assume independent memory traces are called token exemplar models, whereas those that assume a single memory trace are called type exemplar models. Because token models are frequency sensitive (Nosofsky, 1988b) and are thus more powerful, token exemplar models were fitted to the data. Second, one must decide which exemplars to include in the summed similarity computations. The answer to this question differed for the data from the first and last experimental sessions, so each is dealt with in turn. First Experimental Session Each experimental session consisted of 5 presentations of the stimulus corresponding to the Category A and B means (10 presentations total), 100 practice trials, and 300 experimental trials. Although the probability of responding Category A for stimulus i, PeA Ii) was estimated for the 300 experimental trials only, it was assumed that a unique memory trace was formed for the 110 exemplars presented during the preexperimental learning sessions, and thus, these were included in the summed similarity computations. When estimating PeA Ii) on the ith experimental trial, all exemplars presented on Trials I through (i - 1), as well as those presented during the preexperimentallearning sessions, were included in the Equation 6 and 12 sums. Final Experimental Session Ideally, when fitting the exemplar models to the data from the last experimental session, one might like to include the memory trace of all exemplars encountered by the subject up to that point in the experiment. By the last trial of a fifth experimental session, however, a subject will have seen 2,049 exemplars. It seems plausible that at some sample size, predictions of the exemplar models will asymptote. The most efficient fitting algorithm uses the smallest sample size after asymptote occurs. To determine the point at which additional exemplar information provided no improvement in fit, the data from 8 subjects (those with the highest and lowest accuracies from each experiment and each condition of Data Sets 3 and 4) were fit using a procedure where the similarity of stimulus i was taken with respect to all 410 exemplars from the last session. The AlC 70 MADDOX AND ASHBY value for this model was compared with the AIC value for a model where the similarity was taken with respect to the 820 exemplars from the final and penultimate sessions. This procedure was repeated for the three versions of the GCM model (cityblock-exponential, Euclidean-Gaussian, and Euclideanexponential). The difference between the AIC for both methods was computed for each of the 24 cases (8 subjects x 3 versions). The mean AIC difference was - .044, the standard deviation was 3.013, and the range was -4.4 to 6. These results suggest exemplar information asymptotes at or before 410 samples, and so the similarity computations were taken with respect to the 410 exemplars during each subject's last experimental session. The Local Minima Problem To reduce the possibility of local minima in the exemplar model fits, a grid search of the parameter space was conducted. For the GCM, 10 values of each parameter were chosen. The fit of the model was then recorded at each of these 1,000 (10 x 10 x 10) locations, and the parameter values corresponding to the minimum of these 1,000 values were then used as starting values for the minimization routine. Because the GCM is a special case of the GCM(O) and DEM, the best-fitting values from the GCM were used as starting values for the GCM(O) and DEM in a two-step procedure. First, the best-fitting parameters (c, W, and (3) were held constant, and the 'Y or 0 parameter (depending on the model) was left free to vary. Once the best fitting 'Y or 0 parameter was obtained, all four parameters were left free to vary. A similar procedure was used for the DEM(O), but in this case, the best-fitting parameters from the DEM were used as starting values for the DEM(O) in a two-step procedure. (Manuscript received January 24, 1992; revision accepted for publication June 12, 1992.)