Random variation and systematic biases in probability estimation

Rita Howe

Random variation and systematic biases in probability estimation

Rita Howe

2020, Cognitive Psychology

visibility

…

description

39 pages

link

1 file

A number of recent theories have suggested that the various systematic biases and fallacies seen in people's probabilistic reasoning may arise purely as a consequence of random variation in the reasoning process. The underlying argument, in these theories, is that random variation has systematic regressive effects, so producing the observed patterns of bias. These theories typically take this random variation as a given, and assume that the degree of random variation in probabilistic reasoning is sufficiently large to account for observed patterns of fallacy and bias; there has been very little research directly examining the character of random variation in people's probabilistic judgement. We describe 4 experiments investigating the degree, level, and characteristic properties of random variation in people's probability judgement. We show that the degree of variance is easily large enough to account for the occurrence of two central fallacies in probabilistic reasoning (the conjunction fallacy and the disjunction fallacy), and that level of variance is a reliable predictor of the occurrence of these fallacies. We also show that random variance in people's probabilistic judgement follows a particular mathematical model from frequentist probability theory: the binomial proportion distribution. This result supports a model in which people reason about probabilities in a way that follows frequentist probability theory but is subject to random variation or noise.

Cognitive Psychology 123 (2020) 101306 Contents lists available at ScienceDirect Cognitive Psychology journal homepage: www.elsevier.com/locate/cogpsych Random variation and systematic biases in probability estimation Rita Howe , Fintan Costello ⁎ T School of Computer Science, University College Dublin, Ireland A R TICL E INFO A BSTR A CT Keywords: Variance Noise Probability estimation Conjunction fallacy Disjunction fallacy A number of recent theories have suggested that the various systematic biases and fallacies seen in people’s probabilistic reasoning may arise purely as a consequence of random variation in the reasoning process. The underlying argument, in these theories, is that random variation has systematic regressive effects, so producing the observed patterns of bias. These theories typically take this random variation as a given, and assume that the degree of random variation in probabilistic reasoning is sufficiently large to account for observed patterns of fallacy and bias; there has been very little research directly examining the character of random variation in people’s probabilistic judgement. We describe 4 experiments investigating the degree, level, and characteristic properties of random variation in people’s probability judgement. We show that the degree of variance is easily large enough to account for the occurrence of two central fallacies in probabilistic reasoning (the conjunction fallacy and the disjunction fallacy), and that level of variance is a reliable predictor of the occurrence of these fallacies. We also show that random variance in people’s probabilistic judgement follows a particular mathematical model from frequentist probability theory: the binomial proportion distribution. This result supports a model in which people reason about probabilities in a way that follows frequentist probability theory but is subject to random variation or noise. 1. Introduction Researchers over the last 50 years have identified a large number of systematic biases in people’s judgments of probability. These biases are typically taken as evidence that people do not follow the normative rules of probability theory when estimating probabilities, but instead use a series of heuristics (mental shortcuts or ‘rules of thumb’) that sometimes yield reasonable judgments but sometimes lead to severe and systematic errors, causing the observed biases (Kahneman & Tversky, 1982). This ‘heuristics and biases’ view has had a major impact in psychology (Kahneman & Tversky, 1982; Gigerenzer & Gaissmaier, 2011), economics (Camerer, Loewenstein, & Rabin, 2003, 2003), law (Korobkin & Ulen, 2000; Sunstein, 2000), medicine (Dawson & Arkes, 1987; Eva & Norman, 2005) and other fields, and has influenced government policy in a number of countries (Oliver, 2013; Vallgårda, 2012). Evidence for these systematic biases in people’s probabilistic reasoning is very strong. The conclusion that these biases necessarily demonstrate heuristic reasoning processes is, however, less sure. Various researchers have shown that these biases may simply be a consequence of random variation or ‘noise’ in otherwise rational and normatively correct processes: random variation that produces systematic, directional effects (see e.g. Hilbert, 2012; Johnson, Blumstein, Fowler, & Haselton, 2013; Costello & Watts, 2014; Marchiori, Di Guida, & Erev, 2015; Costello & Watts, 2016). Support for this view comes from results showing that when people’s individual, systematically biased, probabilistic judgements are combined in ways which statistically cancel out noise, those judgements tend to agree closely with the requirements of normative probability theory with no remaining systematic deviation (Costello, ⁎ Corresponding author at: School of Computer Science, UCD, Belfield, Dublin 4, Ireland. E-mail addresses: [email protected] (R. Howe), [email protected] (F. Costello). https://doi.org/10.1016/j.cogpsych.2020.101306 Received 28 June 2019; Received in revised form 10 March 2020; Accepted 15 April 2020 0010-0285/ © 2020 Elsevier Inc. All rights reserved. Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Watts, & Fisher, 2018; Costello & Watts, 2018; Fisher & Wolfe, 2014). While both the heuristic and the random variation approaches can explain observed patterns of bias in probabilistic reasoning, these accounts differ in their predictions about the consistency of such bias. The random variation approach necessarily predicts a large degree of inconsistency in responses such that if a person is biased on one presentation of a given item, they may not be biased on another. The heuristic or ‘rule of thumb’ account typically does not consider internal variation in responses or make provision for changes in response to the same stimuli. Representativeness accounts of heuristics, for instance, can account for ‘external’ variance that is, fallacy responses will vary between different problems as representativeness covaries with frequency (Kahneman & Tversky, 1982). However, it makes no such argument for responses to the same problem. Early in heuristics research, Kahneman and Tversky (1982) rejected the notion of an approach that included responses perturbed by error. Indeed, the evidence does not seem to support a “truth plus error” model, which assumes a coherent system of beliefs that is perturbed by various sources of distortion and error. Hence we do not share Dennis Lindley’s optimistic opinion that “inside every incoherent person there is a coherent one trying to get out,” and we suspect that incoherence is more than skin deep (Kahneman & Tversky, 1982, p. 313). More recent approaches to heuristics argue that a “toolbox” of strategies may be used to solve problems under uncertainty (e.g. Rieskamp & Otto, 2006; Scheibehenne, Rieskamp, & Wagenmakers, 2013). This approach can produce variable responding but there is no consensus about how strategies are selected and evidence suggests that single-process models may be preferred over multiple-strategy models (Söllner, Bröder, Glöckner, & Betsch, 2014). To date there has been little research on the degree of variability in people’s probabilistic judgement: ‘noisy rational’ models of probabilistic reasoning simply assume random variation in people’s probability judgement, without investigating its extent or character. In this paper we aim to fill this gap in two ways. First, we give a mathematical model of the form and structure of variance in people’s probabilistic judgement; second we describe four experiments investigating the existence, characteristics, and properties of random variation in people’s probabilistic judgement, and on the relationship between this variance and systematic judgement bias. These experiments all focus on the occurrence of two particular systematic biases – the conjunction and disjunction fallacy – in simple tasks where people are asked to estimate the probability of constituent, conjunctive and disjunctive events in a presented set of events. These studies examine the degree of random variation in people’s estimates for these probabilities, the extent to which this random variation predicts conjunction and disjunction fallacy occurrence, and the degree to which fallacy responses are themselves randomly variable. These studies also examine specific theoretical predictions about the form which random variation will take in these tasks. 1.1. Biases in reasoning: the conjunction and disjunction fallacies Perhaps the best-known and most studied bias in probabilistic reasoning is the conjunction fallacy, exemplified by the “Linda problem” of Tversky and Kahneman (1983). In this problem participants read the following statement about Linda: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. and then answer the following question: Which is more probable?. A. Linda is a bank teller. A B . Linda is a bank teller and is active in the feminist movement. Tversky and Kahneman (1983) found that over 80% of their participants judged A B as more likely than A in this and many similar problems. This response violates probability theory, which requires that P (A B ) P (A) and P (A B ) P (B ) must always hold, simply because A B cannot occur without A or B themselves occurring. The conjunction, A B , under the probabilistic laws, cannot be more likely than the single constituent A, thus when a participant chooses the conjunction A B as more probable, they are committing a fundamental violation of rational probabilistic reasoning referred to as the ‘conjunction fallacy’. A similarly reliable disjunction fallacy occurs when participants judge the constituents A,B as more likely than the disjunction, A B (Carlson & Yates, 1989; Bar-Hillel & Neter, 1993). These widely replicated fallacy results were taken as an indication that humans do not reason in a normative fashion; that is, they don’t apply probabilistic rules to real-life contexts. Instead, it was suggested that people employ heuristics or mental short cuts to solve these problems. The conjunction fallacy, for instance, was suggested to occur because people employed a “representativeness heuristic” when reasoning about conjunctive problems (Tversky & Kahneman, 1983). Under this theory, the fallacy occurs as the person described in the conjunction, A B , is more representative of the information presented in the character sketch than the person described by the constituent, A . However, a number of studies has called the validity of the heuristics account into question (Bonini, Tentori, & Osherson, 2004; Sides, Osherson, Bonini, & Viale, 2002). Experiments that manipulated class inclusion, for instance, demonstrated that the fallacy occurs regardless of whether the conjunction is representative or not (Gavanski & Roskos-Ewoldsen, 1991). Other studies have varied response mode - force choice vs estimation - or conceptual focus frequencies vs probabilities - and found that it can greatly effect the fallacy rates observed (Wedell & Moro, 2008; Tversky & Kahneman, 1983; Hertwig & Gigerenzer, 1999; Fiedler, 1988; Reeves & Lockhart, 1993). More importantly, by manipulating probability values, fallacy rates of 10% to 85% can be found. Fisk and Pidgeon (1996) demonstrated that very high fallacy rates occur where P(A) was high and P(B) was low and very low fallacy rates will occur where both P(A) and P(B) were low. While fallacy rates are generally quite high, a frequent observation among this research is that a small number of participants do not seem overly susceptible to the fallacy. Over a number of conjunction problems, participants rarely have 100% error rates (Stolarz-Fantino, Fantino, Zizzo, & Wen, 2003). 2 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello 1.2. Variability and cognitive biases A number of formal probabilistic models have sought to show that a range of biases can be explained as a function of quasi-rational probabilistic reasoning instead of a heuristic process. These models have emphasised the role of random variation, or noise, in the decisionmaking process. Erev, Wallsten, and Budescu (1994) proposed a model to explain the observation that underconfidence (conservatism) and overconfidence could often be observed in the same judgement tasks. They demonstrated that subjective probability estimates perturbed by error can give this pattern of under- and overconfidence, even when judgements are accurate (also see Budescu, Erev, & Wallsten, 1997). Similarly, Hilbert (2012) proposed a theoretical framework based on noisy information processing. Under this framework, memory based processes convert observations stored in memory into decisions. By assuming that these processes are subject to noisy variation and that this variation generates systematic patterns of error in decision-making, this approach explains a number of cognitive biases. These models, however, simply assume the existence of random variation or noise in probabilistic reasoning; they do not describe the form and structure of this variation. Our main theoretical contribution in this paper is to give a mathematical description of variance in probabilistic reasoning. We take as our starting point a general model of noise in a normatively correct reasoning process: the probability theory plus noise model (PTN). This model assumes that people estimate probabilities via a mechanism that is fundamentally rational (following standard frequentist probability theory), but is perturbed in various ways by the systematic effects or biases caused by purely random noise or error. This approach follows a line of research leading back at least to Thurstone (1927) and continued by various more recent researchers (see, e.g. Bearden & Wallsten, 2004; Dougherty, Gettys, & Ogden, 1999; Erev, Wallsten, & Budescu, 1994; Hilbert, 2012). This model explains a wide range of results on bias in people’s direct and conditional probability judgments across a range of event types, and identifies various probabilistic expressions in which this bias is ‘cancelled out’ and for which people’s probability judgments agree with the requirements of standard probability theory (see Costello & Mathison, 2014; Costello & Watts, 2014, 2016, 2017, 2018, 2019; Costello, Watts, & Fisher, 2018). In standard frequentist probability theory the probability of some event A is estimated by drawing a random sample of events, counting the number of those events that are instances of A, and dividing by the sample size to give a sample proportion. The expected value of these estimates is P (A) , the probability of A; individual estimates will vary with a ‘binomial proportion’ distribution around this expected value (taking N to be the sample size, the binomial proportion distribution is simply equal to the binomial distribution Bin (N , P (A)), rescaled by 1/ N to represent sample proportions; see below). The probability theory plus noise model assumes that people estimate the probability of some event A in exactly the same way: by randomly sampling items from memory, counting the number that are instances of A, and dividing by the sample size. If this process was error-free, people’s estimates would be expected to have an average value of P (A) . Human memory is subject to various forms of random error, however. To reflect this the model assumes that events have some chance d < 0.5 of randomly being read incorrectly: there is a chance d that a ¬A (not A) event will be incorrectly counted as A, and the same chance d that an A event will be incorrectly counted as ¬A . We take PE (A) to represent the probability that a single randomly sampled item from this population will be read as an instance of A (subject to this random error in counting). Since a randomly sampled event will be counted as A if the event truly is A and is counted correctly (this occurs with a probability (1 d) P (A) , since P (A) events are truly A and events have a 1 d chance of being counted correctly), or if the event is truly ¬A and is counted incorrectly as A (this occurs with a probability (1 P (A)) d , since 1 P (A) events are truly ¬A , and events have a d chance of being counted incorrectly), the population probability of a single randomly sampled item being read as A is PE (A) = (1 d ) P (A) + (1 P (A)) d = (1 (1) 2d ) P (A) + d This equation gives the expected value or predicted average for people’s estimates for the probability of some event A. Since individual estimates are produced via sampling, individual probability estimates will vary randomly around this expected value in an approximately binomial proportion distribution. Note that this predicted average embodies a regression towards the center, due to random noise: estimates are systematically biased away from the ‘true’ probability P (A) , such that on average estimates will tend to be greater than P (A) when P (A) < 0.5, and will tend to be less than P (A) when P (A) > 0.5, and will tend to equal P (A) when P (A) = 0.5. This expression represents the expected value or average of people’s probability estimates for some event A. Since this model of probability estimation gives a central role to random noise (and sampling) it does not predict that all probability estimates will exactly equal the value given in this expression. Instead, the prediction is that, since individual estimates are produced via sampling and are subject to random error, individual estimates will vary randomly around this expected value. Original versions of this account assumed the same rate of random errors for all events (Costello & Watts, 2014). More recent versions (Costello & Watts, 2018, 2016, 2018) proposed a higher rate of this random error in complex events (conjunctions A B and disjunctions A B ). This extension allowed for increased regression in complex events, and was primarily intended to explain the wide range of conjunction and disjunction fallacy rates observed in the literature (ranging from 0% fallacy rates for some conjunctions to over 70% ) in some cases this increased regression would push conjunctive estimates P (A B ) closer to 0.5 than constituent estimates PE (A) , producing high conjunction fallacy rates for that conjunction. With this extension the model gave a close fit to data on fallacy rates across the full observed range (Costello and Watts, 2017). This idea of increased error for conjunctive or disjunctive events follows the standard statistical concept of propagation of error, which states that if two variables A and B are subject to random error, then a complex variable (e.g. A B ) that is a function of those two variables will have a higher rate of error than either variable on its own. To reflect this, the model assumes a rate of random error of d for single events but of d + d for conjunctions and disjunctions (where d represents a small increase in the rate of random error). The PTN then predicts that the expected value of a conjunction estimate will be: PE (A B ) = (1 (2[d + d]) P (A (2) B ) + [d + d] 3 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello and that for a disjunction estimate will be: PE (A B ) = (1 (2[d + d]) P (A (3) B ) + [d + d] with individual estimates varying randomly around these expected values in a binomial proportion distribution. These d expressions are simplifying approximations, and were simply taken as given in previous presentations of the PTN model. In the Appendix we extend this model by giving a specific model of the differential effects of random error on combined estimates PE (A B ) and PE (A B ), and show that this more precise model can be well approximated by these d expressions. The more precise d expression assumes that counting for complex items can take place in two separate ways - some familiar complex items can be treated “integrally” and counted as if they are simple events while other complex items will be treated “separably.” In separable cases, B . This more there are three possible sources of error: when counting items A, when counting B, and when counting A B or A specific model is quite complex: we use these simplifying d approximations in the main body of the paper for ease of presentation and to indicate that these error rates are themselves uncertain. Indeed, the main d term in this model is also a simplifying approximation, suggesting as it does the existence of a fixed rate of random error in probabilistic recall (in fact, we expect the error rate itself to vary randomly from moment to moment, depending on a range of extraneous factors). 1.3. Fallacy occurrence The conjunction (and disjunction) fallacy arise in this model purely as a consequence of this random variation. Assuming without loss of generality that P (B ) P (A) , the general idea is that a reasoner’s probability estimates for the probabilities of B and A B will both vary randomly around their expected values PE (B ) and PE (A B ) . This random variation means that some individual estimates will occur where PE (B ) < PE (A B ) , producing a conjunction fallacy response. The closer the expected values PE (B ) and PE (A B ) are to each other, the greater the chance of this fallacy response occurring. More specifically, this model predicts that the rate of conjunction fallacy responses will increase with the difference between average estimates PE (A B) PE (B ) = = (1 (1 2[d + d]) P (A B) 2d )[P (A B ) + [d + d ] P (B )] + d [1 (1 2P (A 2d) P (B ) B )] d (being low when this difference is negative and high when it is positive). When this difference is negative we have PE (A B ) < PE (B ) . Since individual estimates PE (A B ) and PE (B ) are both perturbed by random noise (which is equally likely to be positive or negative), when this difference is negative we expect that an individual estimate PE (A B ) will randomly fall above an estimate PE (B ) less than 50% of the time, producing a conjunction fallacy rate of less than 50%. Rearranging, we see that this difference will be positive when d [1 2P (A B )] > (1 2d )[P (B ) P (A B )] and when this inequality holds we expect that an individual estimate PE (A B ) will randomly fall above an estimate PE (B ) more than 50% of the time, producing fallacy rates of over 50% (and indeed as high as 85% or 90% ) for some events. This model can thus account for the wide range of conjunction fallacy rates seen in experimental studies. In a similar way the model predicts that the rate of disjunction fallacy responses will increase with the difference between average estimates PE (A) PE (A B) = (1 = (1 2d ) P (A) + d (1 2[d + d]) P (A B) [d + 2d )[P (A) P (A B )] d[1 2P(A B)] (being low when this difference is negative and high when it is positive). Since P (A) PE (A) PE (A B ) = (1 2d )[P (A B) P (B )] d [1 2P (A P (A d] B ) = P (A B) P (B ) we have B )] and we see that this model predicts that for a given pair of events A and B, the rate of disjunction fallacy occurrence should be approximately equal to the rate of conjunction fallacy occurrence (subject to a small difference of order d ). 1.4. The addition law These conjunction and disjunction fallacy predictions both concern patterns of deviation from the requirements of normative probability theory. Interestingly, by combining these results we obtain a prediction of agreement with one particular requirement of normative probability theory: the addition law. The addition law states that P (A ) + P (B ) P (A B) P (A B) = 0 must hold for all events A and B. If we just take a single noise rate of d across all forms of probability estimation, we get PE (A) = (1 =0 + PE (B ) PE (A 2d)[P (A) + P (B ) B) P (A PE (A B) B) P (A B )] + 2d 2d and the addition law identity should also hold in people’s probability estimates according to this model. Taking our more complex d expressions for conjunctions and disjunctions, we get 4 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello PE (A) PE (B ) + PE (A B) PE (A B ) = 2 d [P (A = 2 d [P (A) + P (B ) B ) + P (A B) 1] 1] ~ 0 Since 1 P (A) + P (B ) 1 1 necessarily holds, this model predicts that the average or expected value for this identity in people’s judgements will fall within 2 d of zero and we expect the addition law to hold, on average, in people’s probability estimates just as it does in normative probability theory. Note that, as before, this Equation gives the expected value or predicted average the addition law when computed from people’s probability estimates for some pair of events A, B . Since individual estimates are produced via sampling and are subject to random error, individual values for this identity are predicted to vary randomly around this expected value. Note also that the terms in this addition law expression can be rewritten as PE (A B) PE (A) = PE (B ) PE (A B) and so correspond exactly to the terms predicting conjunction and disjunction fallacy occurrence, in the previous section. This model thus predicts simultaneous patterns of deviation from and agreement with the normative requirements of probability theory (deviation in terms of conjunction and disjunction fallacy occurrence; agreement in terms of the addition law). 1.5. Variance of probability estimates This PTN model simply assumes the existence of random variation in probabilistic reasoning. Here we extend this model to derive predictions about the characteristic properties and degree of variance that should hold in people’s probability estimates (if people are estimating probabilities via noisy sampling as that model proposes). As before, we assume that people estimate the probability of some event A by randomly sampling some set of items from memory, counting instances of A in the sample (subject to random error in counting), and dividing by sample size. The variance of the sample count X in this process can be modelled via the binomial distribution. In the binomial distribution the probability of getting x successes in a sample of size N with fixed probability of success p is given by P (x|N , p) = ( ) p (1 x N x p) N x with the mean value of this sample count being N mean (X ) = xP (x|N , p) = pN x=0 and the variance of this sample count being N var (X ) = pN )2P (x|N , p) = Np (1 (x p) (4) x=0 Since the variance of any random variable is the average squared difference between values of that variable and its mean, the variance of the sample proportion pE (that is, the variance of the proportion of successes in a sample) is given by N var (pE ) = ( = x=0 N 1 N2 x=0 Np (1 = = pN 2 P (x|N , N ) x N p) pN 2 ) P (x|N , p) (x p) N2 p (1 p) (5) N If people are estimating probabilities via the sampling process assumed in the probability theory plus noise model (where their probability estimate for some event A is equal to the proportion of items in a random sample that were counted as instances of A, subject to random noise), then we would expect the variance of probability estimates to approximately follow this expression. More specifically, for event A and noise rate d we would expect the variance of people’s probability estimates PE (A) to be var (pE (A)) = PE (A) (1 PE (A) ) (6) N where N is the sample size used when estimating probabilities and PE (A) = (1 2d ) P (A) + d is the probability of an item being read as A (and where for conjunctive or disjunctive events we use d + d , as before). Predicted Standard Deviation (SD) of probability estimates will then be the square root of this variance. Given this theoretical background we now describe a series of experiments investigating the degree of random variation in probabilistic judgement, and the relationship between that variation and fallacy occurrence, in two different types of judgement task. Experiments 1 and 2 examine variance and fallacy occurrence in probability estimation for everyday events; Experiments 3 and 4 examine variance and fallacy occurrence in probability estimation for simple visual stimuli. The PTN predicts that the variance within 5 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello the probability estimates differs depending on whether the estimate is for a constituent, conjunction or disjunction, where higher variance should be observed for the complex statements. We will test this prediction and look at how variance in responses relates to fallacy rates; whether participants will produce fallacious responses repeatedly and whether they will be consistent or inconsistent in producing them across stimuli. Experiment 3 and 4 use stimuli with known objective probability values, allowing us to test predictions about the relationship between objective probability value and probability estimates, variance of estimates, and fallacy occurrence. 2. Experiment 1 Experiment 1 sought to investigate the variance in probability estimates using simple natural language estimation tasks. The participants were presented with single weather events (‘cold’, ‘rainy’) and conjunctive and disjunctive weather events (‘cold and rainy’, ‘cold or rainy’) and asked to estimate the probability or frequency of these weather events. The weather types were presented to participants in a randomised order and the participants were randomly assigned to one of two groups- frequency questions or probability questions. This experiment will test a number of predictions about subjective estimates. The main impetus of this paper is to examine the variability in judgements. Here, we will investigate whether participants estimates agree with probability theory and whether those participants will produce noisier estimates for complex statements - conjunctions and disjunctions - than constituents. Theoretical approaches such as representativeness accounts and averaging models assume that participants do not produce estimates in line with probability theory while noise models such as the PTN do under certain circumstances. There is a theoretical divide here, broadly speaking,where the theories on extensional errors can be classified as those that propose that judgements are produced by a process radically difference to probability theory and those that propose that judgements are produced by a process akin to probability theory. Representativeness accounts argue that participant judgements are not consistent with probability theory because they produce fallacies. The PTN, on the other hand, predicts that participants produce fallacies while being consistent with certain aspects of probability theory. We will investigate these claims. An occasional finding in the literature is that question type (whether questions are about event frequency or event probability) affects the rate of fallacy production. We examine this factor here, and ask whether the question type with the higher fallacy rate will also have a higher degree of response variability. 2.1. Materials and method The materials consisted of sets of questions about the likelihood (frequency, or probability) of a type of weather on a given day. Each set had 7 constituents, 8 conjunctions and 8 disjunctions (see Table 1 for materials). The questions were the same for each participant in each group but displayed in a randomised order. 94 participants were recruited from the student body in exchange for course credit, and were randomly assigned to either the frequency or the probability group. For the frequency group, the participants were asked Imagine a set of 100 different days, selected at random. On how many of those 100 days do you think the weather in Ireland will be [weather type]? Participants then indicated their answer using a scale of 0 to 100, where 0 indicated that they thought that there would be [weather type] on zero of those days, while 100 meant that they thought there would be [weather type] on 100 of those days. The probability group were asked What is the probability that the weather will be [weather type] on a randomly selected day in Ireland? Again, they indicated their answer on a scale of 0–100. Answers of 0 meant that the weather type would never happen, while answers of 100 meant that the weather type was certain to happen on a given day. Table 1 Constituents, conjunctions, average probability estimates, and total conjunction fallacy counts for Experiment 1. This table gives average probability estimates and total conjunction fallacy counts for constituents and conjunctions used in Experiment 1. Total conjunction fallacy count here is simply the number of participants who gave a probability estimate for a given conjunction that was greater than the estimate they gave for one or other constituent (in subsequent analyses we consider fallacy rates relative to constituent A and constituent B separately). Since there were 94 participants in the experiment in total, we use the cumulative binomial test to ask whether these fallacy counts are consistent with the hypothesis that the conjunction fallacy occurs at a rate of p = 0.5 (the most conservative prediction of a ‘noisy averaging’ model of the conjunction fallacy). Of 8 conjunctions, 5 had fallacy rates that were inconsistent with this hypothesis at the 0.05 significance level, and 3 were inconsistent at the 0.01 level. Total conj. fallacy count (/94) A B Warm Sunny Warm and Sunny 0.32 0.33 0.26 Rainy Cold Rainy and Cold 0.64 0.65 0.55 36† Rainy Warm Rainy and Warm 0.64 0.32 0.31 Sunny Snowy Cloudy Snowy Windy Cloudy Windy Sunny Windy and Sunny Snowy and Cloudy Windy and Cloudy Snowy and Sunny 0.33 0.13 0.73 0.33 0.62 0.73 0.62 0.13 0.33 0.16 0.61 0.12 35‡ 43 45 43 17‡ Cloudy Rainy Cloudy and Rainy 0.73 0.64 0.59 36† A B PE (A) PE (B ) Note: † probability <0.05 in a cumulative binomial test with N = 94, p = 0.5. ‡probability <0.01 in a cumulative binomial test with N = 94, p = 0.5. 6 PE (A B) 32‡ Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello 2.2. Results Under the PTN, violations of probability theory - conjunction and disjunction fallacies - should arise as a function of two things: probability values and variance. In the results below we examine how these variables contribute to fallacies. It is expected that the participant estimates should be consistent with elements of probability theory despite the production of fallacies. We will examine a number of things; whether judgements are consistent with the addition law and whether variability is greater for complex items than simple ones. Representativeness and noise accounts of the fallacies make disparate predictions about these items. 2.2.1. Response mode and fallacy rate Previous experiments looking at response mode have typically found lower fallacy rates when participants are presented with conjunction and disjunction questions in a frequency format than a probability format. To test where a difference would exist, each conjunction in the frequency group was paired with the respective conjunction in the probability group and a 2-sample test for equality of proportions was calculated. This found no significant difference in fallacy rates for any of the pairs. The disjunctions in both groups were also paired in this fashion and again, the equality of proportions test was calculated. Again, there was no difference in the fallacy rates between the groups.1 As the two groups produced very similar estimates and fallacy rates, they were collapsed for the purpose of analysis. 2.2.2. Estimation and probability theory We examined whether participant judgements could appear consistent with normative reasoning under certain conditions. An important prediction of the PTN is that participant judgements should be in line with the addition law even while producing fallacies. Under the PTN, the noise in participant judgements should cancel and produce a response that is in compliance with the addition law. The averaged values for each P(A), P(B), P(A B) and P(A B) estimate were used to test this. The participants’ estimates showed good compliance with the addition law. The estimates were close to the expected mean value of 0, with mild deviations from this value. We found an overall value of 0.019 for the estimates. For the frequency group, the average estimate was 0.035. In the probability group, there was even closer compliance with the addition law. There, the average estimate was 0.006. From the addition law, we observe that the sum of estimates for the positive terms, P(A), P(B), should equal the sum of estimates for the negative terms, P (A B ), P (A B ) . P (A ) + P (B ) = P ( A B ) + P (A B) Using this, we constructed a scatterplot to investigate compliance for probability theory. Fig. 1 shows a scatterplot the positive and negative terms for both groups in experiment 1. A Deming regression was used to determine how consistent the individual estimates were with probability theory. If the participant estimates are consistent with probability theory, then this regression will produce a line of best fit that follows the line of identity. As the figure shows, values for the addition law are distributed approximately symmetrically around the line of identity, with the line of best agreeing closely with the line of identity, as predicted by our model. A JZS Bayes Factor analysis based on a paired t-test of x and y values in this scatterplot (positive terms and negative terms in the addition law) gave strong evidence in favour of the null hypothesis that x and y values were equal (Scaled JZS Bayes Factor = 24.5), supporting the conclusion that the addition law identity holds in individual participant probability estimates. This replicates a range of previous results on the addition law (Costello & Watts, 2014; Costello & Watts, 2016; Costello & Watts, 2018). 2.2.3. Addition law and fallacy rates The conjunction fallacy rate relative to A should follow the disjunction fallacy rate relative to B, for any pairing of A,B. This arises as a natural consequence of the addition law, which the PTN predicts will be related to the fallacy rates. By rearranging the terms of the addition law, we see that P (A B) P (A ) = P (B ) P (A B) If the participant judgements are consistent with this prediction, then we should see analogous responses to conjunctions and disjunctions; for example, when the conjunction fallacy rate versus A is low, then the disjunction rate versus B should be low. In Table 2, we observe that the related fallacy rates P (A B ) - P(A) and P(B) - P (A B ) are typically close. A very strong positive correlation was observed between the relative conjunction and disjunction fallacy rates, r = 0.912, p < 0.00001. 2.2.4. Variability and probability estimates From the PTN, we expect that the average estimate difference between the constituent and complex item can be used to predict the resultant fallacy rate for that complex item. To test this prediction, PE (A B ) PE (A) was calculated for each constituent and B ) was calculated for each constituent and disjunction. This was then compared to the fallacy rates. conjunction and PE (B ) PE (A These values are shown in Table 2. A Pearson’s correlation was used to examine the relationship between estimate difference and total fallacy rate for each pairing. A strong positive correlation was observed the average conjunction fallacy rates and average calculated estimate difference, r = 0.77, p < 0.0005. For the disjunctions, a very strong positive correlation was observed between disjunction fallacy rates and estimate difference, r = 0.92, p < 0.00001. 1 Only one pair were significantly different from each other - cloudy rainy, which had a fallacy rate of 50% in the frequency group and 25% in the probability group, respectively. 7 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 1. The figure above shows a scatterplot of all the positive and negative terms for all estimates and participants in the frequency and probability groups of Experiment 1. The positive term is the sum of the P(A) and P(B) estimates while the negative term is the sum of the P(A B) and P(A B) estimates. The correlation between the pairs for the frequency group was r = 0.786, p < 0.00001. The probability group had a correlation value of r = 0.794, p < 0.00001. As the groups were very similar in their estimates, they were collapsed. For the scatterplot, normative probability is represented by the line of identity, shown in grey. A Deming regression was calculated to determine the best fit line. This is represented by the black, dashed line on the scatterplot. For the addition law to hold, the points must be symmetrically distributed around this ‘line of identity.’. Table 2 Restricted estimate difference and fallacy rate. Restricted estimate difference was found by excluding the estimates from participants that had produced a fallacy for a given conjunction and the calculating PE (A B ) - PE (A) from the remaining estimates for that conjunction. For the disjunctions, participants that produced a disjunction fallacy for a given disjunction had their estimates excluded from the calculation of estimate difference, P (B ) P (A B ) , for that disjunction. Positive differences were observed with fallacy rates above 50% while negative differences were associated with fallacy rates less than 50%. A very strong positive correlation was found for the restricted estimate difference and the conjunction fallacy rate, r = 0.96, p < 0.00001. A strong positive relationship was also found for the disjunction rate and the restricted estimates, r = 0.78, p < 0.00001. These correlations between restricted estimates and the conjunction and disjunction fallacy rates suggest that estimate difference can be used to predict fallacy rates. The PTN predicts that the conjunction rate relative to A should follow the disjunction rate relative to B, for any pairing of A,B. Due to this, any P(A) vs P( A B ) conjunction fallacy rate should match the P(B) vs P( A B ) disjunction fallacy rate. Below, we see strong indications that this is the case, with a strong positive correlation observed between the relative fallacy rates, r = 0.912, p < 0.00001. P( A Constituent Cloudy Rainy Sunny Snowy Windy Cloudy Cloudy Sunny Rainy Cold Rainy Warm Warm Windy Sunny Snowy Conjunction Cloudy Rainy Sunny Sunny Windy Cloudy Cloudy Sunny Rainy Rainy Cloudy Sunny Rainy Cloudy Windy Cloudy Snowy Warm Snowy Snowy Sunny Windy Rainy Warm Cold Cold Rainy Warm Warm Windy Sunny Snowy B ) - P(A) P (B ) Average Difference Fallacy rate Constituent −0.59 −0.36 −0.26 −0.07 −0.35 −0.17 −0.21 −0.12 −0.15 −0.18 −0.12 −0.12 −0.17 −0.13 −0.14 −0.08 4% 6% 10% 16% 11% 15% 21% 21% 22% 27% 28% 26% 36% 44% 44% 48% Snowy Warm Snowy Sunny Windy Rainy Warm Rainy Cold Sunny Cloudy Sunny Rainy Windy Cloudy Cloudy Disjunction Cloudy Rainy Sunny Windy Cloudy Cloudy Sunny Rainy Rainy Sunny Cloudy Sunny Rainy Windy Cloudy Cloudy Snowy Warm Snowy Sunny Windy Rainy Warm Cold Cold Snowy Rainy Warm Warm Sunny Windy Snowy P (A B) Average Difference Fallacy rate −0.56 −0.38 −0.27 −0.34 −0.19 −0.19 −0.14 −0.14 −0.15 −0.11 −0.13 −0.13 −0.15 −0.14 −0.12 −0.10 5% 6% 9% 9% 22% 24% 32% 35% 35% 36% 37% 38% 39% 44% 44% 55% To test whether these correlations held across different sets of participants, we performed 100 random split-half correlations, dividing participants into two randomly chosen equal-sized halves, calculating conjunctive and disjunctive fallacy rates for each pair of events A, B in one half and calculating estimate differences for those pairs in the other half, and measuring the correlation between those measures. There was strong positive relationship between average estimate difference and conjunction fallacy rate (average r = 0.66, min r = 0.51, p < 0.001 in all cases) and between average estimate difference and disjunction fallacy rate, (average r = 0.80 , min r = 0.65, p < 0.00001 in all cases). However, the average difference and the fallacy rate are two measures that are by definition connected - one is the measure of the number of times that P (A B ) exceeds P(A) for the conjunction, the other a measure of, on average, how much P (A B ) is larger 8 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello than P(A) for the conjunction. This also holds for the disjunction where the average difference is the measure of how much B ) is less than P(A). To address this, P (A B ) is smaller then P(A), while the fallacy rate is the measure of how many times P (A these measures were separated and used to predict fallacy rates. The estimates from any participant that had produced a fallacy for a particular conjunction or disjunction was excluded and the B ) difference was calculated for the participants that had not produced the fallacy. average PE (A B ) PE (A) or PE (B ) PE (A For instance, if participants 3, 5, and 7 had produced a fallacy response for Cloudy vs Cloudy Snowy, their estimates were removed and the PE (A B ) PE (A) value for Cloudy vs Cloudy Snowy was then calculated for the participants that had produced no fallacy. This procedure was then repeated for each conjunction and disjunction. Table 2 which displays the average difference calculated for the restricted set of estimates and how it relates to the conjunction and disjunction fallacy rate for that pair. Higher fallacy rates were observed when the differences were close to zero, while lower fallacy rates were observed where the difference were much lower than zero. Pearson’s correlations were again calculated for estimate difference and fallacy rate. A very strong positive correlation was found for the restricted estimate difference and the conjunction fallacy rate, r = 0.96, p < 0.00001. A strong positive relationship was also found for the disjunction rate and the restricted estimates, r = 0.78, p < 0.0005. In general, greater variance was observed for the complex statements than the constituents. The conjunctions were more variable than their constituent counterparts on 81% of the occasions while the disjunctions were more variable on 56% of the occasions. In the probability group, 93% of the conjunctions were more variable than their constituents, while in the frequency group 69% of the conjunctions were more variable. For the disjunctions, the opposite pattern was observed, with 75% of the frequency group showing higher variance, while 38% of the probability group’s disjunctions were more variable. Levene’s test of homogeneity of variances2 was used to determine if any of these were more variable at statistically significant levels. In the conjunctions, 13% of them were significantly more variable at 0.05 level, while a further 13% for were significant at the 0.1 level. For the disjunctions, Levene’s test found that 13% were significantly variable at the 0.05 while a further 10% were variable at the 0.1 level.3 To examine the relationship between probability estimates and variability in producing the fallacies, 95% confidence intervals were constructed for the constituent and complex item from the restricted estimates. Each instance of a fallacy response (where P (A B ) > P (B ) or P (A B ) < P (A) was observed) was removed and the confidence intervals were then constructed using the instances where no fallacy occurred. In Tables 3 and 4 these values are displayed, in addition to degree to which the two confidence intervals overlapped. These results demonstrate that for high fallacy rates to occur there must be an overlap in the confidence interval of the constituent and the complex statement. That is, that the constituent and conjunction or disjunction estimate must be close to each other. The closer the estimates get to each other, the more likely the fallacy is to result. Large negative overlaps result in very low fallacy rates while overlaps around or above 0 will result in fallacy rates of approximately 50%. The larger the positive overlap, the greater the fallacy rate will be. A strong positive correlation was observed between fallacy rate and confidence interval overlap,r = 0.778, p < 0.00001. 2.3. Testing averaging models of the conjunction fallacy Finally, it is worth noting that results from this experiment pose a challenge for one type of heuristic-based approach to conjunctive probability estimation and the conjunction fallacy: an approach were conjunctive probability estimates are produced by averaging constituent probabilities. Approaches following this line initially proposed that the conjunction estimate was simply the mean of the two constituent probabilities (Carlson & Yates, 1989; Fantino, Kulik, Stolarz-Fantino, & Wright, 1997). More recently Nilsson and colleagues (Nilsson, Winman, Juslin, & Hansson, 2009) have proposed a more sophisticated configural cue model, where conjunctive probabilities are computed by a weighted average of the form P (A B ) = W min (P (A), P (B )) + (1 W ) max (P (A), P (B )) 0.5 W 1 (7) where a higher weight is given to the lower constituent probability and a lower weight to the higher constituent. Disjunctive probabilities are computed by an analogous weighted average P (A B ) = (1 W ) min (P (A), P (B )) + W max (P (A), P (B )) 0.5 W 1 but with the assignments of weights reversed, so that a lower weight is given to the lower constituent probability and a higher weight to the higher constituent. Note that these conjunctive and disjunctive probability values will satisfy the addition law and similar identities, and so this model is consistent with those results (Nilsson, Juslin, & Winman, 2014). Even with this configural weighting, however, the average of two numbers is always greater than the minimum of those two numbers and less than the maximum (except when the numbers are equal). This means that these averaging accounts predict that the conjunction probability will almost always be greater than the lower constituent probability: that the conjunction fallacy will occur for almost every conjunction (and that the disjuntion probability will almost always be less than the higher constituent probability: that the disjunction fallacy will occur for almost every disjunction). This is clearly not the case: there are many conjunctions for which the fallacy does not occur at anything close to 100%. To address this problem, 2 Shapiro–Wilk Test for Normality determined that Levene’s test was the most appropriate measure for analysis of equality of variance. Note that these differences in the degree of variance for conjunctions and disjunctions are consistent with the binomial variance model, where the variance in estimates for P (X ) is a function of the value P (X )(1 P (X )) (Eq. (6)); this value is only the same for conjunctions A B and disjunctions A B when P (A) + P (B ) = 1 holds. 3 9 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 3 Confidence Intervals for Conjunctions (Exp 1). The table below displays the confidence intervals for the conjunctions in experiment 1. 95% confidence intervals were constructed using the restricted estimates for the constituent and conjunctions. From this, we could calculate how much the probability estimates overlapped for each pair. A positive value shows that the estimates for the constituent and conjunction typically overlapped and had higher fallacy rates. A negative value meant that the estimates typically did not overlap and were associated with low fallacy rates. A strong positive correlation was observed between fallacy rate and confidence interval overlap, r = 0.778, p < 0.00001. P(A) 95% CI P(A) Cloudy Rainy Sunny Windy Cloudy Snowy Cloudy Sunny Rainy Warm Cold Rainy Warm Windy Sunny Snowy P (A B) Snowy Cloudy Rainy Warm Snowy Sunny Windy Sunny Windy Cloudy Snowy Sunny Cloudy Rainy Warm Sunny Rainy Cold Warm Sunny Rainy Cold Cloudy Rainy Rainy Warm Windy Cloudy Windy Sunny Snowy Cloudy P (A B ) 95% CI Low High Low High Overlap Fallacy 0.69 0.60 0.30 0.60 0.71 0.30 0.71 0.30 0.61 0.29 0.66 0.61 0.31 0.64 0.32 0.07 0.76 0.69 0.37 0.68 0.79 0.37 0.79 0.38 0.70 0.37 0.74 0.71 0.41 0.74 0.43 0.21 0.09 0.24 0.04 0.25 0.53 0.04 0.49 0.19 0.46 0.17 0.46 0.50 0.15 0.50 0.19 0.03 0.18 0.33 0.10 0.33 0.62 0.10 0.58 0.25 0.55 0.24 0.56 0.59 0.24 0.60 0.27 0.10 −0.51 −0.27 −0.20 −0.27 −0.09 −0.20 −0.13 −0.05 −0.06 −0.05 −0.10 −0.02 −0.07 −0.04 −0.05 0.03 4% 6% 10% 11% 15% 16% 21% 21% 22% 26% 27% 28% 36% 44% 44% 48% Table 4 Confidence Intervals for Disjunctions (Exp 1). The table displays the confidence intervals for the frequency and probability groups in experiment 1. 95% confidence intervals were constructed for the restricted constituent and disjunction estimates. A positive overlap shows the degree that the estimates for the constituent and conjunction typically overlapped. Higher fallacy rates with a larger overlap. A negative value meant that the estimates typically did not overlap and were associated with low fallacy rates. A very strong positive correlation is observed between the CI overlap and fallacy rate, r = 0.874, p < 0.00001. P(B) 95% CI P(B) Snowy Warm Sunny Snowy Windy Rainy Warm Rainy Cold Sunny Cloudy Sunny Rainy Cloudy Windy Cloudy P (A B) Snowy Cloudy Rainy Warm Windy Sunny Snowy Sunny Windy Cloudy Cloudy Rainy Warm Sunny Rainy Cold Rainy Cold Snowy Sunny Cloudy Rainy Warm Sunny Rainy Warm Windy Cloudy Windy Sunny Snowy Cloudy P (A B ) 95% CI Low High Low High Overlap Fallacy 0.07 0.26 0.28 0.06 0.53 0.55 0.23 0.52 0.54 0.26 0.63 0.24 0.51 0.61 0.49 0.63 0.13 0.33 0.34 0.12 0.62 0.65 0.31 0.62 0.65 0.35 0.72 0.32 0.62 0.71 0.59 0.74 0.62 0.64 0.6 0.31 0.73 0.75 0.36 0.67 0.71 0.37 0.77 0.35 0.66 0.74 0.63 0.74 0.71 0.71 0.68 0.40 0.80 0.82 0.46 0.76 0.78 0.47 0.84 0.46 0.76 0.82 0.73 0.82 −0.49 −0.31 −0.26 −0.19 −0.11 −0.10 −0.05 −0.05 −0.06 −0.02 −0.05 −0.03 −0.04 −0.03 −0.04 0.00 5% 6% 9% 9% 22% 24% 32% 34% 35% 36% 37% 38% 39% 44% 44% 55% Nilsson et al.’s model also includes a noise component that randomly perturbs conjunctive probability estimates, sometimes moving the conjunctive probability below the lower constituent probability and so eliminating the conjunction fallacy for that estimate (and similarly for disjunctions). Since this noise is random, it has at most a 50% chance of moving a conjunctive probability (produced by averaging) below its lower component probability. This 50% chance arises when constituent and conjunctive probabilities are equal: in all other cases the conjunctive probability is greater than its lower constituent and so the chance of the conjunctive estimate falling below the constituent probability is necessarily less than 50%. This means that this noisy averaging model necessarily predicts conjunction fallacy will be predominant (occurring at rates of 50% or higher) for all conjunctions (see Nilsson et al., 2009, p. 521). We can carry out a conservative assessment of this prediction by using the cumulative binomial test to ask whether the total number of conjunction fallacy occurrences observed in our experiment is consistent with the hypothesis that conjunction fallacy responses occur with a probability of 0.5 (the minimum probability predicted in this ‘noisy average’ account). Applying the cumulative binomial test to the total conjunction fallacy counts given in Table 1 (with N = 94 , since there were 94 participants in total, and p = 0.5) we find that the total fallacy rates for 5 out of 8 conjunctions are inconsistent with the noisy averaging hypothesis at the 0.05 10 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello significance level, and 3 out of 8 are inconsistent at the p = 0.01 level. For the conjunction ‘Rainy and Cold’, for example, 36 out of 94 participants gave a conjunction fallacy response. Under the assumption that P (fallacy ) = 0.5, the probability of observing a fallacy count of 36 or less in a sample of 94 responses is less than p = 0.05. Similar results hold for the disjunction fallacy. 2.4. Experiment 1 discussion As predicted by the PTN, the participants’ estimates were consistent with probability theory (in terms of the addition law) while simultaneously deviating from probability theory (in terms of frequent occurrence of the conjunction and disjunction fallacies). Rates of occurrence of the conjunction and disjunction fallacy were closely connected to difference in estimates, variance and overlap measures (just as predicted by that model). Results showed that participants are typically more variable across a range of conjunction and disjunction estimates than they are for constituents while the range of fallacy rates observed for the pairings are in line with those observed in previous research (4–48% for conjunctions and 5–55% for disjunctions). This has important implications for the production of fallacies as the high fallacy rates seem to arise due to a combination of comparably high variance in the conjunction or disjunction and probability estimates for the constituent that consistently overlap with the conjunction or disjunction probability estimates. Lower fallacy rates are typically observed where the estimates are far apart. Unlike other findings in the literature, there was no significant observed difference between the frequency and probability groups in their estimates or fallacy rates. This was a consistent observation for both the conjunctions and disjunctions, with most pairs only differing by a few percentage points between the two response modes and no statistically significant difference found upon analysis. Previous research had suggested that fallacy rates could be manipulated by varying the response mode. However, the stimuli set used was more complex than the simple events used here, which may account for the observed difference. Experiment 1 has established three important things: participants can be consistent with probability theory and still commit the fallacies, that variance exists between question types for participants, and that participants can be variable for the same question types. However, most theories of variability emphasise not just that participants will be variable in their response to different conjunction or disjunction problems but that they will also be variable for the same conjunction or disjunction problem if presented to them repeatedly. To investigate this, participants must provide repeated estimates for the same stimulus and their ‘internal’ variability must be examined in relation to their fallacy rates. This will allow us to examine whether variance for these responses will arise under these conditions. 3. Experiment 2 This experiment sought to examine the variance in probability estimates for simple natural language estimation tasks as in experiment 1. Here, we presented constituents, conjunctions and disjunctions repeatedly to participants asking them for estimates on different types of weather events. Noise accounts of cognitive biases emphasise that they result due to internal noise, that is, that a participant will have variable responses when they are asked for repeated estimates of the same event. In this experiment, each of the weather types were presented to participants repeatedly and in randomised order. Few experiments have looked as individual participant variability on the same probability judgements to date so repeated judgements will allow us to examine both variability and consistency of fallacy production for each participant. 3.1. Materials and method The materials consisted of two sets of questions of the likelihood of specific weather conditions on a given day. The sets were designed so that participants were asked to assess weather conditions of high, medium and low likelihoods. Set A had four constituents (Windy, Sunny, Snowy, Cloudy), three conjunctions (Windy and Sunny, Windy and Cloudy, Snowy and Cloudy), and three disjunctions (Windy or Sunny, Windy or Cloudy, Snowy or Cloudy). Set B also consisted of four constituents (Warm, Rainy, Cold, Sunny), three conjunctions (Warm and Sunny, Rainy and Cold, Rainy and Warm), and three disjunctions (Warm or Sunny, Rainy or Cold, Rainy or Warm). The questions about the likelihood of the weather conditions appeared on screen in front of the participants and they had to submit their estimate by moving a mark on a slider. Participants were asked What is the probability that the weather will be [weather type] on a randomly selected day in Ireland? The slider had a minimum value of 0 and a maximum value of 100. An estimate of 0 meant zero chance of that particular weather occurring on a given day. An estimate of 100 meant that the weather was certain to occur on a given day. To examine variability in estimates, each of the 10 set items were presented 5 times in a randomized order to the participant. In total, each participant was asked for 50 probability estimates. Unlike experiment 1, participants were only asked for probability responses. For this experiment, 87 participants were recruited from the student body in exchange for course credit. They were randomly assigned one of the two question sets. They were given a brief description of their task (assessing the likelihood of weather conditions on a given day) and informed that there was no time limit on task completion. The participants were asked to provide probability judgements for statements for the type of weather that appeared on-screen. At no stage did they have access to their previous responses. 3.2. Results Participants who did not complete the task were excluded from the final analysis. In total, 6 participants failed to complete the task and were excluded from the final analysis. The results for the remaining 81 participants are given below. 11 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 2. The figure above shows a scatterplot of all the positive and negative terms for all estimates and participants in experiment 2. Positive (P (A) + P (B ) ) and negative (P (A B ) P (A B ) ) terms of the addition law were calculated for each participant (by averaging each participant’s 5 estimates for these terms for each pair A, B ). The correlation between the pairs was r = 0.88, p < 0.00001. Normative probability is represented by the line of identity, shown in grey. A Deming regression was calculated for both to determine the best fit line. This is represented by the dashed black line on the scatterplot. 3.2.1. Estimation and probability theory We predict that participant estimates should be consistent with probability theory in terms of the addition law. Initially, averaged values for each of the A,B pairings were used to calculate the addition law itself. Overall, as with experiment 1, the participants’ estimates showed good compliance with the addition law. For all the pairings, the values were close to the expected normative value of 0, showing only mild deviations above and below that value, with an overall mean value of 0.004. Positive (P (A) + P (B ) ) and negative (P (A B ) P (A B ) ) terms of the addition law were calculated for each participant (by averaging each participant’s 5 estimates for these terms for each pair A, B ). For the addition law to hold, the points must be symmetrically distributed around the line of identity. Fig. 2 shows the relationship between these positive and negative terms. A Deming regression was calculated using the participant estimates to investigate whether the estimates were consistent with probability theory. As in experiment 1, values for the addition law are distributed approximately symmetrically around the line of identity, with the line of best fit agreeing closely with the line of identity, as predicted by our model. A JZS Bayes Factor analysis based on a paired t-test of x and y values in this scatterplot gave strong evidence in favour of the null hypothesis that x and y values were equal (Scaled JZS Bayes Factor = 11.2), supporting the conclusion that the addition law identity holds in individual participant probability estimates. 3.2.2. Addition law and fallacy rate The PTN predicts that the fallacies rates will be related via the addition law, with the conjunction fallacy rate relative to A following the disjunction fallacy rate relative to B, for any pairing of A,B. We see strong indications that this is the case in Table 5, with the fallacy rates for P (A B ) vs P (A) and P (B ) vs (P (A B )) strongly correlated for the participants’ judgements, r = 0.841, p < 0.00001. 3.2.3. Variability in probability estimation The PTN predicts that the fallacy rates arise as a function of variance in the probability estimates with higher variance observable in the conjunction and disjunction statements than in the constituents. As each of the constituents, conjunctions and disjunctions were presented multiple times to participants, we were able to measure the variance for each event and type in the sample and examine how it relates to the observed fallacy rates. We tested this prediction, first by calculating the overall estimate difference for each conjunction and disjunction and comparing it to the fallacy rate for that item and then by comparing a restricted estimate difference to the fallacy rate. The overall estimate difference was calculated for each conjunction using PE (A B ) PE (A) . The difference for the disjunctions B ) . This was then compared to the overall fallacy rate. A Pearson’s r correlation found a strong was calculated using PE (A) PE (A positive relationship between average estimate difference and conjunction fallacy rate, r = 0.862, p < 0.0005 and a strong positive correlation between average estimate difference and disjunction fallacy rate, r = 0.86, p < 0.0005. As in experiment 1, to test whether these correlations held across different sets of participants, we performed 100 random splithalf correlations, dividing participants into two randomly chosen equal-sized halves, calculating conjunctive and disjunctive fallacy rates for each pair of events A, B in one half and calculating estimate differences for those pairs in the other half, and measuring the correlation between those measures. There was strong positive relationship between average estimate difference and conjunction fallacy rate (average r = 0.83, min r = 0.74, p < 0.00001 in all cases) and between average estimate difference and disjunction fallacy rate, (average r = 0.86, min r = 0.75, p < 0.00001 in all cases). Each participant in the experiment gave 5 probability estimates for each constituent, each conjunction, each disjunction, and so on. Individual conjunction and disjunction fallacy occurrences for a given constituent/conjunction pair were identified by comparing 12 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 5 Restricted estimate difference and fallacy rate. The table below displays the difference for each set of restricted estimates and its corresponding fallacy rate for experiment 2. To demonstrate that estimate difference can be used to predict the fallacy rate of a complex item, the two measures were separated as in experiment 1. A significant positive correlation of r = 0.98, p < 0.00001 was observed for the restricted estimate difference and the conjunction fallacy rate. A significant positive correlation of r = 0.78p < 0.00001 was observed for the restricted estimate difference and the disjunction fallacy rate. The PTN predicts that the conjunction rate relative to A should follow the disjunction rate relative to B, for any pairing of A,B. This arises as a natural consequence of the addition law. By rearranging the terms of the addition law, we see that P (A B ) P (A) = P (B ) P (A B ) . Below, we see strong indications that this is the case, with similar fallacy rates for P (A B ) vs P (A) and P (B ) vs (P (A B )) for the participants’ judgements. The relative fallacy rates are strongly correlated, r = 0.841, p < 0.001 P( A B ) - P(A) P(B) - P( A B) Constituent Conjunction Average Difference Fallacy rate Constituent Disjunction Average Difference Fallacy rate Cloudy Windy Rainy Cloudy Sunny Windy Warm Cold Warm Sunny Rainy Snowy Cloudy Snowy Windy Sunny Rainy Warm Cloudy Windy Sunny Warm Cloudy Windy Rainy Warm Cold Rainy Sunny Warm Windy Sunny Cold Rainy Cloudy Snowy −0.54 −0.53 −0.36 −0.13 −0.11 −0.11 −0.12 −0.10 −0.08 −0.07 −0.11 −0.01 0% 0% 2% 13% 20% 25% 27% 27% 27% 45% 49% 65% Snowy Sunny Warm Warm Sunny Windy Cloudy Rainy Cold Windy Rainy Cloudy Cloudy Snowy Windy Sunny Rainy Warm Sunny Warm Sunny Warm Cloudy Windy Cloudy Windy Cold Rainy Cold Rainy Windy Sunny Rainy Warm Cloudy Snowy −0.50 −0.30 −0.27 −0.13 −0.10 −0.18 −0.12 −0.16 −0.11 −0.10 −0.16 −0.02 0% 0% 5% 10% 17% 20% 20% 24% 32% 33% 54% 63% these repeated estimates in order (if a participant’s first estimate for A B was greater than their first estimate for A, a fallacy was recorded, if there second estimate for A B was greater than their second estimate for A a fallacy was recorded, and so on). To address the possibility that this ordering may have influenced responses we repeated the above split-half correlation test, but randomly shuffling the order of participants’ repeated estimates for each constituent event. Even with this random shuffling of repeated responses there remained a strong positive relationship between average estimate difference and conjunction fallacy rate (average r = 0.85, min r = 0.70, p < 0.00001 in all cases) and between average estimate difference and disjunction fallacy rate, (average r = 0.85, min r = 0.74, p < 0.00001 in all cases). Finally, to address the fact that the fallacy rate and estimate difference are connected, the fallacy rate and estimate difference were separated and used to predict fallacy rates. The participants that had produced fallacies for a given conjunction or disjunction had these set of estimates excluded from the calculation of estimate differences. Then the difference PE (A B ) PE (A) was calculated for participants that didn’t produce a conjunction fallacy for a given conjunction. PE (B ) PE (A B ) was calculated for all the participants that didn’t produce a disjunction fallacy for a given disjunction. The fallacy rate was calculated for all instances of estimates in the pair. A significant positive correlation of r = 0.98, p < 0.00001 was observed for the restricted estimate difference and the conjunction fallacy rate. A significant positive correlation of r = 0.78, p < 0.005 was observed for the restricted estimate difference and the disjunction fallacy rate. For differences greater than 0, we see fallacy rates greater than 50%. For differences around 0, we see fallacy rates close to 50% and for differences less than 0, we see fallacy rates less than 50%. Table 5 displays the restricted differences and fallacy rates. One of the predictions of the PTN model is that fallacy rate should occur inconsistently, that is - the participants should be variable in their responses-, when the calculated difference between the conjunction and constituent is zero. Inconsistent fallacy production occurred when the participant produced a fallacy response for 1, 2, 3 or 4 of the possible 5 occasions for each weather type. A consistent fallacy response occurred when the participants produced either 0 or 5 fallacy responses for the five possible occasions. For the sample, 51% of the responses were consistent and 49% were inconsistent. Each fallacy response and the corresponding average difference between the conjunction and constituent estimate were calculated. Fig. 3 shows the results of these calculations. Participants that produced zero fallacy responses (darkest frequency distribution, at the back of the graph) had an average difference between the conjunction and constituent estimates of zero or less. Participants that have five fallacy responses (lightest frequency distribution, at the back of the graph) had positive averages differences. Participants that had inconsistent responses have differences grouped around zero, with a pattern of increasingly positive results observed for the more fallacy responses made, just as predicted. Variability. The total conjunction fallacy rate for the sample was 25%. A wide range of fallacy rates was observed for the conjunctions, with fallacy rates of 0% to 65% depending on the constituent-conjunction pair. An overall disjunction fallacy rate of 23% was observed for the sample. As with the conjunction fallacy rate, a wide range of fallacy rates were observed, here we observed fallacy rates of 0% to 63% depending on the constituent-disjunction pair. These results can be observed in Table 5. As in experiment 1, 95% confidence intervals were constructed for the constituents and complex items using the estimates where no fallacy had occurred. Then the overlap between the two CIs was compared to the fallacy rates. High fallacy rates typically occurred when there was both a positive overlap between the respective confidence intervals between the constituent and complex variability (see Tables 6, 7). A strong positive correlation was observed between the degree of CI overlap and the fallacy rate for both the conjunction 13 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 3. This graph shows the relationship between individual fallacy rate (number of times a participant produced the conjunction fallacy for a given conjunction/constituent pair across the 5 repetitions of that pair), the average difference PE (A B ) PE (A) for that pair and the frequency for which each fallacy-difference pair occurred in Expt 2. Individual fallacy rates here go from 0 at the back of the graph (no fallacy occurrence) to 1 at the front of the graph (fallacy occurrence in all 5 presentations). Average differences PE (A B ) PE (A) and individual fallacy rates were calculated for each participant and each A, B pair; each individual block in this graph shows the total number of times, across all participants and pairs, that this difference fell into a given bin and that a given individual fallacy rate was produced. A consistent response occurred when the participant produced zero or five fallacy responses out of five repetitions for a conjunction. The PTN predicts that consistent no-fallacy responses will have negative average differences while consistent fallacy responses will have positive average differences. Average differences were ‘binned’ in blocks of 0.1, so, for instance, all estimate differences that fell between −0.05 to +0.05 were placed in the ‘0’ bin. In the case of the 100% fallacy rate, a small number of positive average differences fell between 0 and +0.05 and hence, were placed in the ‘0’ bin. For fallacy rates between 0% and 100%, the average differences in estimates were more frequently found varying around 0 (the grey bars in the figure). fallacy, r = 0.71, p < 0.01 and the disjunction fallacy,r = 0.79, p < 0.005. The further apart the constituent and conjunction or disjunction values were, the less likely we were to observe a fallacy occurring, while fallacies were much more likely to occur where the confidence interval approached or exceeded zero. Individual variability. As the participants had given multiple estimates for each constituent, conjunction and disjunction, we were able to assess each participant’s individual variability. In total, each participant has 6 occasions where we could compare the constituent and conjunction variability (e.g. Cloudy vs Cloudy Snowy and Snowy vs Cloudy Snowy) and 6 occasions where we could compare the constituent and disjunction variability (e.g. Cloudy vs Cloudy Snowy and Snowy vs Cloudy Snowy). For the conjunctions, 35% of participants were more variable for their individual constituent estimates than their conjunction estimates, the remaining 65% of participants were equally or more variable for their individual conjunction estimates. For the disjunctions, 30% of participants were more variable for their individual constituent estimates than their conjunction estimates, the remaining 70% of participants were equally or more variable for their individual disjunction estimates. The summary of these results can be seen in Fig. 4 where the individual variance is compared to the fallacy rates. Fallacies are more likely to occur when the conjunction or disjunction is more variable than the constituent. 3.3. Experiment 2 discussion As with experiment 1, we investigated whether participant judgements were in agreement with probability theory. Again, we found strong evidence that their estimates were in line with the addition law, with only minor deviations being observed. While the estimates were consistent with this aspect of probability theory, participants still produced both conjunction and disjunction fallacies at varying rates, depending on the question posed to them. 14 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 6 Confidence Intervals for conjunction estimates (Exp 2). As in experiment 1, the restricted estimates were used to calculate the 95% confidence intervals. A positive value for the overlap meant that the estimates were typically close to each other, while a negative overlap meant that the estimates were typically far from each other. A reliable positive correlation was observed between fallacy rate and confidence interval overlap, r = 0.71, p < 0.01. P(A) 95% CI P(A) Cloudy Windy Rainy Cloudy Sunny Windy Warm Cold Warm Sunny Rainy Snowy P( A B) Cloudy Snowy Windy Sunny Rainy Warm Windy Cloudy Warm Sunny Windy Cloudy Warm Sunny Rainy Cold Rainy Warm Windy Sunny Rainy Cold Cloudy Snowy P( A B ) 95% CI Low High Low High Overlap Fallacy 0.64 0.58 0.53 0.65 0.31 0.60 0.27 0.56 0.25 0.35 0.51 0.02 0.73 0.69 0.65 0.75 0.40 0.72 0.36 0.70 0.35 0.46 0.66 0.06 0.08 0.35 0.18 0.51 0.20 0.49 0.19 0.45 0.14 0.28 0.40 0.02 0.15 0.45 0.27 0.62 0.28 0.61 0.28 0.60 0.23 0.39 0.56 0.05 −0.49 −0.13 −0.26 −0.03 −0.03 0.01 0.01 0.04 −0.02 0.04 0.05 0.03 0% 0% 2% 13% 20% 25% 27% 27% 27% 45% 49% 65% Table 7 Confidence Intervals for Disjunction Estimates (Exp 2). The 95% confidence intervals for the constituent-disjunction pairs for experiment 2. A positive overlap demonstrated that the estimates for the constituent-disjunction pair overlapped to some degree. A strong positive correlation is observed between the CI overlap and fallacy rate, r = 0.791, p < 0.005 . The greater the negative overlap, the lower the fallacy rate. Positive overlaps are associated with fallacy rates greater than 50%. P(B) 95% CI P(B) Snowy Sunny Warm Warm Sunny Windy Cloudy Rainy Cold Windy Rainy Cloudy P( A B) Cloudy Snowy Windy Sunny Rainy Warm Warm Sunny Warm Sunny Windy Cloudy Windy Cloudy Rainy Cold Rainy Cold Windy Sunny Rainy Warm Cloudy Snowy P( A B ) 95% CI Low High Low High Overlap Fallacy 0.05 0.34 0.26 0.26 0.28 0.55 0.62 0.50 0.52 0.52 0.39 0.53 0.09 0.42 0.34 0.33 0.37 0.68 0.72 0.63 0.64 0.65 0.56 0.70 0.51 0.64 0.52 0.38 0.38 0.75 0.75 0.67 0.65 0.63 0.57 0.55 0.62 0.72 0.63 0.48 0.48 0.84 0.83 0.77 0.75 0.74 0.71 0.72 −0.42 −0.22 −0.18 −0.05 −0.01 −0.07 −0.03 −0.04 −0.01 0.02 −0.01 0.15 0% 0% 5% 10% 17% 20% 20% 24% 32% 33% 54% 63% To examine the variance in people’s probability estimates for the same items, we repeatedly presented the participants with the same judgements and asked them to provide estimates for them on each occasion. This overwhelmingly demonstrated that participant judgements are noisy - estimates typically varied from one occasion to the next and the complex statements had more variance than the constituents. Constituents were typically more variable where the fallacy rate was close to 0%, the conjunctions were typically more variable where the high fallacy rates were observed. For disjunctions, they were usually more variable than their constituents regardless of fallacy rate. Additionally, high fallacy rates are commonly observed where the constituent and complex estimates are close to each other. This is consistent with the PTN which predicts that fallacy rates are due to the higher variance in the conjunction pushing the conjunction estimate above the constituent estimate. If the conjunction and constituent are close in value, then this is more likely to happen. Generally, it has been found in the literature that high probability - low probability constituent pairings (PLow(A) vs PLow(A) PHigh(B)) tend to produce the highest fallacy rates. Our results are consistent with this finding. However, here, as in the literature, “high-low” is a subjective observation of a constituent probability value that is a priori decided by the researcher rather than based on an objective, observable probability value. Further research is needed on whether this would hold if objectively high and low constituents were used. Furthermore, we observed that participants are frequently inconsistent in producing fallacies for the same stimulus. For the conjunctions, nearly half of all the estimates were inconsistent i.e. participants produced a fallacy for some but not all of the repeated estimates for a given stimulus. A 100% fallacy rate for any of the stimuli was rare, with the majority of consistent responses being the 0% fallacy rate. If participants were producing their estimates using a heuristic-based approach, we would expect to see participants consistently producing or avoiding a fallacy for the repeated estimates. Analysis of the relationship between fallacy rates and variance in probability estimates demonstrated that fallacies typically occurred where the conjunction or disjunction was more variable than the constituent. Complex statements are typically more variable than constituents. Fallacy rates are a product of both variance in the estimates and the “true” probability values of the 15 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 4. Fallacy rate and individual variance. The relationship between the difference in variance and the fallacy rate for experiment 2 for individual estimates is shown above. Each participant gave multiple estimates for the same constituent, conjunction and disjunction, so individual fallacy rates and variance for probability estimates could be calculated for each participant. Fallacies typically occurred when there was a positive overlap in confidence intervals and when there was a positive difference in variance - that is, when the complex item was more variable than the constituent. Low fallacy rates were more likely to occur when there was a negative difference in variance or no overlap between constituent and complex CI. Above we observe that for fallacies to occur, the conjunction or disjunction is typically more variable than the constituent. stimuli. Little research has been done where the objective probability is known or even available to the researchers. Stimuli such as the “Linda problem” have only subjective probabilities. For other research that uses real world events - like the weather events used here or future sporting events (e.g. Teigen, Martinussen, & Lund, 1996) - it might be possible to calculate objective values for them but at best these are non-stationary and it is typically truer to say that these have subjective probability values too. To fully understand the role of probability in producing estimates and its impact on fallacy rates and whether participants are objectively skilled reasoners, participants must produce estimates for stimuli that have accessible, objective probability values. In situations where participants produce fallacies at a varying rates but where we have no access to objective probabilities, we cannot fully determine why this range of results exists. We investigate this in experiment 3. 4. Experiment 3 For experiment 3, we investigated the impact of two factors on variability: probability values and sample size. Typically, research on the conjunction and disjunction fallacy has not used experimental stimuli that has observable objective probabilities: most of the research to date has employed stimuli with subjective probabilities - e.g. scenarios about people. The binomial model predicts that variance in estimates is a result of the probability values of the stimuli. Here, we present the participants with simple judgements where the underlying, objective probability is controlled. This will allow us to examine how variability of estimates relates to probability value. In addition to this, we will also look at the role of sample size in probability estimates and variability. To this end, participants will be presented with stimuli that preserves the underlying probabilities while modifying the sample size. To examine the internal variability of the participants, we present them with repeated probability judgements. They saw images where each image contains a set number of shapes differing in colour (red, white or green) and configuration (solid or hollow). For each image, participants were asked to estimate the probability of an event (a randomly selected shape being red, for example). The true probability of events in these images were held constant across multiple presentations (but with the images themselves varying as to the position of the shapes on the screen each time), as described below. Each participant saw multiple presentations of the same probability question (multiple questions for which the objectively correct probability was the same), allowing us to estimate the degree of random variation in participant estimates. Some questions asked about simple events (a shape being red, being hollow, etc.) while other questions asked about conjunctive and disjunctive events (a shape being red and solid, a shape being white or hollow, etc.) Two distinct sets of images were used, with objective probabilities held constant in each set (see below). The images from these two sets were interspersed with each other. Participants answered questions about 460 images in total. Images were only on screen for a short time (2 s), so participants did not have time to count the occurrence of shapes of different types. Images were presented in randomised order. 4.1. Materials The images consisted of shapes of three colours - colours C1, C2, and C3 respectively - and 2 shape configurations - S1 and S2 - with fixed probabilities. To prevent the participants from remembering or recognising the images after multiple repetitions, the actual 16 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello colour varied from image to image, so sometimes colour C1 was white, sometimes colour C1 was red and sometimes colour C1 was green but the objective probability value assigned to C1 remained the same. The colours varied in the same way for colour C2 and colour C3. The actual configuration of the shapes also varied from image to image so sometimes configuration S1 was the solid shapes and sometimes configuration S1 was the hollow shapes. As with C1, the objective probabilities were held constant. Conjunction and disjunctions were created for a number of combinations of colour and configuration such as P(C1 S1), P(C1 S2 ) and P(C2 S1). For each type C1, C2, C3, S1, S2, C1 S1, C1 S2 etc, there were 20 images asking participants to estimate the probability of that type. In practise, this meant that the participants saw 20 images asking them to estimate the probability of colour C1, 20 images asking them to estimate the probability of colour C2, 20 images asking them to estimate the probability of configuration S1 and so on. Each image presentation included a question to elicit a probability judgement. For the colour probability questions, participants were presented with questions in the form “What is the probability of picking a shape that is [colour C1]?” or “What is the probability of picking a shape that is [colour C2]?” For the configuration questions, participants were presented with questions in the form: “What is the probability of picking a shape that is [configuration S1]? or “What is the probability of picking a shape that is [configuration S2]?” The conjunction and disjunction questions took the same form. For instance, the question to elicit a probability judgement for the objective probability of 0.63 in set 1 would be: “What is the probability of picking a shape that is [colour C1 AND configuration S1]?”. 4.1.1. Set 1 - probability values The stimuli in set 1 was designed to investigate how probability values effect variability. In set 1, colour C1 had a fixed probability of 0.7, colour C2 had a fixed probability of 0.2 and colour C3 had a fixed probability of 0.1. Configuration S1 had a fixed probability of 0.9 and configuration S2 had a fixed probability of 0.1. The conjunctions for set 1 were created using the following colour and configuration combinations: P(C1 S1), P(C2 S1), and P(C2 S2 ). These corresponded to the objective probability values of 0.63, 0.18 and 0.02. The disjunctions for set 1 were created using the following colour and configuration combinations: P(C1 S1), P (C2 S1), P(C1 S2 ), and P(C2 S2 ). These disjunction combinations corresponded to the objective probability values of 0.97, 0.92, 0.73, and 0.28. Participants viewed 220 images of 20 geometric shapes on a computer screen with a probability question. Colour C3 was excluded from the probability questions for this set (as both C3 and S2 had objective probability value of 0.1). 4.1.2. Set 2 - Sample size Set 2 was designed to investigate how sample size effects probability estimation and variability. To this end, the probability values were fixed for each level. For set 2, colour C1 had the fixed probability of 0.333, colour C2 had the fixed probability of 0.333 and colour C3 had the fixed probability of 0.333. Configuration S1 had the fixed probability value of 0.5 and configuration S2 had the fixed probability value of 0.5. The conjunction for set 2 had the value of 0.17. Any combination of C1,C2,C3 and S1,S2 would give this value. The disjunction had the objective probability value of 0.67. Again, any combination of C1,C2,C3 or S1,S2 would give this value. Participants viewed 240 images of geometric shapes in a computer screen. Each image consisted of 12, 24, or 36 shapes (levels 1, 2, and 3 respectively). Each objective probability values of 0.333, 0.5, 0.17, and 0.67 were presented 20 times for each of the 12, 24 and 36 shape images. At the bottom of each image was a question asking participants about the probability of some event (shape, color, or shape/color conjunction) given the sample shown in the image. This was followed by a slider scale: participants moved the bar on this scale to select their estimated probability for the event in question. A box to the right showed the currently selected probability paired with a button labelled ‘next’: clicking that button recorded the participant’s probability estimate and moved the participant on to the next screen (see Fig. 5). For ease of use the slider’s position remained where the participant had placed it as the participant moved on to the next screen. 4.2. Procedure Participants were seated at a screen. Each participant began with a training trial of sample stimuli to familiarize themselves with the task. Training trials used different probability combinations to the main experiment. Once the participants were comfortable with the task, they moved onto the experimental trials. The static image and the probability question appeared on screen simultaneously. The image was replaced with a blank screen once 2 s had elapsed to prevent the participants from counting the shapes. The associated question remained on-screen until the participants had made their guess. The participants indicated their estimate by moving a mark on a slider using their mouse or arrow keys. This slider had a minimum value of 0 and a maximum value of 1. Responses were discretized. A box in the corner indicated the exact value of the participants’ estimate and dynamically updated as they moved the slider. When the participant was satisfied with their answer, they submitted it by clicking on a “Next” button. This also triggered the succeeding image and probability question. 4.3. Results A total of 9 participants made 460 probability judgements each. Their responses and response time was recorded for each judgement. Two of the participants were excluded from the final analysis for failing to answer over 20% of the questions. The number of participants is consistent with other studies of probability perception (e.g. Gallistel, Krishan, Liu, Miller, & Latham, 2014). 4.3.1. Estimation and probability theory To test whether there is evidence of normative reasoning in the participants’ estimates, we employed the addition law in a variety of ways. The estimates for A, B, and their conjunction and disjunction combinations were used to calculate the addition law values as 17 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 5. Example stimulus image for experiment 3 and 4. The figure above displays example stimuli image from set 1 in experiment 3 in grey scale. While the shape types and colours changed between images, the underlying proportions remained constant. The image above has a shape configuration of 0.9 for solid shapes and 0.1 for hollow shapes. The colours have fixed probabilities of 0.7, 0.2 and 0.1. in the previous experiments. For estimates to be compliant with probability theory, the terms should cancel to zero. The addition law was calculated for estimates in both set 1 and set 2. In set 1, the addition law was calculated for the following4: P (C1) + P (S1) P (C2) + P (S2) P (C2) + P (S3) P (C1 P (C2 P (C2 S1) S2) S3) P (C1 P (C2 P (C2 S1) = 0 S2) = 0 S3) = 0 Consistent with the previous experiments, the identities were close to zero, varying minutely around the value. An overall value of 0.037 was found for the sample. Fig. 6 shows the scatterplot with the positive and negative terms for these addition law identities in participant responses. A Deming regression was calculated using the participant estimates to investigate whether the estimates agreed with the addition law prediction. As in experiment 1, values for the addition law are distributed approximately symmetrically around the line of identity, with the line of best agreeing closely with the line of identity, as predicted by our model. A JZS Bayes Factor analysis based on a paired t-test of x and y values in this scatterplot gave strong evidence in favour of the null hypothesis that x and y values were equal (Scaled JZS Bayes Factor = 25.5), again confirming the conclusion that the addition law identity holds in individual participant probability estimates. 4.3.2. Addition law and fallacy rate The PTN predicts that the fallacy rates will be related to the addition law. The conjunction fallacy rate relative to A should follow the disjunction fallacy rate relative to B, for any pairing of A,B. This arises as a natural consequence of the addition law. We see strong B ) strongly correlated for the indications that this is the case, with the fallacy rates for P (A B ) vs P (A) and P (B ) vs P (A participants’ judgements in set 1, r = 0.91, p < 0.0005 and set 2, r = 0.832, p < 0.001. This can be observed in Table 8. 4.3.3. Estimate accuracy Representativeness accounts typically posit that participants are poor estimators of probability. Here, we can investigate how accurate judgements are by comparing them to the objective probability. For each of the 11 probability values in set 1, each participant gave 20 estimates for its value. In set 2, the 4 probability estimates were questioned at 3 different levels; 20 estimates were given for each probability value at each level. The relationship between mean probability estimates and objective probability are displayed in Fig. 7. For each probability value, the participants’ average estimate and standard deviation were calculated. The average estimate and standard deviation were also calculated for the sample. The average deviation from the true probability was calculated in terms of percentage points. Some noticeable trends were observed, participants tended to overestimate the low probabilities and underestimate the higher probabilities. The degree of overestimation for the low constituents was much less than for the low complex statements. For instance, the constituent with a true probability of 0.1 have an average estimate of 0.13, while the conjunction with a probability of 0.02 had an average estimate of 0.14. Overall, conjunctions were overestimated and disjunctions were underestimated. The conjunctions’ average deviation from their objective value was by 10 percentage points, the disjunction average deviation their true probability was by 17 percentage points while the constituent average deviation was by 7 percentage points. Fig. 7 shows the average estimate for each type. For set 2, the conjunctions were overestimated on all occasions, with the average estimate increasing as the stimulus set became more complex. 4 No estimates were elicited for P (C1 S3) so the addition rule could not be calculated for combinations of P (C1) and P (S3) . 18 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 6. The figure above shows a scatterplot of all the positive and negative terms for all estimates and participants in experiment 3. The positive term is the sum of the P(A) and P(B) estimates while the negative term is the sum of the P(A B) and P(A B) estimates. The correlation between the pairs for set 1 was r = 0.794, p < 0.00001. The pairs for set 2 had a correlation value of r = 0.79, p < 0.00001. As the groups were very similar in their estimates, they were collapsed. For the scatterplot, normative probability is represented by the line of identity, shown in grey. A Deming regression was calculated to determine the best fit line. This is shown in black on the scatterplot. For the addition law to hold, the points must be symmetrically distributed around this line of identity. Table 8 Restricted Estimate difference and fallacy rate. The table below displays the difference of conjunction, disjunction and constituent estimates and its PE (A) for conjunction and corresponding fallacy rate for both sets in experiment 3. Estimate difference was found by calculating PE (A B ) B ) for each disjunction and constituent pair. Differences approaching 0 were observed with fallacy rates above constituent pair and PE (B ) PE (A 50% while negative differences were associated with fallacy rates less than 50%. The PTN predicts that the conjunction rate relative to A should follow the disjunction rate relative to B, for any pairing of A,B. If the participant judgements are consistent with this prediction, then we should see analogous responses to conjunctions and disjunctions; for example, when the conjunction fallacy rate versus A is low, then the disjunction rate versus B should be low. This arises as a natural consequence of the addition law. By rearranging the terms of the addition law, we see that P (A B ) P (A) = P (B ) P (A B ) . Below, we see strong indications that this is the case, with significant correlations for both set 1, r = 0.91, p < 0.05 and set 2, r = 0.83, p < 0.05 P (A PO (A) PO (A B) B ) - P (A) P (B ) - P (A Average Difference Fallacy rate PO (B) PO (A B) B) Average Difference Fallacy rate Set 1 – 0.9 0.2 0.9 – 0.2 0.1 0.7 – 0.18 0.02 0.63 – 0.18 0.02 0.63 – −0.70 −0.10 −0.11 – −0.10 −0.04 −0.06 – 0% 14% 15% – 19% 42% 68% 0.1 0.2 0.1 0.7 0.7 0.2 0.9 0.9 0.73 0.92 0.28 0.97 0.73 0.28 0.92 0.97 −0.66 −0.62 −0.14 −0.12 −0.11 −0.11 −0.09 −0.08 0% 0% 6% 17% 18% 36% 45% 71% Set 2 0.5† 0.5§ 0.5‡ 0.33‡ 0.33§ 0.33† 0.17 0.17 0.17 0.17 0.17 0.17 −0.27 −0.23 −0.22 −0.19 −0.19 −0.16 4% 4% 11% 13% 13% 19% 0.33† 0.33‡ 0.33§ 0.5‡ 0.5§ 0.5† 0.67 0.67 0.67 0.67 0.67 0.67 −0.23 −0.2 −0.23 −0.16 −0.15 −0.15 14% 20% 22% 23% 24% 40% Note: †12 shapes, ‡24 shapes, § 36 shapes The disjunctions were consistently underestimated. Participants were more accurate in their estimates for the constituents. The 12shape combinations had the lowest average estimates, the 24-shape estimates were higher than the 12-shape and lower than the 36shape estimates. The 36-shape images had the highest mean estimates. 4.3.4. Variability in probability estimation As with the previous experiments, the average estimate difference for the complex item and its constituents were calculated and compared with the fallacy rate for that item. Significant positive correlations were observed for both the conjunction average difference and fallacy rate, r = 0.66, p < 0.05, and the disjunction average difference and fallacy rate, r = 0.73, p < 0.01. A consistent relationship was observed between the average difference and the fallacy rate where higher fallacy rates are associated with positive average 19 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 7. The above graph displays the average probability estimate vs the objective probability value by type for both sets in experiment 3. Any value falling above the line represents an overestimation of the probability value, while the values falling below the line represent underestimations of the true value. Largely, conjunctions were overestimated and disjunctions were underestimated. Constituents tended to have accurate estimates. Note: The values from set 1 are represented by white shapes. In set 2, the values for the 12-shape estimates are shown in black, the values for the 24-shape estimates are shown in dark grey and the values for the 36-shape estimates are shown in light grey. differences and lower fallacy rates are associated with negative differences. As before, the restricted estimate difference was calculated and used to predict fallacy rates. The fallacy rate was calculated for each pair and any instance where a participant had made the fallacy was excluded from the analysis of the average difference. The results can be observed in Table 8. There was a significant positive correlation between the restricted estimate difference and fallacy rate for conjunctions, r = 0.57, p = 0.05 and for disjunctions, r = 0.63, p < 0.05. Each conjunction and constituent was presented 20 times to each participant. To evaluate the rate at which the participant had committed the conjunction fallacy, each conjunction judgement 1…20 was matched in order with its corresponding constituent judgements 1…20 , so the first conjunction judgement was matched with the first constituent judgements, and so on. If a particular conjunction judgement exceeded the estimate of either of the corresponding constituent values, an instance of the conjunction fallacy was recorded. For each participant, there were six conjunction questions where the fallacy could be committed, three from set 1 and three from set 2. The average conjunction fallacy rate was 19%. Fallacy rates ranged from 0% to 68% per constituent-conjunction pair: a range that is in-line with those seen in description based studies (e.g. Stolarz-Fantino et al., 2003). The set-up of this experiment allows us to categorise conjunctions based on their actual probabilities and their underlying constituent probabilities. The participants showed marked differences in performances for each of the six conjunctions they were presented with. Table 8 displays the fallacy rate breakdown by conjunction type. As with the conjunction fallacy, each disjunction judgement was matched with the constituent judgements in sequence, so the first disjunction judgement was matched with the first instances of the relevant constituent judgements. If a disjunctive estimate was less than either of its constituent estimates then it was counted as an instance of the disjunction fallacy. The average disjunction fallacy rate was 24%. The fallacy rate ranged from 0% to 71%, which is consistent with the results from description based research and simulations of the PTN. The average fallacy rate for the each of the 7 possible disjunctions is displayed Table 8. As for the conjunctions, the objective probability value of the disjunction was not an indicator of fallacy rate occurrence. Conjunction and disjunction fallacy occurrence varied over the course of presentation, however, there was no obvious trend of improvement or deterioration in the participants ability to avoid committing the fallacies (that is, fallacy rates did not decline with task familiarity). In experiment 2, participants gave 5 repeated estimates of the same probability question so we could measure the internal variance and the consistency of fallacy production of each participant. In experiment 3, participants gave 20 estimates for each objective probability. Here, an inconsistent response occurred where the participant produced a fallacy on 1–19 of the possible occasions for a conjunction or disjunction. A consistent fallacy response occurred when the participant produced a fallacy response on 0 or 20 of the occasions. The fallacy response rates were calculated for each participant in addition to the average difference in estimate between the conjunction and constituent. These results are displayed in Fig. 8. Participants with low fallacy rates typically had negative differences in estimates, with increasingly positive estimate differences as the fallacy rate rose. The maximum number of fallacies committed by any of the participants was 17 (of a possible 20). In total, 27% of the fallacy responses were consistent, with a participant either producing a fallacy in all responses for a given item, or in no responses for that item (all the consistent responses involved no fallacy production) and 73% of the responses were inconsistent (with the same participant sometimes producing fallacy responses for a given item and sometimes not). Variance. Since each conjunction, disjunction and constituent was presented 20 times to each participant, we can estimate the degree of variance (standard deviation) in estimates for type. Recall that the PTN model predicts greater variance would exist for the complex combinations than the constituents. The average SDs revealed that the conjunctions were noisier than their constituent counterparts for 75% of the comparisons. In a breakdown by participant, the conjunctions were more variable on 33% of the occasions to 75% of the occasions, depending on the participant. The average SDs for the disjunctions showed that they were more 20 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 8. The above graph displays the inconsistent fallacy production by the participants in experiment 3. Each participant gave 20 estimates for each constituent-conjunction pair. Calculation of individual fallacy rate, and binning of average differences, was as described in Fig. 3. The PTN predicts that the inconsistent estimates should be grouped around zero, with increasingly positive differences as the rate of fallacy production increased while the consistent estimates will have negative average differences for those that produce zero errors and positive average differences for those that produce twenty errors. Typically the error rate for fallacies was low, however, a large difference could be observed in the rates depending on the underlying probability. The majority of responses were inconsistent. The consistent responses were all 0 fallacy responses here, as no participant made more than 17 fallacy responses for a gi.ven conjunction. variable than their constituent counterparts for 100% of the comparisons. This ranged from 64% of the occasions to 100% of the occasions, depending on the participant. This supports the PTN model assumption that conjunction and disjunction fallacies arise due to variability in conjunction and disjunction estimates. Overall, the complex combinations had higher average standard deviations than the constituents. Overall, Levene’s test found statistical significance in 62% of the comparisons. The conjunctions were found to have statistically higher levels of variance on 17% of the occasions and the constituents being statistically more variable than the conjunctions on 8% of occasions. For the disjunctions, Levene’s test found that they had significantly higher levels of variance than their constituents in 93% of the comparisons. 95% confidence intervals were constructed for each conjunction - constituent and disjunction - constituent pair. The overlap between the confidence intervals, the difference in their respective SDs and the fallacy rate can be seen in Tables 9 and 10. A significant positive correlation was observed for the CI overlap and the conjunction fallacy rate, r = 0.598, p < 0.05. A significant positive correlation was also observed for the CI overlap and the disjunction fallacy rate, r = 0.65, p < 0.05. Higher fallacy rates were observed where the constituent and complex statement were close to each other. Individual Variability. As in experiment 2, the nature of experiment 3 - repeated elicitations of probability estimates - allows us to examine individual variability in participant estimates. In total, there were 12 constituent and conjunction comparsions for each participant - e.g. PE (C1) vs PE (C1 S1), PE (S1) vs PE (C1 S1) . There were 14 occasions were the constituent and disjunction variability could be compared, e.g. PE (C1) vs PE (C1 S2), PE (S2) vs PE (C1 S2) . The relationship between variance and fallacy rates for individual participants is displayed in Fig. 9. Typically, higher fallacy rates were associated with positive differences in SD - that is, that the complex statement was more variable than the constituent. 4.4. Experiment 3 discussion As we observed in experiments 1 and 2, the participant estimates for this experiment were consistent with the addition law of probability theory, showing only the mild deviations observed in the other experiments. Despite the different stimuli used compared 21 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 9 Confidence Intervals for Conjunction estimates (Exp 3). The 95% confidence intervals for the constituent-conjunction pairs in experiment 3 based on the restricted estimates. A positive value for the overlap meant that the estimates were typically close to each other, while a negative overlap meant that the estimates were typically far from each other. A positive correlation between the restricted CI overlap and fallacy rate was observed, r = 0.598, p < 0.05. PE (A) 95% CI PO (A) 0.9 0.5† 0.5§ 0.5‡ 0.33‡ 0.33§ 0.2 0.9 0.33† 0.2 0.1 0.7 PO (A B) 0.18 0.17 0.17 0.17 0.17 0.17 0.02 0.63 0.17 0.18 0.02 0.63 PE (A B ) 95% CI Low High Low High Overlap Fallacy 0.84 0.46 0.48 0.43 0.39 0.41 0.21 0.87 0.32 0.22 0.13 0.75 0.87 0.49 0.5 0.46 0.42 0.45 0.24 0.89 0.36 0.25 0.16 0.8 0.15 0.18 0.24 0.22 0.21 0.23 0.11 0.75 0.16 0.13 0.1 0.68 0.17 0.22 0.27 0.25 0.23 0.26 0.13 0.78 0.19 0.14 0.12 0.74 −0.67 −0.24 −0.20 −0.18 −0.16 −0.15 −0.08 −0.09 −0.13 −0.08 −0.02 −0.01 0% 4% 4% 11% 13% 13% 14% 15% 19% 19% 42% 68% Note: †12 shapes, ‡24 shapes, § 36 shapes. to the previous experiments (estimates for language statements vs estimates for visual stimuli), the participants still produced estimates consistent with this aspect of normative reasoning. From this, we can assume that the reasoning process employed in both scenarios are consistent and the results are comparable. The set-up for this experiment is relatively novel in work on cognitive biases and to our knowledge, only a small number of studies (e.g. Wedell & Moro, 2008) have investigated conjunction or disjunction fallacies where the underlying probability was known and none have explicitly looked at variance in those responses. It gives us a position to examine how participant estimates relate to objective estimates, how fallacy rates are influenced by probability values and how sample size effects estimation. Overall, participants were most accurate for the constituents and more accurate for the conjunctions than disjunctions. Fallacy rates observed here (0–68% for conjunctions, 0–71% for disjunctions) are in line with those observed for other conjunction/disjunction studies. We addressed the observation in the literature that the “high-low” constituent pairings produced the highest fallacy rates. However, the observation here that this is not the case with the objective probabilities. For instance, the constituent “high-low” pairing of 0.9, 0.2, only produced fallacy rates of 0% and 19% respectively. In fact, the difference between the constituent and objective conjunction or disjunction value was a much better indicator of fallacy rate. The closer (in probability value) the constituent was to the conjunction or disjunction, the higher the resulting fallacy rate was likely to be. As in the experiments with the description based stimuli, average estimate difference was a good predictor of fallacy rate, with positive correlations observed for both the conjunction and disjunctions. However, this brings into question the exact role of probability values in fallacy responses. If average difference and variance are a very good predictors of fallacy rates then it is possible Table 10 Confidence Intervals for Disjunction Estimates (Exp 3). The 95% confidence intervals for the constituent-disjunction pairs for experiment 3. The restricted participant estimates were used to calculate the confidence intervals. The larger the negative overlap, the smaller the fallacy was likely to be, as the overlap got closer to 0, the fallacy rate increased. A reliable positive correlation is observed between the restricted CI overlap and fallacy rate, r = 0.65, p < 0.05 . PE (A) 95% CI PE (A B ) 95% CI Constituent Disjunction Low High Low High Overlap Fallacy 0.2 0.1 0.1 0.33† 0.7 0.7 0.33‡ 0.33§ 0.5‡ 0.5§ 0.2 0.5† 0.9 0.9 0.92 0.73 0.28 0.67 0.97 0.73 0.67 0.67 0.67 0.67 0.28 0.67 0.92 0.97 0.21 0.12 0.12 0.3 0.69 0.69 0.37 0.37 0.4 0.46 0.18 0.42 0.81 0.75 0.23 0.14 0.14 0.33 0.73 0.73 0.4 0.41 0.44 0.49 0.2 0.47 0.86 0.83 0.82 0.78 0.24 0.52 0.81 0.81 0.56 0.59 0.55 0.6 0.27 0.57 0.91 0.86 0.87 0.81 0.28 0.57 0.84 0.84 0.6 0.64 0.6 0.65 0.32 0.62 0.94 0.89 −0.58 −0.64 −0.11 −0.19 −0.09 −0.08 −0.16 −0.18 −0.11 −0.12 −0.07 −0.11 −0.05 −0.03 0% 0% 6% 14% 17% 18% 20% 22% 23% 24% 36% 40% 45% 71% Note: †12 shapes, ‡24 shapes, § 36 shapes 22 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 9. This graph shows the relationship between the difference in variance for individual estimates and the fallacy rate for materials in experiment 3. Each participant gave multiple estimates for the same constituent, conjunction and disjunction, so individual fallacy rates and variance for probability estimates could be calculated for each participant. Fallacies typically occurred when there was a positive overlap in confidence intervals and when there was a positive difference in variance - that is, when the complex item was more variable than the constituent. Low fallacy rates were more likely to occur when there was a negative difference in variance or no overlap between constituent and complex CI. that probability values play no direct role in fallacy rates - rather it could be due entirely to higher variance in the complex item and the absolute differences between the two values that fallacy rates occur. The PTN predicts that higher fallacies will occur with conjunctions close to 0.5 than at the extremes. Currently, however, we cannot say conclusively that this is the case. In the following experiment, we address this by controlling the distance between constituents and conjunctions. As the objective probabilities were available for this experiment, we were able to directly test the values predicted by the binomial model against those we calculated from the participant estimates. Overall, we observed that the model was consistent with the participant data, with the participants’ variance showing the same trends that the binomial model predicts. In this experiment, we observed that the probability value effects the conjunction fallacy rate, with the highest rates observed where p was close to 0.5. This is where the binomial model predicts the greatest variance and where we observed the greatest variance in estimates. Under the binomial model, the variance in an estimate is predicated by its probability value and a conjunction fallacy response occurring is most likely where P (A B ) = 0.5 and the P(A) value is further from 0.5 than the conjunction and so not as variable as the conjunction. The results suggest a pattern that is consistent with the predictions of the binomial variance model. However, the p values here are not evenly spread across the scale, nor are we able to B ) responses of the same p value, so directly compared the variance (as predicted by the binomial model) for P(A), P (A B ) and P (A we cannot draw strong conclusions about the accuracy of the model. We address these issues in the following experiment. 5. Experiment 4 In experiment 3, we examined the role of probability value in estimate variance. Here, we take that further to look at the variability of constituents and conjunctions with the same objective probability value. We aim to examine how making different probability judgements effects the response while the objective probability remains the same. To this end, we will compare the probability estimates and variance for constituents and conjunctions of the same objective probability value. It is expected that greater variance and greater deviation from the objective value will be observed for the conjunctions. The previous experiments have looked at estimate difference as a predictor of fallacy rate. The average difference between the constituent and conjunction has been shown to be a good indicator of fallacy rate. Here, we control the distance between the constituent and conjunction, so that they are either |0.1| or |0.15| apart . The PTN argues that probability values will impact fallacy rates. If participants are accurate in their judgement, then we would expect to see differences of approximately −0.1 or −0.15 (depending on the pair) between the constituent-conjunction pairs (PE (A B ) PE (A) ), and we would expect to see low fallacy rates and minimal difference between the fallacy rates for each of the pairings. Again, this experiment involves repeatedly presenting participants with images where each image contains a set number of shapes differing in colour and configuration. For each image participants are asked to estimate the probability of some event (e.g., a randomly selected shape being red). The true probability of events in these images were held constant across multiple presentations. Each participant saw multiple presentations of images for which the objectively correct probability was the same, allowing us to estimate the degree of random variation in participants estimates. Some questions asked about simple events (a shape being red, being hollow, etc.) while other questions asked about conjunctive events (a shape being red and solid, etc.) Images were only on screen for a short time (2 s), so participants did not have time to count the occurrence of shapes of different types. Images were presented in randomised order. 23 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello 5.1. Materials The material set for this experiment consisted of 192 images, each with 20 shapes of varying types and colours. The images were organised into 7 ‘probability sets’ so that all images in a given set contained the same number of occurrences of some constituent A, the same number of occurrences of some constituent B, and the same number of occurrences of the conjunction A B . These event counts (and hence the objective probabilities of these events A, B , and A B ) were the same in all images in a given set: these counts and probabilities are given in Table 11. However, the actual concrete instantiation of each event varied randomly from image to image within each set (so that in one image in the first set, A would be represented by red, B by solid, and there would be 5 red shapes 5 solid shapes, and 3 solid red shapes; while in another image in the same set A would be represented by hollow and B by blue and there would be 5 hollow shapes, 5 blue shapes, and 3 hollow blue shapes; and so on). The position of shapes also varied randomly across images. This variation in event representation and position was designed to ensure that participants could not respond by recalling estimates given for previous images: all images were unique. There were 24 images for each probability set - 12 presented with a question asking participants to estimate the probability of single event A (however it was represented in that particular image) and 12 presented with a question asking participants to estimate the probability of conjunctive A B (however it was represented in that particular image). In addition to these 7 probability sets there was a filler set containing 12 images with single-event questions and 12 with conjunctive event questions, but with no relation between those single and conjunctive events. The full set of images were presented in random order: images were not grouped according to probability set, and filler images were interspersed throughout. Probability sets were designed so that probabilities presented to participants would be either 0.15, 0.25, 0.35, 0.5, 0.65, 0.75, 0.85 or 0.95, for both single event A and conjunctive event A B . This was to allow direct comparison of single and conjunctive probability estimates for cases where the single event and the conjunctive event had the same underlying objective probability. Participants were only asked to estimate probabilities for event A and event A B in each set: estimates for event B were not obtained. As in Experiment 3, each image was paired with a question asking participants about the probability of some event (shape, color, or shape/color conjunction) given the sample shown in the image. This was followed by a slider scale: participants moved the bar on this scale to select their estimated probability for the event in question. A box to the right showed the currently selected probability paired with a button labelled ‘next’: clicking that button recorded the participant’s probability estimate and moved the participant on to the next screen (see Fig. 5). In this experiment the slider’s position was reset to the center of the probability scale when the participant moved on to the next screen. 5.1.1. Procedure Participants were seated at a screen. They began with a training trial of sample stimuli to familiarize themselves with the task. Once the participants were comfortable with the task, they moved onto the experimental trials. The static image appeared on screen. The image was replaced with a black screen with a fixation point once 2 s had elapsed to prevent the participants from counting the shapes. The probability question appeared once the static experimental image has disappeared. The question remained on-screen until the participants had made their estimate. The participants indicated their estimate by moving a slider using their mouse. This slider had a minimum value of 0 and a maximum value of 1. A box in the corner indicated the exact value of the participants’ estimate and dynamically updated as they moved the slider. When the participant was satisfied with their answer, they submitted it by clicking on a “Next” button. This also triggered the succeeding image and probability question. 5.2. Results In total, 12 participants produced estimates for 192 images. Both their responses and response time were recorded. The results are detailed below. For each conjunction judgement P (A B ) , it had its own constituent, P (A) , against which we could check Table 11 Objective probabilities and average probability estimates for materials in Expt 4. This table shows the event counts, objective probability values, and participants average probability estimates, for events in the 7 probability sets used to construct images in Experiment 4. Every image contained 20 events in total; all images for the first probability set would contain 5 instances of event A, 5 instances of event B and 3 instances of A B , and so on. The concrete instantiation of each event varied randomly from image to image within each set (so that for one image in the first set, A would be represented by red and B by solid and there would be 5 red shapes, 5 solid shapes, and 3 solid red shapes in the image; while in another image in the same set A would be represented by hollow and B by blue and there would be 5 hollow shapes, 5 blue shapes, and 3 hollow blue shapes in the image; and so on). No probability estimates were gathered for event B. A B 5 7 10 13 15 17 19 5 7 10 13 15 17 17 A B 3 5 7 10 13 15 17 PO (A) PE (A) PO (B) PE (B ) 0.25 0.35 0.50 0.65 0.75 0.85 0.95 0.323 0.387 0.479 0.580 0.705 0.766 0.873 0.25 0.35 0.50 0.65 0.75 0.85 0.85 – – – – – – – 24 PO (A 0.15 0.25 0.35 0.50 0.65 0.75 0.85 B) PE (A B) 0.253 0.330 0.421 0.481 0.604 0.714 0.785 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello conjunction fallacy rates. In addition to this, it had a value matched constituent P (C ) , which had the same objective probability as the conjunction P (A B ) but the conjunction was not a subset of the constituent C, so no fallacy rates could be derived from their comparison. 5.2.1. Estimate accuracy As in experiment 3, we were able to compare the subjective responses to objective population probability values. For each of the 8 objective values, there was a constituent and a conjunction response elicited for its value. The relationship between the average probability estimates and the objective “true” probability values is displayed in Fig. 10. As each objective value has both constituent and conjunction responses, we are able to examine the role that type plays in probability estimation. Typically, the following trends were observed, regardless of type: probability estimates were overestimated for values less than 0.5, estimates for the objective value of 0.5 were the most accurate and estimates for values about 0.5 were underestimated. Fig. 10 also displays the average amount of deviation from the true probability value. Constituents had less deviation from the true probability value than the conjunctions for values less than 0.5, similar amounts of deviation for 0.5, and greater deviation from the objective value for values above 0.5. Conjunctions had greater deviation from the objective probability for values less than 0.5, the same deviation for 0.5, and less deviation from the objective probability value for values over 0.5. 5.2.2. Variability in probability estimation As expected, the total conjunction fallacy rate for the sample was very low, with an average of 24%. As the objective difference was controlled between the constituents and conjunctions, it was hypothesised that there would be no relationship between average difference and fallacy rates. Pearson’s correlation found no significant relationship, r = 0.405, p > 0.05. The fallacy rate and average estimate difference were partitioned and calculated as in the previous experiments. A relationship was not observed for the restricted estimate differences and fallacy rate and no correlation was found between the two, r = 0.655, p > 0.05. As with experiment 2 and 3, each participant saw multiple presentations of each item which allowed us to test the PTN prediction that participants will produce the fallacy in an inconsistent fashion for the same item. For this experiment, fallacy rates of 0 or 12 (of a possible 12) were counted as a consistent fallacy response while the occurrence of 1–11 (of a possible 12) fallacies per item were counted as an inconsistent fallacy response. The majority of responses were inconsistent and no participant had 100% fallacy rate for any of the conjunctions. The maximum observed fallacy rate by any participant for any of the conjunctions was 75% (9 out of a possible 12). Fig. 11 displays the fallacy rate occurrence and its corresponding average estimate difference. In total, 13% of the fallacy responses were consistent (no participant produced 12 fallacy responses for any of the conjunctions so all the consistent responses are 0 fallacies here) and 87% of the responses were inconsistent. Participants that produced zero fallacy responses had an average difference of zero or less. Participants that had inconsistent fallacy responses had average values grouped around zero with increasingly positive results as the rate of fallacy production increased. Group Variance. With this experiment, we could examine variance in two ways; variance between the constituent-conjunction pairings (P (A) vs P (A B ) ) and variance between the value-matched constituents and conjunctions (P (C ) vs P (A B ) ). Overall, the conjunctions P (A B ) were more variable than their constituents P (A) on 86% of occasions. Levene’s test of equality Fig. 10. The graph above displays the average probability estimate vs the objective probability value by type. Any value falling above the line represents an overestimation of the probability value (in percentage points), while the values falling below the line represent underestimation of the true value. Overall, the average deviation (in terms of percentage points) for the constituents from their objective values was 5.7%, while the conjunctions had an average deviation from their objective values by 5.9%. Largely, constituents and conjunctions with objective values less than 0.5 were overestimated while constituents and conjunctions with objective values over 0.5 were underestimated. Conjunctions had greater deviations from the true probability for values less than 0.5, constituents had greater deviations from the true probability for values greater than 0.5. A similar amount of deviation from the true probability was observed for constituents and conjunctions around 0.5. 25 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 11. This graph displays the inconsistent fallacy production by the participants in experiment 4. Each participant gave 12 estimates for each constituent-conjunction pair, and so individual fallacy rates range from 0 to 12. Calculation of individual fallacy rate, and binning of average differences, was as described in Fig. 3. The PTN predicts that the inconsistent estimates should be grouped around zero, with increasingly positive differences as the rate of fallacy production increased while the consistent estimates will have negative average differences for those that produce zero errors and positive average differences for those that produce twelve errors. Here, most of the fallacies produced fell into the inconsistent category, which a small number of consistent 0 fallacy responses. No participant produced more than 9 (of a possible 12) fallacies for any of conjunctions. of variances found that the conjunctions were statistically significantly more variant than the constituents on 72% of the occasions. In addition to this, we also matched the constituents P (C ) and conjunctions P (A B ) that had the same objective probability values to compare how type effects variance. For example, the constituent with the objective value of 0.35 was matched to the conjunction that had the objective value of 0.35 and their response variances were found. Overall, the conjunctions were more variable on 75% of the occasions compared to the value-matched constituents. Again, Levene’s test was used to determine of any of the conjunctions were statistically significantly more variable than the value matched constituents. In this case, statistical significance was found for 38% of the comparisons. Binomial Variance. Using the participants’ measured variance for each probability estimate, we tested the predicted variance values from the binomial model. To predict the variance, we assume that K (the number of successes) is distributed according to the binomial distribution, K ~Bin (N , p) . As we are interested in the sample proportion, K/N, rather than the sample count, K, we calculate the variance using p (1 p) N where N = 12, the number of repetitions of each item. This model predicts that the highest variance will be observed for estimates X where P (X ) = 0.5 and the variance should decline the closer the estimates are to P (X ) = 0 or P (X ) = 1. Each participants’ variance was calculated and compared to the predicted value. Fig. 12 displays the measured and predicted variance versus the objective probability. The participant values are distributed around the predicted value in all cases, with lower variance typically found close to 0 and 1 and high variance found close to the midpoint. The variance values closely follows the predictions of the binomial model. We observe that participants typically had low variance where the model predicted low variance and high variance where the model predicted high variance and that the model predictions are a good fit for the data. Polynomial fits were calculated for both the constituents and conjunctions and found good fits for both against the predicted values. The measured individual variances for items was positively correlated with the predicted variance for that item, r = 0.51, p < 0.00001. Observed variance in people’s probability estimates for both constituents and conjunctions followed the variance values predicted by the binomial model with no observable 26 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 12. This figure displays the predicted and measured variance for the average probability values in experiment 4. The predicted variance for a given probability value was calculated using the binomial variance model. Those values are shown in black. The participants’ variance for each of the probability values are displayed are also above. The participant values are close to and distributed around the predicted variance values. Measured variance peaked around 0.5 and were lowest the closer to 0 or 1. This is in line with the model predictions. difference between them. A t-test found no significant difference between the constituent and conjunction variance, t(95) = −1.156, p > 0.05. 5.2.3. Aggregate-level model fitting This experiment, involving as it does objective probability values for P (A) and P (A B ) , allows us to test the computational fit between our model’s predicted means and standard deviations (SDs) for probability estimates PE (A) and PE (A B ) and the means and SDs of participant responses. To carry out this fit we select values for the noise parameters d and d (used to calculate predicted mean probability estimates PE (A) and PE (A B ) for given objective probabilities P (A) and P (A B ) , as in Eqs. (1) and (2)) and for the sample size parameter N (used to calculate predicted variance for these probability estimates, as in Eq. (6): predicted SD is the square root of this variance). Prior to fitting we can identify a reasonable range of values for these free parameters. We expect the noise rate d to be relatively low (somewhere around 0.1, the best fitting value in previous computational fits of this model: see Costello & Watts (2017)), and we expect parameter d to be significantly smaller than this value d. Finally, we expect the sample size parameter N to be somewhere around to Miller’s ‘magical number 7 ± 2 ’ for working memory capacity (Miller, 1956). We take the best fit between model and data to occur when the Root Mean Squared Difference (RMSD) between predicted and observed mean probability values, and predicted and observed SDs, is minimised. These values were minimised for parameter values d = 0.1, d = 0.02, N = 7 . With these parameters the RMSD between participants’ mean probability estimates and predicted mean estimates (computed from objective probability values as in Eqs. (1) and (2)) was RMSD = 0.021 (correlation between observed and predicted values, r = 0.994, p < 0.00002 , across all single and conjunctive events; for single events alone these parameters gave a fit of RMSD = 0.021, r = 0.994 ; for conjunctive events alone these parameters gave a fit of RMSD = 0.022, r = 0.995). With these parameters the RMSD between average SD in participants’ probability estimates and predicted SD for estimates for those events (computed from objective probability values by taking the square root of the value in Eq. (6)) was RMSD = 0.017 (correlation between observed and predicted SD, r = 0.73, p < 0.05, across all single and conjunctive events; for single events alone these parameters gave a fit of RMSD = 0.017, r = 0.76 ; for conjunctive events alone these parameters gave a fit of RMSD = 0.009, r = 0.889). The model is a good fit to people’s average probability estimates, and standard deviation in those estimates, for a reasonable set of parameter values. We can also fit the Nilsson et al. (2009) configural weighting model of conjunctive probability estimation, as given in Eq. (7), to this experimental data and compare with the fit given by the PTN model. Since the configural weighting model does not address constituent probability estimation (it simply assumes such estimates are available, but does not explain how they are produced) and does not make any specific predictions about variance in estimates, this fit can examine only conjunctive probability estimates. Note that the probability sets used in this experiment were designed so that both constituents P (A) and P (B ) had the same objective probability for the first six probability sets: in these sets PE (A) and PE (B ) are expected to be equal, and so the configural weighting model predicts that PE (A B ) = PE (A) = PE (B ) should hold irrespective of weighting parameter W in these cases. Since this weighting parameter W affects only the value of PE (A B ) in the 7th probability set, a value for W was chosen so that the averaging model exactly matched participants mean conjunctive probability estimate for that set (see Table 12). As this table shows, estimates produced by the PTN were closer to those produced by participants (with lower RMSD, higher correlation r) than those produced by the averaging model, though the correlation between averaging model conjunctive estimates and participants’ average conjunctive estimates was also very high (r = 0.986 , versus r = 0.995 for the PTN model). This high correlation produced by the averaging model 27 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 12 This table shows the event counts, participants average probability estimates, SDs of the entire set of estimates, and estimates produced by the PTN and averaging models for the 7 probability sets in Experiment 4. Estimates produced by the PTN were closer to those produced by participants (lower RMSD, higher correlation r) than those produced by the averaging model. Note that the averaging model predicts that, for the first 6 probability sets, average estimates PE (A B ) should have the same value as average estimates PE (A) , while for the 7 set the averaging model can assign arbitrary values (and so comparison for that set is meaningless). PE (A Participants A B 5 7 10 13 15 17 19 5 7 10 13 15 17 17 A B 3 5 7 10 13 15 17 PE (A) Mean SD 0.323 0.387 0.479 0.580 0.705 0.766 0.873 0.253 0.330 0.421 0.481 0.604 0.714 0.785 (0.162) (0.180) (0.194) (0.193) (0.197) (0.161) (0.158) RMSD r B) Noise model Mean SD Averaging model Mean SD 0.224 0.303 0.382 0.50 0.618 0.698 0.778 (0.160) (0.175) (0.184) (0.189) (0.184) (0.175) (0.160) 0.323 0.387 0.479 0.580 0.705 0.766 0.785 0.022 0.995 0.009 0.889 0.070 0.986 – – – – – – – could well be an artefact of the experimental design. The averaging model predicts that PE (A B ) will equal PE (A) (for the first 6 probability sets), but these materials were specifically designed so that the objective probability PO (A B ) exactly followed the objective probability PO (A) (this design being used to test differences in variance for single and conjunctive events with the same objective probability). This designed-in relationship could explain the observed high correlation between PE (A B ) and the averaging model’s predicted value of PE (A). We investigate and compare model fits further in the next section, where we describe computational fits at the individual level to participant’s repeated estimates in Experiments 2,3, and 4, and carry out model comparisons based on those fits using WAIC. 5.3. Experiment 4 discussion In experiment 4, we investigated how judgement type effects probability estimates by presenting participants with constituents, P (C ) , and conjunctions, P (A B ) , of the same value and elicited repeated responses for each of them. Typically, we see that participants are good at estimating both types of judgements - with only marginal differences in mean estimates for a given objective probability value. The most accurate estimates for both constituents and conjunctions were for PO = 0.5, while estimates for items where p was less than 0.5 were overestimated and those above 0.5 were underestimated. This pattern is consistent with the PTN where noise has a regressive effect towards 0.5, causing estimates below 0.5 to be overestimated and estimates above 0.5 to be underestimated. In both experiment 3 and 4, we have observed that participants are typically accurate reasoners, particularly for constituents and conjunctions but we have also observed that they frequently produce inconsistent responses to the same conjunction stimulus, sometimes committing the fallacy and sometimes avoiding it entirely. The participant estimates were accurate for both constituents and conjunctions, however, the conjunctions were more variable than the constituents in both the case of the P (A) and P (C ) judgements. Aggregate-level model fitting demonstrates that our model’s predictions are good fits for both the participants’ mean and SDs. These fits were also performed for the configural weighted model. While our model performed better, the configural model also produced strong correlations between the model fits and data. In the following section, we investigate the models performances for fits at an individual level. 6. Individual level computational model fitting In this section we describe computational model fits to individual participant responses in Experiments 2, 3 and 4 for both the binomial variance and the configural weighting models. The model fitting process was carried out in Stan, a probabilistic programming language that provides full Bayesian statistical inference with MCMC sampling(Carpenter et al., 2017). Model fitting was carried out simultaneously on individual (repeated) probability estimates for constituents, conjunctions, and disjunctions and on (conjunctive and disjunctive) fallacy occurrence for those individual estimates. The same general framework was used for both models. We first consider model fitting for experiments 3 and 4 (for which objective probabilities of events are known), and then consider fitting for experiment 2 (for which objective probabilities of events are not known and must be treated as free parameters in the model-fitting process). 6.1. Binomial variance model fitting In fitting the binomial variance model to individual repeated probability estimates in a given experiment with known objective probabilities of events (Experiments 3 and 4), we assume 3 free parameters for each participant i : dsimple, i < 0.5 (the noise rate for that 28 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello participant in estimating probabilities for simple events A and B, assumed to be less than 0.5), dcomplex , i < 0.5 (the noise rate for that participant in estimating complex events P (A B ) and P (A B ) , assumed to be less than 0.5) and Ni (the sample size used by that participant in estimating probabilities). Given the known objective probability for some event A we assume that participant i’s repeated probability estimates for that event are randomly distributed around the mean estimate Pi (A) = (1 2dsimple, i ) P (A) + dsimple, i (where P (A) is the objective probability for event A), with standard deviation Pi (A)(1 A, i = Pi (A)) Ni For modelling purposes we assume the error distribution around the mean estimate is approximately normal: so that participant i’s repeated estimates for event A follow the Normal distribution Normal (Pi (A), (8) A, i ) For complex events A the mean estimates B (or A B ) we assume that participant i’s repeated probability estimates are randomly distributed around Pi (A B ) = (1 2dcomplex , i ) P (A Pi (A B ) = (1 2dcomplex, i ) P (A (where P (A B ) and P (A Pi (A A B, i = A B, i = B ) + dcomplex, i (9) B ) + dcomplex, i (10) B ) are the known objective probabilities for those events) with standard deviations B )(1 Pi (A B )) Ni Pi (A B )(1 Pi (A B )) Ni and participant i’s repeated estimates for events A Normal (Pi (A B ), Normal (Pi (A B ), B and A B follow the Normal distributions (11) A B, i ) (12) A B, i ) The binomial variance model assumes that the relationship 0 dsimple dcomplex 0.5 holds for these noise parameters (all noise parameters are less than 0.5, and noise for simple events is less than noise for complex events). We implement this in our model by defining two free parameters 0 dsimple, i < 0.5 and 0 dincrease, i 1 for each participant, and, given this, calculate dcomplex , i as dcomplex , i = dsimple, i + (0.5 dsimple, i ) dincrease, i so that every possible value of the free parameters dsimple, i and dincrease, i will produce a value of dcomplex , i such that 0 dsimple dcomplex 0.5 as required. The binomial variance model fit depends on the objective probability for simple, conjunctive and disjunctive events. In Experiment 2, these objective probabilities are not known. In fitting to experiment 2, therefore, we augment the model with additional free parameters representing the objective probabilities of these events (which we assume are common across all experimental participants). These free parameters representing (unknown) objective probabilities are constructed to be fully consistent with all normative requirements of probability theory. Recall that Experiment 2 contained two separate sets of events, each containing 4 single events, 3 conjunctions, and 3 disjunctions (9 objective probabilities in total). For each set 7 free parameters were required to construct these 9 objective probabilities and ensure full consistency with probability theory. To test the binomial variance model’s prediction that dsimple dcomplex will hold, we also carry out secondary fits with a version of the model that simply treats the noise rates dsimple, i and dcomplex , i as independent parameters that can take on any value (less than 0.5). This secondary fit allows us to test the binomial variance model predictions about differential noise rates by comparing degree of fit for constrained (dsimple dcomplex ) and unconstrained (dsimple and dcomplex independent) versions of the model. 6.2. Configural weighting model fitting In fitting the configural weighting model to individual repeated probability estimates, we use the same approach of modelling error in repeated probability estimates as normally distributed around the mean estimate produced by the model. We assume 3 free parameters for each participant i: simple, i (the standard deviation of that participants repeated probability estimates for simple events A and B), complex, i (the standard deviation of that participants repeated probability estimates complex events A B and A B ) and Wi (the participant’s weighting parameter used in calculating complex estimates from the configural weighting of simple probabilities). In experiments 3 and 4, where objective probabilities for events are known, we assume that participant i’s repeated probability estimates for some simple event A follow the Normal distribution Normal (Pi (A), simple, i ) (13) 29 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello where Pi (A) is the mean of participant i’s probability estimates for that event. Note that the configural weighting model doesn’t give any account of the relationship between constituent probability estimates and objective probability values. To fit the configural weighting model to Experiments 3 and 4, we assume that the mean constitutuent estimate for a given participant, Pi (A) , is a linear function of the true probability of event A: Pi (A) = Oi (1 Si ) + Si × P (A) where the ‘scale’ parameter 0 Si 1 represents participant i’s mapping from the objective probability scale to their own subjective estimate scale, and the offset parameter 0 Oi 1 represents the intercept of that mapping (multiplied by (1 Si ) to ensure all constituent probability estimates fall between 0 and 1). For complex events A B we assume that participant i’s repeated probability estimates are randomly distributed around the mean estimates Pi (A B ) = Wi min (Pi (A), Pi (B )) + (1 Wi ) max (Pi (A), Pi (B )) 0.5 Wi 1 Pi (A B ) = (1 Wi ) min (Pi (A), Pi (B )) + Wi max (Pi (A), Pi (B )) 0.5 Wi 1 and given in the configural weighting model, and that they follow the Normal distributions Normal (Pi (A B ), Normal (Pi (A B ), complex , i ) (14) complex , i ) (15) For Experiment 2, where objective probabilities of events are not known, we fitted the configural weighted model in a way that matched the fitting approach for the binomial variance model: by adding free parameters to represent mean probability estimates for single events. Experiment 2 contained two separate sets each containing 4 single events, and so we fitted the configural weighting model by adding 4 additional free parameters for each set. These single probabilities could take on any value between 0 and 1. 6.3. Fitting individual conjunction and disjunction fallacy responses As well as fitting participant’s repeated probability estimates for single and conjunctive/disjunctive events, we are also interested in fitting conjunction and disjunction fallacy responses in those estimates. Both the binomial variance and the configural weighting models see conjunction fallacy rates as a function of the difference of means Pi (A B ) Pi (A) , and of random variation or noise in estimates. In fitting the binomial variance model we assume that individual probability estimates follow the normal distributions given in Eqs. (8), (11) and (12). This means that the difference between estimates for a constituent A and a conjunction A B , for participant i, will follow the distribution Normal (Pi (A B) Pi (A), A, i A B, i ) + Given this, the probability of a conjunction fallacy for these events in participant i’s responses is equal to the probability of obtaining a positive value under this distribution; and this probability is given by Pi (A Where by B > A) = 1 (0; Pi (A B) Pi (A), A, i (16) A B, i ) + is the cumulative function for this normal distribution. Similarly, the probability of a disjunction fallacy occurring is given Pi (A B < B) = 1 (0; Pi (A B) Pi (B ), B, i + (17) A B, i ) In fitting the configural weighting model we similarly assume that individual probability estimates follow normal distributions given in Eqs. 8, 11 and 12 Normal (Pi (A B) Pi (A), simple, i + complex , i ) Given this, the probability of a conjunction fallacy occurring is equal to the probability of obtaining a positive value under this distribution; and this probability is given by Pi (A B > A) = 1 (0; Pi (A B) Pi (A), simple, i + complex , i ) (18) and the probability of a disjunction fallacy occurring is given by Pi (A B < B) = 1 (0; Pi (A B) Pi (B ), simple, i + complex , i ) (19) Since in a given item the conjunction fallacy either occurs or does not occur (it is a binary variable), and since the chance of occurrence is a function of the difference Pi (A B ) Pi (A) , a natural distributional model for fallacy occurrence is the Bernoulli distribution. In our computational fit for both models, therefore, we represent the distribution of conjunction fallacy occurrences in repeated estimates for events A B and A produced by a given participant i as Bernoulli (Pi (A B > A) ) and the distribution of disjunction fallacy occurrences in repeated estimates for events A 30 B and B produced by a given participant i Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello as Bernoulli (Pi (A B < B )) We thus have a common framework for computational fits of both the binomial variance and configural weighting models to experimental data. In this framework repeated individual probability estimates are modelled as normally distributed around a mean computed by the model (via the regressive or noisy means with parameters dsimple, i and dcomplex , i in the binomial variance model; via weighting of constituent probabilities with parameter Wi in the configural model) with a given standard deviation (calculated from the mean and sample size parameter Ni in the binomial variance model; taken as free parameters simple, i and complex, i in the configural model), while occurrence/non-occurrence of the conjunction and disjunction fallacies are modelled via the Bernoulli distribution parameterised as described above. In this framework the binomial variance model has three free parameters dsimple, i , d complex , i and Ni for each participant, while the configural model has five: simple, i complex, i, Wi , scale parameter Si and intercept parameter Oi . We fit these models to experimental data using Stan, a probabilistic programming language for specifying statistical models that provides full Bayesian inference for continuous-variable models using a adaptive form of Hamiltonian Monte Carlo sampling (Carpenter et al., 2017). Stan probabilistic programs implementing these models for Experiments 2,3 and 4, along with raw experimental data and R code running computational fits of these models, are available online.5 We compared model fit using the Widely Applicable Information Criterion (WAIC, Watanabe, 2010) which takes functional form complexity into account when comparing model fit, implemented in R (Vehtari, Gelman, & Gabry, 2017; Vehtari, Gelman, & Gabry, 2018). We first describe the results of fitting these models to results from Experiment’s 3 and 4, for which objective probabilities for events are known; we then describe model fits to results from Experiment 2, for which objective probabilities are not known. 7. Computational fit results We used Stan to implement the binomial variance and configural weighting models as described above and applied them to participant’s individual responses in Experiment 2 (set A and set B), Experiment 3 and Experiment 4. Experiment 2 asked 40 participants (set A) or 41 participants (set B) to give 5 repeated probability estimates for 4 simple constituent events, 3 conjunctions, and 3 disjunctions, giving 2000 individual estimates in set A and 2050 estimates in set B. Experiment 3 asked 7 participants to give 20 repeated probability estimates for three sets of events, with each set containing two constituents, one conjunction, and one disjunction, and for one set of events containing 4 constituents, 3 conjunctions and 4 disjunctions, giving 3220 individual estimates in total. Due to a coding error and some network problems, a relatively small number of these responses were not recorded (99 dropped responses out of 3220 or 3% of responses dropped, distributed randomly across all response sets). Since Stan does not handle missing data, we cleaned the set of raw response data by replacing any blank responses in a given participant i’s repeated estimates for some event A with the average of the remaining responses given by participant i for that event A. Finally, Experiment 4 asked 12 participants to give 12 repeated probability estimates for 7 constituents and 7 conjunctions, giving 2016 individual estimates in total; in this experiment 9 responses were blank (due to network problems); these were replaced with that participants average estimate for the event in question, as before. After preparing the data we fitted the two models to each experimental dataset via the stanfit MCMC sampler called from R, with 4 chains and 2000 iterations per chain. Fits for both models converged (rhat = 1) in all cases except when the configural weighted model was applied to the two sets of data from Experiment 2 (even much larger numbers of iterations, up to 20,000, did not produce convergent fit for the configural weighted model on this datasets; given that the binomial variance model converged for these sets, this suggests that the configural weighted model is a poor model for the data in Experiment 2). In all cases the fit was to both individual probability estimates and to individual conjunction and disjunction fallacy occurrences (treated as categorical data as described above). For Experiment 2 we ran a single fit for each model. For experiments 3 and 4 we ran two fits. On the first run we extracted log likelihoods, and hence WAIC expected log pointwise predictive densities (elpdWAIC ), for the two models across all individual constituent, conjunction and disjunction probability estimates and all individual conjunctive and disjunctive fallacy occurrences. On the second run we extracted log likelihoods and elpdWAIC values for the two models for individual conjunction and disjunction probability estimates and all individual conjunctive and disjunctive fallacy occurrences (dropping constituent probability estimates because both models fit these estimates very closely). Table 13 gives elpdWAIC values, differences elpdDIFF , and standard errors, for both runs. Note that difference in extracted log likelihood values does not affect the fit produced, only the data returned for a given fit. Table 13 shows WAIC expected log predictive density adjusted for number of parameters (elpdWAIC ) for the two models in each dataset, alongside expected log predictive density difference (elpdDiff ) between the two models, and standard errors for these values. Higher log predictive densities indicate better fits; negative values of elpdDIFF indicate preference for the first model (binomial variance): the binomial variance gave a better fit in all cases. Since elpdDIFF values are approximately normal we use the Z test to indicate statistically significant differences in model fit. The binomial variance model had a statistically significant advantage in model fit (at p < 0.05 or lower) for Experiment 2 and Experiment 4. For experiment 3 there was no significant difference in model fit. 7.1. Fitting to individual participants separately The above analysis compares model fits across all individual participants and responses, and shows an overall advantage for the binomial variance model. The above fits implicitly assume that all participants follow the same process of probability estimation, and 5 https://osf.io/a47ut/. 31 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 13 WAIC expected log predictive density adjusted for number of parameters (elpdWAIC ) for the binomial variance and configural weighting model in Experiments 2, 3 and 4 (deviance equals 2 elpdWAIC ). Target values are constituent, conjunctive and disjunctive probability estimates, and conjunction and disjunction fallacy occurrences. For experiments 3 and 4, elpdWAIC is shown for all target values and for all target values excluding constituent estimates (which were easy to fit for both models). For experiment 2, elpdWAIC is shown for set A (conjunctions Windy Sunny , Snowy Cloudy , Windy Cloudy analogous disjunctions, and constituents) and set B (conjunctions Warm Sunny , Rainy Cold , Rainy Warm , analogous disjunctions, and constituents). Negative elpdDIFF values indicate preference for the first model (binomial variance). elpdWAIC (SE) Target values binomial variance configural weighting elpdDIFF (SE) Expt 2 set A Expt 2 set B Expt 3 all Expt 3 excluding single events Expt 4 all Expt 4 excluding single events 115.0 (116.4) −582.7 (99.0) 241.0 (261.8) −1016.4 (198.1) 249.6 (113.0) −219.3 (87.0) −615.6 (93.7) −820.1 (88.1) 219.3 (266.1) −1089.5 (192.7) 169.6 (114.4) −319.6 (90.4) −730.6 (172.9)† −237.5 (70.0)† −21.8 (109.4) −72.1 (104.3) −80.0 (42.4)‡ −100.3 (32.2)† Note: †: p < 0.001, ‡: p < 0.05. asks whether that process is better modelled by the binomial variance or the configural weighting approach. It could be argued that different participants might follow different approaches to probability estimation, with some participants following one approach and some the other. To test this proposal, we fit the two models to each individual participant’s responses for all target values (excluding constituent estimates, which were easy to fit for both models) in Experiments 3 and 4, and compared model fits for each participant. We did not carry out this process of fitting models separately to participants in Experiment 2, primarily because of difficulties with convergence for the configural weighting model in that experiment). Table 14 shows the results of these separate fits. Testing for statistical significance of difference in fit at the p < 0.05 level (with Bonferroni correction for multiple comparisons, giving operational criteria of significance of p < 0.0026) we found that fits were indistinguishable most participants, but significantly in favour of the binomial variance model in two cases. 7.2. Conjunction and disjunction fallacy rate predictions To assess the level of agreement between observed conjunction and disjunction fallacy rates and rates predicted by the two models, we extracted observed conjunction and disjunction fallacy rates for each participant and each conjunction/constituent and disjunction/constituent pairing in all Experiments. We also extracted values for the expressions in Eqs. (16) and (17) (the binomial variance model’s predicted fallacy rates) and for the expressions in Eqs. (18) and (19) (the configural weighting model’s predicted fallacy rates) from the model fits described above. We then calculated the correlation between observed and predicted fallacy rates for the two models (see Table 15). For Experiment 4, for example, where there were 12 participants and 12 conjunction/constituent pairs, these numbers represent the correlation between 12 × 12 = 144 observed conjunction fallacy rates (each participant’s repeated estimates for each conjunction/constituent pair producing an observed fallacy rate for that participant and that pair, equal to the proportion of times that participant gave a higher estimate for the conjunction than the constituent in those repeated estimates) and 144 predicted fallacy rates produced by the model in question. All correlations were positive and significant at the p < 0.01 level: correlations produced by the binomial variance model were higher than those produced by the configural weighting model in all cases. Both models tended to overestimate fallacy rates for both conjunctions and disjunctions (see the positive differences between predicted and observed fallacy rates in the Table) but the binomial variance model’s predicted fallacy rates were closer to the observed rate in all cases. 7.3. Relation between dsimple and dcomplex The binomial variance model predicts the noise rate for complex events should be higher than the noise rate for simple events: that dsimple < dcomplex . The model fits described above impose this requirement on the noise parameters explicitly. To illustrate the relationship between these noise rates in the model, Fig. 13 shows a scatterplot of best-fitting dsimple versus dcomplex values for the 81 participants in Experiment 2. As the figure shows, the difference was noticable across a range of participants. To test the model’s prediction that dsimple < dcomplex , we reran the model fits described above, but with the binomial variance model modified so that the requirement dsimple < dcomplex was not imposed: in this version of the binomial variance model both dsimple and dcomplex could independently take on any value between 0 and 0.5. We compared the model fit obtained with this unconstrained version of the model against the fit obtained with the constrained (dsimple < dcomplex ) version. If degree of fit to data for this unconstrained version of the model was noticably greater than that obtained from the constrained version where dsimple < dcomplex , that would count as evidence against the model’s prediction. Note that we do not expect the constrained version of the model to give a significantly better fit than the unconstrained version, since the unconstrained version can always ‘find’ parameter values for which dsimple < dcomplex happens to hold (and so can match the degree of fit of the constrained model). Table 16 shows the expected log predictive densities for these two versions of the model across experiments 2, 3 and 4: the two versions are essentially indistinguishable, which supports the binomial variance account. 32 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Table 14 WAIC expected log predictive density adjusted for number of parameters (elpdWAIC ) for the binomial variance and configural weighting model fit separately each individual participant’s responses in Experiments 3 and 4, for all target values excluding constituent estimates (which were easy to fit for both models). Negative elpdDIFF values indicate preference for the first model (binomial variance). Fits were indistinguishable for all participants but two, for which the fit comparison was significantly in favour of the binomial variance model (at Bonferroni corrected p < 0.05) elpdWAIC (SE) Participant binomial variance configural weighting elpdDIFF (SE) Experiment 3 1 2 3 4 5 6 7 −67.8 −59.7 −193.3 −213.1 −131.1 −198.9 −152.9 (87.3) (88.7) (63.8) (61.9) (74.0) (70.4) (75.5) 1 2 3 4 5 6 7 8 9 10 11 12 −8.2 −20.1 −54.5 −16.2 −27.3 −33.7 2.2 −30.5 −1.0 −28.8 −1.0 −0.1 (27.9) (25.1) (17.3) (28.5) (23.4) (21.6) (27.0) (25.6) (28.6) (21.4) (27.4) (29.9) −176.4 −110.9 −149.9 −202.4 −167.6 −145.0 −138.0 (68.1) (83.4) (72.5) (65.3) (72.3) (75.2) (76.3) −108.6 −51.2 43.4 10.7 −36.5 54.0 14.9 (52.8) (51.2) (35.8) (16.5) (20.9) (40.2) (39.0) −18.2 −11.5 −26.8 −27.7 −33.9 −36.3 −45.5 −48.9 −12.0 −25.4 −11.3 −21.9 (28.1) (29.7) (24.8) (27.8) (28.0) (23.5) (22.0) (21.0) (30.5) (26.0) (28.9) (29.2) −10.0 (7.2) 8.6 (9.8) 27.7 (10.7) −11.5 (4.8) −6.5 (10.5) −2.6 (7.9) −47.7 (10.7)† −18.3 (7.6) −10.9 (8.0) 3.4 (6.0) −10.3 (8.4) −21.8 (5.3)† Experiment 4 Note: †: p < 0.0026, (equivalent to p < 0.05 with Bonferroni correction for 19 separate comparisons). 8. General discussion The aim of the paper was to examine variability in probability estimation and its relationship to two well known cognitive biases - the conjunction and disjunction fallacies. To this end, we carried out four experiments; the first a study of variance and how different response formats effects probability judgements, the second was a study of the internal variance in probability estimation which was tested by giving the participants repeated judgement tasks. Both these experiments used description-based stimuli consistent with other descriptionbased studies of the conjunction and disjunction fallacy. The third experiment focused on the roles of probability values and sample size on variance, estimation accuracy and fallacy rates. The final experiment again looked at the role of probability values on fallacy rates and how question type influences estimation. Experiments 3 and 4 both employed repeated judgements to understand the internal variance in participant estimates and the effect of p values on that variance. Each had stimuli with observable objective probabilities so we could investigate estimate accuracy. This makes these experiments somewhat novel in research on cognitive biases. However, fallacy rates observed for both were in line with more traditional research stimuli in this field so we believe that they are appropriate. Results showed that variability of the estimate is a key indicator of whether a fallacy response will occur. Overall, the complex statements showed higher levels of variability with statistically significant levels observed for most occasions. Approximately 70% of the complex statements were more variable than their constituent counterparts across all experiments. A small number of the simple Table 15 Each column shows the correlation, r, between observed fallacy rates (from experimental data) and predicted fallacy rates extracted from model fits for the binomial variance and configural weighting models. Mean difference between predicted and observed fallacy rates are shown in brackets. Correlations ran across all participants and all conjunction/constituent (or disjunction/constituent) pairs. All correlations were significant, but the binomial variance model had a higher correlation with observed fallacy rates. Both models tended to overestimate fallacy rates for both conjunctions and disjunctions (positive differences between predicted and observed fallacy rates) but the binomial variance model’s predicted fallacy rates were closer to the observed rate in all cases. Note that Experiment 4 did not include disjunctions, and so gives no rates for disjunction fallacy occurrence. binomial variance Expt 2 set A Expt 2 set B Expt 3 Expt 4 configural weighting conj. fallacy disj. fallacy conj. fallacy disj. fallacy 0.65 0.42 0.62 0.41 0.68 (0.08) 0.45 (0.09) 0.68 (0.14) - 0.63 0.37 0.45 0.28 0.64 (0.16) 0.39 (0.18) 0.58 (0.18) – (0.06) (0.14) (0.20) (0.14) 33 (0.14) (0.19) (0.27) (0.25) Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 13. Scatterplot of dsimple and dcomplex values for the 81 participants in Experiment 2, from the binomial variance model fits. The diagonal line is the line of identity. statements had higher variance than the complex statements. for the description-based experiments, this occurred most frequently when the constituent had a high probability and the conjunction CI typically had no overlap with the constitiuent CI (e.g. P(Cloudy) vs P(Cloudy Snowy)). Statistically significant higher levels of variance are observed in some constituents with extremely low fallacy rates (0–2%). The disjunctive statements weren’t any more variable than the conjunctive statements and very similar fallacy rates were recorded in the description experiments. No clear difference in variability can be observed been conjunction and disjunctions. Higher variance in the complex items is also observed in the visual stimuli, with both the conjunction and disjunction being more variable than the constituent. As the participants produced repeated estimates in a number of experiments, we could also analyse individual variability and its relation to fallacy rate. A consistent observation across the experiments is that participants that were more variable across their own responses for complex items were more likely to make repeated fallacy responses. In the final experiment, we were able to compare the variance of conjunctions versus their constituents (P(A)) and value matched single events (P(C)). Here, we saw the same reported higher variance in the conjunction versus its own constituent (P( A B ) v. P(A)) that we reported in the previous experiments. The variance for the P( A B ) v. P(C) comparisons also found higher variance for the conjunction but for individuals, there wasn’t a significant difference between them. To date, none of the probabilistic models have Table 16 WAIC expected log predictive density adjusted for number of parameters (elpdWAIC ) for the standard binomial variance model (constrained so that dsimple < dcomplex ) and the unconstrained version of that model (dsimple and dcomplex take independent values) in Experiments 2, 3 and 4. The two versions of the model show no significant difference in fit for three out of 4 cases: for the first case, the standard (constrained) model gives a significantly better fit. elpdWAIC (SE) Target values constrained unconstrained elpdDIFF (SE) Expt 2 set A Expt 2 set B Expt 3 excluding single events Expt 4 excluding single events 114.9 (116.4) −582.7 (98.1) −1017.4 (198.2) −219.1 (87.0) 68.9 (112.8) -563.1 (99.4) −1017.0 (198.4) −215.0 (88.9) −46.0 (24.6)‡ 19.5 (15.6) −0.3 (3.7) −4.1 (6.1) Note: ‡: p < 0.05. 34 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello included an explicit model of how the variance in estimates functions. Here, we presented a simple model of variance, based on the binomial theorem, that is capable of capturing the patterns of participant responding. The Binomial variance model provides good predictions of participant variance for a given estimate and it demonstrated the importance of sample size for probability judgements, with estimates taken from larger sample sizes much less variable than estimates taken from samples whose size was small. Conjunction fallacy rates across these experiments ranged from 0% to 68% depending on the stimulus. Similar rates of disjunction fallacies observed for the sample, ranging from 0% to 71%. These values are both in line with other research findings and the predictions of the PTN model. In experiments 1 and 2, it appeared that participants were most likely to produce a fallacy if their subjective estimates for the constituent and conjunction are close to each other, e.g. the 65% fallacy rate observed for P(Snowy) vs P (Cloudy Snowy) in experiment 2 was the highest observed for the experiment despite both sets of average estimates being low. Very low fallacy rates were likely to be observed when the constituent and conjunction estimates were unlikely to overlap. Further exploration of this trend in experiment 3 confirmed these findings. In experiment 4, we were able to control fallacy rates by manipulating the distance (P (A B ) P (A) ) and low fallacy rates resulted. Here we see that rather than high constituent values being correlated with low fallacy rates and low constituent values being correlated with high fallacy rates, the estimate difference between the constituent and complex item is correlated with the fallacy rate.6 Analysis of the participant estimates proved that they were internally variable. Typically, their repeated probability estimates for an item were similar, but not identical, to each other. This variability in estimates meant that participants that were often inconsistent when they produced fallacies; if they did produce a fallacy, typically they produced it for a number of occasions but not all of them. Of the possible fallacy responses (e.g. producing the fallacy between one to five times for a given conjunction or disjunction for experiment 2), a fallacy response for all of the occasions was the least likely to occur. With the larger repetitions, the likelihood of participants producing 100% fallacy rates fell. In experiments 3 and 4, no participant produced a fallacy response on all occasions. Typically, for a fallacy response to occur, one of two things should occur: the constituent and complex item estimates should be close to each other and the complex item should be more variable than its constituent. A fallacy response may occur when either of these items are present but the highest fallacy responses typically occurred when both were observed for the same estimates. One of the most fascinating results on these experiments (particularly experiments 3 and 4) is that they revealed how good participants are at estimating probabilities. Typically, the participants produced estimates that were accurate for all of the estimation tasks presented to them. This sophistication of estimation is somewhat unexpected, particularly for research on cognitive biases, which makes a point of demonstrating the myriad of ways which humans are poor reasoners. What we find here is reasoners that are skilful, even for novel stimuli with a degree of precision that, heretofore, hasn’t been recognised in the literature. In addition to this, participant estimates are consistent with probability theory, in terms of the Addition Law expression, in all three experiments where that expression could be calculated.7 In all three, we found good compliance with the addition law, each A,B combination producing values that were close to, and varied around, the required value of 0, alongside significant conjunction and disjunction fallacy occurrence. This demonstrates that high conjunction and disjunction fallacy rates cannot be taken as evidence that people do not reason in a logical and reasonable fashion - that is, that their reasoning is always contra to probability theory. The results here demonstrate that both scenarios can occur concurrently and are not, in fact, contradictory. That probability estimates are simultaneously accurate, consistent with probability theory and produce fallacies is a major challenge to the heuristics accounts of the fallacy. Currently, noise approaches are better able to account for these results than the more traditional heuristics accounts. 9. Conclusions The findings of this study can be taken as evidence that cognitive biases can be explained by errors in a rational probabilistic reasoning process rather than a heuristic process. Humans are good and accurate reasoners of both familiar and novel scenarios and their failings in reasoning - conjunction and disjunction fallacies in this case - arise due to a confluence of high variability in complex items and small differences in probability values. From these observations, we can conclude that probabilistic models are capable of predicting a range of biases and that they provide a coherent framework for future work on reasoning errors. 10. Financial disclosure This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Declaration of Competing Interest The authors declare no potential conflict of interests. Appendix A In the main text we assume a noise rate of d for probability estimation for simple events P (A) and an increased rate of d + d for B ) . This assumption of an increase in noise d for complex events is very much a first complex events P (A B ) and P (A 6 7 P (A B ) P (A) and P (B ) P (A B ) Participants didn’t provide disjunction estimates for experiment 4 so the addition law was not calculated for those estimates. 35 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello approximation. In this appendix we derive more detailed expressions for the increase in noise rate for complex events in the noisy frequentist model. This model assumes that people estimate the probability of some event A by randomly sampling items from memory, counting the number that are instances of A, and dividing by the sample size. The model assumes that events have some chance d < 0.5 of randomly being read incorrectly; this random error results in a an average noisy estimate for the probability of A of. PE (A) = (1 (A.1) 2d ) P (A) + d For conjunctive events A B this counting process may take place in two different ways. If the complex event is itself a familiar, already known category, we can treat the conjunctive category as ‘integral’ and simply count items as members of that category directly, just as for simple events. For conjunctive events A B that are treated in this way we get an average noisy estimate for the probability of A B of PE (A B|integral) = (1 2d ) P (A Similarly, for disjunctive events A probability estimate of PE (A B|integral) = (1 (A.2) B) + d B that represent an existing familiar category and can be treated as integral we get average noisy 2d ) P (A (A.3) B) + d Note that these expressions satisfy the addition law: substituting Eqs. (A.2) and (A.3) into the Addition Law we get PE (A) + PE (B ) PE (A B|integral) PE (A B|integral) = (1 2d )(P (A) + P (B ) P (A B ) P (A B )) = 0 as required in standard probability theory. B ) by treating the category as ‘separable’: by We can also, however, make this decision about membership in A B (or A separately checking whether the item is an instance of A (subject to noise rate d) and whether that item is an instance of B (subject to the same noise rate d). Items which are read as A and separately read as B are labelled as instances of A B : the probability estimate for the conjunction is obtained by counting such labelled items and dividing by the sample size (and similarly for disjunctions). There are at 3 possible locations of error in this ‘separable’ case: error in reading an item as A, error in reading an item as B, and finally, error in reading an item as A B . This last form of error arises when, for example, an item which was labelled as an instance of A B is mistakenly read as a non-instance during counting, or similarly, when an item that was not labelled as an instance of A B is mistakenly read as an instance during counting. We assume, for simplicity, that all three types of error occur randomly at the same rate, d. We calculate the noisy probability estimate for A B under these three sources of error by first giving an expression for the probability of an item being labelled as A B under the first two forms of error. We then use that expression to get the average noisy estimate PE (A B ) of given random error in reading of these labels. We calculate the probability of a given randomly sampled item being labelled as an instance of a separable conjunction A B as follows. We take P (labelled A B|A B ) to represent the probability of an item being labelled as A B , given that the item truly is an ¬ B ) to represent the probability of an item being labelled A B , given that the item instance of A B ; we take P (labelled A B|A truly is an instance of A but is not an instance of B, and so on. We begin by noting that the total probability of a randomly sampled item being labelled A B is obtained by summing over all possibilities for that item, each weighted by their probability of occurrence: P (labelled A B ) = P (labelled A B|A B ) P (A B ) + P (labelled A B|A ¬ B ) P (A + P (labelled A B| ¬ A B ) P ( ¬ A B ) + P (labelled A B| ¬ A ¬ B) ¬ B) P ( ¬ A ¬ B) An item that is truly an instance of A B will be labelled A B , in this separable process, only if both constituent events are read correctly (with no random error); this occurs with probability (1 d )2 , and we have P (labelled A B|A B ) = (1 d )2 ¬ B (or ¬A An item that is truly an instance of A is read incorrectly; this occurs with probability (1 P (labelled A B|A ¬ B ) = (1 B| ¬ A B only if the one constituent is read correctly but the other d) d Finally, an item that is truly an instance of ¬A with probability d 2 , and we have P (labelled A B ) will be labelled A d ) d , and we have ¬ B will be labelled A B only if the both constituents are read incorrectly; this occurs ¬ B) = d2 and substituting into the overall expression above we get P (labelled A B ) = (1 d ) 2 P (A B ) = (1 2d ) 2P (A B ) + (1 d ) d [P (A ¬ B) + P ( ¬ A B )] + d 2P ( ¬ A ¬ B) or simplifying P (labelled A B ) + d (1 2d )[P (A) + P (B )] + d 2 Finally, with random error at the same rate d in counting instances that have been labelled A 36 B , we get Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello PE (A B|separable ) = (1 = (1 2d ) P (labelled A B ) + d 2d )[(1 2d ) 2P (A B ) + d (1 2d )[P (A) + P (B )] + d 2] + d (A.4) A similar derivation gives a probability estimate for a separable disjunction of PE (A B|separable ) = (1 2d )[(1 2d )2P (A B ) + d (1 2d )[P (A) + P (B )] + 2d d 2] + d (A.5) Note that these expressions approximately satisfy the addition law: substituting Eqs. (A.4) and (A.5) into the Addition Law we get PE (A) + PE (B ) PE (A B|separable ) PE (A B|separable ) = (1 2d )[P (A) + P (B ) (1 2d ) 2 [P (A B ) + P (A B )] 2d (1 2d )[P (A) + P (B )] 2d] = (1 2d )[P (A) + P (B )][1 (1 2d )2 2d (1 2d )] 2d (1 2d ) = 2d (1 2d )[P (A) + P (B )] 2d (1 2d ) = 2d (1 2d )[P (A) + P (B ) 1~0 (A.6) and since 1 P (A) + P (B ) 1 1 necessarily holds, the average value for this expression across a wide range of probabilities P (A), P (B ) will be 0, just as required in standard probability theory. To combine these various measures, we need some way of estimating the probability that a given pair of events A and B will be treated separably or integrally. A natural way to estimate this probability is is to say that the probability of A and B being treated separably is simply equal to the probability of those events occurring separately from each other, which we can write as P (separable ) = P (A ¬ B) + P ( ¬ A B) The higher this probability the more likely it is that A and B will occur separately, and the more likely it is that A and B will be treated as separable events. Similarly, we can say that the probability of A and B being treated integrally is equal to the probability of those events occurring together (if A occurs, B occurs; if A does not occur, B does not occur; and vice versa), which we can write as P (integral) = P (A B) + P ( ¬ A ¬ B) = 1 P (separable) The higher this probability, the more likely it is that A and B will only ever be seen together, and so will be treated as a single integral event. Given these probabilities we get, as our overall expression for the average noisy probability estimate for a conjunction, the expression PE (A B ) = PE (A B|separable ) P (separable ) + PE (A B|integral) P (integral) (A.7) and similarly PE (A B ) = PE (A B|separable ) P (separable ) + PE (A B|integral) P (integral) (A.8) These expressions are clearly very complex (we are not going to expand them here). One thing worth noting, however, is that these expressions again approximately satisfy the Addition Law; substituting Eqs. (A.7) and (A.8) into the Addition Law we get PE (A) + PE (B ) PE (A B ) PE (A B) PE (A B|integral) PE (A B|integral) ] = P (integral)[ PE (A) + PE (B ) PE (A B|separable) PE (A B|separable) ] + P (separable)[ PE (A) + PE (B ) = 2d (1 2d ) P (separable)[P (A) + P (B ) 1]~0 (A.9) with the last line following from Eq. (A.6). Given the complexity of Eqs. (A.7) and (A.8), we approximate them by assuming a noise rate of d for simple probability estimates PE (A) and PE (B ) but an increased noise rate of d + d for conjunctive and disjunctive estimates, giving PE (A B) [1 2(d + d )] P (A PE (A B) [1 2(d + d )] P (A (A.10) B ) + (d + d ) (A.11) B ) + (d + d ) We use this particular d + d approximation for three reasons. First, numerical simulations show that Eqs. (A.7) and (A.8) are regressive towards 0.5 in a way that is systematically stronger than that of matching single probability estimates as given in Eq. (A.1) (see Fig. 14). The use of a d increase in error rate captures this increased regression. To conduct these simulations we generated 2000 randomly selected objective probability sets, all consistent with standard probability theory. Probability sets were produced by selecting random values for a triplet of objective probabilities P (B ), P (A|B ) and P (A| ¬ B ) (chosen uniformly in the range 0…1), and used these values to calculate associated probabilities in each set by applying the equations of probability theory (so that P (A) = P (A|B ) P (B ) + P (A| ¬ B )(1 P (B )), P (A B ) = P (A|B ) P (B ), and so on). Mean noisy estimates for each set was calculated by applying Eqs. (A.7) and (A.8) to the probabilities in each set (with d = 0.1). Fig. 14 shows the mean noisy estimate produced in this way, and shows that these estimates are systematically more regressive towards 0.5 than matching single probability estimates. Second, we use this d approximation because it can produce noisy probability estimates which agree reasonably well with those produced by the more complex expressions derived above. Using the same numerical simulation data described above we found a very close fit between Equations A.7 and A.10 for d = 0.1 and d = 0.04 (correlation: r = 0.99, root mean squared difference between values of RMSD = 0.018). Exactly the same close fit obtained between Eqs. (A.7) and (A.11) with d = 0.1 and d = 0.04 37 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fig. 14. Graph of true probability P (A B ) versus mean noisy estimate PE (A B ) (left) and true probability P (A B ) versus mean noisy estimate PE (A B ) (right) for 2000 randomly selected probability sets. Probability sets were produced by selecting random values for a triplet of probabilities P (B ), P (A|B ) and P (A| ¬ B ) (each having a random value chosen uniformly in the range 0…1), and using these values to calculate associated probabilities in each set by applying the equations of probability theory (so that P (A B ) = P (A|B ) P (B ), P (A) = P (A|B ) P (B ) + P (A| ¬ B )(1 P (B )) , and so on). Mean noisy estimates for each set were calculated by applying Eqs. (A.7) and (A.8) to the probabilities in each set (with d = 0.1). The value of PE (A B ) or PE (A B ) for each set is represented by a small black dot: the speckled areas represent the distribution of such values associated with each true probability P (A B ) (or P (A B ) ). White circles represent noisy estimates produced by applying the ‘d’ equations (1 2d ) P (A B ) + d and (1 2d ) P (A B ) + d , also with d = 0.1. The solid 45° line is the line of identity. The graph shows that mean noisy estimates have a higher degree of regression towards 0.5 (black dots falling above white circles when true probability is below 0.5, and below white circles when true probability is above 0.5). This indicates that the complex expressions in Eqs. (A.7) and (A.8) can be approximated by a simpler ‘(d + d ) ’ expression given in Eqs. (A.10) and (A.11), with the increase in error d capturing this increase in regression. (r = 0.99, RMSD = 0.018). Similar fits hold for different values of d. Finally, we use this particular d approximation because it makes predictions about values of the addition law identity that match those from Eqs. (A.7) and (A.8). Substituting Eqs. (A.10) and (A.11) into the Addition Law expression we get PE (A) + PE (B ) = 2 d [P (A) + P (B ) PE (A 1]~0 B) PE (A B) and this approximation gives a predicted value for the Addition Law whose form follows that given in Eq. (A.9). References Bar-Hillel, M., & Neter, E. (1993). How alike is it versus how likely is it: A disjunction fallacy in probability judgments. Journal of Personality and Social Psychology, 65, 1119. Bearden, N. J., & Wallsten, T. S. (2004). Minerva-DM and subadditive frequency judgments. Journal of Behavioral Decision Making, 17, 349–363. Bonini, N., Tentori, K., & Osherson, D. (2004). A different conjunction fallacy. Mind & Language, 19, 199–210. Budescu, D. V., Erev, I., & Wallsten, T. S. (1997). On the importance of random error in the study of probability judgment. part i: New theoretical developments. Journal of Behavioral Decision Making, 10, 157–171. Camerer, C., Loewenstein, G., & Rabin, M. (2003). Advances in Behavioral Economics. Princeton University Press. Carlson, B. W., & Yates, J. F. (1989). Disjunction errors in qualitative likelihood judgment. Organizational Behavior and Human Decision Processes, 44, 368–379. Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76. Costello, F., & Mathison, T. (2014). On fallacies and normative reasoning: When people’s judgements follow probability theory. Proceedings of the 36th annual meeting of the Cognitive Science Society (pp. 361–366). . Costello, F., & Watts, P. (2014). Surprisingly rational: Probability theory plus noise explains biases in judgment. Psychological Review, 121, 463. Costello, F., & Watts, P. (2016). People’s conditional probability judgments follow probability theory (plus noise). Cognitive Psychology, 89, 106–133. Costello, F., & Watts, P. (2017). Explaining high conjunction fallacy rates: The probability theory plus noise account. Journal of Behavioral Decision Making, 30, 304–321. Costello, F., & Watts, P. (2018). Invariants in probabilistic reasoning. Cognitive Psychology, 100, 1–16. Costello, F., & Watts, P. (2019). The rationality of illusory correlation. Psychological Review, 126, 437. Costello, F., Watts, P., & Fisher, C. (2018). Surprising rationality in probability judgment: Assessing two competing models. Cognition, 170, 280–297. Dawson, N. V., & Arkes, H. R. (1987). Systematic errors in medical decision making. Journal of General Internal Medicine, 2, 183–187. Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). Minerva-DM: A memory processes model for judgments of likelihood. Psychological Review, 106, 180–209. Erev, I., Wallsten, T. S., & Budescu, D. V. (1994). Simultaneous over-and underconfidence: The role of error in judgment processes. Psychological Review, 101, 519. Eva, K. W., & Norman, G. R. (2005). Heuristics and biases: Biased perspective on clinical reasoning. Medical Education, 39, 870–872. Fantino, E., Kulik, J., Stolarz-Fantino, S., & Wright, W. (1997). The conjunction fallacy: A test of averaging hypotheses. Psychonomic Bulletin & Review, 4, 96–101. Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123–129. 38 Cognitive Psychology 123 (2020) 101306 R. Howe and F. Costello Fisher, C. R., & Wolfe, C. R. (2014). Are people naïve probability theorists? A further examination of the probability theory + variation model. Journal of Behavioral Decision Making, 27, 433–443. Fisk, J. E., & Pidgeon, N. (1996). Component probabilities and the conjunction fallacy: Resolving signed summation and the low component model in a contingent approach. Acta Psychologica, 94, 1–20. Gallistel, C. R., Krishan, M., Liu, Y., Miller, R., & Latham, P. E. (2014). The perception of probability. Psychological Review, 121, 96. Gavanski, I., & Roskos-Ewoldsen, D. R. (1991). Representativeness and conjoint probability. Journal of Personality and Social Psychology, 61, 181. Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451–482. Hertwig, R., & Gigerenzer, G. (1999). The ‘conjunction fallacy’revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305. Hilbert, M. (2012). Toward a synthesis of cognitive biases: How noisy information processing can bias human decision making. Psychological Bulletin, 138(2), 211–237. Johnson, D. D., Blumstein, D. T., Fowler, J. H., & Haselton, M. G. (2013). The evolution of error: Error management, cognitive constraints, and adaptive decisionmaking biases. Trends in Ecology & Evolution, 28, 474–481. Kahneman, D. (2003). Maps of bounded rationality: Psychology for behavioral economics. The American Economic Review, 93, 1449–1475. Kahneman, D., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge University Press. Korobkin, R., & Ulen, T. (2000). Law and behavioral science: Removing the rationality assumption from law and economics. California Law Review, 88, 1051. Marchiori, D., Di Guida, S., & Erev, I. (2015). Noisy retrieval models of over-and undersensitivity to rare events. Decision, 2, 82. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81. Nilsson, H., Juslin, P., & Winman, A. (2016). Heuristics can produce Surprisingly Rational Probability Estimates: A commentary on Costello and Watts (2014). Psychological Review, 123(1), 103–111. Nilsson, H., Winman, A., Juslin, P., & Hansson, G. (2009). Linda is not a bearded lady: Configural weighting and adding as the cause of extension errors. Journal of Experimental Psychology: General, 138, 517. Oliver, A. (2013). From nudging to budging: Using behavioural economics to inform public sector policy. Journal of Social Policy, 42, 685–700. Reeves, T., & Lockhart, R. S. (1993). Distributional versus singular approaches to probability and errors in probabilistic reasoning. Journal of Experimental Psychology: General, 122, 207. Rieskamp, J., & Otto, P. E. (2006). SSL: a theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135, 207. Scheibehenne, B., Rieskamp, J., & Wagenmakers, E.-J. (2013). Testing adaptive toolbox models: A bayesian hierarchical approach. Psychological Review, 120, 39. Sides, A., Osherson, D., Bonini, N., & Viale, R. (2002). On the reality of the conjunction fallacy. Memory & Cognition, 30, 191–198. Söllner, A., Bröder, A., Glöckner, A., & Betsch, T. (2014). Single-process versus multiple-strategy models of decision making: Evidence from an information intrusion paradigm. Acta Psychologica, 146, 84–96. Stolarz-Fantino, S., Fantino, E., Zizzo, D. J., & Wen, J. (2003). The conjunction effect: New evidence for robustness. The American Journal of Psychology, 116, 15–34. Sunstein, C. (2000). Behavioral Law and Economics. Cambridge University Press. Teigen, K. H., Martinussen, M., & Lund, T. (1996). Linda versus world cup: Conjunctive probabilities in three-event fictional and real-life predictions. Journal of Behavioral Decision Making, 9, 77–93. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286. Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293. Vallgårda, S. (2012). Nudge: A new and better way to improve health? Health Policy, 104, 200–203. Vehtari, A., Gelman, A., & Gabry, J. (2018). Loo: Efficient leave-one-out cross-validation and waic for bayesian models [Computer Software]. R package, version 2.1.3, https://mc-stan.org/loo. Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27, 1413–1432. Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594. Wedell, D. H., & Moro, R. (2008). Testing boundary conditions for the conjunction fallacy: Effects of response mode, conceptual focus, and problem type. Cognition, 107, 105–136. 39

Log In

Random variation and systematic biases in probability estimation

Related papers

Related papers

Related topics