Probability Propagation in Generalized Inference Forms

Christian Wallmann; Gernot Kleiter

Probability Propagation in Generalized Inference Forms

Christian Wallmann

Gernot Kleiter

visibility

…

description

17 pages

link

1 file

Probabilistic inference forms lead from point probabilities of the premises to interval probabilities of the conclusion. The probabilistic version of Modus Ponens, for example, licenses the inference from P (A) = α and P (B|A) = β to P (B) ∈ [αβ, αβ +1−α]. We study generalized inference forms with three or more premises. The generalized Modus Ponens, for example, leads from P (A1) = α1, . . . , P (An) = αn and P (B|A1 ∧ · · · ∧ An) = β to an according interval for P (B). We present the probability intervals for the conclusions of the generalized versions of Cut, Cautious Monotonicity, Modus Tollens, Bayes' Theorem, and some System O rules. Recently, Gilio has shown that generalized inference forms "degrade"-more premises lead to less precise conclusions, i.e., to wider probability intervals of the conclusion. We also study Adam's probability preservation properties in generalized inference forms. Special attention is devoted to zero probabilities of the conditioning events. These zero probabilities often lead to different intervals in the coherence and the Kolmogorov approach.

Christian Wallmann Gernot D. Kleiter Probability Propagation in Generalized Inference Forms Abstract. Probabilistic inference forms lead from point probabilities of the premises to interval probabilities of the conclusion. The probabilistic version of Modus Ponens, for example, licenses the inference from P (A) = α and P (B|A) = β to P (B) ∈ [αβ, αβ +1−α]. We study generalized inference forms with three or more premises. The generalized Modus Ponens, for example, leads from P (A1 ) = α1 , . . . , P (An ) = αn and P (B|A1 ∧ · · · ∧ An ) = β to an according interval for P (B). We present the probability intervals for the conclusions of the generalized versions of Cut, Cautious Monotonicity, Modus Tollens, Bayes’ Theorem, and some System O rules. Recently, Gilio has shown that generalized inference forms “degrade”—more premises lead to less precise conclusions, i.e., to wider probability intervals of the conclusion. We also study Adam’s probability preservation properties in generalized inference forms. Special attention is devoted to zero probabilities of the conditioning events. These zero probabilities often lead to different intervals in the coherence and the Kolmogorov approach. Keywords: Probability logic, Generalized inference forms, Degradation, Probability preservation, Coherence. 1. Introduction While logic studies the propagation of truth values from premises to conclusions, probability logic studies the propagation of probabilities from premises to conclusions. In probability logic Modus Ponens, for example, has the form shown on the left hand side of Table 1. On the right hand side the generalized probabilistic Modus Ponens is shown. In probability logic a conditional A ⇒ B is represented by the “conditional event” B|A. The probabilities of the premises are point probabilities. This assessment is assumed to be coherent. Usually, the inferred probability of the conclusion is an interval probability. Special Issue Logic and Probability: Reasoning in Uncertain Environments Edited by Matthias Unterhuber and Gerhard Schurz Received November 2, 2012 Studia Logica DOI: 10.1007/s11225-013-9513-4 c The Author(s) 2013. This article is published with open access at Springerlink.com C. Wallmann, G. D. Kleiter Table 1. Probabilistic Modus Ponens with one categorical premise (left) and generalized probabilistic Modus Ponens with n categorical premises (right) P (E) = α P (H|E) = β P (H) ∈ [δ ′ , δ ′′ ] P (E1 ) = α1 ... P (En ) = αn P (H|E1 ∧ · · · ∧ En ) = βn P (H) ∈ [δn′ , δn′′ ] Below we study, for different generalized inference forms, the behavior of the interval [δn′ , δn′′ ] for an increasing number n of premises. We review results recently obtained for generalized probabilistic inference forms [6,10,14,15]. In these inference forms a degradation is observed. The width of the probability interval of the conclusion increases as the number n of premises increases. Thus, more premises lead to less precise conclusions. Figure 1 shows a numerical example for the degradation of Modus Ponens. In most inference forms even an “ultimate” degradation occurs: Already after the addition of a small number of premises, the interval of the conclusion becomes the non-informative interval [0, 1]. This is a consequence of the fact, that already for small n n the lower bound of the conjunction P ( Ei ) may be zero. i=1 Because the lower bound of the conjunction probability—even for a relative small number of conjuncts—becomes zero, the conditioning events may have zero probabilities. However, in the Kolmogorov approach conditional probability is undefined in this case. The Kolmogorov approach is therefore not appropriate to investigate generalized inference forms. The case where Figure 1. Degradation of Modus Ponens: lower (◦) and upper () bounds of P (H) (on the y-axes) for increasing numbers of categorical premises n (on the x-axes); premise probabilities P (Ei ) = 0.8, for n i = 1, . . . , 8, and P (H| Ei ) = 0.5 i=1 Probability Propagation in Generalized Inference Forms the conditioning event has zero probability can, however, be treated in the coherence approach of de Finetti [4]. As a consequence, the Kolmogorov and the coherence approach lead to different interval probabilities for the conclusion of generalized inference forms. As already mentioned, in many generalized inference forms the interval for the conclusion is getting wider as the number of premises increases and the interval [0, 1] is obtained after a certain number of premises is added. In probabilistically valid inference forms, however, the probability of the premises of an inference form is preserved to its conclusion. Are generalized inference forms consequently probabilistically invalid? Different inference forms preserve the probability of its premises to its conclusion to different degrees. Adams distinguished four preservation properties [1]. Each of these preservation properties determines a consequence operation. An inference form is valid with respect to such a consequence operation, if and only if, it satisfies the corresponding preservation property. We can establish whether an inference form satisfies a preservation property by considering the lower probability of its conclusion. Well-known examples are System P [11], which is associated with probability one-preservation and System O [8,9], which is closely connected with minimum probability preservation. Modus Ponens, for instance, is probability one preserving and consequently System P valid. This can immediately be seen by considering the lower bound of the interval of the conclusion of Modus Ponens. If P (A) = α = 1 and P (B|A) = β = 1, then P (B) ≥ αβ = 1. It is important to note that, since they yield different intervals for the conclusion of inference forms, the Kolmogorov and the coherence approach validate different inference forms. 2. Coherent Conditional Probability For the treatment of conditioning events with zero probabilities in generalized inference forms, we employ the coherence approach of probability theory [3,4]. While in the Kolmogorov approach conditional probability is defined by a ratio of two (unconditional) probabilities, it is a primitive concept in the coherence approach. Let L be a Boolean algebra (i.e., L is closed under ¬, ∧, ∨), let |= denote the classical consequence operation and let ⊤ be the sure event. Definition 1. A mapping P : L → [0, 1] is a (finitely additive) one-place probability function iﬀ for all A, B ∈ L 1. P (⊤) = 1, 2. P (A ∨ B) = P (A) + P (B), if |= ¬(A ∧ B). C. Wallmann, G. D. Kleiter Definition 2. Let P ′ : L → [0, 1] be a one-place probability function and T0 = {B ∈ L : P ′ (B) = 0}. A mapping P : L × T0 → [0, 1] is the (finitely additive) Kolmogorov conditional probability associated with P ′ iﬀ for all A ∈ L and B ∈ T0 P (A|B) = P ′ (A ∧ B) . P ′ (B) For every Kolmogorov conditional probability, if P (B) = P (B|⊤) = P (B) = 0, then P (A|B) is undefined. The coherence approach does not exclude the case of zero probability of the conditioning event. Let L be a Boolean algebra, T ⊆ L such that T is closed under disjunction and T 0 = T \ {A ∈ T : A inconsistent} be a set of conditioning events. ′ Definition 3. A mapping P : L × T 0 → [0, 1] is a (finitely additive) conditional probability on L × T 0 iﬀ 1. P (H|H) = 1, for every H ∈ T 0 , 2. P (·|H) is a (one-place) probability function on L for every H ∈ T 0 , 3. P (E ∧ A|H) = P (E|H)P (A|(E ∧ H)), for A, E ∈ L, H, E ∧ H ∈ T 0 . (see, e.g., [3]) Let L′ , T ′ be arbitrary sets of events. Definition 4. A mapping P : L′ × T ′ → [0, 1] is coherent iﬀ there exist a Boolean algebra L ⊇ L′ and a set T ⊆ L closed under disjunction with T ′ ⊆ T 0 such that P can be extended to a conditional probability on L×T 0 . There are several advantages of the coherence approach. First, in the Kolmogorov approach, P (A|B) is defined by P P(A∧B) (B) . Knowledge of the marginal probabilities P (A ∧ B) and P (B), however, is not required to assess conditional probabilities in the coherence approach [3]. Second, while in the Kolmogorov approach P (A|B) is undefined if P (B) = 0, in the coherence approach conditioning on (consistent) events with zero probability is possible. P (·|B) is a one-place probability function even if P (B) = 0. As a consequence, for instance, probability one can be updated in the light of events with zero probability, i.e., it is not necessarily the case that P (A|B) = 1, if P (A) = 1 [3]. The interval of the coherent probability values for the conclusion of an inference form can be determined by solving sequences of linear systems. This is the content of Theorem 5 below which provides an alternative characterization of coherence [3, p. 81]. Let P (E1 |H1 ), . . . , P (En |Hn ) be a probability assessment. If Hi = ⊤, then we write P (Ei ) instead of P (Ei |Hi ). Probability Propagation in Generalized Inference Forms Table 2. Constituents C1 , . . . , C8 and their probabilities x1 , . . . , x8 for n = 3 events E F G P (Ci ) C1 1 1 1 x1 C2 1 1 0 x2 C3 1 0 1 x3 C4 1 0 0 x4 C5 0 1 1 x5 C6 0 1 0 x6 C7 0 0 1 x7 C8 0 0 0 x8 Probability x1 + x2 + x3 + x4 x1 + x2 + x5 + x6 x1 + x3 + x5 + x7 A possible outcome or a constituent is a logically consistent conjunction of the form ±E1 ∧ · · · ∧ ±En ∧ ±H1 ∧ · · · ∧ ±Hn , where ±A ∈ {A, ¬A} for all events A. If the 2n events are logically independent, then there are 22n constituents C1 , . . . , C22n . The probability of an event E is the sum of the probabilities of the constituents Cr verifying it, i.e., Cr |= E. Table 2 shows our notation in the case of three events E, F, G. Theorem 5. (Coletti and Scozzafava [3, p. 81, Theorem 4]) An assessment P (E1 |H1 ), . . . , P (En |Hn ) is coherent iﬀ there exists a sequence of compatible systems, with unknowns xα r ≥ 0, ⎧ ⎪ xα xα ⎪ r r = P (Ei |Hi ) ⎪ ⎪ Cr |=Hi Cr |=Ei ∧Hi ⎪ ⎪ ⎪ ⎨ Sα = if xrα−1 = 0, α ≥ 1 (i = 1, . . . , n) ⎪ ⎪ C |=H r i ⎪ ⎪ ⎪ ⎪ xα ⎪ r =1 ⎩ α Cr |=H0 with α = 0, 1, . . . , k ≤ n, where H00 = H1 ∨ · · · ∨ Hn and H0α denotes, for xα−1 = 0. α ≥ 1, the union of the Hi such that r Cr |=Hi Let P be a coherent extension of the assessment P (E1 |H1 ),. . ., P (En |Hn ). Then any given solution (xα r ) of the System Sα can be interpreted as a coherent extension of the initial assessment to the family {Cr |H0α : Cr |= H0α } [2]. To improve readability, we write xi instead of x0i and yi instead of x1i . Example 1. Consider for example Predictive Inference. The premises are P (E) = γ1 , P (F ) = γ2 , P (G) = γ3 . If P (E ∧ F ) = x1 + x2 > 0, then we obtain the lower (upper) bound for the predictive probability P (G|E ∧F ) by minimizing (maximizing) the objective 1 function x1x+x in the system S0 2 C. Wallmann, G. D. Kleiter x1 + x2 + x3 + x4 = γ1 x1 + x2 + x5 + x6 = γ2 x1 + x3 + x5 + x7 = γ3 P (G|E ∧ F )(x1 + x2 ) = x1 8 xi = 1, i=1 xi ≥ 0. Solving the linear system shows that γ1 + γ2 + γ3 − 2 P (G|E ∧ F ) ∈ max 0, γ1 + γ2 − 1 γ3 , min 1, γ1 + γ2 − 1 . If P (E ∧ F ) = x1 + x2 = 0, then H01 in Theorem 5 is E ∧ F . The system S1 is consequently given by P (G|E ∧ F )(y1 + y2 ) = y1 y1 + y2 = 1, y1 , y2 ≥ 0. Solving S1 shows that, if P (E∧F ) = 0, then P (G|E∧F ) can attain any value in [0, 1]. Note that in the Kolmogorov approach no corresponding result is obtained as in this case P (G|E ∧ F ) is undefined. 3. Probability Intervals for Generalized Inference Forms In this section, we collect results for probabilistic versions of important generalized inference forms [6,14]. We analyze these inference forms with respect to degradation. If some of the conditioning events have zero probability, we often obtain different intervals for the coherence and the Kolmogorov approach. In the coherence approach a proper treatment of this case is possible, so that the probability of the conclusion is always a closed interval. In the Kolmogorov approach, we obtain in many cases half-open, open, or no intervals at all. For the remainder of the paper, we suppose that P is a coherent conditional probability. 3.1. And Rule Theorem 6. If P (Ei |H) = αi , for i = 1, . . . , n, then n n Ei |H ∈ max 0, P αi − (n − 1) , min{αi } i=1 i=1 . Probability Propagation in Generalized Inference Forms The lower bound of P ( n+1 i=1 Ei |H) is less than or equal to that of P ( n i=1 Ei |H). Equality holds for lower bounds greater than zero if and only if P (En+1 |H) = n n αi + 1, then the lower bound of P ( Ei |H) αn+1 = 1. Moreover, if n ≥ i=1 i=1 is 0. We shall soon see that these properties of the conjunction cause the degradation of many other inferences. 3.2. Cautious Monotonicity The generalized version of System P rule Cautious Monotonicity is given by Theorem 7. (Gilio [6]) If P (Ei |E0 ) = αi , for i = 1, . . . , n + 1, then P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) ∈ [γ ′ , γ ′′ ], with ⎧ ⎫ ⎧ n+1 ⎪ ⎨ αi −n ⎬ n ⎪ ⎪ i=1 ⎪ ⎨ max 0, αi − (n − 1) > 0 if n ⎩ αi −(n−1) ⎭ i=1 γ′ = i=1 ⎪ n ⎪ ⎪ ⎪ αi − (n − 1) ≤ 0 if ⎩0 i=1 ⎧ ⎫ ⎧ ⎪ ⎬ ⎨ n ⎪ ⎪ αn+1 ⎪ αi − (n − 1) > 0 if ⎨ min 1, n ⎩ αi −(n−1) ⎭ i=1 γ ′′ = i=1 ⎪ n ⎪ ⎪ ⎪ if αi − (n − 1) ≤ 0 ⎩1 i=1 Remark 8. Suppose that αn+1 < 1. If P is a Kolmogorov probability and n αi −(n−1) ≤ 0, then the upper bound 1 cannot be attained. P (En+1 |E0 ∧ i=1 E1 ∧ · · · ∧ En ) = 1, if and only if, P (E0 ∧ E1 ∧ · · · ∧ En ) = P (E0 ∧ E1 ∧ · · · ∧ En ∧ En+1 ). This requires that P (E0 ∧ E1 ∧ · · · ∧ En ) = 0 and hence that P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) is undefined. Cautious Monotonicity degrades. As the number of premises increases, the width of the interval of the conclusion increases. Furthermore, if n ≥ n αi + 1, then P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) ∈ [0, 1]. i=1 3.3. Cut The generalized version of System P rule Cut is given by the following theorem. The interval of the conclusion strongly depends on the lower bound n σn for the conjunction P ( Ei |E0 ). i=0 C. Wallmann, G. D. Kleiter Theorem 9. (Gilio [6]) If P (Ei |E0 ) = αi , for i = 1, . . . , n, and n Ei = β, then P H| i=0 P (H|E0 ) ∈ [βσn , βσn + 1 − σn ], with n σn = max 0, i=1 αi − (n − 1) . Remark 10. If P is a Kolmogorov probability, then the bounds are the n n same for αi − (n − 1) > 0. However, if αi − (n − 1) ≤ 0 and 0 < β < 1, i=1 i=1 then the interval for P (H|E0 ) is the open interval (0, 1). The value 0 (resp. n Ei |E0 = 0 as the following equation shows 1) would require that P i=1 P (H|E0 ) = P H| Therefore, P is undefined. n i=0 n Ei P i=0 n i=1 Ei |E0 +P H|¬ n i=1 Ei P ¬ n i=1 Ei |E0 . n n Ei Ei |E0 P E0 = 0 and hence P H| Ei = P i=1 i=0 Cut degrades. The width of the interval for P (H|E0 ) increases as the number of premises increases. This follows from the facts that its width is 1 − σn and that σn is monotonically decreasing. The width of the interval 1 − σn depends on the lower bound of the conjunction, i.e., σn . Since this n lower bound is zero if n ≥ αi + 1, the interval for P (H|E0 ) is the unit i=1 interval if the number of premises is sufficiently high. 3.4. Bayes’ Theorem Suppose that the prior probability of a certain hypothesis P (H) = δ, the likelihood of the data given both, the hypothesis H, P (D|H) = β, and the alternative hypothesis ¬H, P (D|¬H) = γ, are given. The posterior probability of the hypothesis H given the data D is obtained, if P (D) > 0, by βδ . The premises of generalized Bayes’ Bayes’ Theorem P (H|D) = βδ+γ(1−δ) Theorem are P (H) = δ, P (E1 |H) = β1 , . . . , P (En |H) = βn , P (E1 |¬H) = γ1 , . . . , P (En |¬H) = γn . In inferential statistics it is often assumed that the Ei ’s are independent and identically distributed. We do neither require conditional independence of the Ei ’s given H nor do we require that P (Ei |H) = P (Ej |H) for i = j. The conclusion of the generalized Bayes’ Theorem is P (H|E1 ∧ · · · ∧ En ). Probability Propagation in Generalized Inference Forms Theorem 11. (Wallmann & Kleiter [14], lower bound) Let P (H) = δ and let for all i = 1, . . . , n, P (Ei |H) = βi and P (Ei |¬H) = γi . Then: n • If δ βi − (n − 1) > 0, then i=1 n βi − (n − 1) . P (H|E1 ∧ · · · ∧ En ) ≥ n βi − (n − 1) + (1 − δ) min{γi } δ δ i=1 i=1 • If δ( n i=1 βi − (n − 1)) ≤ 0, then P (H|E1 ∧ · · · ∧ En ) ≥ 0. Theorem 12. (Wallmann & Kleiter [14], upper bound) Let P (H) = δ and let for all i = 1, . . . , n, P (Ei |H) = βi and P (Ei |¬H) = γi . Then: • If (1 − δ)( n i=1 γi − (n − 1)) > 0, then P (H|E1 ∧ · · · ∧ En ) ≤ δ min{βi } . n γi − (n − 1) δ min{βi } + (1 − δ) i=1 • If (1 − δ)( n i=1 γi − (n − 1)) ≤ 0, then P (H|E1 ∧ · · · ∧ En ) ≤ 1. Bayes’ Theorem does not degrade (for a counter-example see [14]). Hown n n ever, if n ≥ max βi − (n − 1) ≤ 0 and γi + 1 , then βi + 1, n i=1 i=1 i=1 i=1 γi − (n − 1) ≤ 0, so that the interval [0, 1] is obtained. There are two spe- cial cases in which Bayes’ Theorem degrades. First, if identical likelihoods βi = β and γi = γ are assumed. Second, if the values γi ∈ [0, 1] are not specified. 3.5. Modus Tollens Modus Tollens is the inference from {¬B, A ⇒ B} to the conclusion ¬A. The result for probabilistic Modus Tollens with two premises within the Kolmogorov approach has been derived in [13]. Generalized Modus Tollens is given by C. Wallmann, G. D. Kleiter Theorem 13. (Wallmann & Kleiter [14]) and if P (E1 ∧ E2 ∧ · · · ∧ En |H) = β, then ⎧ ∗ if α∗ + β > 1 1 − 1−α ⎪ β ⎪ ⎪ n ⎪ ⎪ αi ⎨ i=1 ′ 1 − if α∗ + β ≤ 1 1−β δ = ⎪ ⎪ ⎪ ⎪ ⎪ ⎩0 if α∗ + β ≤ 1 where α∗ = max{αi }. If P (¬Ei ) = αi , for i = 1, . . . , n, P (¬H) ∈ [δ ′ , 1], with and and n i=1 n i=1 αi + β < 1 αi + β ≥ 1 , Remark 14. If P is a Kolmogorov probability, then the upper bound 1 is never correct. Since if P (¬H) = 1, then P (H) = 0 and consequently P (E1 ∧ E2 ∧ · · · ∧ En |H) is undefined. Within the coherence approach an assessment of probability 1 to both premises of Modus Tollens is perfectly alright and leads to probability 1 of the conclusion. A Kolmogorov probability such that P (¬B) = 1 and P (B|A) = 1, however, does not exist. If P (¬B) = 1, then P (A) = 0, and hence P (B|A) is undefined—a contradiction. Modus Tollens is special because if α∗ + β > 1, then the interval of its conclusion does not depend on the number of premises n. However, if α∗ +β ≤ 1, then it does depend on n. Modus Tollens does not degrade. Moreover, contrary to the other inferences considered so far, the unit interval is not necessarily obtained if the number of premises is large. 3.5.1. Exclusive-Or. System O is weaker than System P [8,9]. It contains weaker forms of the rules And and Or, Weak-And (Wand) and Weak-Or (Wor). Wor is System O equivalent with the rule Exclusive-Or (Xor). Exclusive-Or is the following rule If A ⇒ C, B ⇒ C, and |= ¬(A ∧ B), then A ∨ B ⇒ C. The generalized probabilistic version is given by Theorem 15. If P (H|Ei ) = αi , for i = 1, . . . , n and |= ¬(Ei ∧ Ej ), for 1 ≤ i < j ≤ n, then P (H|E1 ∨ · · · ∨ En ) ∈ [min{αi }, max{αi }]. Proof. If |= ¬(Ei ∧ Ej ), then P (H|E1 ∨ · · · ∨ En )= = P (H ∧ E1 |E1 ∨ · · · ∨ En ) + · · · + P (H ∧ En |E1 ∨ · · · ∨ En ) = P (H|E1 )P (E1 |E1 ∨ · · · ∨ En ) + · · · + P (H|En )P (En |E1 ∨ · · · ∨ En ). Probability Propagation in Generalized Inference Forms Setting if αi = min{αi } (resp. αi = max{αi }), P (Ei |E1 ∨ · · · ∨ En ) = 1 yields the lower (resp. upper) probability for the conclusion. Remark 16. In the Kolmogorov approach, if for some i, j αi = αj , we obtain the open interval (min{αi }, max{αi }). In this case we cannot set for some i P (Ei |E1 ∨ · · · ∨ En ) = 1, because then P (Ej ) = 0 for all j = i and consequently P (H|Ej ) is undefined. Xor does degrade. However, the interval [0, 1] is not necessarily obtained after addition of a certain number of premises. 4. Probabilistic Validity of Generalized Inference Forms The key question of this section is whether a certain generalized inference form satisfies one of the probability preservation properties below. The question can be answered by considering the lower bound of the intervals obtained in Section 3. The Kolmogorov approach and the coherence approach often yield different lower bounds. As a consequence, an inference form may satisfy a preservation property relative to one of the approaches while it does not satisfy it with respect to the other approach. 4.1. Preservation Properties Adams considered four preservation properties [1, p. 1]. 1. Certainty-preservation If the premises of an inference form have probability 1, then its conclusion has probability 1. 2. High probability-preservation If the premises are highly probable, then the conclusion is highly probable, i.e., for all δ > 0 there exists ǫ > 0 such that: If for every premise A it is P (A) ≥ 1 − ǫ, then for the conclusion C it holds that P (C) ≥ δ. 3. Positive probability-preservation If the premises have positive probability, then the conclusion has positive probability. 4. Minimum probability-preservation The probability of the conclusion is at least as high as the minimum of the probabilities of the premises. Or equivalently: For all thresholds r: If the probability of each premise is at least r, then the probability of the conclusion is at least r. The preservation properties above are ordered by strictness. The chain of implications 4 ⇒ 3 ⇒ 2 ⇒ 1 holds. However, neither of the converse implications is true. C. Wallmann, G. D. Kleiter Consider for example Modus Ponens. If P (B|A) = β and P (A) = α, then the interval for P (B) is [αβ, αβ + 1 − α]. Modus Ponens {A ⇒ B, A} ∴ B is consequently 1. Certainty preserving: If α = 1 and β = 1, then αβ = 1. 2. High probability preserving: For P (H) ≥ δ, choose 1 − ǫ = √ δ. 3. Positive probability preserving: If α > 0 and β > 0, then αβ > 0. 4. Not minimum preserving: In general, it is not the case that αβ ≥ min{α, β}. 4.2. Certainty-Preservation and High Probability-Preservation It is important that, while in the Kolmogorov approach high probability and certainty-preservation differ, they are, given the assumption of p-consistent premises, equivalent in the coherence approach [5,7]. We call a set of premises {A1 , . . . , An } p-consistent iﬀ the assessment P (A1 ) = · · · = P (An ) = 1 is coherent. Theorem 17. Suppose that {A1 , . . . , An } is p-consistent. Then the inference from {A1 , . . . , An } to C is in the coherence approach certainty preserving iﬀ it is high probability preserving. Remark 18. In the coherence approach an inference form has been called System P valid iﬀ its premises are p-consistent and it is high probability preserving [5]. Contrary to other approaches, System P validity therefore requires p-consistent premises. We mention three such approaches. Adams [1] works with the default assumption: If P (A) = 0, then P (B|A) = 1 for all B. Hawthorne uses Popper functions. With respect to Popper functions certainty and high probability-preservation are, even without the assumption of p-consistent premises, equivalent [8]. Hawthorne and Makinson [9] employ Kolmogorov probability functions. In Section 4.4, we discuss the inference form Weak-And. It is System P valid with respect to these approaches, but not with respect to the coherence approach. The inference from B to A ⇒ B is, for example, relative to the Kolmogorov approach certainty preserving but not high probability preserving. In the coherence approach this inference form is not high probability preserving and therefore, because {B} is p-consistent, not certainty preserving. The inference forms of Section 3 are certainty preserving relative to the coherence approach. This is immediately obtained by considering the lower bound of their conclusion. Consequently, if their premises are p-consistent, Probability Propagation in Generalized Inference Forms these inference forms are already known to be high probability preserving in the coherence approach. To show that an inference form is high probability preserving with respect to the coherence approach, we can alternatively determine for every probability of the conclusion δ a “high” probability 1−ǫ for the premises, such that this probability assessment guarantees that the probability of the conclusion is at least δ. A suitable ǫ can be determined by considering the intervals given in Section 3. Consider, for example, Modus Tollens. Let δ > 0. In order that ∗ P (C) ≥ δ, the lower bound of P (C), i.e., 1 − 1−α β , may not be less than δ. Therefore, we solve 1− 1 − (1 − ǫ) ≥δ 1−ǫ 1 . A suitable ǫ for the other inference forms can for ǫ and obtain 1 − ǫ ≥ 2−δ be determined by the same method. We have Theorem 19. Let P be a coherent conditional probability. All inference forms of Section 3 with p-consistent premises are certainty preserving and (consequently) high probability preserving. Remark 20. Although generalized inference forms remain high probability preserving, degradation has a striking consequence. In order to obtain that the probability of the conclusion is at least δ, ǫ is clearly decreasing with increasing n. Since the lower probability of the conclusion decreases as the number of premises increases, for a high probability of the conclusion a higher probability of the premises is necessary with increasing n. 4.3. Positive Probability-Preservation For n ≥ 2 premises high probability preservation differs significantly from positive probability preservation. While all of the generalized inference forms considered in Section 3 are high probability preserving, none of them—with the exception of Xor—is positive probability preserving. As already pointed out, in the case of And, Cautious Monotonicity, Cut, and Bayes’ Theorem the lower bound of the conclusion is zero if the number of premises n is sufficiently high. Moreover, in contrast to their generalizations, some of the inference forms are positive probability preserving. Cut and Bayes’ Theorem are positive probability preserving for n = 1. If the sum of the probabilities of the two premises is different from one, then Modus Tollens is also positive probability preserving. C. Wallmann, G. D. Kleiter Theorem 21. If P is a coherent conditional probability, then the generalizations of And, Cautious Monotonicity, Cut, Modus Tollens and Bayes’ Theorem are not positive probability preserving. Generalized Xor is positive probability preserving. The Kolmogorov and the coherence approach validate different inference forms. In the Kolmogorov approach generalized Cut, for instance, is positive probability preserving, while this is not the case in the coherence approach. 4.4. Minimum Probability-Preservation Positive probability preservation and minimum probability preservation differ. Contrary to positive probability preservation, Cut with two premises is, for example, not minimum probability preserving. System O is closely connected with minimum probability preservation (for a description of System O see [8,9]). All its inference forms are minimum probability preserving. The converse, however, is not true. The generalization of Xor is minimum probability preserving. Theorem 22. Let P be a coherent or a Kolmogorov conditional probability. Then generalized Xor is minimum probability preserving. The System O rule Weak-And (Wand) is given by If A ∧ ¬B ⇒ B and A ⇒ C, then A ⇒ B ∧ C. Wand is central to System O and minimum preserving in the Kolmogorov approach. However, a positive probability assessment to A ∧ ¬B ⇒ B is incoherent. Hence, from the point of view of coherence, System O is not satisfactory. Theorem 23. For every probability assessment P P (B|A ∧ ¬B) > 0 is incoherent. In particular, {A ∧ ¬B ⇒ B, A ⇒ C} is not p-consistent. Proof. If P is a coherent probability assessment, then P (B|A ∧ ¬B) + P (¬B|A ∧ ¬B) = 1. Since P (¬B|A ∧ ¬B) = 1, P (B|A ∧ ¬B) = 0. Remark 24. The premises of Wand are not p-consistent. As a consequence Wand is not System P valid in the coherence approach (compare Remark 18). In other approaches p-consistency of the premises is not required for System P validity (compare Remark 18). Consequently, since And is System P valid, and Wand is a special case of And, Wand is System P valid in these approaches. Probability Propagation in Generalized Inference Forms If P (B|A∧¬B) > 0, we can conclude in the Kolmogorov framework, that P (A ∧ ¬B) = 0 and hence that P (B|A ∧ ¬B) is undefined. Hence, there is no Kolmogorov probability such that P (B|A ∧ ¬B) > 0. 5. Conclusions We have seen that Cautious Monotonicity, Cut, and Exclusive-Or clearly degrade, and that Bayes’ Theorem (with some exceptions) and Modus Tollens do not degrade. Moreover, in all the inference forms considered— with the exception of Modus Tollens and Exclusive-Or—the unit interval is obtained even with a “small” number of premises. Narrow intervals may be considered to be better than wide intervals; a more complete knowledge base may be considered to be better than a truncated one [12]. While in general the number of premises and the precision of the conclusion may conflict, in generalized inference forms they often must conflict. Degradation does not conflict with the property of monotonicity, but its consequences for information seeking cannot be ignored. On the one hand, the principle of total evidence leads to the selection of the most “recent” interval based on the most specific information. This yields wide intervals, and in many cases even the non-informative [0, 1] interval. On the other hand, a take-the-best strategy leads to the selection of the tightest interval. The according interval is based on the seemingly most “relevant” information with n = 1. Since all additional premises are discarded, it would be counterproductive to seek further information, because it would simply be useless. Degradation is neither “good” nor “bad”. To solve the conflict between precision and specificity requires to counterbalance (i) the width of an interval, (ii) the amount of information it is based upon, and (iii) the position of the interval. The choice depends on pragmatic conditions. An answer to the question which interval should rationally be selected seems to lie outside the domain of probability theory. It might be supposed that degradation disappears if further constraints are added to the premises. In many cases stochastic independence, for example, leads to point probabilities of the conclusions. Though often presupposed, independence may be a constraint that is too strong. Exchangeability is a related but much weaker assumption. We have shown that in many generalized inference forms exchangeability does not prevent degradation [15]. C. Wallmann, G. D. Kleiter In general, degradation does not make generalized inference forms probabilistically invalid. Each of the inference forms considered in this contribution is high probability preserving. As already pointed out, the lower probability of the conclusion is often zero if the number of premises is large. Therefore, all of the inference forms—with exception of Exclusive-Or—are not positive probability preserving. Acknowledgements. Supported by the Austrian Science Foundation (I 141G15) and the LogICCC Programme of the European Science Foundation. We thank two anonymous referees for their valuable comments. Open Access. This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. References [1] Adams, E. W., Four probability-preserving properties of inferences, Journal of Philosophical Logic 25:1–24, 1996. [2] Biazzo, V., Gilio, A., and G. Sanfilippo, Coherent conditional previsions and proper scoring rules, in S. Greco, B. Bouchon-Meunier, G. Coletti, M. Fedrizzi, B. Matarazzo, and R. R. Yager (eds.), Advances in Computational Intelligence, IPMU (4), vol. 300 of Communications in Computer and Information Science, Springer, 2012, pp. 146–156. [3] Coletti, G., and R. Scozzafava, Probabilistic Logic in a Coherent Setting, Kluwer, Dordrecht, 2002. [4] De Finetti, B., Theory of Probability, Wiley, London, 1974. [5] Gilio, A., Probabilistic reasoning under coherence in System P, Annals of Mathematics and Artificial Intelligence 34(1–3):5–34, 2002. [6] Gilio, A., Generalization of inference rules in coherence-based probabilistic default reasoning, International Journal of Approximate Reasoning 53:413–434, 2012. [7] Gilio, A., and G. Sanfilippo, Probabilistic entailment in the setting of coherence: The role of quasi conjunction and inclusion relation. International Journal of Approximate Reasoning 54(4):513–525, 2013. [8] Hawthorne, J., On the logic of nonmonotonic conditionals and conditional probabilities, Journal of Philosophical Logic 25:185–218, 1996. [9] Hawthorne, J., and D. Makinson, The quantitative/qualitative watershed for rules of uncertain inference, Studia Logica 86:247–297, 2007. [10] Kleiter, G. D., Ockham’s razor in probability logic, in R. Kruse, M. R. Berthold, C. Moewes, M. A. Gil, P. Grzegorzewski, and O. Hryniewicz (eds.), Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Advances in Intelligent Systems and Computation, 190, Springer, 2012, pp. 409–417. Probability Propagation in Generalized Inference Forms [11] Kraus, S., D. Lehmann., and M. Magidor, Nonmonotonic reasoning, preferential models and cumulative logics, Artificial Intelligence 44:167–207, 1990. [12] Kyburg, H., and C. M. Teng, Uncertain Inference, Cambridge University Press, Cambridge, 2001. [13] Wagner, C. G., Modus tollens probabilized, British Journal for the Philosophy of Science 55:747–753, 2004. [14] Wallmann, C., and G. D. Kleiter, Beware of too much information, in T. Kroupa, and J. Vejnarova (eds.), Proceedings of the 9th Workshop on Uncertainty Processing, WUPES, Fakulty of Managment, University of Economics, Prague, 2012, pp. 214–225. [15] Wallmann, C., and G. D. Kleiter, Exchangeability in probability logic, in S. Greco, B. Bouchon-Meunier, G. Coletti, M. Fedrizzi, B. Matarazzo, and R. R. Yager (eds.), IPMU (4), vol. 300 of Communications in Computer and Information Science, Springer, 2012, pp. 157–167. C. Wallmann, G. D. KLEITER Department of Psychology University of Salzburg Hellbrunnerstr. 34 Salzburg, Austria [email protected] G. D. Kleiter [email protected]

Log In

Probability Propagation in Generalized Inference Forms

Related papers

Related papers

Related topics