Christian Wallmann
Gernot D. Kleiter
Probability Propagation in
Generalized Inference Forms
Abstract.
Probabilistic inference forms lead from point probabilities of the premises
to interval probabilities of the conclusion. The probabilistic version of Modus Ponens, for
example, licenses the inference from P (A) = α and P (B|A) = β to P (B) ∈ [αβ, αβ +1−α].
We study generalized inference forms with three or more premises. The generalized Modus
Ponens, for example, leads from P (A1 ) = α1 , . . . , P (An ) = αn and P (B|A1 ∧ · · · ∧ An ) = β
to an according interval for P (B). We present the probability intervals for the conclusions of the generalized versions of Cut, Cautious Monotonicity, Modus Tollens, Bayes’
Theorem, and some System O rules. Recently, Gilio has shown that generalized inference
forms “degrade”—more premises lead to less precise conclusions, i.e., to wider probability
intervals of the conclusion. We also study Adam’s probability preservation properties in
generalized inference forms. Special attention is devoted to zero probabilities of the conditioning events. These zero probabilities often lead to different intervals in the coherence
and the Kolmogorov approach.
Keywords: Probability logic, Generalized inference forms, Degradation, Probability
preservation, Coherence.
1.
Introduction
While logic studies the propagation of truth values from premises to conclusions, probability logic studies the propagation of probabilities from premises
to conclusions. In probability logic Modus Ponens, for example, has the
form shown on the left hand side of Table 1. On the right hand side the
generalized probabilistic Modus Ponens is shown. In probability logic a conditional A ⇒ B is represented by the “conditional event” B|A. The probabilities of the premises are point probabilities. This assessment is assumed to
be coherent. Usually, the inferred probability of the conclusion is an interval
probability.
Special Issue Logic and Probability: Reasoning in Uncertain Environments
Edited by Matthias Unterhuber and Gerhard Schurz
Received November 2, 2012
Studia Logica
DOI: 10.1007/s11225-013-9513-4
c The Author(s) 2013. This article is published with open access at Springerlink.com
C. Wallmann, G. D. Kleiter
Table 1. Probabilistic Modus Ponens with one categorical premise (left)
and generalized probabilistic Modus Ponens with n categorical premises
(right)
P (E) = α
P (H|E) = β
P (H) ∈ [δ ′ , δ ′′ ]
P (E1 ) = α1
...
P (En ) = αn
P (H|E1 ∧ · · · ∧ En ) = βn
P (H) ∈ [δn′ , δn′′ ]
Below we study, for different generalized inference forms, the behavior of
the interval [δn′ , δn′′ ] for an increasing number n of premises. We review results
recently obtained for generalized probabilistic inference forms [6,10,14,15].
In these inference forms a degradation is observed. The width of the probability interval of the conclusion increases as the number n of premises increases.
Thus, more premises lead to less precise conclusions. Figure 1 shows a numerical example for the degradation of Modus Ponens. In most inference forms
even an “ultimate” degradation occurs: Already after the addition of a small
number of premises, the interval of the conclusion becomes the non-informative interval [0, 1]. This is a consequence of the fact, that already for small
n
n the lower bound of the conjunction P ( Ei ) may be zero.
i=1
Because the lower bound of the conjunction probability—even for a relative small number of conjuncts—becomes zero, the conditioning events may
have zero probabilities. However, in the Kolmogorov approach conditional
probability is undefined in this case. The Kolmogorov approach is therefore
not appropriate to investigate generalized inference forms. The case where
Figure 1. Degradation of Modus Ponens: lower (◦) and upper ()
bounds of P (H) (on the y-axes) for increasing numbers of categorical premises n (on the x-axes); premise probabilities P (Ei ) = 0.8, for
n
i = 1, . . . , 8, and P (H|
Ei ) = 0.5
i=1
Probability Propagation in Generalized Inference Forms
the conditioning event has zero probability can, however, be treated in the
coherence approach of de Finetti [4]. As a consequence, the Kolmogorov
and the coherence approach lead to different interval probabilities for the
conclusion of generalized inference forms.
As already mentioned, in many generalized inference forms the interval
for the conclusion is getting wider as the number of premises increases and
the interval [0, 1] is obtained after a certain number of premises is added.
In probabilistically valid inference forms, however, the probability of the
premises of an inference form is preserved to its conclusion. Are generalized
inference forms consequently probabilistically invalid? Different inference
forms preserve the probability of its premises to its conclusion to different
degrees. Adams distinguished four preservation properties [1]. Each of these
preservation properties determines a consequence operation. An inference
form is valid with respect to such a consequence operation, if and only if, it
satisfies the corresponding preservation property. We can establish whether
an inference form satisfies a preservation property by considering the lower
probability of its conclusion. Well-known examples are System P [11], which
is associated with probability one-preservation and System O [8,9], which
is closely connected with minimum probability preservation. Modus Ponens,
for instance, is probability one preserving and consequently System P valid.
This can immediately be seen by considering the lower bound of the interval
of the conclusion of Modus Ponens. If P (A) = α = 1 and P (B|A) = β = 1,
then P (B) ≥ αβ = 1. It is important to note that, since they yield different intervals for the conclusion of inference forms, the Kolmogorov and the
coherence approach validate different inference forms.
2.
Coherent Conditional Probability
For the treatment of conditioning events with zero probabilities in generalized inference forms, we employ the coherence approach of probability theory
[3,4]. While in the Kolmogorov approach conditional probability is defined
by a ratio of two (unconditional) probabilities, it is a primitive concept in
the coherence approach.
Let L be a Boolean algebra (i.e., L is closed under ¬, ∧, ∨), let |= denote
the classical consequence operation and let ⊤ be the sure event.
Definition 1. A mapping P : L → [0, 1] is a (finitely additive) one-place
probability function iff for all A, B ∈ L
1. P (⊤) = 1,
2. P (A ∨ B) = P (A) + P (B), if |= ¬(A ∧ B).
C. Wallmann, G. D. Kleiter
Definition 2. Let P ′ : L → [0, 1] be a one-place probability function and
T0 = {B ∈ L : P ′ (B) = 0}. A mapping P : L × T0 → [0, 1] is the (finitely
additive) Kolmogorov conditional probability associated with P ′ iff for all
A ∈ L and B ∈ T0
P (A|B) =
P ′ (A ∧ B)
.
P ′ (B)
For every Kolmogorov conditional probability, if P (B) = P (B|⊤) =
P (B) = 0, then P (A|B) is undefined. The coherence approach does not
exclude the case of zero probability of the conditioning event.
Let L be a Boolean algebra, T ⊆ L such that T is closed under disjunction
and T 0 = T \ {A ∈ T : A inconsistent} be a set of conditioning events.
′
Definition 3. A mapping P : L × T 0 → [0, 1] is a (finitely additive) conditional probability on L × T 0 iff
1. P (H|H) = 1, for every H ∈ T 0 ,
2. P (·|H) is a (one-place) probability function on L for every H ∈ T 0 ,
3. P (E ∧ A|H) = P (E|H)P (A|(E ∧ H)), for A, E ∈ L, H, E ∧ H ∈ T 0 . (see,
e.g., [3])
Let L′ , T ′ be arbitrary sets of events.
Definition 4. A mapping P : L′ × T ′ → [0, 1] is coherent iff there exist
a Boolean algebra L ⊇ L′ and a set T ⊆ L closed under disjunction with
T ′ ⊆ T 0 such that P can be extended to a conditional probability on L×T 0 .
There are several advantages of the coherence approach. First, in the Kolmogorov approach, P (A|B) is defined by P P(A∧B)
(B) . Knowledge of the marginal
probabilities P (A ∧ B) and P (B), however, is not required to assess conditional probabilities in the coherence approach [3]. Second, while in the
Kolmogorov approach P (A|B) is undefined if P (B) = 0, in the coherence
approach conditioning on (consistent) events with zero probability is possible. P (·|B) is a one-place probability function even if P (B) = 0. As a consequence, for instance, probability one can be updated in the light of events
with zero probability, i.e., it is not necessarily the case that P (A|B) = 1, if
P (A) = 1 [3].
The interval of the coherent probability values for the conclusion of an
inference form can be determined by solving sequences of linear systems.
This is the content of Theorem 5 below which provides an alternative characterization of coherence [3, p. 81]. Let P (E1 |H1 ), . . . , P (En |Hn ) be a probability assessment. If Hi = ⊤, then we write P (Ei ) instead of P (Ei |Hi ).
Probability Propagation in Generalized Inference Forms
Table 2. Constituents C1 , . . . , C8 and their probabilities x1 , . . . , x8 for
n = 3 events
E
F
G
P (Ci )
C1
1
1
1
x1
C2
1
1
0
x2
C3
1
0
1
x3
C4
1
0
0
x4
C5
0
1
1
x5
C6
0
1
0
x6
C7
0
0
1
x7
C8
0
0
0
x8
Probability
x1 + x2 + x3 + x4
x1 + x2 + x5 + x6
x1 + x3 + x5 + x7
A possible outcome or a constituent is a logically consistent conjunction of
the form ±E1 ∧ · · · ∧ ±En ∧ ±H1 ∧ · · · ∧ ±Hn , where ±A ∈ {A, ¬A} for
all events A. If the 2n events are logically independent, then there are 22n
constituents C1 , . . . , C22n . The probability of an event E is the sum of the
probabilities of the constituents Cr verifying it, i.e., Cr |= E. Table 2 shows
our notation in the case of three events E, F, G.
Theorem 5. (Coletti and Scozzafava [3, p. 81, Theorem 4]) An assessment
P (E1 |H1 ), . . . , P (En |Hn ) is coherent iff there exists a sequence of compatible systems, with unknowns xα
r ≥ 0,
⎧
⎪
xα
xα
⎪
r
r = P (Ei |Hi )
⎪
⎪
Cr |=Hi
Cr |=Ei ∧Hi
⎪
⎪
⎪
⎨
Sα =
if
xrα−1 = 0, α ≥ 1
(i = 1, . . . , n)
⎪
⎪
C
|=H
r
i
⎪
⎪
⎪
⎪
xα
⎪
r =1
⎩
α
Cr |=H0
with α = 0, 1, . . . , k ≤ n, where H00 =
H1 ∨ · · · ∨ Hn and H0α denotes, for
xα−1
= 0.
α ≥ 1, the union of the Hi such that
r
Cr |=Hi
Let P be a coherent extension of the assessment P (E1 |H1 ),. . ., P (En |Hn ).
Then any given solution (xα
r ) of the System Sα can be interpreted as a coherent extension of the initial assessment to the family {Cr |H0α : Cr |= H0α } [2].
To improve readability, we write xi instead of x0i and yi instead of x1i .
Example 1. Consider for example Predictive Inference. The premises are
P (E) = γ1 , P (F ) = γ2 , P (G) = γ3 .
If P (E ∧ F ) = x1 + x2 > 0, then we obtain the lower (upper) bound for the
predictive probability P (G|E ∧F ) by minimizing (maximizing) the objective
1
function x1x+x
in the system S0
2
C. Wallmann, G. D. Kleiter
x1 + x2 + x3 + x4 = γ1
x1 + x2 + x5 + x6 = γ2
x1 + x3 + x5 + x7 = γ3
P (G|E ∧ F )(x1 + x2 ) = x1
8
xi = 1,
i=1
xi ≥ 0.
Solving the linear system shows that
γ1 + γ2 + γ3 − 2
P (G|E ∧ F ) ∈ max 0,
γ1 + γ2 − 1
γ3
, min 1,
γ1 + γ2 − 1
.
If P (E ∧ F ) = x1 + x2 = 0, then H01 in Theorem 5 is E ∧ F . The system S1
is consequently given by
P (G|E ∧ F )(y1 + y2 ) = y1
y1 + y2 = 1,
y1 , y2 ≥ 0.
Solving S1 shows that, if P (E∧F ) = 0, then P (G|E∧F ) can attain any value
in [0, 1]. Note that in the Kolmogorov approach no corresponding result is
obtained as in this case P (G|E ∧ F ) is undefined.
3.
Probability Intervals for Generalized Inference Forms
In this section, we collect results for probabilistic versions of important generalized inference forms [6,14]. We analyze these inference forms with respect
to degradation. If some of the conditioning events have zero probability,
we often obtain different intervals for the coherence and the Kolmogorov
approach. In the coherence approach a proper treatment of this case is possible, so that the probability of the conclusion is always a closed interval. In
the Kolmogorov approach, we obtain in many cases half-open, open, or no
intervals at all.
For the remainder of the paper, we suppose that P is a coherent conditional probability.
3.1.
And Rule
Theorem 6. If P (Ei |H) = αi , for i = 1, . . . , n, then
n
n
Ei |H ∈ max 0,
P
αi − (n − 1) , min{αi }
i=1
i=1
.
Probability Propagation in Generalized Inference Forms
The lower bound of P (
n+1
i=1
Ei |H) is less than or equal to that of P (
n
i=1
Ei |H).
Equality holds for lower bounds greater than zero if and only if P (En+1 |H) =
n
n
αi + 1, then the lower bound of P ( Ei |H)
αn+1 = 1. Moreover, if n ≥
i=1
i=1
is 0. We shall soon see that these properties of the conjunction cause the
degradation of many other inferences.
3.2.
Cautious Monotonicity
The generalized version of System P rule Cautious Monotonicity is given by
Theorem 7. (Gilio [6]) If P (Ei |E0 ) = αi , for i = 1, . . . , n + 1, then
P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) ∈ [γ ′ , γ ′′ ], with
⎧
⎫
⎧
n+1
⎪
⎨
αi −n ⎬
n
⎪
⎪
i=1
⎪
⎨ max 0,
αi − (n − 1) > 0
if
n
⎩
αi −(n−1) ⎭
i=1
γ′ =
i=1
⎪
n
⎪
⎪
⎪
αi − (n − 1) ≤ 0
if
⎩0
i=1
⎧
⎫
⎧
⎪
⎬
⎨
n
⎪
⎪
αn+1
⎪
αi − (n − 1) > 0
if
⎨ min 1,
n
⎩
αi −(n−1) ⎭
i=1
γ ′′ =
i=1
⎪
n
⎪
⎪
⎪
if
αi − (n − 1) ≤ 0
⎩1
i=1
Remark 8. Suppose that αn+1 < 1. If P is a Kolmogorov probability and
n
αi −(n−1) ≤ 0, then the upper bound 1 cannot be attained. P (En+1 |E0 ∧
i=1
E1 ∧ · · · ∧ En ) = 1, if and only if, P (E0 ∧ E1 ∧ · · · ∧ En ) = P (E0 ∧ E1 ∧ · · · ∧
En ∧ En+1 ). This requires that P (E0 ∧ E1 ∧ · · · ∧ En ) = 0 and hence that
P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) is undefined.
Cautious Monotonicity degrades. As the number of premises increases,
the width of the interval of the conclusion increases. Furthermore, if n ≥
n
αi + 1, then P (En+1 |E0 ∧ E1 ∧ · · · ∧ En ) ∈ [0, 1].
i=1
3.3.
Cut
The generalized version of System P rule Cut is given by the following theorem. The interval of the conclusion strongly depends on the lower bound
n
σn for the conjunction P ( Ei |E0 ).
i=0
C. Wallmann, G. D. Kleiter
Theorem 9. (Gilio [6]) If P (Ei |E0 ) = αi , for i = 1, . . . , n, and
n
Ei = β, then
P H|
i=0
P (H|E0 ) ∈ [βσn , βσn + 1 − σn ], with
n
σn = max 0,
i=1
αi − (n − 1) .
Remark 10. If P is a Kolmogorov probability, then the bounds are the
n
n
same for
αi − (n − 1) > 0. However, if
αi − (n − 1) ≤ 0 and 0 < β < 1,
i=1
i=1
then the interval for P (H|E0 ) is the open interval (0, 1). The value 0 (resp.
n
Ei |E0 = 0 as the following equation shows
1) would require that P
i=1
P (H|E0 ) = P H|
Therefore, P
is undefined.
n
i=0
n
Ei P
i=0
n
i=1
Ei |E0 +P H|¬
n
i=1
Ei P ¬
n
i=1
Ei |E0
.
n
n
Ei
Ei |E0 P E0 = 0 and hence P H|
Ei = P
i=1
i=0
Cut degrades. The width of the interval for P (H|E0 ) increases as the
number of premises increases. This follows from the facts that its width is
1 − σn and that σn is monotonically decreasing. The width of the interval
1 − σn depends on the lower bound of the conjunction, i.e., σn . Since this
n
lower bound is zero if n ≥
αi + 1, the interval for P (H|E0 ) is the unit
i=1
interval if the number of premises is sufficiently high.
3.4.
Bayes’ Theorem
Suppose that the prior probability of a certain hypothesis P (H) = δ, the
likelihood of the data given both, the hypothesis H, P (D|H) = β, and the
alternative hypothesis ¬H, P (D|¬H) = γ, are given. The posterior probability of the hypothesis H given the data D is obtained, if P (D) > 0, by
βδ
. The premises of generalized Bayes’
Bayes’ Theorem P (H|D) = βδ+γ(1−δ)
Theorem are P (H) = δ, P (E1 |H) = β1 , . . . , P (En |H) = βn , P (E1 |¬H) =
γ1 , . . . , P (En |¬H) = γn . In inferential statistics it is often assumed that the
Ei ’s are independent and identically distributed. We do neither require conditional independence of the Ei ’s given H nor do we require that P (Ei |H) =
P (Ej |H) for i = j. The conclusion of the generalized Bayes’ Theorem is
P (H|E1 ∧ · · · ∧ En ).
Probability Propagation in Generalized Inference Forms
Theorem 11. (Wallmann & Kleiter [14], lower bound) Let P (H) = δ and
let for all i = 1, . . . , n, P (Ei |H) = βi and P (Ei |¬H) = γi . Then:
n
• If δ
βi − (n − 1) > 0, then
i=1
n
βi − (n − 1)
.
P (H|E1 ∧ · · · ∧ En ) ≥ n
βi − (n − 1) + (1 − δ) min{γi }
δ
δ
i=1
i=1
• If δ(
n
i=1
βi − (n − 1)) ≤ 0, then P (H|E1 ∧ · · · ∧ En ) ≥ 0.
Theorem 12. (Wallmann & Kleiter [14], upper bound) Let P (H) = δ and
let for all i = 1, . . . , n, P (Ei |H) = βi and P (Ei |¬H) = γi . Then:
• If (1 − δ)(
n
i=1
γi − (n − 1)) > 0, then
P (H|E1 ∧ · · · ∧ En ) ≤
δ min{βi }
.
n
γi − (n − 1)
δ min{βi } + (1 − δ)
i=1
• If (1 − δ)(
n
i=1
γi − (n − 1)) ≤ 0, then P (H|E1 ∧ · · · ∧ En ) ≤ 1.
Bayes’ Theorem does not degrade (for a counter-example see [14]). Hown
n
n
ever, if n ≥ max
βi − (n − 1) ≤ 0 and
γi + 1 , then
βi + 1,
n
i=1
i=1
i=1
i=1
γi − (n − 1) ≤ 0, so that the interval [0, 1] is obtained. There are two spe-
cial cases in which Bayes’ Theorem degrades. First, if identical likelihoods
βi = β and γi = γ are assumed. Second, if the values γi ∈ [0, 1] are not
specified.
3.5.
Modus Tollens
Modus Tollens is the inference from {¬B, A ⇒ B} to the conclusion ¬A.
The result for probabilistic Modus Tollens with two premises within the
Kolmogorov approach has been derived in [13]. Generalized Modus Tollens
is given by
C. Wallmann, G. D. Kleiter
Theorem 13. (Wallmann & Kleiter [14])
and if P (E1 ∧ E2 ∧ · · · ∧ En |H) = β, then
⎧
∗
if α∗ + β > 1
1 − 1−α
⎪
β
⎪
⎪
n
⎪
⎪
αi
⎨
i=1
′
1
−
if α∗ + β ≤ 1
1−β
δ =
⎪
⎪
⎪
⎪
⎪
⎩0
if α∗ + β ≤ 1
where α∗ = max{αi }.
If P (¬Ei ) = αi , for i = 1, . . . , n,
P (¬H) ∈ [δ ′ , 1], with
and
and
n
i=1
n
i=1
αi + β < 1
αi + β ≥ 1 ,
Remark 14. If P is a Kolmogorov probability, then the upper bound 1
is never correct. Since if P (¬H) = 1, then P (H) = 0 and consequently
P (E1 ∧ E2 ∧ · · · ∧ En |H) is undefined. Within the coherence approach an
assessment of probability 1 to both premises of Modus Tollens is perfectly
alright and leads to probability 1 of the conclusion. A Kolmogorov probability such that P (¬B) = 1 and P (B|A) = 1, however, does not exist. If
P (¬B) = 1, then P (A) = 0, and hence P (B|A) is undefined—a contradiction.
Modus Tollens is special because if α∗ + β > 1, then the interval of
its conclusion does not depend on the number of premises n. However, if
α∗ +β ≤ 1, then it does depend on n. Modus Tollens does not degrade. Moreover, contrary to the other inferences considered so far, the unit interval is
not necessarily obtained if the number of premises is large.
3.5.1. Exclusive-Or. System O is weaker than System P [8,9]. It contains weaker forms of the rules And and Or, Weak-And (Wand) and Weak-Or
(Wor). Wor is System O equivalent with the rule Exclusive-Or (Xor).
Exclusive-Or is the following rule
If A ⇒ C, B ⇒ C, and |= ¬(A ∧ B), then A ∨ B ⇒ C.
The generalized probabilistic version is given by
Theorem 15. If P (H|Ei ) = αi , for i = 1, . . . , n and |= ¬(Ei ∧ Ej ), for
1 ≤ i < j ≤ n, then
P (H|E1 ∨ · · · ∨ En ) ∈ [min{αi }, max{αi }].
Proof. If |= ¬(Ei ∧ Ej ), then P (H|E1 ∨ · · · ∨ En )=
= P (H ∧ E1 |E1 ∨ · · · ∨ En ) + · · · + P (H ∧ En |E1 ∨ · · · ∨ En )
= P (H|E1 )P (E1 |E1 ∨ · · · ∨ En ) + · · · + P (H|En )P (En |E1 ∨ · · · ∨ En ).
Probability Propagation in Generalized Inference Forms
Setting if αi = min{αi } (resp. αi = max{αi }), P (Ei |E1 ∨ · · · ∨ En ) = 1
yields the lower (resp. upper) probability for the conclusion.
Remark 16. In the Kolmogorov approach, if for some i, j αi = αj , we
obtain the open interval (min{αi }, max{αi }). In this case we cannot set for
some i P (Ei |E1 ∨ · · · ∨ En ) = 1, because then P (Ej ) = 0 for all j = i and
consequently P (H|Ej ) is undefined.
Xor does degrade. However, the interval [0, 1] is not necessarily obtained
after addition of a certain number of premises.
4.
Probabilistic Validity of Generalized Inference Forms
The key question of this section is whether a certain generalized inference form satisfies one of the probability preservation properties below.
The question can be answered by considering the lower bound of the intervals obtained in Section 3. The Kolmogorov approach and the coherence
approach often yield different lower bounds. As a consequence, an inference
form may satisfy a preservation property relative to one of the approaches
while it does not satisfy it with respect to the other approach.
4.1.
Preservation Properties
Adams considered four preservation properties [1, p. 1].
1. Certainty-preservation If the premises of an inference form have probability 1, then its conclusion has probability 1.
2. High probability-preservation If the premises are highly probable, then
the conclusion is highly probable, i.e., for all δ > 0 there exists ǫ > 0 such
that: If for every premise A it is P (A) ≥ 1 − ǫ, then for the conclusion
C it holds that P (C) ≥ δ.
3. Positive probability-preservation If the premises have positive probability, then the conclusion has positive probability.
4. Minimum probability-preservation The probability of the conclusion is
at least as high as the minimum of the probabilities of the premises. Or
equivalently: For all thresholds r: If the probability of each premise is
at least r, then the probability of the conclusion is at least r.
The preservation properties above are ordered by strictness. The chain
of implications 4 ⇒ 3 ⇒ 2 ⇒ 1 holds. However, neither of the converse
implications is true.
C. Wallmann, G. D. Kleiter
Consider for example Modus Ponens. If P (B|A) = β and P (A) = α, then
the interval for P (B) is [αβ, αβ + 1 − α]. Modus Ponens {A ⇒ B, A} ∴ B
is consequently
1. Certainty preserving: If α = 1 and β = 1, then αβ = 1.
2. High probability preserving: For P (H) ≥ δ, choose 1 − ǫ =
√
δ.
3. Positive probability preserving: If α > 0 and β > 0, then αβ > 0.
4. Not minimum preserving: In general, it is not the case that αβ ≥
min{α, β}.
4.2.
Certainty-Preservation and High Probability-Preservation
It is important that, while in the Kolmogorov approach high probability
and certainty-preservation differ, they are, given the assumption of p-consistent premises, equivalent in the coherence approach [5,7]. We call a set
of premises {A1 , . . . , An } p-consistent iff the assessment P (A1 ) = · · · =
P (An ) = 1 is coherent.
Theorem 17. Suppose that {A1 , . . . , An } is p-consistent. Then the inference
from {A1 , . . . , An } to C is in the coherence approach certainty preserving iff
it is high probability preserving.
Remark 18. In the coherence approach an inference form has been called
System P valid iff its premises are p-consistent and it is high probability
preserving [5]. Contrary to other approaches, System P validity therefore
requires p-consistent premises. We mention three such approaches. Adams
[1] works with the default assumption: If P (A) = 0, then P (B|A) = 1 for all
B. Hawthorne uses Popper functions. With respect to Popper functions certainty and high probability-preservation are, even without the assumption
of p-consistent premises, equivalent [8]. Hawthorne and Makinson [9] employ
Kolmogorov probability functions. In Section 4.4, we discuss the inference
form Weak-And. It is System P valid with respect to these approaches, but
not with respect to the coherence approach.
The inference from B to A ⇒ B is, for example, relative to the Kolmogorov approach certainty preserving but not high probability preserving. In
the coherence approach this inference form is not high probability preserving and therefore, because {B} is p-consistent, not certainty preserving.
The inference forms of Section 3 are certainty preserving relative to the
coherence approach. This is immediately obtained by considering the lower
bound of their conclusion. Consequently, if their premises are p-consistent,
Probability Propagation in Generalized Inference Forms
these inference forms are already known to be high probability preserving
in the coherence approach.
To show that an inference form is high probability preserving with respect
to the coherence approach, we can alternatively determine for every probability of the conclusion δ a “high” probability 1−ǫ for the premises, such that
this probability assessment guarantees that the probability of the conclusion
is at least δ. A suitable ǫ can be determined by considering the intervals given
in Section 3. Consider, for example, Modus Tollens. Let δ > 0. In order that
∗
P (C) ≥ δ, the lower bound of P (C), i.e., 1 − 1−α
β , may not be less than δ.
Therefore, we solve
1−
1 − (1 − ǫ)
≥δ
1−ǫ
1
. A suitable ǫ for the other inference forms can
for ǫ and obtain 1 − ǫ ≥ 2−δ
be determined by the same method. We have
Theorem 19. Let P be a coherent conditional probability. All inference
forms of Section 3 with p-consistent premises are certainty preserving and
(consequently) high probability preserving.
Remark 20. Although generalized inference forms remain high probability preserving, degradation has a striking consequence. In order to obtain
that the probability of the conclusion is at least δ, ǫ is clearly decreasing
with increasing n. Since the lower probability of the conclusion decreases as
the number of premises increases, for a high probability of the conclusion a
higher probability of the premises is necessary with increasing n.
4.3.
Positive Probability-Preservation
For n ≥ 2 premises high probability preservation differs significantly from
positive probability preservation. While all of the generalized inference forms
considered in Section 3 are high probability preserving, none of them—with
the exception of Xor—is positive probability preserving. As already pointed
out, in the case of And, Cautious Monotonicity, Cut, and Bayes’ Theorem
the lower bound of the conclusion is zero if the number of premises n is
sufficiently high.
Moreover, in contrast to their generalizations, some of the inference forms
are positive probability preserving. Cut and Bayes’ Theorem are positive
probability preserving for n = 1. If the sum of the probabilities of the two
premises is different from one, then Modus Tollens is also positive probability preserving.
C. Wallmann, G. D. Kleiter
Theorem 21. If P is a coherent conditional probability, then the generalizations of And, Cautious Monotonicity, Cut, Modus Tollens and Bayes’
Theorem are not positive probability preserving. Generalized Xor is positive
probability preserving.
The Kolmogorov and the coherence approach validate different inference
forms. In the Kolmogorov approach generalized Cut, for instance, is positive
probability preserving, while this is not the case in the coherence approach.
4.4.
Minimum Probability-Preservation
Positive probability preservation and minimum probability preservation differ. Contrary to positive probability preservation, Cut with two premises is,
for example, not minimum probability preserving. System O is closely connected with minimum probability preservation (for a description of System
O see [8,9]). All its inference forms are minimum probability preserving.
The converse, however, is not true.
The generalization of Xor is minimum probability preserving.
Theorem 22. Let P be a coherent or a Kolmogorov conditional probability.
Then generalized Xor is minimum probability preserving.
The System O rule Weak-And (Wand) is given by
If A ∧ ¬B ⇒ B and A ⇒ C,
then A ⇒ B ∧ C.
Wand is central to System O and minimum preserving in the Kolmogorov approach. However, a positive probability assessment to A ∧ ¬B ⇒ B
is incoherent. Hence, from the point of view of coherence, System O is not
satisfactory.
Theorem 23. For every probability assessment P P (B|A ∧ ¬B) > 0 is incoherent. In particular, {A ∧ ¬B ⇒ B, A ⇒ C} is not p-consistent.
Proof. If P is a coherent probability assessment, then P (B|A ∧ ¬B) +
P (¬B|A ∧ ¬B) = 1. Since P (¬B|A ∧ ¬B) = 1, P (B|A ∧ ¬B) = 0.
Remark 24. The premises of Wand are not p-consistent. As a consequence
Wand is not System P valid in the coherence approach (compare Remark
18). In other approaches p-consistency of the premises is not required for
System P validity (compare Remark 18). Consequently, since And is System P valid, and Wand is a special case of And, Wand is System P valid
in these approaches.
Probability Propagation in Generalized Inference Forms
If P (B|A∧¬B) > 0, we can conclude in the Kolmogorov framework, that
P (A ∧ ¬B) = 0 and hence that P (B|A ∧ ¬B) is undefined. Hence, there is
no Kolmogorov probability such that P (B|A ∧ ¬B) > 0.
5.
Conclusions
We have seen that Cautious Monotonicity, Cut, and Exclusive-Or clearly
degrade, and that Bayes’ Theorem (with some exceptions) and Modus
Tollens do not degrade. Moreover, in all the inference forms considered—
with the exception of Modus Tollens and Exclusive-Or—the unit interval is
obtained even with a “small” number of premises. Narrow intervals may be
considered to be better than wide intervals; a more complete knowledge base
may be considered to be better than a truncated one [12]. While in general
the number of premises and the precision of the conclusion may conflict, in
generalized inference forms they often must conflict.
Degradation does not conflict with the property of monotonicity, but its
consequences for information seeking cannot be ignored. On the one hand,
the principle of total evidence leads to the selection of the most “recent”
interval based on the most specific information. This yields wide intervals,
and in many cases even the non-informative [0, 1] interval. On the other
hand, a take-the-best strategy leads to the selection of the tightest interval.
The according interval is based on the seemingly most “relevant” information with n = 1. Since all additional premises are discarded, it would be
counterproductive to seek further information, because it would simply be
useless.
Degradation is neither “good” nor “bad”. To solve the conflict between
precision and specificity requires to counterbalance (i) the width of an interval, (ii) the amount of information it is based upon, and (iii) the position
of the interval. The choice depends on pragmatic conditions. An answer to
the question which interval should rationally be selected seems to lie outside
the domain of probability theory.
It might be supposed that degradation disappears if further constraints
are added to the premises. In many cases stochastic independence, for
example, leads to point probabilities of the conclusions. Though often presupposed, independence may be a constraint that is too strong. Exchangeability is a related but much weaker assumption. We have shown that in
many generalized inference forms exchangeability does not prevent degradation [15].
C. Wallmann, G. D. Kleiter
In general, degradation does not make generalized inference forms probabilistically invalid. Each of the inference forms considered in this contribution is high probability preserving. As already pointed out, the lower
probability of the conclusion is often zero if the number of premises is large.
Therefore, all of the inference forms—with exception of Exclusive-Or—are
not positive probability preserving.
Acknowledgements. Supported by the Austrian Science Foundation (I 141G15) and the LogICCC Programme of the European Science Foundation.
We thank two anonymous referees for their valuable comments.
Open Access. This article is distributed under the terms of the Creative
Commons Attribution License which permits any use, distribution, and
reproduction in any medium, provided the original author(s) and the source
are credited.
References
[1] Adams, E. W., Four probability-preserving properties of inferences, Journal of Philosophical Logic 25:1–24, 1996.
[2] Biazzo, V., Gilio, A., and G. Sanfilippo, Coherent conditional previsions and
proper scoring rules, in S. Greco, B. Bouchon-Meunier, G. Coletti, M. Fedrizzi, B.
Matarazzo, and R. R. Yager (eds.), Advances in Computational Intelligence, IPMU
(4), vol. 300 of Communications in Computer and Information Science, Springer,
2012, pp. 146–156.
[3] Coletti, G., and R. Scozzafava, Probabilistic Logic in a Coherent Setting, Kluwer,
Dordrecht, 2002.
[4] De Finetti, B., Theory of Probability, Wiley, London, 1974.
[5] Gilio, A., Probabilistic reasoning under coherence in System P, Annals of Mathematics and Artificial Intelligence 34(1–3):5–34, 2002.
[6] Gilio, A., Generalization of inference rules in coherence-based probabilistic default
reasoning, International Journal of Approximate Reasoning 53:413–434, 2012.
[7] Gilio, A., and G. Sanfilippo, Probabilistic entailment in the setting of coherence:
The role of quasi conjunction and inclusion relation. International Journal of Approximate Reasoning 54(4):513–525, 2013.
[8] Hawthorne, J., On the logic of nonmonotonic conditionals and conditional probabilities, Journal of Philosophical Logic 25:185–218, 1996.
[9] Hawthorne, J., and D. Makinson, The quantitative/qualitative watershed for rules
of uncertain inference, Studia Logica 86:247–297, 2007.
[10] Kleiter, G. D., Ockham’s razor in probability logic, in R. Kruse, M. R. Berthold,
C. Moewes, M. A. Gil, P. Grzegorzewski, and O. Hryniewicz (eds.), Synergies of Soft
Computing and Statistics for Intelligent Data Analysis, Advances in Intelligent Systems and Computation, 190, Springer, 2012, pp. 409–417.
Probability Propagation in Generalized Inference Forms
[11] Kraus, S., D. Lehmann., and M. Magidor, Nonmonotonic reasoning, preferential
models and cumulative logics, Artificial Intelligence 44:167–207, 1990.
[12] Kyburg, H., and C. M. Teng, Uncertain Inference, Cambridge University Press,
Cambridge, 2001.
[13] Wagner, C. G., Modus tollens probabilized, British Journal for the Philosophy of
Science 55:747–753, 2004.
[14] Wallmann, C., and G. D. Kleiter, Beware of too much information, in T. Kroupa,
and J. Vejnarova (eds.), Proceedings of the 9th Workshop on Uncertainty Processing,
WUPES, Fakulty of Managment, University of Economics, Prague, 2012, pp. 214–225.
[15] Wallmann, C., and G. D. Kleiter, Exchangeability in probability logic, in S.
Greco, B. Bouchon-Meunier, G. Coletti, M. Fedrizzi, B. Matarazzo, and R. R. Yager
(eds.), IPMU (4), vol. 300 of Communications in Computer and Information Science,
Springer, 2012, pp. 157–167.
C. Wallmann, G. D. KLEITER
Department of Psychology
University of Salzburg
Hellbrunnerstr. 34
Salzburg, Austria
[email protected]
G. D. Kleiter
[email protected]