12th International Conference on Information Fusion
Seattle, WA, USA, July 6-9, 2009
Non-Classical Markov Logic and Network Analysis
Ralph L. Wojtowicz
Metron, Inc.
1818 Library Street, Suite 600
Reston, VA, U.S.A.
[email protected]
Abstract – First-order languages express properties
of entities and their relationships in rich models of
heterogeneous network phenomena. Markov logic is
a set of techniques for estimating the probabilities of
truth values of such properties. This article generalizes Markov logic in order to allow non-classical sets of
truth values. The new methods directly support uncertainties in both data sources and values. The concepts
and methods of categorical logic give precise guidelines
for selecting sets of truth values based on the form of
a network model. Applications to alias detection, cargo
shipping, insurgency analysis, and other problems are
given. Open problems include complexity analysis and
parallelization of algorithms.
Keywords: network analysis, entity resolution, alias
detection, categorical logic, Markov network.
1
Interactions between logic and probability have a rich
history dating at least to Boole’s 1847 treatise [6, 7, 14].
Markov logic, however, does not assign probabilities to
logical formulae. It assigns truth values to formulae
and builds a probability space from the set of all such
assignments. The joint density satisfies conditional independence conditions expressed by a Markov network
(Markov random field) [1].
Bayesian approaches to analysis of uncertain networks
are under active research. Some have been successfully
used in real systems such as Metron’s TerrAlert [5].
Markov logic is appealing, however, because it supports
a concise formulation of network models and has performed well on challenge problems [4].
Our objectives in this article are to
Introduction
Markov logic is a set of techniques for estimating the
probabilities of truth values of formulae written in firstorder languages [4, 9]. In network analysis applications,
the formulae describe properties of and relationships or
links among entities. The truth values tell if an entity
has a property or whether or not a link exists. The
networks may involve many different sorts of entities
and types of links. Estimates are based on the values
specified in training and test data. We refer to the special case involving two truth values as classical Markov
logic. Data in this case must assign either ‘false’ or
‘true’ to all (closed) formulae. In practical applications,
however, we may have limited confidence in some intelligence sources or data values.
In this paper we generalize concepts, constructions, and
algorithms of [4] to the case of multiple truth values.
The resulting methods directly support uncertainties in
both data sources and values. Our approach employs
the concepts and techniques of categorical logic [8, 15].
This framework gives precise guidelines for selecting
sets of truth values based on the form of a network
978-0-9824438-0-4 ©2009 ISIF
model and it supplies equations for computing with
these values.
1. Generalize the concepts and constructions of
Markov logic to the case of multiple truth values
2. Adapt Markov logic algorithms to this case
3. Compute illustrative examples
4. Describe challenges and future work.
To meet Objective 1, we must explain how to populate
the Truth Values box of Figure 1 with non-classical values. This entails reformulating the Network Model box
from its setup in [4] and impacts probability calculations in the Markov Network boxes. Since references on
categorical logic are generally “not addressed to those
who are trying to learn [the subject] for the first time,”
[8] we give a sketch in Section 2. [4] and [9], however,
are fine introductions to Markov logic. In Section 3.2 we
discuss our progress on Objective 2 which involves the
Weight Learning and Inference boxes of Figure 1. To
address Objective 3, we apply Markov logic to a simple
social network (Section 4), geopolitics (Section 5), insurgency (Section 6), detecting aliases (Section 7), and
analysis of cargo shipping (Section 8). Readers eager
for applications should proceed to Section 4.
938
2
Domain Understanding
Using domain understanding, one builds a network
model and identifies a partially ordered set (poset) of
truth values. To describe heterogeneous networks and
their properties, we use languages containing expressions in finitary, first-order, many-sorted predicate logic
with equality [8].
Network Model
function
symbols
sorts
relation
symbols
2.1
sequents
maps
links
operators
properties
propositions
Truth
Values
partial
order
Each function and relation symbol has a type which
is a finite list [A1 , . . . , An ] of sorts. The empty type is
denoted []. If f is a function symbol, then f : A1 × · · ·×
An → B indicates that [A1 , · · · , An , B ] is its type. It
specifies the sorts of entities that f takes as input and
produces as output. f is a constant if n = 0 in which
case we write f : [] → B. If R is a relation symbol,
then R A1 × · · · × An asserts that [A1 , · · · , An ] is
its type. R is a property if n = 1 and a link if n > 1.
In the former case, the type specifies the sort of entity
that may have the property. In the latter case, it lists
the sorts of entities that may be involved in an R link.
For n = 0, R is a proposition and we write R [].
operators
Parameterized
Markov Network
Template
nodes
cliques
density
weights
constants
Training Data
truth
assignments
Weight
Learning
2.1.1
nodes
cliques
density
weights
We assume a (countably infinite) supply of variables
x : B for each sort B. FV (x) = {x} for any variable x.
If f : A1 × · · · × An → B is a function symbol and
t1 : A1 , . . . , tn S
: An are terms, then f (t1 , . . . , tn ) : B is
n
a term having i=1 FV (ti ) as its set of free variables.
If f : [] → B is a constant, then f : B, is a term with
FV (f ) = φ in which case we write f rather than f ().
constants
truth
assignments
Inference
Markov Network
nodes
cliques
marginals and
conditionals
density
weights
Terms
The terms over a signature Σ are derived names for
entities. Terms, along with their type and free variables,
are defined recursively. For a term t, t : B indicates that
B is its type. FV (t) is its set of free variables.
Markov Network
Template
Test Data
Syntax
A signature Σ consists of sets of sorts, function symbols, and relation symbols. See Figure 1. In network
models, the sorts are the kinds of entities. Function
symbols may be constants, that name specific entities,
or maps, such as unit conversions or other deterministic assignments. Relation symbols include the kinds of
links, kinds of entity properties, and propositions about
the network.
formulae
constants
First-order languages
A term is closed if it has no free variables. The closed
terms include constants c and terms f (c1 , . . . , cn ) with
each ci a constant of the appropriate type for the function symbol f .
2.1.2
Figure 1: The structure of Markov logic. Arrows indicate that contents of a source box are needed to define
or compute contents of the destination. Shaded boxes
are placeholders for quantities supplied in later stages.
Formulae
The formulae over a signature are formal statements
about network entities, links, and relationships. Formulae, along with their sets of free and bound variables
are defined recursively using rules (i)–(x) below. If ϕ is
a formula, then FV (ϕ) and BV (ϕ) are respectively its
sets of free and bound variables.
939
(i) Relations: If R A1 × · · · × An is a relation
symbol and t1 : A1 , . . . , tn : An are terms, then
R(t
S1n, . . . , tn ) is a formula. Its set of free variables
is i=1 FV (ti ). It has no bound variables.
(ii) Equality: If t1 and t2 are terms of the same type,
then t1 = t2 is a formula. Its set of free variables
is the union of those of t1 and t2 . It has no bound
variables.
(iii) Truth: ⊤ is a formula. It has neither free nor
bound variables.
(iv) Conjunction: If ϕ and ψ are formulae, then ϕ ∧ ψ
is. Its sets of free and bound variables are formed
as unions of those of ϕ and ψ.
(v) Falsity: ⊥ is a formula. It has neither free nor
bound variables.
(vi) Disjunction: If ϕ and ψ are formulae, then ϕ ∨ ψ
is. Its sets of free and bound variables are formed
as unions of those of ϕ and ψ.
(vii) Implication: If ϕ and ψ are formulae, then ϕ ⇒ ψ
is. Its sets of free and bound variables are formed
as unions of those of ϕ and ψ.
(viii) Negation: If ϕ is a formula, then ¬ϕ is. Its sets of
free and bound variables coincide with those of ϕ.
(ix) Existential quantification: If ϕ is a formula, then
(∃x)ϕ is a formula. FV (ϕ) \{x} and BV (ϕ) ∪ {x}
are its sets of free and bound variables respectively.
(x) Universal quantification: If ϕ is a formula, then
(∀x)ϕ is a formula. FV (ϕ) \{x} and BV (ϕ) ∪ {x}
are its sets of free and bound variables respectively.
The atomic formulae are those constructed using (i)
and (ii). Formulae constructed from (i)–(iv) are Horn.
Those formed using (i)–(iv) and (ix) are regular. Those
built from (i)–(vi) and (ix) are coherent. First-order
formulae are those constructed using any of the rules.
A formula is closed if it has no free variables. The closed
formulae include atomic formulae such as R(c1 , . . . , cn )
with each ci a constant of the appropriate type for the
relation symbol R, c = c′ with c and c′ constants of the
same type, and compound expressions built from such
atomic formulae using rules (iv) and (vi)–(x).
A context is a finite list ~x = [x1 : A1 , . . . , xn : An ]
of distinct variables. Its length is n and its type is
[A1 , . . . , An ]. The types of the variables need not be
distinct. [] is the empty context. A context ~x is suitable
for a term t if each free variable of t occurs in ~x. Suitable contexts for formulae are similarly defined. The
canonical context for a term or formula consists of the
distinct free variables that occur, listed in the order of
their appearance. A term-in-context is an expression of
the form ~x.t where t is a term and ~x is a context suitable for t. Formulae-in-context are similarly defined.
The nodes of a network model’s Markov network are
the closed atomic formulae. To compute them we use
the operation of substitution of terms for variables.
In particular, we substitute constant terms into relation and equality formulae. In Figure 1, the arrows
from the function and relation symbol boxes to the
nodes box indicate this fact. If ~x.ϕ is a formula-incontext with variables ~x = [x1 : A1 , . . . , xn : An ] and
~t = [t1 : A1 , . . . , tn : An ] is a list of terms of the same
type as ~x, then
ϕ[~t/~x ]
(1)
denotes the formula obtained by simultaneously substituting each ti for each free occurrence of xi in ϕ after
first changing the names of any bound variables in ϕ,
if necessary, so that they are distinct from all the free
variables that occur in ~t. If each ti is closed, then ϕ[~t/~x ]
is closed.
2.1.3
Network models
A sequent σ over a signature Σ is an expression
ϕ ⊢~x ψ
(2)
where ϕ and ψ are formulae over Σ and ~x is a
context suitable for both formulae. A sequent is
Horn/regular/coherent/first-order if its formulae are of
the corresponding class. ~x is the canonical context for
σ if it consists of the distinct free variables that occur,
listed in the order of their appearance.
A network model K consists of a signature Σ and a set
of sequents over Σ. Alternative names for this structure
are knowledge base [4] and theory [8]. A network model
is Horn/regular/coherent/first-order if all its sequents
are of the corresponding class. The class of the network
model imposes requirements on the sets of truth values
that are suitable for it. The arrow from the sequents
box to Truth Values in Figure 1 indicates this fact. The
example in Section 5 illustrates why we must define
network models using sequents rather than, the perhaps
more familiar, if-then rules ϕ ⇒ ψ.
2.2
Semantics
A semantics of a language is a mapping M that assigns
mathematical structures to its sorts, function symbols,
and relation symbols and which is extended, by recursive definitions, to terms-in-context and formulae-incontext. In the traditional approach, due to Tarski, the
assigned structures are sets1 . A fundamental insight,
first illuminated by Lawvere’s 1963 thesis, is the fact
that, although a network model imposes requirements
on semantics, the class of suitable semantic structures is
much richer than mere sets. One may, for example, interpret any network model using directed graphs, fuzzy
sets, or finite-state discrete-time dynamical systems.
1 Functions and relations between sets are themselves sets in
traditional axiomatizations of set theory as a singly-typed theory.
940
2.2.1
Categorical semantics
•
Although logic plays an essential role in artificial intelligence and machine learning [12, 13], the fundamental
insight of the late 20th century “has hitherto not been
applied in a meaningful way to [these fields]” [2]. This
paper is part of a program to research such applications.
We sketch the general framework of categorical semantics but focus on the parts needed for the construction
in Section 3.1 and for the algorithms in Section 3.2.
See D1 and D4 of [8] for details.
A category consists of objects (e.g., sets, graphs,
fuzzy sets, or belief spaces) and structure-preserving
morphisms between objects (e.g., functions or graph
maps) [10, 16]. To interpret a network model K over
a signature Σ in a category C, we first assign an object M (A) of C to each sort A of Σ. We extend M
to types [A1 , . . . , An ] using the product M (A1 ) × · · · ×
M (An ) object (e.g., Cartesian product of sets or product graph). We assign a terminal object (see Section 2.2.2) to the empty type. We assign a morphism M (f ) : M (A1 ) × · · · × M (An ) → M (B) to
each function symbol f : A1 × · · · × An → B. To
each relation symbol R A1 × · · · × An we assign
a subobject M (R) 7−→ M (A1 ) × · · · M (An ) (e.g., subset or subgraph). We extend M to terms-in-context
and formulae-in-context by recursive definitions. The
end result is that each term-in-context ~x.t with ~x =
[x1 : A1 , . . . , xn : An ] and t : B is assigned a morphism M (~x.t) : M (A1 ) × · · · × M (An ) → M (B) and
each formula-in-context ~x.ϕ is assigned a subobject
M (~x.ϕ) 7−→ M (A1 ) × · · · × M (An ).
2.2.2
Truth values
A partially ordered set (poset) is a pair (Ω, ≤) with Ω
a set and ≤ a binary relation on Ω which is reflexive,
transitive, and anti-symmetric [8, 11]. To construct a
network model’s Markov network in Section 3.1 and
describe inference algorithms in Sections 3.2, we need
only give details about semantics of closed formulae-incontext (see Section 2.1.2). A semantics M maps these
to subobjects of M ([]) where [] is the empty type (see
Section 2.1). The only details we need about this are
that
• Subobjects of M ([]) form a poset (Ω, ≤)
• Elements of Ω are called truth values
• The existence of various upper and lower bounds
in (Ω, ≤) determines the network models for which
it is a suitable poset of truth values.
In the classical case of set-valued semantics, the set {0}
is a standard choice for M ([]). It has two subsets, the
empty set and itself, to which we assign the descriptive
names ‘false’ and ‘true.’ For directed graphs semantics,
the single-arrow graph is a suitable choice for M ([]) in
which case Ω has three truth values. See Figure 2.
zz
Figure 2: In the case of directed graphs, truth values
are the three subgraphs of the single-arrow graph [10].
They form a linearly ordered poset, false < ω < true,
which is a Heyting lattice but not a Boolean algebra.
As shown in Figure 1, domain understanding and the
network model impact the choice of truth values. In
particular, the class of the network model determines
the class of suitable posets (Ω, ≤). The reason for this
is that network model classes are distinguished by the
logical operations (⊤, ∧, ⊥, ∨, =⇒ , ¬, etc.) that
occur in their sequents and (Ω, ≤) must support limit
operations that correspond to these logical operations.
Table 1 gives guidelines. A poset is a meet semi-lattice,
for example, if it has a top and each pair x, y of elements has a greatest lower bound. Such posets are
suitable for any network model K which contains only
Horn sequents. This is because the top element gives
semantics of ⊤ while lower bounds compute the logical
operator ∧. Due to the close correspondence, we will
use the same symbol for both the logical and poset operations (e.g., ∧ denotes both ‘and’ and greatest lower
bound). See Section 5 for an application.
network model class
Horn
regular
coherent
first-order
classical first-order
truth values class
meet semi-lattice
meet semi-lattice
distributive lattice
Heyting lattice
Boolean algebra
Table 1: Classes of posets of truth values corresponding to classes of network models. For computational
purposes, we consider only finite sets of truth values
in which case Heyting and distributive lattices coincide. Any Boolean algebra [11] (e.g., the classical twoelement one) is a suitable choice for any network model.
Table 2 lists the poset operations that correspond to
various logical operations. The Heyting implication,
for example, is
(x ⇒ y) = sup {z | (z ∧ x) ≤ y}
(3)
where existence of the least upper bound is part of
the definition of Heyting lattice [8, 11]. The Heyting
pseudo-complement is ¬x = (x ⇒ ⊥). The linearly
ordered set described in Figure 2 (and, in fact, any linearly ordered set) is a Heyting lattice with
(
true if x ≤ y
(x ⇒ y) =
(4)
y
otherwise.
In this truth values poset, ¬ω = false and ¬¬ω = true,
so, it is not a Boolean algebra. Since ω ∨ (¬ω) = ω,
941
the Law of Excluded Middle does not hold either. In
terms of the interpretation of Section 4, ¬ω is a report that an event is not possible (hence, is false). If
a data source can only report whether or not an event
is possible (i.e., ω) or impossible (i.e., ¬ω), then one
can not make a definitive, positive conclusion based on
data from this source.
logical
operation
operation on
truth values in (Ω, ≤)
⊤
∧
⊥
∨
⇒
¬
top element of Ω
meet (greatest lower bound)
bottom element of Ω
join (least upper bound)
Heyting implication
Heyting pseudo-complement
Table 2: Categorical semantics of logical operations on
truth values.
Let M be an assignment of truth values to all closed
atomic formulae-in-context. As discussed above, the
operations in Table 2 extend M to all closed formulaein-context. Using this observation we define a closed
sequent ϕ ⊢[] ψ to be satisfied if M ([].ϕ) ≤ M ([].ψ).
That is, the truth value resulting from evaluating the
left side of the sequent is no larger than that obtained
from evaluating the right side.
3
Markov networks
A Markov network (or Markov random field) consists
of a finite list X1 , . . . , Xn of random variables and an
undirected graph that specifies both a factorization of
the joint distribution of the variables and conditional
independence relations among the marginals [1]. Each
graph node corresponds to a unique Xi . Graph cliques
(completely connected subgraphs) give the conditional
independence relations.
3.1
conjunctions. The Axiom of Extensionality of set theory does not hold in all semantic categories, however.
We consequently assume that no sequents of K have
formulae built using rules (ix) or (x) of Section 2.1.2.
We further assume that Ω is finite. This ensures that
the space of truth values assignments can be equipped
with the powerset σ-algebra.
The assumptions above simplify the description of M.
It has a node N for each atomic formula R(t1 , . . . , tn )
built from a relation symbol R A1 × · · · × An and a
list [t1 : A1 , . . . , tn : An ] of closed terms of the appropriate type. Each node is an index for a random variable taking values in Ω. M has a clique Cσ, ~t for each
sequent σ = (ϕ ⊢~x ψ) of K (with the canonical context)
and each list ~t = [t1 : A1 , . . . , tn : An ] of closed terms
having the same type as ~x. By the conventions of the
previous paragraph, such lists of closed terms are lists
of constants. Cσ, ~t contains the nodes Ni formed by
substituting these constants into the relation symbols
that occur in ϕ and ψ. Each choice of truth values
ωi for these nodes induces a truth value ωϕ for ϕ[~t/~x]
(see Equation 1) and ωψ for ψ[~t/~x] by recursive application of the Table 2 operations. We define a potential
function fσ, ~t on Cσ, ~t by
(
1 if ωϕ ≤ ωψ
fσ, ~t (~ω ) =
(5)
0 otherwise.
This generalizes the potential function of [4] but yields
the same form of probability density. For each sequent
σ = (ϕ ⊢~x ψ) of K, we introduce a weight parameter rσ
and define T (σ) to be the set of all lists ~t of constants
having the same type as the context ~x. The equations
X
X
1
exp
rσ
fσ, ~t (~ω )
(6)
P ~r (~ω ) =
Z
σ
~
t ∈ T (σ)
!
X
1
=
exp
rσ nσ (~ω )
(7)
Z
σ
where the weights rσ are model parameters, ~r is the
Markov network of a network model vector of these weights, nσ (~ω ) is the count of non-zero
A network model K together with a suitable poset Ω of
truth values induce a Markov network M(K, Ω). The
nodes of M are the closed atomic formulae of K. To
ensure that the network is finite, we assume that K has
only finitely many sorts and relation symbols and that
its only function symbols are finitely many constants.
The latter assumption avoids the potential for infinite
lists of distinct, closed terms such as f (c), f (f (c)), etc.
By adopting the unique names convention of [4], we
fix the truth values of all closed atomic formulae of
form (ii) (see Section 2.1.2): for constants ci and cj ,
semantics of ci = cj (in the empty context) is ‘true’ iff
i = j. In classical Markov logic, the domain closure
convention of [4] converts existentially quantified formulae to disjunctions and universally quantified ones to
terms from the inner sum of (6), and Z is a normalization factor, define a probability density on the set
of all truth values assignments. The notation in (6)
and (7) hides the fact that ω
~ is projected onto the subproduct space of nodes relevant to a given Cσ, ~t . This
allows us to estimate the rσ from a training data set
then use the same equation to make inferences on different data since the network models, K and K′ , differ
only by constants. In Figure 1 we use the term Parameterized Markov Network Template to refer to (7)
when the weights are unknown and the dimension of ω
~
is unspecified. We use the term Markov Network Template to refer to (7) with known weights but unspecified
dimension. When the weights are known and the set of
constants is fixed, then (7) is a Markov Network.
942
3.2
Weight-learning and inference
Algorithm 1 MaxWalkSAT(K, Ω, ~r, n, m, v, p, ~ω)
In the previous sections of this article, we generalized
the concepts and constructions of Markov logic to the
case of multiple truth values. In this section we describe
progress on adapting Markov logic algorithms.
Estimate the most probable state ~ω of the network
M(K, Ω) given weights ~r. Return 0 if cost(~ω ) ≤ v is
achieved within n iterations; 1 otherwise. Upon return,
populate ω
~ with the state estimate.
1: for i ← 1 to n do
2:
ω
~ ← a random network state
3:
v ′ ← cost(~ω )
4:
for j ← 1 to m do
5:
if v ′ ≤ v then
6:
return 0
7:
end if
8:
σ, ~t ← a random unsatisfied closed sequent
9:
if Uniform(0, 1) < p then
10:
N ← a random node from Cσ,~t
11:
ω ′ ← a random truth value
12:
else
13:
N , ω ′ ← arg min
∆(~ω ; N, ω ′ )
′
3.2.1
Weight-learning
As shown in Figure 1, training data provides a set of
constants (entity names) for the network model and
an assignment ~
ω of truth values (to closed formulae).
By assuming a prior distribution on weights, interpreting (7) as the probability of ω
~ given ~r, and applying
Bayes’ Rule, we may calculate a posterior distribution by maximizing (7) with respect to ~r. Solutions
of 0 = ∇~r ln (P ~r (~
ω )) give such maxima. As in [4],
∂
ln (P~r (~
ω )) = nσ (~
ω ) − nσ
∂ rσ
(8)
where the mean nσ is computed with respect to P ~r .
This calculation scales as |Ω|n with n determined by
the arities of the relation symbols and by the sizes of
the sets of constants of the various sorts. Solution of
the non-linear system 0 = ∇~r ln(P ~r (~
ω )) is expensive
as well. Efficiently estimating the posterior on ~r using
the pseudo-likelihood of [4] also generalizes to the nonclassical case, however.
The applications in Sections 6–8 below illustrate the
discriminative learning method of [4]. We assume that
the formulae that will serve as evidence in test data
are known a priori. This partitions the vector ~ω of
truth values into disjoint sets, ~u and ~v , with ~v known
and ~u to be estimated. As in [4], we may approximate
the mean in (8) in this case by nσ (~u∗ , ~v ) where ~u∗ is
the MAP state given ~v . Both of the MAP algorithms,
MaxWalkSAT and LazySAT, of [4] generalize to the
multiple truth values case. We sketch our adaptation of
MaxWalkSAT in Algorithm 1. To simplify the sketch
we define
X
X
1 − fσ, ~t (~
ω)
(9)
cost(~ω ) =
rσ
σ
16:
17:
18:
19:
end if
v ′ ← v ′ + ∆(~ω ; N, ω ′ )
~ω ← ~ω[ω ′ , N ]
end for
end for
return 1
network models. As it is implemented in [4], MC-SAT
relies on the Boolean nature of classical logic. It may
be possible to modify this algorithm for applications to
general posets of truth values, however.
4
A simple, non-classical social
network calculation
To illustrate non-classical Markov logic, we modify
the simple example on page 97 of [4]. Define K to
be the network model with a single sort person, two
unary relation symbols Smokes and Cancer, one sequent
Smokes(x) ⊢x Cancer(x), and one constant a. Figure 3
shows the induced Markov network.
~
t ∈ T (σ)
which is the sum of the weights of the unsatisfied closed
sequents in a given Markov network state and we define
ω
~ [ω ′ , N ] the be the state that results from setting the
truth value to ω ′ at position N of ω
~ . We define
∆(~ω ; N, ω ′ ) = cost(~
ω [ω ′ , N ]) − cost(~
ω ).
(10)
to be the cost change that results.
3.2.2
N, ω
14:
15:
Marginal and conditional probabilities
The marginal and conditional probabilities of M(K, Ω)
can be found by direct calculation, Markov chain Monte
Carlo estimation, and other methods such as MCSAT [4]. The former two approaches readily adapt to
the non-classical case but are not practical for large
Smokes(a)
Cancer(a)
Figure 3: Markov network constructed from a simple
social network network model.
The poset (Ω, ≤) described in Figure 2 and having three
truth values, false < ω < true, is suitable for K. The
intermediate truth value ω is interpreted as ‘possibly.’
A network state is a pair x = (xSmokes(a) , xCancer(a) ) of
Ω values. There are nine such states. (7) gives the
probability density on this space and has
(
1 if xSmokes(a) ≤ xCancer(a)
n(x) =
(11)
0 otherwise.
943
Z = 3 (1 + 2 er ) since n(true, false) = n(true, ω) =
n(ω, false) = 0 and n(x) = 1 for the other six states.
For example, Pr (true, ω) = 1/Z. We may compute the
marginal Pr (xSmokes(a) = ω) = 1/3 then calculate the
conditional probabilities
Pr (xCancer(a) = τ | xSmokes(a) = ω)
(
1/(1 + 2 er ) if τ = false
=
er /(1 + 2 er ) otherwise.
(12)
As r → ∞, the τ = false case converges to 0 while the
other two conditionals converge to 1/2. To interpret
these calculations, assume the sequent has high weight
and that data indicates a possibility that a smokes. In
this case, a conclusive observation that a is cancer-free
has low probability. Positive cancer reports that are
either conclusive or that indicate the possibility of the
disease have probability close to 1/2.
5
Geopolitics
Given a sequent ϕ ⊢~x ψ, we may use classical logic to
infer ⊤ ⊢~x (ϕ ⇒ ψ), then simply assert that the formula
ϕ ⇒ ψ is true. These steps dispatch sequents in favor
of implication formulae. In general, however, the logic
of posets of truth values is not classical. Figure 4 gives
an example and interpretation. This poset is suitable
for Horn or regular network models (see Section 2.1.3)
but not for those in which either of the operators ∨
or ⇒ occurs in a sequent. The reason for this is that
the inference rules for ⇒ force (Ω, ≤) to be a Heyting
lattice (see [8, 15] or Table 2) but such posets satisfy
the distributive law:
x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z).
(13)
Substituting x = γ∗ , y = ω, and z = γ ∗ into (13),
however, yields γ∗ on the left side but γ ∗ on the right.
⊤
γ∗
ω
γ∗
Resources, Hostile, and Weapon. The sequent
Resources(x) ∧ Hostile(x) ⊢x Weapon(x)
(14)
models the assertion that a hostile nation with sufficient
financial resources will develop a particular weapon system. We assume the poset of truth values shown in
Figure 4. A network state is a list of three Ω values
x = (xR(a) , xH(a) , xW(a) ) where we have abbreviated the
names of the relation symbols. The density (7) has
Z = 32 + 93 er and we may calculate the conditionals
Pr (xW(a) = τ | xH(a) = ⊤ and xR(a) = γ∗ )
(15)
(
r
r
∗
e /(2 + 3 e ) if τ ∈ {⊤, γ , γ∗ }
=
1/(2 + 3 er ) otherwise.
As r → ∞, the conditional probability converges to 1/3
if γ∗ ≤ τ and to 0 otherwise. In particular, the τ = γ
case has limit 0 since the data xR(a) = γ∗ indicates that
the report is from intelligence source B.
6
Insurgency network analysis
Understanding a social network may involve analysis
of different links between and properties of individuals.
We define a model to infer the probability of entities
having an anti-U.S. sentiment based on familial and
tribal bonds. The signature has a single sort, person.
It has two unary relation symbols, TribeMember and
AntiUS, and five binary ones: Mother, Father, Spouse,
Son, and Daughter. With the exception of Spouse,
the binary relations are intended to be directional with
R(x, y) interpreted as the assertion that x has relationship R to y. Father(x, y), for example, is read ‘x is
the father of y’. We use discriminative learning (see
Section 3.2.1) to make inferences about AntiUS.
The network model has the following sequents which
may express statistical tendencies in real data. (16), for
example, asserts that tribal members tend to have antiU.S. sentiment. (17) says that tribal members tend to
intermarry. (22)–(27) express the tendency of anti-U.S.
views to be transferred across familial links. (24) and
(25), for example, respectively model the transfers from
son to parent and from parent to son.
TribeMember(x) ⊢x AntiUS(x)
⊥
Figure 4: Poset (Ω, ≤) of truth values that represents
intelligence reports from two sources, A and B. ω is a
report of ‘possibly’ from A. γ∗ and γ ∗ are reports of
‘plausibly’ and ‘likely’ from B. ⊤ and ⊥ are conclusive
positive and negative data from either source. The conditions ω ∧ γ∗ = ⊥ = ω ∧ γ ∗ model the fact that action
is to be taken on the first report received.
Spouse(x, y) ⊢x,y
(16)
(17)
(TribeMember(x) ⇔ TribeMember(y))
Son(x, y) ∧ Spouse(y, z) ⊢x,y,z Son(x, z)
(18)
Daughter(x, y) ∧ Spouse(y, z) ⊢x,y,z
Daughter(x, z)
(19)
TribeMember(x) ∧ Father(x, y) ⊢x,y
(20)
TribeMember(y)
Consider a network model with a single sort, Country,
a single constant a, and three unary relation symbols,
944
TribeMember(x) ∧ Mother(x, y) ⊢x,y
TribeMember(y)
(21)
AntiUS(x) ∧ Father(x, y) ⊢x,y AntiUS(y)
(22)
AntiUS(x) ∧ Mother(x, y) ⊢x,y AntiUS(y)
(23)
AntiUS(x) ∧ Son(x, y) ⊢x,y AntiUS(y)
(24)
AntiUS(y) ∧ Son(x, y) ⊢x,y AntiUS(x)
(25)
AntiUS(x) ∧ Daughter(x, y) ⊢x,y AntiUS(y)
(26)
part of the human network. Familial links and tribal
associations were assumed known in this example. The
test results reflect the fact that 1/3 of the entities in
the training data were anti-U.S. Note that the chain of
three male descendents show a strengthening of antiU.S. views down generations of tribal members.
AntiUS(y) ∧ Daughter(x, y) ⊢x,y AntiUS(x)
(27)
7
By including a large number of sequents, we ensure that
weight learning will be responsive to data subtleties.
The apparent redundancies in (22)–(27), for example,
allow the model to distinguish between male and female
entities who are descendents but not parents.
Figure 5 shows the network that was used to train
a Markov network model having probability density
given by (7). This training data represents observations about part of a network of interest or part of a
different human network that is assumed to have characteristics similar to the one of interest. We chose
Ω = {false, true} and used the Alchemy system [9].
M
M
F
M
F
F
F
M
M
In some network analysis applications it is useful to distinguish between entities and nodes. The latter are to
be construed as ports through which the former communicate. Nodes, for example, may be telephone numbers
or IP addresses while entities are humans. Alternatively, nodes may be IP addresses in an ad hoc network
context in which entities are MAC addresses [3]. The
goal of alias detection is to determine when a single
entity is employing more than one node.
7.1
M
0.36
F
0.35
F
0.31
M
0.60
M
0.25
F
0.12
M
0.74
F
0.39
Direct link model
To formulate a primitive alias model, we observe that
if nodes m and n are directly linked to the same other
nodes, then m and n are alias candidates. Let the
network model K have a single sort, Node, two binary relation symbols LinkedTo Node × Node and
Alias Node × Node, and the following sequents.
LinkedTo(a, x) ∧ LinkedTo(b, y) ∧ Alias(a, b)
(28)
⊢a,b,x,y Alias(x, y)
Figure 5: Training data for an insurgency network
model. Entities are people. Nodes labeled M and F are
male and female, respectively. Circled nodes have antiU.S. sentiment. Boxed nodes are members of a tribe of
interest. Undirected arrows are spousal relationships.
Directed arrows show son or daughter relationships.
M
0.42
Alias detection
Alias(x, y) ∧ Alias(y, x) ⊢x,y,z Alias(x, z)
(29)
Alias(x, y) ⊢x,y Alias(y, x)
(30)
Alias(x, y) ∧ LinkedTo(x, a)
(31)
⊢a,x,y LinkedTo(y, a)
Figure 7 shows training data used to learn the Markov
network weights. The left diagram is the LinkedTo data
while the right diagram is the Alias data. We use discriminative learning (Section 3.2.1) to make inferences
about Alias. The training data includes two nodes, 0
and 5, that are known to be aliases for the same entity.
Both are directly linked to the same other nodes. We
assumed Ω = {false, true} and used Alchemy [9].
F
0.26
F
0.71
M
0.81
0
Figure 6: Test and inference results for an insurgency
network. As in Figure 5, M and F indicate male and
female nodes. Directed and undirected links and boxed
nodes are also interpreted as in the previous figure. The
number associated with each node is the inferred probability that the given entity has an anti-U.S. sentiment.
5
1
5
1
4
2
4
2
3
The resulting Markov network template (see Figure 1)
was used to infer probabilities of anti-U.S. sentiment for
entities in the test data set shown in Figure 6. The test
data represents the unobserved or partially observed
0
3
Figure 7: Alias detection training data. The diagram
on the left shows the links between nodes. The diagram
on the right indicates that nodes 0 and 5 are aliases.
945
The test data appears in Figure 8. An appealing feature
of Markov logic is that it performs simultaneous alias
detection on all pairs of nodes in this dataset. The conclusion that two particular nodes are aliases for a single
entity changes the relationships among nodes linked to
either of the aliases. In Figure 8, nodes 1 and 4 are
potential aliases because they are linked to the same
nodes. This identification makes 2 and 3 look similar,
although 3 is connected to 0 while 2 is not. 7 and 5
look alike because there is only one neighbor that they
do not share. This identification, however, gives weak
evidence that 7 and 3 are aliases since both link to 0
while 3 links to one other entity (if we identify 1 and 4).
0
7
0
7
1
6
2
5
1
6
3
2
5
4
3
4
Figure 8: Test data and alias detection results. The
left diagram shows link data in which we seek to detect
aliases. In the right diagram, an edge is drawn between
two nodes if there is evidence that they correspond to
the same entity. The width of the edge indicates the
strength of the evidence.
7.2
Local metric models
A class of more expressive alias detection models results
from accounting for one or more local metrics such as
degree centrality or proximity prestige. (32) is a template for adding such features to the network model.
Metric(x, v) ∧ Metric(y, v) ⊢x,y,v Alias(x, y)
(32)
to all values v that may occur in the test data. If the set
of possible values is known and finite (such as the set
of shipping ports in Section 8), this is straightforward.
8
We analyze container shipping transaction data to find
entities that have similar shipping patterns and ones
with patterns resembling those of known threats. We
define a network model K that has five sorts Company,
CargoCode, Port, City, and Country. The model
has a unary relation symbol Threat Company and
five binary relation symbols Similar Company ×
Company, Ships Company×CargoCode, UsesPort
Company × Port, City Company × City, and
Country Company × Country. Ships(x, a) and
UsesPort(x, p) respectively indicate that company x
ships commodity a and uses port p. The City and
Country relations are company address fields. We use
discriminative learning (Section 3.2.1) to make inferences about property Threat and on the link Similar.
The sequents of the network model are listed in (34)–
(41). These capture statistical tendencies in training
and test data but are unlikely to be satisfied by all data
elements. (34)–(37) and (40)–(41) characterize what it
means for shipping patterns to be similar. Companies
are similar if they ship the same commodities, use the
same ports, and have the same country and city of origin. Similarity is symmetric and transitive. (38) and
(39) characterize the notion of a threat. A company
is a threat if its shipping pattern is similar to that of
a likely threat. We chose Ω = {false, true} and used
the Alchemy system [9]. The modifier + in (35)–(37)
is an Alchemy feature that forces a different weight to
be learned for each port, city, and country that occurs
in the data. The fact that two companies both use a
major port, such as Rotterdam, may reflect a different
signature than use of a small, remote port.
We may simultaneously account for different metrics
by including a list of sequents of this form. Each would
involve a relation symbol Metrick for one of the metrics
of interest.
Ships(x, a) ∧ Ships(y, a) ⊢x,y,a Similar(x, y)
UsesPort(x, +p) ∧ UsesPort(y, +p)
⊢x,y,+p Similar(x, y)
To account for distributions of metric values, a distinct
weight may be learned for each value that occurs in
the data. (33) expresses this enhancement using the +
notation of [9].
Metric(x, +v) ∧ Metric(y, +v)
(34)
(35)
City(x, +c) ∧ City(y, +c) ⊢x,y,c Similar(x, y) (36)
Country(x, +c) ∧ Country(y, +c)
(37)
⊢x,y,c Similar(x, y)
(33)
⊢x,y,+v Alias(x, y)
We anticipate that, in applications of such models,
keeping the computations of reasonable complexity and
maintaining a low false-alarm rate will involve a balance
among model expressiveness, training data quality, test
data size, and choice of metrics. To use the enhancement (33), one must, moreover, learn or assign weights
Transaction pattern analysis
Similar(x, y) ∧ Threat(x) ⊢x,y Threat(y)
(38)
Similar(x, y) ∧ Threat(y) ⊢x,y Threat(x)
(39)
Similar(x, y) ⊢x,y Similar(y, x)
Similar(x, y) ∧ Similar(y, z)
(40)
(41)
⊢x,y,z Similar(x, z)
Figure 9 shows test results on a small data set.
Nodes 0–8 are known threats. Note that the model
946
References
[1] C. M. Bishop, Pattern Recognition and Machine
Learning, Springer-Verlag, 2006.
[2] S. Bringsjord, personal communication.
[3] S. Cheshire, B. Aboba, and E. Guttman, Dynamic
Configuration of IPv4 Link-Local Addresses, IETF,
RFC 3927, 2005.
13
12
11
10
9
[4] P. Domingos, S. Kok, D. Lowd, H. Poon,
M. Richardson, and P. Singla, “Markov Logic,” in
Probabilistic Inductive Logic Programming. L. D.
Raedt, P. Frasconi, K. Kersting, and S. Muggleton
Eds., Springer-Verlag. 2008
8
7
18
17
16
15
14
identifies nodes 9 and 29 as probable threats having
transaction patterns similar to those of known threats.
Analysis of the results reveals that the algorithm finds
instances in which a single company has more than one
database key, associates a manufacturer and a favored
shipper, and detects cases in which two (or more) distinct companies have similar shipping patterns.
19
6
5
4
3
2
1
0
49
48
47
22
21
20
24
23
28
27
26
25
44
[5] G. Godfrey, J. Cunningham, and T. Tran, “A
Bayesian, Nonlinear Particle Filtering Approach
for Tracking the State of Terrorist Operations”,
Intelligence and Security Informatics, IEEE, 23–
24 May 2007, pp. 350–355.
46
45
[6] J. Y. Halpern, Reasoning about uncertainty, MIT
Press, 2003.
42
41
40
39
38
37
31
30
29
36
35
34
33
43
32
[7] M. Jackson, A sheaf theoretic approach to measure theory, Dissertation, University of Pittsburgh,
2006.
Figure 9: Entities with similar shipping patterns and
with patterns like those of known threats. Each node
corresponds to a company. Node gray scales indicate
threat probabilities based on the model and training
data. Darker nodes have higher probability. 0–8 were
training data threats. Links indicate similar shipping
patterns. Thicker links indicate greater similarity.
9
Conclusions
We have generalized the concepts and constructions of
Markov logic to allow (closed) formulae to take values in
any partially ordered set (poset) of truth values that is
suitable for the network model of interest. The Boolean
poset, false < true, of classical logic is suitable for any
network model but, by utilizing more general posets,
the new methods directly support uncertainties in both
data sources and values. We have taken steps toward
adapting Markov logic algorithms to the non-classical
case. Adapting the MAP state estimation algorithm,
MC-SAT, is an open problem, however. Analysis of the
computational complexity of Markov logic techniques,
and the degrees to which they may be parallelized, are
needed. We applied Markov logic to a variety of scenarios including some arising from real world problems.
Acknowledgment
This work was partially supported by Air Force Office
of Scientific Research grant FA9550-08-1-0411.
[8] P. Johnstone, Sketches of an elephant: a topos theory compendium, Oxford University Press, 2002.
[9] S. Kok, P. Singla, M. Richardson, and P. Domingos, The Alchemy system for statistical relational
AI. Technical report, Department of Computer
Science and Engineering, University of Washington, Seattle, www.cs.washington.edu/ai/alchemy.
[10] F. W. Lawvere and S. Schanuel, Conceptual Mathematics, Cambridge University Press. 2nd Ed.,
2009.
[11] S. Mac Lane and G. Birkhoff, Algebra, 3rd Ed.,
Chelsea Publishing Company, 1993.
[12] T. M. Mitchell, Machine Learning, McGraw-Hill,
1997.
[13] S. Russell and P. Norvig, Artificial Intelligence: A
Modern Approach, 2nd Ed., Prentice-Hall, 2002.
[14] D. Scott and P. Krauss, “Assigning probabilities to
logical formulas,” in Aspects of Inductive Logic, J.
Hintikka and P. Suppes Eds., North-Holland, 1966.
[15] R. L. Wojtowicz, Categorical logic as a foundation
for reasoning under uncertainty, SBIR Final Report, Metron, Inc. 2005.
[16] R. L. Wojtowicz, “On transformations between belief spaces,” in Soft Methods for Handling Variability and Imprecision, D. Dubois, H. Prade, et al.
Eds., Springer-Verlag, pp. 313–320, 2008.
947