Academia.eduAcademia.edu

Non-classical Markov logic and network analysis

2009

First order languages express properties of entities and their relationships in rich models of heterogeneous network phenomena. Markov logic is a set of techniques for estimating the probabilities of truth values of such properties. This article generalizes Markov logic in order to allow nonclassical sets of truth values. The new methods directly support uncertainties in both data sources and values. The concepts and methods of categorical logic give precise guidelines for selecting sets of truth values based on the form of a network model. Applications to alias detection, cargo shipping, insurgency analysis, and other problems are given. Open problems include complexity analysis and parallelization of algorithms.

12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 2009 Non-Classical Markov Logic and Network Analysis Ralph L. Wojtowicz Metron, Inc. 1818 Library Street, Suite 600 Reston, VA, U.S.A. [email protected] Abstract – First-order languages express properties of entities and their relationships in rich models of heterogeneous network phenomena. Markov logic is a set of techniques for estimating the probabilities of truth values of such properties. This article generalizes Markov logic in order to allow non-classical sets of truth values. The new methods directly support uncertainties in both data sources and values. The concepts and methods of categorical logic give precise guidelines for selecting sets of truth values based on the form of a network model. Applications to alias detection, cargo shipping, insurgency analysis, and other problems are given. Open problems include complexity analysis and parallelization of algorithms. Keywords: network analysis, entity resolution, alias detection, categorical logic, Markov network. 1 Interactions between logic and probability have a rich history dating at least to Boole’s 1847 treatise [6, 7, 14]. Markov logic, however, does not assign probabilities to logical formulae. It assigns truth values to formulae and builds a probability space from the set of all such assignments. The joint density satisfies conditional independence conditions expressed by a Markov network (Markov random field) [1]. Bayesian approaches to analysis of uncertain networks are under active research. Some have been successfully used in real systems such as Metron’s TerrAlert [5]. Markov logic is appealing, however, because it supports a concise formulation of network models and has performed well on challenge problems [4]. Our objectives in this article are to Introduction Markov logic is a set of techniques for estimating the probabilities of truth values of formulae written in firstorder languages [4, 9]. In network analysis applications, the formulae describe properties of and relationships or links among entities. The truth values tell if an entity has a property or whether or not a link exists. The networks may involve many different sorts of entities and types of links. Estimates are based on the values specified in training and test data. We refer to the special case involving two truth values as classical Markov logic. Data in this case must assign either ‘false’ or ‘true’ to all (closed) formulae. In practical applications, however, we may have limited confidence in some intelligence sources or data values. In this paper we generalize concepts, constructions, and algorithms of [4] to the case of multiple truth values. The resulting methods directly support uncertainties in both data sources and values. Our approach employs the concepts and techniques of categorical logic [8, 15]. This framework gives precise guidelines for selecting sets of truth values based on the form of a network 978-0-9824438-0-4 ©2009 ISIF model and it supplies equations for computing with these values. 1. Generalize the concepts and constructions of Markov logic to the case of multiple truth values 2. Adapt Markov logic algorithms to this case 3. Compute illustrative examples 4. Describe challenges and future work. To meet Objective 1, we must explain how to populate the Truth Values box of Figure 1 with non-classical values. This entails reformulating the Network Model box from its setup in [4] and impacts probability calculations in the Markov Network boxes. Since references on categorical logic are generally “not addressed to those who are trying to learn [the subject] for the first time,” [8] we give a sketch in Section 2. [4] and [9], however, are fine introductions to Markov logic. In Section 3.2 we discuss our progress on Objective 2 which involves the Weight Learning and Inference boxes of Figure 1. To address Objective 3, we apply Markov logic to a simple social network (Section 4), geopolitics (Section 5), insurgency (Section 6), detecting aliases (Section 7), and analysis of cargo shipping (Section 8). Readers eager for applications should proceed to Section 4. 938 2 Domain Understanding Using domain understanding, one builds a network model and identifies a partially ordered set (poset) of truth values. To describe heterogeneous networks and their properties, we use languages containing expressions in finitary, first-order, many-sorted predicate logic with equality [8]. Network Model function symbols sorts relation symbols 2.1 sequents maps links operators properties propositions Truth Values partial order Each function and relation symbol has a type which is a finite list [A1 , . . . , An ] of sorts. The empty type is denoted []. If f is a function symbol, then f : A1 × · · ·× An → B indicates that [A1 , · · · , An , B ] is its type. It specifies the sorts of entities that f takes as input and produces as output. f is a constant if n = 0 in which case we write f : [] → B. If R is a relation symbol, then R ֌ A1 × · · · × An asserts that [A1 , · · · , An ] is its type. R is a property if n = 1 and a link if n > 1. In the former case, the type specifies the sort of entity that may have the property. In the latter case, it lists the sorts of entities that may be involved in an R link. For n = 0, R is a proposition and we write R ֌ []. operators Parameterized Markov Network Template nodes cliques density weights constants Training Data truth assignments Weight Learning 2.1.1 nodes cliques density weights We assume a (countably infinite) supply of variables x : B for each sort B. FV (x) = {x} for any variable x. If f : A1 × · · · × An → B is a function symbol and t1 : A1 , . . . , tn S : An are terms, then f (t1 , . . . , tn ) : B is n a term having i=1 FV (ti ) as its set of free variables. If f : [] → B is a constant, then f : B, is a term with FV (f ) = φ in which case we write f rather than f (). constants truth assignments Inference Markov Network nodes cliques marginals and conditionals density weights Terms The terms over a signature Σ are derived names for entities. Terms, along with their type and free variables, are defined recursively. For a term t, t : B indicates that B is its type. FV (t) is its set of free variables. Markov Network Template Test Data Syntax A signature Σ consists of sets of sorts, function symbols, and relation symbols. See Figure 1. In network models, the sorts are the kinds of entities. Function symbols may be constants, that name specific entities, or maps, such as unit conversions or other deterministic assignments. Relation symbols include the kinds of links, kinds of entity properties, and propositions about the network. formulae constants First-order languages A term is closed if it has no free variables. The closed terms include constants c and terms f (c1 , . . . , cn ) with each ci a constant of the appropriate type for the function symbol f . 2.1.2 Figure 1: The structure of Markov logic. Arrows indicate that contents of a source box are needed to define or compute contents of the destination. Shaded boxes are placeholders for quantities supplied in later stages. Formulae The formulae over a signature are formal statements about network entities, links, and relationships. Formulae, along with their sets of free and bound variables are defined recursively using rules (i)–(x) below. If ϕ is a formula, then FV (ϕ) and BV (ϕ) are respectively its sets of free and bound variables. 939 (i) Relations: If R ֌ A1 × · · · × An is a relation symbol and t1 : A1 , . . . , tn : An are terms, then R(t S1n, . . . , tn ) is a formula. Its set of free variables is i=1 FV (ti ). It has no bound variables. (ii) Equality: If t1 and t2 are terms of the same type, then t1 = t2 is a formula. Its set of free variables is the union of those of t1 and t2 . It has no bound variables. (iii) Truth: ⊤ is a formula. It has neither free nor bound variables. (iv) Conjunction: If ϕ and ψ are formulae, then ϕ ∧ ψ is. Its sets of free and bound variables are formed as unions of those of ϕ and ψ. (v) Falsity: ⊥ is a formula. It has neither free nor bound variables. (vi) Disjunction: If ϕ and ψ are formulae, then ϕ ∨ ψ is. Its sets of free and bound variables are formed as unions of those of ϕ and ψ. (vii) Implication: If ϕ and ψ are formulae, then ϕ ⇒ ψ is. Its sets of free and bound variables are formed as unions of those of ϕ and ψ. (viii) Negation: If ϕ is a formula, then ¬ϕ is. Its sets of free and bound variables coincide with those of ϕ. (ix) Existential quantification: If ϕ is a formula, then (∃x)ϕ is a formula. FV (ϕ) \{x} and BV (ϕ) ∪ {x} are its sets of free and bound variables respectively. (x) Universal quantification: If ϕ is a formula, then (∀x)ϕ is a formula. FV (ϕ) \{x} and BV (ϕ) ∪ {x} are its sets of free and bound variables respectively. The atomic formulae are those constructed using (i) and (ii). Formulae constructed from (i)–(iv) are Horn. Those formed using (i)–(iv) and (ix) are regular. Those built from (i)–(vi) and (ix) are coherent. First-order formulae are those constructed using any of the rules. A formula is closed if it has no free variables. The closed formulae include atomic formulae such as R(c1 , . . . , cn ) with each ci a constant of the appropriate type for the relation symbol R, c = c′ with c and c′ constants of the same type, and compound expressions built from such atomic formulae using rules (iv) and (vi)–(x). A context is a finite list ~x = [x1 : A1 , . . . , xn : An ] of distinct variables. Its length is n and its type is [A1 , . . . , An ]. The types of the variables need not be distinct. [] is the empty context. A context ~x is suitable for a term t if each free variable of t occurs in ~x. Suitable contexts for formulae are similarly defined. The canonical context for a term or formula consists of the distinct free variables that occur, listed in the order of their appearance. A term-in-context is an expression of the form ~x.t where t is a term and ~x is a context suitable for t. Formulae-in-context are similarly defined. The nodes of a network model’s Markov network are the closed atomic formulae. To compute them we use the operation of substitution of terms for variables. In particular, we substitute constant terms into relation and equality formulae. In Figure 1, the arrows from the function and relation symbol boxes to the nodes box indicate this fact. If ~x.ϕ is a formula-incontext with variables ~x = [x1 : A1 , . . . , xn : An ] and ~t = [t1 : A1 , . . . , tn : An ] is a list of terms of the same type as ~x, then ϕ[~t/~x ] (1) denotes the formula obtained by simultaneously substituting each ti for each free occurrence of xi in ϕ after first changing the names of any bound variables in ϕ, if necessary, so that they are distinct from all the free variables that occur in ~t. If each ti is closed, then ϕ[~t/~x ] is closed. 2.1.3 Network models A sequent σ over a signature Σ is an expression ϕ ⊢~x ψ (2) where ϕ and ψ are formulae over Σ and ~x is a context suitable for both formulae. A sequent is Horn/regular/coherent/first-order if its formulae are of the corresponding class. ~x is the canonical context for σ if it consists of the distinct free variables that occur, listed in the order of their appearance. A network model K consists of a signature Σ and a set of sequents over Σ. Alternative names for this structure are knowledge base [4] and theory [8]. A network model is Horn/regular/coherent/first-order if all its sequents are of the corresponding class. The class of the network model imposes requirements on the sets of truth values that are suitable for it. The arrow from the sequents box to Truth Values in Figure 1 indicates this fact. The example in Section 5 illustrates why we must define network models using sequents rather than, the perhaps more familiar, if-then rules ϕ ⇒ ψ. 2.2 Semantics A semantics of a language is a mapping M that assigns mathematical structures to its sorts, function symbols, and relation symbols and which is extended, by recursive definitions, to terms-in-context and formulae-incontext. In the traditional approach, due to Tarski, the assigned structures are sets1 . A fundamental insight, first illuminated by Lawvere’s 1963 thesis, is the fact that, although a network model imposes requirements on semantics, the class of suitable semantic structures is much richer than mere sets. One may, for example, interpret any network model using directed graphs, fuzzy sets, or finite-state discrete-time dynamical systems. 1 Functions and relations between sets are themselves sets in traditional axiomatizations of set theory as a singly-typed theory. 940 2.2.1 Categorical semantics • Although logic plays an essential role in artificial intelligence and machine learning [12, 13], the fundamental insight of the late 20th century “has hitherto not been applied in a meaningful way to [these fields]” [2]. This paper is part of a program to research such applications. We sketch the general framework of categorical semantics but focus on the parts needed for the construction in Section 3.1 and for the algorithms in Section 3.2. See D1 and D4 of [8] for details. A category consists of objects (e.g., sets, graphs, fuzzy sets, or belief spaces) and structure-preserving morphisms between objects (e.g., functions or graph maps) [10, 16]. To interpret a network model K over a signature Σ in a category C, we first assign an object M (A) of C to each sort A of Σ. We extend M to types [A1 , . . . , An ] using the product M (A1 ) × · · · × M (An ) object (e.g., Cartesian product of sets or product graph). We assign a terminal object (see Section 2.2.2) to the empty type. We assign a morphism M (f ) : M (A1 ) × · · · × M (An ) → M (B) to each function symbol f : A1 × · · · × An → B. To each relation symbol R ֌ A1 × · · · × An we assign a subobject M (R) 7−→ M (A1 ) × · · · M (An ) (e.g., subset or subgraph). We extend M to terms-in-context and formulae-in-context by recursive definitions. The end result is that each term-in-context ~x.t with ~x = [x1 : A1 , . . . , xn : An ] and t : B is assigned a morphism M (~x.t) : M (A1 ) × · · · × M (An ) → M (B) and each formula-in-context ~x.ϕ is assigned a subobject M (~x.ϕ) 7−→ M (A1 ) × · · · × M (An ). 2.2.2 Truth values A partially ordered set (poset) is a pair (Ω, ≤) with Ω a set and ≤ a binary relation on Ω which is reflexive, transitive, and anti-symmetric [8, 11]. To construct a network model’s Markov network in Section 3.1 and describe inference algorithms in Sections 3.2, we need only give details about semantics of closed formulae-incontext (see Section 2.1.2). A semantics M maps these to subobjects of M ([]) where [] is the empty type (see Section 2.1). The only details we need about this are that • Subobjects of M ([]) form a poset (Ω, ≤) • Elements of Ω are called truth values • The existence of various upper and lower bounds in (Ω, ≤) determines the network models for which it is a suitable poset of truth values. In the classical case of set-valued semantics, the set {0} is a standard choice for M ([]). It has two subsets, the empty set and itself, to which we assign the descriptive names ‘false’ and ‘true.’ For directed graphs semantics, the single-arrow graph is a suitable choice for M ([]) in which case Ω has three truth values. See Figure 2. zz Figure 2: In the case of directed graphs, truth values are the three subgraphs of the single-arrow graph [10]. They form a linearly ordered poset, false < ω < true, which is a Heyting lattice but not a Boolean algebra. As shown in Figure 1, domain understanding and the network model impact the choice of truth values. In particular, the class of the network model determines the class of suitable posets (Ω, ≤). The reason for this is that network model classes are distinguished by the logical operations (⊤, ∧, ⊥, ∨, =⇒ , ¬, etc.) that occur in their sequents and (Ω, ≤) must support limit operations that correspond to these logical operations. Table 1 gives guidelines. A poset is a meet semi-lattice, for example, if it has a top and each pair x, y of elements has a greatest lower bound. Such posets are suitable for any network model K which contains only Horn sequents. This is because the top element gives semantics of ⊤ while lower bounds compute the logical operator ∧. Due to the close correspondence, we will use the same symbol for both the logical and poset operations (e.g., ∧ denotes both ‘and’ and greatest lower bound). See Section 5 for an application. network model class Horn regular coherent first-order classical first-order truth values class meet semi-lattice meet semi-lattice distributive lattice Heyting lattice Boolean algebra Table 1: Classes of posets of truth values corresponding to classes of network models. For computational purposes, we consider only finite sets of truth values in which case Heyting and distributive lattices coincide. Any Boolean algebra [11] (e.g., the classical twoelement one) is a suitable choice for any network model. Table 2 lists the poset operations that correspond to various logical operations. The Heyting implication, for example, is (x ⇒ y) = sup {z | (z ∧ x) ≤ y} (3) where existence of the least upper bound is part of the definition of Heyting lattice [8, 11]. The Heyting pseudo-complement is ¬x = (x ⇒ ⊥). The linearly ordered set described in Figure 2 (and, in fact, any linearly ordered set) is a Heyting lattice with ( true if x ≤ y (x ⇒ y) = (4) y otherwise. In this truth values poset, ¬ω = false and ¬¬ω = true, so, it is not a Boolean algebra. Since ω ∨ (¬ω) = ω, 941 the Law of Excluded Middle does not hold either. In terms of the interpretation of Section 4, ¬ω is a report that an event is not possible (hence, is false). If a data source can only report whether or not an event is possible (i.e., ω) or impossible (i.e., ¬ω), then one can not make a definitive, positive conclusion based on data from this source. logical operation operation on truth values in (Ω, ≤) ⊤ ∧ ⊥ ∨ ⇒ ¬ top element of Ω meet (greatest lower bound) bottom element of Ω join (least upper bound) Heyting implication Heyting pseudo-complement Table 2: Categorical semantics of logical operations on truth values. Let M be an assignment of truth values to all closed atomic formulae-in-context. As discussed above, the operations in Table 2 extend M to all closed formulaein-context. Using this observation we define a closed sequent ϕ ⊢[] ψ to be satisfied if M ([].ϕ) ≤ M ([].ψ). That is, the truth value resulting from evaluating the left side of the sequent is no larger than that obtained from evaluating the right side. 3 Markov networks A Markov network (or Markov random field) consists of a finite list X1 , . . . , Xn of random variables and an undirected graph that specifies both a factorization of the joint distribution of the variables and conditional independence relations among the marginals [1]. Each graph node corresponds to a unique Xi . Graph cliques (completely connected subgraphs) give the conditional independence relations. 3.1 conjunctions. The Axiom of Extensionality of set theory does not hold in all semantic categories, however. We consequently assume that no sequents of K have formulae built using rules (ix) or (x) of Section 2.1.2. We further assume that Ω is finite. This ensures that the space of truth values assignments can be equipped with the powerset σ-algebra. The assumptions above simplify the description of M. It has a node N for each atomic formula R(t1 , . . . , tn ) built from a relation symbol R ֌ A1 × · · · × An and a list [t1 : A1 , . . . , tn : An ] of closed terms of the appropriate type. Each node is an index for a random variable taking values in Ω. M has a clique Cσ, ~t for each sequent σ = (ϕ ⊢~x ψ) of K (with the canonical context) and each list ~t = [t1 : A1 , . . . , tn : An ] of closed terms having the same type as ~x. By the conventions of the previous paragraph, such lists of closed terms are lists of constants. Cσ, ~t contains the nodes Ni formed by substituting these constants into the relation symbols that occur in ϕ and ψ. Each choice of truth values ωi for these nodes induces a truth value ωϕ for ϕ[~t/~x] (see Equation 1) and ωψ for ψ[~t/~x] by recursive application of the Table 2 operations. We define a potential function fσ, ~t on Cσ, ~t by ( 1 if ωϕ ≤ ωψ fσ, ~t (~ω ) = (5) 0 otherwise. This generalizes the potential function of [4] but yields the same form of probability density. For each sequent σ = (ϕ ⊢~x ψ) of K, we introduce a weight parameter rσ and define T (σ) to be the set of all lists ~t of constants having the same type as the context ~x. The equations   X X 1 exp  rσ fσ, ~t (~ω ) (6) P ~r (~ω ) = Z σ ~ t ∈ T (σ) ! X 1 = exp rσ nσ (~ω ) (7) Z σ where the weights rσ are model parameters, ~r is the Markov network of a network model vector of these weights, nσ (~ω ) is the count of non-zero A network model K together with a suitable poset Ω of truth values induce a Markov network M(K, Ω). The nodes of M are the closed atomic formulae of K. To ensure that the network is finite, we assume that K has only finitely many sorts and relation symbols and that its only function symbols are finitely many constants. The latter assumption avoids the potential for infinite lists of distinct, closed terms such as f (c), f (f (c)), etc. By adopting the unique names convention of [4], we fix the truth values of all closed atomic formulae of form (ii) (see Section 2.1.2): for constants ci and cj , semantics of ci = cj (in the empty context) is ‘true’ iff i = j. In classical Markov logic, the domain closure convention of [4] converts existentially quantified formulae to disjunctions and universally quantified ones to terms from the inner sum of (6), and Z is a normalization factor, define a probability density on the set of all truth values assignments. The notation in (6) and (7) hides the fact that ω ~ is projected onto the subproduct space of nodes relevant to a given Cσ, ~t . This allows us to estimate the rσ from a training data set then use the same equation to make inferences on different data since the network models, K and K′ , differ only by constants. In Figure 1 we use the term Parameterized Markov Network Template to refer to (7) when the weights are unknown and the dimension of ω ~ is unspecified. We use the term Markov Network Template to refer to (7) with known weights but unspecified dimension. When the weights are known and the set of constants is fixed, then (7) is a Markov Network. 942 3.2 Weight-learning and inference Algorithm 1 MaxWalkSAT(K, Ω, ~r, n, m, v, p, ~ω) In the previous sections of this article, we generalized the concepts and constructions of Markov logic to the case of multiple truth values. In this section we describe progress on adapting Markov logic algorithms. Estimate the most probable state ~ω of the network M(K, Ω) given weights ~r. Return 0 if cost(~ω ) ≤ v is achieved within n iterations; 1 otherwise. Upon return, populate ω ~ with the state estimate. 1: for i ← 1 to n do 2: ω ~ ← a random network state 3: v ′ ← cost(~ω ) 4: for j ← 1 to m do 5: if v ′ ≤ v then 6: return 0 7: end if 8: σ, ~t ← a random unsatisfied closed sequent 9: if Uniform(0, 1) < p then 10: N ← a random node from Cσ,~t 11: ω ′ ← a random truth value 12: else 13: N , ω ′ ← arg min ∆(~ω ; N, ω ′ ) ′ 3.2.1 Weight-learning As shown in Figure 1, training data provides a set of constants (entity names) for the network model and an assignment ~ ω of truth values (to closed formulae). By assuming a prior distribution on weights, interpreting (7) as the probability of ω ~ given ~r, and applying Bayes’ Rule, we may calculate a posterior distribution by maximizing (7) with respect to ~r. Solutions of 0 = ∇~r ln (P ~r (~ ω )) give such maxima. As in [4], ∂ ln (P~r (~ ω )) = nσ (~ ω ) − nσ ∂ rσ (8) where the mean nσ is computed with respect to P ~r . This calculation scales as |Ω|n with n determined by the arities of the relation symbols and by the sizes of the sets of constants of the various sorts. Solution of the non-linear system 0 = ∇~r ln(P ~r (~ ω )) is expensive as well. Efficiently estimating the posterior on ~r using the pseudo-likelihood of [4] also generalizes to the nonclassical case, however. The applications in Sections 6–8 below illustrate the discriminative learning method of [4]. We assume that the formulae that will serve as evidence in test data are known a priori. This partitions the vector ~ω of truth values into disjoint sets, ~u and ~v , with ~v known and ~u to be estimated. As in [4], we may approximate the mean in (8) in this case by nσ (~u∗ , ~v ) where ~u∗ is the MAP state given ~v . Both of the MAP algorithms, MaxWalkSAT and LazySAT, of [4] generalize to the multiple truth values case. We sketch our adaptation of MaxWalkSAT in Algorithm 1. To simplify the sketch we define  X X  1 − fσ, ~t (~ ω) (9) cost(~ω ) = rσ σ 16: 17: 18: 19: end if v ′ ← v ′ + ∆(~ω ; N, ω ′ ) ~ω ← ~ω[ω ′ , N ] end for end for return 1 network models. As it is implemented in [4], MC-SAT relies on the Boolean nature of classical logic. It may be possible to modify this algorithm for applications to general posets of truth values, however. 4 A simple, non-classical social network calculation To illustrate non-classical Markov logic, we modify the simple example on page 97 of [4]. Define K to be the network model with a single sort person, two unary relation symbols Smokes and Cancer, one sequent Smokes(x) ⊢x Cancer(x), and one constant a. Figure 3 shows the induced Markov network. ~ t ∈ T (σ) which is the sum of the weights of the unsatisfied closed sequents in a given Markov network state and we define ω ~ [ω ′ , N ] the be the state that results from setting the truth value to ω ′ at position N of ω ~ . We define ∆(~ω ; N, ω ′ ) = cost(~ ω [ω ′ , N ]) − cost(~ ω ). (10) to be the cost change that results. 3.2.2 N, ω 14: 15: Marginal and conditional probabilities The marginal and conditional probabilities of M(K, Ω) can be found by direct calculation, Markov chain Monte Carlo estimation, and other methods such as MCSAT [4]. The former two approaches readily adapt to the non-classical case but are not practical for large Smokes(a) Cancer(a) Figure 3: Markov network constructed from a simple social network network model. The poset (Ω, ≤) described in Figure 2 and having three truth values, false < ω < true, is suitable for K. The intermediate truth value ω is interpreted as ‘possibly.’ A network state is a pair x = (xSmokes(a) , xCancer(a) ) of Ω values. There are nine such states. (7) gives the probability density on this space and has ( 1 if xSmokes(a) ≤ xCancer(a) n(x) = (11) 0 otherwise. 943 Z = 3 (1 + 2 er ) since n(true, false) = n(true, ω) = n(ω, false) = 0 and n(x) = 1 for the other six states. For example, Pr (true, ω) = 1/Z. We may compute the marginal Pr (xSmokes(a) = ω) = 1/3 then calculate the conditional probabilities Pr (xCancer(a) = τ | xSmokes(a) = ω) ( 1/(1 + 2 er ) if τ = false = er /(1 + 2 er ) otherwise. (12) As r → ∞, the τ = false case converges to 0 while the other two conditionals converge to 1/2. To interpret these calculations, assume the sequent has high weight and that data indicates a possibility that a smokes. In this case, a conclusive observation that a is cancer-free has low probability. Positive cancer reports that are either conclusive or that indicate the possibility of the disease have probability close to 1/2. 5 Geopolitics Given a sequent ϕ ⊢~x ψ, we may use classical logic to infer ⊤ ⊢~x (ϕ ⇒ ψ), then simply assert that the formula ϕ ⇒ ψ is true. These steps dispatch sequents in favor of implication formulae. In general, however, the logic of posets of truth values is not classical. Figure 4 gives an example and interpretation. This poset is suitable for Horn or regular network models (see Section 2.1.3) but not for those in which either of the operators ∨ or ⇒ occurs in a sequent. The reason for this is that the inference rules for ⇒ force (Ω, ≤) to be a Heyting lattice (see [8, 15] or Table 2) but such posets satisfy the distributive law: x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z). (13) Substituting x = γ∗ , y = ω, and z = γ ∗ into (13), however, yields γ∗ on the left side but γ ∗ on the right. ⊤ γ∗ ω γ∗ Resources, Hostile, and Weapon. The sequent Resources(x) ∧ Hostile(x) ⊢x Weapon(x) (14) models the assertion that a hostile nation with sufficient financial resources will develop a particular weapon system. We assume the poset of truth values shown in Figure 4. A network state is a list of three Ω values x = (xR(a) , xH(a) , xW(a) ) where we have abbreviated the names of the relation symbols. The density (7) has Z = 32 + 93 er and we may calculate the conditionals Pr (xW(a) = τ | xH(a) = ⊤ and xR(a) = γ∗ ) (15) ( r r ∗ e /(2 + 3 e ) if τ ∈ {⊤, γ , γ∗ } = 1/(2 + 3 er ) otherwise. As r → ∞, the conditional probability converges to 1/3 if γ∗ ≤ τ and to 0 otherwise. In particular, the τ = γ case has limit 0 since the data xR(a) = γ∗ indicates that the report is from intelligence source B. 6 Insurgency network analysis Understanding a social network may involve analysis of different links between and properties of individuals. We define a model to infer the probability of entities having an anti-U.S. sentiment based on familial and tribal bonds. The signature has a single sort, person. It has two unary relation symbols, TribeMember and AntiUS, and five binary ones: Mother, Father, Spouse, Son, and Daughter. With the exception of Spouse, the binary relations are intended to be directional with R(x, y) interpreted as the assertion that x has relationship R to y. Father(x, y), for example, is read ‘x is the father of y’. We use discriminative learning (see Section 3.2.1) to make inferences about AntiUS. The network model has the following sequents which may express statistical tendencies in real data. (16), for example, asserts that tribal members tend to have antiU.S. sentiment. (17) says that tribal members tend to intermarry. (22)–(27) express the tendency of anti-U.S. views to be transferred across familial links. (24) and (25), for example, respectively model the transfers from son to parent and from parent to son. TribeMember(x) ⊢x AntiUS(x) ⊥ Figure 4: Poset (Ω, ≤) of truth values that represents intelligence reports from two sources, A and B. ω is a report of ‘possibly’ from A. γ∗ and γ ∗ are reports of ‘plausibly’ and ‘likely’ from B. ⊤ and ⊥ are conclusive positive and negative data from either source. The conditions ω ∧ γ∗ = ⊥ = ω ∧ γ ∗ model the fact that action is to be taken on the first report received. Spouse(x, y) ⊢x,y (16) (17) (TribeMember(x) ⇔ TribeMember(y)) Son(x, y) ∧ Spouse(y, z) ⊢x,y,z Son(x, z) (18) Daughter(x, y) ∧ Spouse(y, z) ⊢x,y,z Daughter(x, z) (19) TribeMember(x) ∧ Father(x, y) ⊢x,y (20) TribeMember(y) Consider a network model with a single sort, Country, a single constant a, and three unary relation symbols, 944 TribeMember(x) ∧ Mother(x, y) ⊢x,y TribeMember(y) (21) AntiUS(x) ∧ Father(x, y) ⊢x,y AntiUS(y) (22) AntiUS(x) ∧ Mother(x, y) ⊢x,y AntiUS(y) (23) AntiUS(x) ∧ Son(x, y) ⊢x,y AntiUS(y) (24) AntiUS(y) ∧ Son(x, y) ⊢x,y AntiUS(x) (25) AntiUS(x) ∧ Daughter(x, y) ⊢x,y AntiUS(y) (26) part of the human network. Familial links and tribal associations were assumed known in this example. The test results reflect the fact that 1/3 of the entities in the training data were anti-U.S. Note that the chain of three male descendents show a strengthening of antiU.S. views down generations of tribal members. AntiUS(y) ∧ Daughter(x, y) ⊢x,y AntiUS(x) (27) 7 By including a large number of sequents, we ensure that weight learning will be responsive to data subtleties. The apparent redundancies in (22)–(27), for example, allow the model to distinguish between male and female entities who are descendents but not parents. Figure 5 shows the network that was used to train a Markov network model having probability density given by (7). This training data represents observations about part of a network of interest or part of a different human network that is assumed to have characteristics similar to the one of interest. We chose Ω = {false, true} and used the Alchemy system [9]. M M F M F F F M M In some network analysis applications it is useful to distinguish between entities and nodes. The latter are to be construed as ports through which the former communicate. Nodes, for example, may be telephone numbers or IP addresses while entities are humans. Alternatively, nodes may be IP addresses in an ad hoc network context in which entities are MAC addresses [3]. The goal of alias detection is to determine when a single entity is employing more than one node. 7.1 M 0.36 F 0.35 F 0.31 M 0.60 M 0.25 F 0.12 M 0.74 F 0.39 Direct link model To formulate a primitive alias model, we observe that if nodes m and n are directly linked to the same other nodes, then m and n are alias candidates. Let the network model K have a single sort, Node, two binary relation symbols LinkedTo ֌ Node × Node and Alias ֌ Node × Node, and the following sequents. LinkedTo(a, x) ∧ LinkedTo(b, y) ∧ Alias(a, b) (28) ⊢a,b,x,y Alias(x, y) Figure 5: Training data for an insurgency network model. Entities are people. Nodes labeled M and F are male and female, respectively. Circled nodes have antiU.S. sentiment. Boxed nodes are members of a tribe of interest. Undirected arrows are spousal relationships. Directed arrows show son or daughter relationships. M 0.42 Alias detection Alias(x, y) ∧ Alias(y, x) ⊢x,y,z Alias(x, z) (29) Alias(x, y) ⊢x,y Alias(y, x) (30) Alias(x, y) ∧ LinkedTo(x, a) (31) ⊢a,x,y LinkedTo(y, a) Figure 7 shows training data used to learn the Markov network weights. The left diagram is the LinkedTo data while the right diagram is the Alias data. We use discriminative learning (Section 3.2.1) to make inferences about Alias. The training data includes two nodes, 0 and 5, that are known to be aliases for the same entity. Both are directly linked to the same other nodes. We assumed Ω = {false, true} and used Alchemy [9]. F 0.26 F 0.71 M 0.81 0 Figure 6: Test and inference results for an insurgency network. As in Figure 5, M and F indicate male and female nodes. Directed and undirected links and boxed nodes are also interpreted as in the previous figure. The number associated with each node is the inferred probability that the given entity has an anti-U.S. sentiment. 5 1 5 1 4 2 4 2 3 The resulting Markov network template (see Figure 1) was used to infer probabilities of anti-U.S. sentiment for entities in the test data set shown in Figure 6. The test data represents the unobserved or partially observed 0 3 Figure 7: Alias detection training data. The diagram on the left shows the links between nodes. The diagram on the right indicates that nodes 0 and 5 are aliases. 945 The test data appears in Figure 8. An appealing feature of Markov logic is that it performs simultaneous alias detection on all pairs of nodes in this dataset. The conclusion that two particular nodes are aliases for a single entity changes the relationships among nodes linked to either of the aliases. In Figure 8, nodes 1 and 4 are potential aliases because they are linked to the same nodes. This identification makes 2 and 3 look similar, although 3 is connected to 0 while 2 is not. 7 and 5 look alike because there is only one neighbor that they do not share. This identification, however, gives weak evidence that 7 and 3 are aliases since both link to 0 while 3 links to one other entity (if we identify 1 and 4). 0 7 0 7 1 6 2 5 1 6 3 2 5 4 3 4 Figure 8: Test data and alias detection results. The left diagram shows link data in which we seek to detect aliases. In the right diagram, an edge is drawn between two nodes if there is evidence that they correspond to the same entity. The width of the edge indicates the strength of the evidence. 7.2 Local metric models A class of more expressive alias detection models results from accounting for one or more local metrics such as degree centrality or proximity prestige. (32) is a template for adding such features to the network model. Metric(x, v) ∧ Metric(y, v) ⊢x,y,v Alias(x, y) (32) to all values v that may occur in the test data. If the set of possible values is known and finite (such as the set of shipping ports in Section 8), this is straightforward. 8 We analyze container shipping transaction data to find entities that have similar shipping patterns and ones with patterns resembling those of known threats. We define a network model K that has five sorts Company, CargoCode, Port, City, and Country. The model has a unary relation symbol Threat ֌ Company and five binary relation symbols Similar ֌ Company × Company, Ships ֌ Company×CargoCode, UsesPort ֌ Company × Port, City ֌ Company × City, and Country ֌ Company × Country. Ships(x, a) and UsesPort(x, p) respectively indicate that company x ships commodity a and uses port p. The City and Country relations are company address fields. We use discriminative learning (Section 3.2.1) to make inferences about property Threat and on the link Similar. The sequents of the network model are listed in (34)– (41). These capture statistical tendencies in training and test data but are unlikely to be satisfied by all data elements. (34)–(37) and (40)–(41) characterize what it means for shipping patterns to be similar. Companies are similar if they ship the same commodities, use the same ports, and have the same country and city of origin. Similarity is symmetric and transitive. (38) and (39) characterize the notion of a threat. A company is a threat if its shipping pattern is similar to that of a likely threat. We chose Ω = {false, true} and used the Alchemy system [9]. The modifier + in (35)–(37) is an Alchemy feature that forces a different weight to be learned for each port, city, and country that occurs in the data. The fact that two companies both use a major port, such as Rotterdam, may reflect a different signature than use of a small, remote port. We may simultaneously account for different metrics by including a list of sequents of this form. Each would involve a relation symbol Metrick for one of the metrics of interest. Ships(x, a) ∧ Ships(y, a) ⊢x,y,a Similar(x, y) UsesPort(x, +p) ∧ UsesPort(y, +p) ⊢x,y,+p Similar(x, y) To account for distributions of metric values, a distinct weight may be learned for each value that occurs in the data. (33) expresses this enhancement using the + notation of [9]. Metric(x, +v) ∧ Metric(y, +v) (34) (35) City(x, +c) ∧ City(y, +c) ⊢x,y,c Similar(x, y) (36) Country(x, +c) ∧ Country(y, +c) (37) ⊢x,y,c Similar(x, y) (33) ⊢x,y,+v Alias(x, y) We anticipate that, in applications of such models, keeping the computations of reasonable complexity and maintaining a low false-alarm rate will involve a balance among model expressiveness, training data quality, test data size, and choice of metrics. To use the enhancement (33), one must, moreover, learn or assign weights Transaction pattern analysis Similar(x, y) ∧ Threat(x) ⊢x,y Threat(y) (38) Similar(x, y) ∧ Threat(y) ⊢x,y Threat(x) (39) Similar(x, y) ⊢x,y Similar(y, x) Similar(x, y) ∧ Similar(y, z) (40) (41) ⊢x,y,z Similar(x, z) Figure 9 shows test results on a small data set. Nodes 0–8 are known threats. Note that the model 946 References [1] C. M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, 2006. [2] S. Bringsjord, personal communication. [3] S. Cheshire, B. Aboba, and E. Guttman, Dynamic Configuration of IPv4 Link-Local Addresses, IETF, RFC 3927, 2005. 13 12 11 10 9 [4] P. Domingos, S. Kok, D. Lowd, H. Poon, M. Richardson, and P. Singla, “Markov Logic,” in Probabilistic Inductive Logic Programming. L. D. Raedt, P. Frasconi, K. Kersting, and S. Muggleton Eds., Springer-Verlag. 2008 8 7 18 17 16 15 14 identifies nodes 9 and 29 as probable threats having transaction patterns similar to those of known threats. Analysis of the results reveals that the algorithm finds instances in which a single company has more than one database key, associates a manufacturer and a favored shipper, and detects cases in which two (or more) distinct companies have similar shipping patterns. 19 6 5 4 3 2 1 0 49 48 47 22 21 20 24 23 28 27 26 25 44 [5] G. Godfrey, J. Cunningham, and T. Tran, “A Bayesian, Nonlinear Particle Filtering Approach for Tracking the State of Terrorist Operations”, Intelligence and Security Informatics, IEEE, 23– 24 May 2007, pp. 350–355. 46 45 [6] J. Y. Halpern, Reasoning about uncertainty, MIT Press, 2003. 42 41 40 39 38 37 31 30 29 36 35 34 33 43 32 [7] M. Jackson, A sheaf theoretic approach to measure theory, Dissertation, University of Pittsburgh, 2006. Figure 9: Entities with similar shipping patterns and with patterns like those of known threats. Each node corresponds to a company. Node gray scales indicate threat probabilities based on the model and training data. Darker nodes have higher probability. 0–8 were training data threats. Links indicate similar shipping patterns. Thicker links indicate greater similarity. 9 Conclusions We have generalized the concepts and constructions of Markov logic to allow (closed) formulae to take values in any partially ordered set (poset) of truth values that is suitable for the network model of interest. The Boolean poset, false < true, of classical logic is suitable for any network model but, by utilizing more general posets, the new methods directly support uncertainties in both data sources and values. We have taken steps toward adapting Markov logic algorithms to the non-classical case. Adapting the MAP state estimation algorithm, MC-SAT, is an open problem, however. Analysis of the computational complexity of Markov logic techniques, and the degrees to which they may be parallelized, are needed. We applied Markov logic to a variety of scenarios including some arising from real world problems. Acknowledgment This work was partially supported by Air Force Office of Scientific Research grant FA9550-08-1-0411. [8] P. Johnstone, Sketches of an elephant: a topos theory compendium, Oxford University Press, 2002. [9] S. Kok, P. Singla, M. Richardson, and P. Domingos, The Alchemy system for statistical relational AI. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, www.cs.washington.edu/ai/alchemy. [10] F. W. Lawvere and S. Schanuel, Conceptual Mathematics, Cambridge University Press. 2nd Ed., 2009. [11] S. Mac Lane and G. Birkhoff, Algebra, 3rd Ed., Chelsea Publishing Company, 1993. [12] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997. [13] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 2nd Ed., Prentice-Hall, 2002. [14] D. Scott and P. Krauss, “Assigning probabilities to logical formulas,” in Aspects of Inductive Logic, J. Hintikka and P. Suppes Eds., North-Holland, 1966. [15] R. L. Wojtowicz, Categorical logic as a foundation for reasoning under uncertainty, SBIR Final Report, Metron, Inc. 2005. [16] R. L. Wojtowicz, “On transformations between belief spaces,” in Soft Methods for Handling Variability and Imprecision, D. Dubois, H. Prade, et al. Eds., Springer-Verlag, pp. 313–320, 2008. 947