An attractor neural network model of semantic fact retrieval

Marius Usher

An attractor neural network model of semantic fact retrieval

Marius Usher

1990, Network: Computation in Neural Systems

visibility

…

description

24 pages

link

1 file

This paper presents an attractor neural network model of semantic fact retrieval, based on Collins and Quillian's original semantic network models. In the context of modeling a semantic network, a distinction is made between associations linking together objects belonging to hierarchically-related semantic classes, and associations linking together objects and their attributes. Using a distributed representation leads to some generalization properties that have computational advantage. Simulations performed demonstrate that it is feasible to get reasonable response performance regarding various semantic queries, and that the temporal pattern of retrieval times obtained in simulations is consistent with psychological experimental data. Therefore, it is shown that attractor neural networks can be successfully used to model higher level cognitive phenomena than standard content addressable pattern recognition.

An Attractor Neural Network Model of Semantic Fact Retrieval. E. Ruppin Department of Computer Science School of Mathematical Sciences Sackler Faculty of Exact Sciences Tel Aviv University 69978, Tel Aviv, Israel M. Usher Department of Physics School of Physics and Astronomy Sackler Faculty of Exact Sciences Tel Aviv University 69978, Tel Aviv, Israel. October 23, 1996 Abstract This paper presents an attractor neural network model of semantic fact retrieval, based on Collins and Quillian's original semantic network models. In the context of modeling a semantic network, a distinction is made between associations linking together objects belonging to hierarchically-related semantic classes, and associations linking together objects and their attributes. Using a distributed representation leads to some generalization properties that have computational advantage. Simulations performed demonstrate that it is feasible to get reasonable response performance regarding various semantic queries, and that the temporal pattern of retrieval times obtained in simulations is consistent with psychological experimental data. Therefore, it is shown that attractor neural networks can be successfully used to model higher level cognitive phenomena than standard content addressable pattern recognition. 1 Introduction This paper presents an attractor neural network (ANN) model of semantic fact retrieval. The static structure of the model represents a Semantic Network (SN) and its dynamics a Spreading Activation process. Both these static and dynamic constituents are of paramount importance in arti cial intelligence and cognitive sciences for the understanding and modeling of human memory retrieval processes. We shall therefore demonstrate that attractor neural networks can be successfully used to model higher level cognitive phenomena than standard content addressable pattern recognition. Attractor neural networks are part of the more general Connectionist framework [1], that has been developed in recent years in parallel to the classic arti cial intelligence mainstream of symbolic processing. While in the latter information is stored at speci c memory addresses and processed by explicit rules, in the Connectionist framework no such distinction between the information and the processing algorithm exists [2]. The implementation of symbolic, graph-based structures such as semantic nets using connectionist architectures, is straight forward if a local representation is utilized; i.e., every node (representing a concept) in the SN is represented uniquely by a single processing unit (a neuron) in the NN. In accordance, every edge in the SN graph (representing an association) is to be represented by a connection (a synapse) in the NN, and the amount of activation of the various nodes is represented by using continuous valued neurons. Indeed, models constructed along these lines have already been presented [3, 4, 5, 6]. However, as noted by Ritter and Kohonen in regard with such one-to-one semantic modeling [7], \In view of the contemporary neurophysiological data, such a degree of speci city and spatial resolution is highly improbable in biology". Hinton [8] has claimed that an alternative approach, based on distributed representations is more promising. According to this approach, the semantic concepts are represented by groups (patterns) of neurons, such that each concept is distributed over many neurons and each neuron participates in representing many di erent concepts. Following Hinton's original conceptual framework, semantically related objects receive similar representations. A thorough discussion of the merits of distributed representation vs. a local one was presented by Hinton, McClelland and Rumelhart [1]. The speci c advantages obtained by using distributed representations for our goals will be further described in the discussion. In the context of modeling a semantic network, a distinction is made between associations linking together objects belonging to hierarchically-related semantic classes, and associations linking together objects and their attributes. This is made since we assume that hierarchically related objects receive closely related representations with respect to a certain distance measure, while no such encoding similarity is reasonable between objects and their corresponding attributes. An association of the rst kind is `zebra is-a mammal', while an association of the latter kind is `zebra has 1 stripes'. A unique characteristic of a distributed representation is that if we assume that the encoding is a topology-conforming mapping, i.e., closely related concepts receive closely related representations, then associational linkage between such related concepts emerges implicitly from their \geometric" proximity in the encoding space. As opposed, the second kind of associations should be explicitly implemented into the synaptic connections. This distinction is realized by constructing the model from two interacting subnetworks, one storing the objects and the other the attributes. While the semantic proximity between objects is re ected in the degree of similarity of their corresponding encodings, the associations between objects and attributes are realized by projections between patterns belonging to di erent subnetworks. Out of the various existing kinds of distributed connectionist models, we have selected the ANN paradigm. According to this paradigm, cognitive events are realized as sustained activity states of a neural net, which are attractors of the neural dynamics. These attractors are distributed activity patterns of the whole neural net, and learning is accomplished through modifying the synaptic connections' strength via a Hebbian-like mechanism. ANNs perform a content addressable memory retrieval and can be used for pattern recognition since they have the ability for performing error correction. For example, an ANN that models retrieval of information from short term memory, demonstrated in the high speed scanning experiments of Sternberg's type [9]. The advantage of using ANNs for such modeling results from the fact that the error correction is ampli ed at every processing phase which is based on convergence towards an attractor. In addition, biological data supporting the hypothesis that memory activation is achieved through sustained neural activity, has been found [10]. Our goal is to account for semantic fact retrieval using the ANN paradigm. The model is conceived so that response performance and retrieval times consistent with Collins and Quillian's experimental data, can be obtained. As a consequence, several constraints upon the model construction are imposed. The recognition and retrieval of facts (a relation composed of several concepts) would be straight forward if they would be stored as attractors in the network. Nevertheless, when modeling fact retrieval, in light of economy and language-productivity considerations [11], one cannot store every fact as an attractor on its own, and therefore another strategy should be pursued; facts will be represented as spatial-temporal combinations of their constituents, which themselves will continue to be represented as attractors. Subsequently, the retrieval of facts is described by a dynamical process performed on this attractor space. Collins and Quillian [12] have originally modeled two fundamental types of queries: 1. An Object-Attribute (OA) query; i.e., asking wherever an item (object) A has a property (attribute) X, (e.g., does a canary sing?). 2. An Object-Object (OO) query; i.e., asking wherever an item A belongs to the category (superset) represented by item B, (e.g., is the canary a bird?). 2 In addition to the modeling of these queries, which are characterized by performing an upwards motion in the semantic tree along the path leading to the root, we have been able to demonstrate the modeling of other plausible types of queries, (been handled previously in Fahlman's NETL system, [13]): 3. The Property intersection query; i.e., what item has the properties X,Y, and Z, (e.g., who sings, ies, and has feathers?). 4. The Reverse Category query; i.e., what type (sub-category) of item A has property X, (e.g., what kind of bird sings?). The network's dynamics should realize a process in which semantic queries are responded. The queries being modeled are composed of either an item (object) and an attribute, two objects, or two attributes. It is assumed that upon a presentation of a query, a preprocessing state exists, by which the two components of the presented query are distinguished and inputed into the corresponding subnetwork (items or attributes), by means of external elds. It has already been shown that ANNs can maintain their retrieval capabilities even if the input is applied as an external eld [14]. It is assumed that a necessary condition for obtaining a positive response is that a stable state with high similarity to the second input component is achieved. If such a degree of similarity is not achieved during a certain xed time, or the network has converged into a stable state with low amount of similarity with the second input, then a negative response occures. This paper is further divided into four sections; In section 2. we brie y review the Semantic Network and Spreading Activations concepts. In section 3. we outline the architecture of the model presented and describe its dynamical behavior. In section 4. the results of computer simulations of the model are described and in section 5. some computational characteristics of the model are analyzed and the model's performance in comparison with the existing psychological experimental data is discussed. 2 Preliminaries. In this section we brie y discuss the static and dynamic features upon which the model is based. The model's static structure is a semantic network, originated by Quillian [15, 16, 17] as a mode of knowledge representation. Its primary motivation was the Economy principle stating that items are stored without redundancy, as is described further below. Information is represented by a set of nodes, representing concepts, connected to each other by a set of edges (pointers), which represent relationships among the concepts represented. Di erent models of semantic nets have later been used to represent a variety of kinds of knowledge [18, 19, 20]. The semantic network described by Collins and Quillian [12] is a directed tree linked structure (see g. 1) where the nodes stand for concepts designating classes 3 of objects and the edges between them represent a special kind of relationship (designated an AKO (`a kind of') relationship in AI literature), denoting a category (or superset) relation; For example, such a directed edge pointing from node A to B designates that the concept represented by A (e.g., a dog) belongs to the category represented by concept B (e.g., the mammals). In addition, every node has another type of directed edges pointing to some properties represented by a set of nodes external to the tree structure. The existence of an edge between an object node to a property node denotes that the object has that property, (e.g., the dog barks). denote object (item) nodes and A chief characteristic of such a semantic net is that if there is a property X which is common to some of the concepts (items) de ned in the tree, it can be stored at the highest possible concept, for which all concepts in the sub-tree spanned by it, still have this property X in common. Thus, a large amount of memory space can be saved, once a dynamic process is implemented characterized by motion along the path leading from a node through its ancestors. In the model presented, such a dynamic process has been implemented using spreading activation. The theory of Spreading Activation [12, 21] has a major role in modeling the various phenomena gathered in psychological experiments, concerning the dynamics of semantic processing. It is based on the hypothesis that the cognitive activity evoked when a memory is queried about one of its stored facts could be characterized by a `spreading out' process. In this respect, by `facts' we mean statements about stored items or the relations between them, (e.g., `the dog barks'). In this process, it is hypothized that the activation originates at the nodes representing the concepts queried about (`dog, barks', denoted the `initiating nodes') and further spreads in parallel to all the nodes that are connected to one of the initiating nodes. If several initiating nodes are connected to another node, then the latter could be considered as an `intersection' node, receiving a high level of activation (above a certain threshold) from all initiating nodes. Since our model is based on the ANN paradigm, the spreading of activation occurs in the concepts' and not in the neurons' space. The activity of a concept is de ned as the overlap between the pattern representing it and the current network's state. Therefore, although in an ANN only one attractor can be fully activated at once, nevertheless, a restricted notion of spreading of activation can be manifested; when moving from one attractor to another, an intermediate stage consisting of mixtures of several (low activated) patterns, arises. This intermediate `spreading' (divergence) of activity nally converges into an attractor. It will be further shown, that in some cases this convergence may be viewed as expressing the intersection phenomenon. In addition, since our model is based on two separate ANN subnetworks, objects and attributes may be fully activated at the same moment. 4 3 Description of the Model. An ANN is an assembly of formal neurons connected by synapses. The neuron's state is a binary variable S , taking the values 1 denoting ring or resting states , correspondingly. The network's state is a vector specifying the binary values of all neurons at a given moment. Each neuron receives inputs from all other neurons to which it is connected, and res only if the sum of the inputs is above its threshold. This process may include a stochastic component (noise) which is analogous to temperature T in statistical mechanics. When a neuron res, its output, weighted by the synaptic strength, is communicated to other neurons and as a consequence, the network's state evolves. Mathematically this is described as follows; in the deterministic case Si(t + 1) = sgn(hi (t)) (1) and for the stochastic case, with temperature T by ( 1; with prob. 21 (1 + tanh( hT )) Si(t + 1) = ? (2) 1; with prob. 12 (1 ? tanh( hT )) where Si is the state of neuron i and hi is the local eld (the postsynaptic potential) of neuron i, which is given by: i i hi = N X j =i Jij Sj (3) 6 where Jij is the ij element of the connection matrix. The network's trajectories, are therefore determined by the matrix describing the synaptic weights, which re ect prior learning. For speci c synaptic weight matrices, the emergent motion is characterized by convergence to stable pattern con gurations (representing the memorized concepts) which are attractors of the dynamics [22]. Various versions of ANNs, whose attractors form an hierarchic structure, have been developed [23, 24]: the hierarchical structure of the stored memory patterns re ects their proximity in the pattern space according to some distance measure. The model we propose for fact retrieval from a semantic memory is composed of two ANN subnetworks of N neurons, (in our simulations we have chosen N = 1000).; the rst subnetwork, denoted the items subnetwork, stores a semantic tree of items, illustrated as circles in g. 1. The second subnetwork, denoted the attributes network, stores the items' properties, depicted as squares in g. 1. 3.1 The items subnetwork The stored items are organized in a three level tree. The top level consists of two patterns representing the most general items in our semantic set, denoted by , where = 1; 2. Each of these two patterns is generated by selecting the value of 5 every neuron in the network to be 1 with probability q1 and ?1 with probability 1 ? q1 independently. The average activity h i for these patterns is therefore a1 = 2q1 ? 1. From each of these patterns, two descendants, denoted by , (; = 1; 2) are generated by probabilistically ipping some (1 ? q2) bits (neurons) of the ancestor pattern m u. Formally stated, this is performed in the following way; i = i i (4) where is subject to the probability distribution P ( i) = q2( i ? 1) + (1 ? q2)( i + 1) (5) where (x) is the Dirac delta-function which is zero for x 6= 0 and its integral equals one. q2 has to be greater than 0:5 so that a positive correlation will exist between the ancestor and its descendant patterns. In the same fashion, a third level of patterns, denoted (; ; = 1; 2), is generated by probabilistically ipping some of the ancestor's bits, with a similarity parameter q3 also greater than 0:5; i = i i where is subject to the probability distribution P (i) = q3(i ? 1) + (1 ? q3)(i + 1) (6) (7) Subsequently, each pattern belonging to the rst level is in correlation a2 = 2q2 ? 1 with its direct descendants, while the same holds for next generation with correlation a3 = 2q3 ? 1. It is straightforward that the correlation between a pattern in the rst level and its descendants in the third layer is a2a3 and the correlation between two patterns belonging to the same ancestors is a22 in the second layer and a32 in the third one. Since we have chosen q2 = q3 = 0:75, we obtain an equal correlation of 0:5 between each ancestor and descendant. The correlation between two patterns at the same level, with a common direct ancestor, is 0:25. Observe that in this structure items closely related obtain similar encoding (where the degree of similarity is measured by the Hamming distance) which re ects the geometric proximity in the 2N space. The connectivity matrix which embeds this hierarchic structure of attractors, is constructed according to Feigelman and Io e [23]; X X i( j ? a1) + 1 ?1a 2 Jij = ( i ? a2 i)( j ? a2 j ) (8) 2 ; =1;2 =1;2 X + 1 ?1a 2 ( i ? a3 i)( j ? a3 j ) 3 ;;=1;2 6 Using this connection matrix is can be easily shown that each of the stored patterns is a stable state since 14. N X Jij Sj = Sj (9) j =1 when S is one of the patterns stored ( ; ; ), whose total number is 2+4+8 = hi = 3.2 The attributes subnetwork This subnetwork stores the properties (attributes) belonging to the items, as attractors. The patterns representing the attributes denoted by are uncorrelated, i.e., they are by generated by selecting randomly the values of each neuron to be 1 with equal probability. This network is therefore a Hop eld type network [22] and its conPp nection matrix is: =1 i j Since for every item stored in the items network we arbitrarily assign three distinct attributes, the total number of stored attributes p is 14 3 = 42. 3.3 The subnetworks interaction The two subnetworks are weakly connected through two kinds of synapses. The rst kind of synapses, J AI , are projecting the activation from items subnetwork to the attributes one, while synapses of the second kind, J IA, project the activation in the opposite direction; every item pattern projects through synapses of the rst kind to three attribute patterns and each of those attribute patterns project their activity backwards to it. The rst interconnectivity matrix is as follows: X J AI ij = 1 ( 1i + 2i + 3i) j (10) In this formula, the neuron i belongs to the attributes network, and the the neuron j belongs to the items network. denotes an attractor in the items subnetwork so that the sum runs over all items patterns and connects each of them to its three attribute patterns in the attributes subnetwork. 1 denotes the strength of the projection from items to attributes patterns. The second interconnectivity matrix is: X (11) J IAij = 2 i( 1j + 2j + 3j ) where i belongs to the item network, j to the attributes network and 2 is the strength of the projection from attributes to items. 7 3.4 Characteristics of the uni ed network The behavior of the network as a whole depends on the values of the couplings 1 and 2. In the extreme case when 1;2 1, every combination of attractor pairs (of the two subnetworks) is an equally stable attractor. Therefore, the uni ed network can practically be regarded as two separate modules. On the other extreme, if 1;2 = 1 and every item is coupled to a single attribute then the uni ed network reduces to a simple ANN in which every attractor is composed of the correspondingly coupled subnetworks attractors as substrings. For in between values of 1;2, an intermediate behavior is obtained; every combination of subnetwork attractors is an attractor of the uni ed network, but not with equal stability. The most stable attractors of the uni ed network (with the largest basin of attraction) are the ones composed of coupled subnetwork attractors. This is illustrated by the following examples; if two mutually coupled patterns are queried about, e.g., S A = 2 and S I = , then the local eld in the items subnetwork is hi = (1 + 2) (12) while if the queried patterns are not mutually coupled, e.g., S A = 1 and S I = , (where 1 is an attribute of object ), we get hi = + 2 (13) This means that in the rst case the attractor is strongly stable while in the latter, it is less stable since for neurons at which 6= the local eld is (1 ? 2). Therefore, for such intermediate values of 1;2, the uni ed network is characterized by weak modularity. 3.5 Input { output strategy The need to account for the various experimental results of Collins and Quillian, has imposed several methodological constraints guiding the input{output strategy used in the model. In order to account for the fact that the response times to the various queries are graded, a uniform motion upwards the tree structure had to be implemented. This imposes a rst constraint on the input to the objects subnetwork: the external eld representing this input must decay fast enough so that the upward motion is not disturbed. Second, a criterion for discriminating between true and false queried facts is needed. Using only the network's convergence into a stable state is not sucient, since even for false facts such convergence could occur, therefore leading to a false response (a false fact positively answered). Thus, the identity of the stable state arrived should be examined. In the case of an Object-attribute query, this examination cannot be performed upon the Objects subnetwork, since in cases of property inheritance, (e.g., does the canary y?) the identity of the ancestor pattern 8 to which the queried attribute is linked to, is not known in advance. Therefore, this examination should be performed upon the attributes subnetwork. This, in turn, leads to the conclusion that the input to the attributes subnetwork cannot decay too fast. This input has to last for sucient time such that the upwards motion upon the objects tree structure reaches the object to which the queried attribute is linked. Otherwise, when this object is reached, no memory of the queried attribute is left. In conclusion, the input-output strategy used is not symmetric: An input is realized by a local external eld applied to the appropriate subnetwork. In particular, the rst input is applied as an external eld which is initially strong and then decays, (this is equivalent to initializing the network in the corresponding input state), while the second input is applied as a constant external eld. Naturally, the subnetwork upon which the response is examined depends upon the type of query; In the case of an Object-attribute query, it is assumed that a necessary condition for obtaining a response is that the attribute subnetwork remains in a stable state for a sucient duration; a response is considered positive if during this period a high degree of similarity between the stable state and the second input component is achieved. If such a degree of similarity is not achieved during a certain xed time, or the network has converged into a stable state with low amount of similarity with the second input, then a negative response occures. Similarly, In the case of an Object-object query, the second component of the query is applied as a non-decaying external eld to the Objects subnetwork where convergence is examined. The initial state of both subnetworks is a \neutral" attractor, with no semantic assignment. 3.6 Expected network behavior Four types of queries are de ned and the following scenario are expected accordingly: 1. Object-attribute query: when a query is presented, the items network is initialized by a strong external eld with the attractor representing the queried object, (ultimately, this eld will decay as explained above). The equivalent external eld applied to the attributes network is weaker but of constant value. By itself it is not sucient to drive the attribute subnetwork out of its neutral attractor, nevertheless, due to the projection from the items to the attributes subnetwork, two scenario can occur: (a) In the case of a positive query (a query about a truly stored relationship (denoted ,q) such as `does a canary y ?'), the local eld in attribute subnetwork will initially be hi = 0i + 1( 1i + 2i + 3i) + 3q i + Ri (14) where 0 represents the neutral attractor, 1 2 3 represent attributes assigned to the queried item , q stands for the pattern representing the ; ; 9 queried attribute, and 3 denotes the strength of the external eld caused by the queried attribute. Ri is a \cold" noise eld originated by the projection from other attractors in the items network which overlap . Most of these projections will be negligible, except of the term originated by the direct ancestor, since a pattern's strength of projection is proportional to its overlap with the actual state of the network. Two cases are further distinguished: i. If the queried attribute is directly associated with the queried item (e.g., `does the canary sing ?') then q is one of the three attribute patterns 1 2 3 , for example 1. Therefore, for certain values of the parameters , since the component of 1 in the eld hi is 1 + 3, the subnetwork's state will be attracted into the pattern representing the queried attribute 1. Consequently, the reverse projection towards the items subnetwork will be directed toward the queried item enhancing its stability. Therefore, the uni ed network will remain in a stable state (composed of the two queried inputs) which is interpreted as a positive response. ii. If the queried attribute is directly associated with an ancestor of the queried item, (e.g., `does the canary have feathers ?', which is answered positively in light of the semantic inheritance), the external eld caused by the queried attribute is directed towards an attribute of the queried item's ancestor. Therefore the external eld in the attribute subnetwork will be as in equation 14, where q is now a pattern representing an attribute belonging to the queried item's ancestor. At rst, since all the above terms are distinct (pseudo orthogonal) attractors, the attribute subnetwork is expected to arrive at a mixed stable state. We have chosen 3 > 1 and therefore the major component of the mixed state is q , but the other components 1 2 3 and the attribute attractor associated with the queried item's direct ancestor (present in Ri) exist. Consequently, the reverse projection is di use, but its major component is directed towards the queried item's ancestor because of the large relative weight of the q component. Hence, the items subnetwork state will be destabilized and a transition toward the ancestor corresponding to q will occur. In turn, the projection towards the attributes subnetwork is directed towards the ancestor's attributes containing q , and therefore the uni ed network will be stabilized in a state composed by the queried attribute and its corresponding item. Thus a positive response is attained, however, the response time is larger due to the transition process. When the attribute queried belongs to a second level ancestor (a \grandfather") ; ; ; ; 10 of the queried item, the transition passes (at least partially) through the direct ancestor and the response time will be even longer. This happens due to the direct ancestor's component in the noise term Ri and to the geometric proximity between the queried item and its direct ancestor (i.e., from the geometric aspect, the direct ancestor lies in between the item queried and the item corresponding to the queried attribute). Therefore we expect to obtain a cascade of response times, increasing with the semantic distance between the queried item and the item corresponding to the queried attribute. (b) In the case of a negative query, (a query about a non-stored relationship (denoted , r ) such as `does a canary bark ?'), the external eld caused by the queried attribute is not directed towards one of the attractors projected by the queried item. The external eld is as in equation 14 and since all the terms are distinct (pseudo orthogonal) attractors the attribute subnetwork is expected to arrive at a mixed stable state. The major component of the mixed state is the r , but the other components 1 2 3 and the attribute attractor associated with the queried item's direct ancestor (present in Ri) also exist. Consequently, the reverse projection is di use and the state of the items subnetwork either remains in the initial state corresponding to the queried item or a transition towards the state corresponding to the queried item's direct ancestor occurs. The latter occurs when the queried attribute belongs to a direct sibling of the queried item, since the local eld in the items subnetwork contains the major components pointing to 111; 112 and 11, where for example, 111 denotes the queried item, 112 the item pointed by the queried attribute, and 11 their direct ancestor. In light of the geometric structure, (the ancestor is in between its descendants) and using the majority rule we obtain a local eld similar (with the same sign) with the ancestor pattern for most (N (1 ? q3 2)) of its neurons. This can be viewed as an instance of the spreading activation notion of intersection. However, also the above transition will not attract the attribute subnetwork into the state corresponding to the queried attribute, since using equation 14, all the terms continue to be orthogonal. Since the component of the queried attribute in the mixed state is not suciently large (we have chosen a threshold of 0:9), a negative response will be generated after a xed amount of time. However, since the coecients of these major components are unequal (they depend on the attribute subnetwork state) the situation is more complex; for the case when the coecient of 112 is large (this occurs when 3 is large), the items subnetwork may be attracted into the 112 pattern, leading the attributes subnetwork into the attractor r representing the queried attribute, and an error is generated. ; ; 11 2. 3. 4. the behavior of the network for the case of an objectobject query evolves in similar lines to the previous description, therefore we present only a brief overview of its main aspects; both inputs are introduced into the items subnetwork, as explained above, while the rst input is initializing the items subnetwork, the second input is introduced as an external eld of strength 4. In the case of a positive direct query (e.g., `is a canary a bird ?') the external eld will enhance the stability of the network's initial state. For the case of positive query in general, a transition towards the ancestor occurs due to the external eld. When the second input is a two levels higher than the rst, the transition passes through the rst item's direct ancestor. Therefore, a cascade of retrieval times is expected in accordance to the semantic distance between the inputs. In these cases a positive response is obtained since the stable states reached have a high degree of similarity (overlap) with the second input. In the case of negative queries (e.g., `is a canary a mammal ?'), if the second input is a sibling of the rst input, the network performs a transition towards the state corresponding to the common ancestor, (an intersection). This state is interpreted as a negative response since its overlap (similarity) with the second input is below the threshold previously de ned. If the second input is semantically far apart (and not an ancestor of) the rst input, a mixed state is achieved and again a negative response is produced. As in the Object-attribute case, an error may be produced since for large values of 4 the items subnetwork may be attracted to the pattern representing the second input. Object-object query: (what kind of bird sings ?) This query requires an explicit answer (which is not a yes/no response). The response will be achieved upon the convergence of the items subnetwork to a stable state. As in the previous case, the rst input initializes the items subnetwork while the second one is introduced as an external eld to the attribute network. However, in order to obtain a correct response, note that a downwards traverse should proceed on the tree structure. Reverse category query: (who sings, ies and has feathers ?) As in the previous case, this query also requires an explicit answer. The query gains signi cance when there are attributes such that each belongs to several items, (otherwise the item can already be retrieved by presenting a single attribute). The query is modeled by presenting the queried attributes as external elds applied to the attributes subnetwork and the items subnetwork is initialized by a random pattern. As in g. 4., when the queried attributes are 1 and 2, the attributes subnetwork reaches a mixed state whose projection to the items subnetwork leads to convergence into 111 (since 111 receives projection from Reverse property query: 12 both attributes while 112 from only one of them). In this way an intersection process is used in order to retrieve the requested information. 3.7 An auxiliary bias mechanism. A possible mechanism by which performance can be improved and the number of erroneous responses reduced (see section 3), is a dynamic bias eld, introduced into the items subnetwork. It was shown [24, 25] that bias can enhance the performance of an ANN. This is done by introducing the additional eld: X (15) hi = ?b ( N1 Sj ? a) j where a denotes the average activity of a group of learned patterns. This additional eld tends to weakly constraint the activity of the network`s state around a. If the patterns in the tree structure are generated so that every level l is characterized by a distinct average activity al, then by modifying al in equation 15 we can enhance the stability of the pattern in the respective level. We have chosen the value of the parameter q1 to be 0:25 and therefore the activity of the di erent levels is a1 = ?0:5, a2 = ?0:25 and a3 = ?0:125. In equation 15, the rst term results from a constant decrement of the synaptic matrix values while the second is a constant eld which can be interpreted as a uniform threshold. Thus, by controlling this external eld the network's state motion can be guided upward or downward the tree structure. Such an external control mechanism can be psychologically motivated as involving an `attention' process required in accordance with the speci c query; for some kinds of queries (e.g. queries 1,2) the processing is characterized by upward motion while for others (e.g.query 3) a downward motion is required. 4 Computer Simulation of the Model 4.1 Technical details Each of the two simulated sub-networks is composed of a 1000 neurons. The subnetworks were constructed according to the prescriptions presented in the model's description. Their attractor states were randomly generated before every new simulation run to ensure the independence of the results obtained of a speci c set of attractors. The dynamics were characterized by an asynchronous, stochastic process; within one update cycle, the units are updated sequentially in random order. After all units have been updated the next update cycle begins. We have run the simulations with and without the bias mechanism. An improved performance was found in the latter case, in the sense that a more uniform motion on the tree structure was achieved. In addition, the No. of retrieval errors was 13 signi cantly decreased, as will be described further on. The bias was implemented by modifying the parameter a of equation 15, so that every 10 iterations it equals the mean activity of the patterns in the following level, e.g., for an upwards motion a is: 8 t < 10 a = -0.125 < (16) : 10 t < 20 a = -0.25 t 20 a = -0.5 The values of the various parameters used are as follows; 1, the strength of the projection from items to attributes was chosen equal to 0:3. 2, the strength of the projection from attributes to items was chosen to be 0:55. 2 was chosen greater than 1 since the latter constitutes an important component of the force driving the activity in the items sub-network along the path leading from a queried item through its ancestors to the root of the semantic tree. The parameters denoting the strength of the external elds representing queried attributes and items 3 and 4 in correspondence are 3 = 0:35 and 4 = 0:2. 4 is relatively smaller since it acts directly on the items subnetwork. The bias parameter was set to b = 5. The Temperature is T = 0:15. Since the dynamics of the network include stochastic noise, the networks' state is characterized by uctuations even when a stable state is achieved. Therefore, a concept was considered to be activated if the the network's state S overlap with the pattern representing this concept is larger than a threshold of 0:9 for at least 5 consecutive iterations. 4.2 Illustration of a typical query run. The dynamics of the uni ed network in response to the presentation of a query is exempli ed graphically by the following gures, where item patterns are represented by circles connected by solid lines (corresponding to semantic relations) while the attribute patterns are represented by squares each of them connected by a dashed line to the item to which it belongs. The size of the various items and attributes presented is proportional to the amount of overlap between the current sub-network's state and the patterns representing them. Fig. 2 describes the case when an item is queried about an attribute belonging to its `grandfather' in the semantic tree; The initial state is depicted in g 2a. where in the items subnetwork the pattern representing the queried item is dominant and the activity of the attributes subnetwork is dispersed among many patterns. Fig. 2b. (after 15 iterations) illustrates a transition of the items subnetwork to a state with 14 high degree of overlap with the queried item's direct ancestor. In correspondence the activity of attributes belonging to it is increased. Fig. 2c. describes the situation (after 22 iterations) where a transition to the pattern representing the queried item's `grandfather' has taken place and shortly afterwards (after 24 iterations) the attribute networks is seen to converge into a stable state of high overlap with the queried attribute. Consequently, a correct positive response is generated. Fig. 3 describes the case when an item is queried about an attribute belonging to its direct ancestor; The initial state ( g 3a., after 3 iterations) is as in the previous query. Fig 3b. (after 10 iterations) illustrates a transition of the items subnetwork to a state with high degree of overlap with the queried item's direct ancestor. In correspondence the activity of attributes belonging to it is increased. This trend is strengthened as can be seen in g. 3c., (after 13 iterations) where the network has converged and remains at a state with high overlaps with the patterns representing the queried components. attribute belonging to The dynamics of the network in response to a negative query are presented in g. 4; observing the initial state ( g 4a., 3 iterations) one can see higher activity in the patterns corresponding with the query`s components. Since both inputs drive the items subnetwork to the queried item's ancestor ( g 4b., after 14 iterations) it converges to its corresponding pattern, however the attributes network remains in a mixed state and the query is answered with a negative response ( g 4c., 22). attribute which neither The dynamics of the network's response towards Object-Object queries follows similar lines; for example, when an input is introduced pointing to a second level ancestor, a two stage transition (passing through the direct ancestor) will occur. When a a Reverse Category query is presented, a downwards motion should be performed and in order to implement such motion the direction of the bias modi cation was inverted. Best performance was achieved when in addition the parameters 4 and b were now set to the values of 4 = 0:4 and b = 7. Fig 5. illustrates the case of a Reverse Property query. It can be seen that the the activity of the item corresponding to the intersection of all three inputed attributes presented grows. `intersect' and their corresponding 4.3 Performance In order to examine the performance of the network the simulations were run 30 times for each query. For every run, the patterns representing the items and attributes, are randomly generated. The results of the overall performance without introducing bias are presented in table 1, where for example, a zero-level object-attribute (OA) query denotes that the queried attribute directly belongs to the queried item, a one-level 15 object-object (OO) query denotes that the rst component of the query is a direct descendant of the second one, etc. Best performance rate was achieved with parameter values of 1 = 0:35, 2 = 0:75, 3 = 0:5, 4 = 0:4 and T = 0:15. The query % of CR MT Positive zero-level OA query 100 2.7 Positive one-level OA query 86.5 9.4 Positive two-level OA query 63.5 15.2 Negative OA query 80 Positive zero-level OO query 100 1 Positive one-level OO query 83 4.5 Positive two-level OO query 47 6.5 Negative OO query 30 Table 1: Performance and response times for the various queries. CR denotes correct response and MT the mean-time of correct response counted by the No. of iterations. It can be seen that for positive object-attribute queries a gradual linear cascade of retrieval times exists; i.e., the larger the semantic distance between the queries components, the longer it takes for a stable state to be reached. For object-attribute negative queries, no correlation was found between the semantic distance between the query's components and the time a stable state was reached in the attribute subnetwork. Several possibilities for determining the response time of negative queries exist and will be discussed in the next section. For object-object queries the network's performance is considerably deteriorated as can be seen in table 1; depending on the value of 4, either we obtain good performance for positive queries and poor performance for the negative ones, or vice versa. It seems that the di erence originates from the non-symmetric modeling of the two queries. The addition of an auxiliary bias mechanism improves the performance of the network with regard to its response accuracy while retaining the cascade of retrieval times described above. The results of the statistics performed on such runs are presented in table 2. We have found that the network implicitly exhibits the Priming phenomenon; a fundamental characteristic inherent to the spreading activation theory [26]; Priming designates the phenomena by which cognitively active concepts cause the memory retrieval of related concepts to be faster. According to SA theory this is explained by the nodes representing the latter concepts receiving a certain amount of baseline activation from the former node representing the cognitively active concept and thus rendering it more prone for further sub-threshold activation. In the model presented, 16 The query % of CR MT Positive zero-level OA query 100 3.1 Positive one-level OA query 100 9.1 Positive two-level OA query 100 15.3 Negative OA query 96.5 Positive zero-level OO query 100 1 Positive one-level OO query 100 15.3 Positive two-level OO query 96.5 22.7 Negative OO query 100 Table 2: Performance and response times for the various queries | Auxiliary bias mechanism is incorporated. CR denotes correct response and MT the mean-time of correct response counted by the No. of iterations. when an input is presented to the network its retrieval time depends on the residual state of the network. When the network was initialized with the pattern a and subsequently another pattern b is presented (as an external eld, with 4 = 0:6 ), the convergence time of the network towards the latter pattern is proportional to the semantic distance between them. For example, for the patterns 111 and 112 we observe a convergence time of 9 iterations and for the case that the second pattern is 222 a convergence time of 13 iterations is observed. 5 Discussion In this section we discuss several computational aspects of the ANN model presented and examine the t between simulation results and the psychological data regarding fact retrieval. Several advantages originate from using a distributed representation and ANN dynamics, in comparison with classic symbolic or local connectionist models: 1. Models based on distributed representations function well as content addressable memories since they are capable of performing error correction and are fault tolerant. 2. As have been described, the network `ascends' the concept hierarchy in a distributed way, by having its state become more like the pattern for a higher level concept. In this way, it does not necessarily go through discrete steps corresponding to particular higher concepts, (as a symbolic program would do), but sort of `mushes' its way up or down the hierarchy. This process seems to us less rigid and more `brain-like'. A further speculation is to view stored patterns with 17 low amount of overlap with the network's current activity state as representing `sub-conscious' concepts awaiting for their turn to become `conscious', i.e., with high overlap with the network's activity state. 3. Distributed systems automatically give rise to generalizations; i.e., when presented with a non-familiar pattern the network can still relate to it according to its relative similarity with the various familiar patterns. In our model, when a new semantic concept is presented, the pattern representing it is stored in the items subnetwork and its correct placing in the semantic tree, regarding its inheritance path and semantic relatedness to other concepts, is determined implicitly by the representation encoding without any need of explicit instructions. In general, the semantic structure follows directly from the assumption that the encoding has such `semantic geometry' conservation. There is no need to speci cally denote and learn pointers between items of the semantic tree. 4. The network has the ability to learn new concepts that were not anticipated at the time the network was constructed; while in classic symbolic models, or connectionist models entailing local representations, one has to add new nodes to the network every time a new concept is learned, in the model presented simply a new pattern is memorized. Our model has the characteristic that when a new concept is added, it becomes more and more signi cant as its de nition gets more precise; i.e., if a new concept is added but it is not known yet to contain any properties then no attribute patterns project to this concept's pattern and thus its basin of attraction would be very small. Only when a pattern in the item's subnetwork receives considerable projection from the attributes subnetwork, it gains a suciently deep and wide basin of attraction, so that its corresponding item becomes signi cant. 5. The various attributes are implicitly dynamically reallocated to the items to which they belong. This is however a virtual process with no pointers been actually modi ed; observe for example a part of the semantic tree containing an item 11 and his two sons 111 and 112. When the attribute l is rst introduced, for example when the fact `item 111 has the attribute l' is rst stored, then its projection is directed only to this item 111. However, if the fact stating that `l belongs to 112' is subsequently stored, then the projection from l to the items subnetwork is now 2l(111 + 112). Hence, in light of the geometric construction, if the attributes subnetwork will be at the state l it will contribute a local eld aligned with the ancestor 11 pattern in N (1 ? q32) of its neurons, and therefore can now practically be considered as belonging to this ancestor. Such a scenario is illustrated in g. 6. where after introducing a property belonging to two sibling items the items subnetwork converges into the pattern representing their direct ancestor, though no such fact was explicitly 18 stored by the network. By the same mechanism, if subsequently some of these stored facts will be deleted then the situation will be reversed accordingly. This generalization property is expected to be enhanced in a network where a pattern can have more than just two direct descendants. 6. By using a speci c form of encoding representing di erent amounts of activity at di erent layers of the semantic tree, we were able to use the bias mechanism in order to model a general tendency of motion upon this tree. We do not have to perform a costly search process upon all the trajectories de ned in the tree but move in a general direction till the network converges to the correct attractor. Thus, by modeling such an external `attention' mechanism we were able to improve much the performance of the network. It may be that modifying the external bias according to some oscillatory function can create such continuous upwards and downwards motion until correct convergence is achieved. The ANN paradigm is used to model the concept of `intersection' underlining spreading activation theory; As shown, intersection is achieved when the integrated action of projections from some patterns leads to convergence to a stable state, not achievable by the activation of only one of the patterns solely. Thus, intersection is modeled without any need for explicitly implementing a labelling procedure as in the original spreading activation models. The model presented is constrained by the assumption that hierarchical associations are implicitly implemented by the geometric similarity solely. Under this constraint, several additional strategies for modeling fact retrieval phenomena have been tried, unsuccessfully; we have found that introducing a fatigue component into the dynamics of the items subnetwork was not sucient to create the requested motion upwards or downwards the semantic tree. The same applyes to increasing the temperature T . In both cases, most of the time the network would converge to various mixed states, not assumed to be of cognitive signi cance. Another conclusion of our work is that without bias the network's behavior is sensitive to small variations in the values of the model parameters presented, and we have performed quite a number of simulations in order to nd the parameters leading to best performance. However, when bias is introduced performance becomes much more reliable and more robust via parameter values modi cation. However, the above mentioned strategies were successful when, removing the geometric constraint, the connections between the various patterns in the items subnetwork were explicitly implemented by pointers between the patterns. Let us now compare the retrieval times (RT) of queries obtained in our model, with Collins and Quillian's psychological data [12]: 1. The main similar feature is the cascade of RTs obtained for positive queries. As with the psychological data, when the RT's are plotted against the level of the 19 query, straight lines are obtained. While using the bias mechanism, the slope of the lines can be controlled mainly by changing the bias. If a constant rate of bias increase is assumed, the lines representing the RT of OA and OO queries are predicted to be parallel. 2. The zero level OO query has a shorter RT than predicted by straight line extrapolation. This is explained by Collins and Quillian by assuming that this query is treated by pattern recognition and not by semantic retrieval. According to our model, this phenomena is expected since when the two inputs are identical the network is initialized to the desired stable end-state and no convergence time is needed. 3. According to the psychological experimental data, OA and OO queries are represented by two parallel lines, with the OA line translated upwards relative to the OO line by t = 225ms. This shift time, which is three times larger than the time of a one-level ascendancy in the tree structure, is explained by Collins and Quillian as the time needed for performing a search process for the queried attribute. While ANN implementations of this scheme are possible in principle, they were not realized in our model since no search process is explicitly de ned. The relatively longer (OA) RT's mentioned above are explained using our model by the additional time needed to allow for the convergence of the attributes subnetwork. However, this leads to a shorter expected shift than the one experimentally observed. Yet, since we assume weak modularity, it is plausible that processes involving both subnetworks entail a constant delay due to the ow of information between the subnetworks and thus the actual value of the observed shift t can be larger. 4. As described, no correlation was found between the RT of a negative query and the semantic distance between its components. However, it should be noticed that besides the strategy used in the model presented, there are several other plausible strategies for determining the time when a query is considered `unretievable' and a negative response is generated; (a) The negative response is generated upon convergence of the item subnetwork into an attractor which is the root of the structure tree and whose semantic meaning is `negative response'. However, according to such a strategy negative RTs for patterns closer to the root are expected to be shorter, which was not observed experimentally. (b) Negative facts could be directly stored in the network. This strategy is compatible with our model since projections from items towards the respective attributes (and vice versa) may be negative, and it could be interesting to investigate it in the future. 20 Considerable simpli cations have been made in the construction of the model presented. It would be interesting to perform further research focused on modeling semantic networks not limited to simple tree structures and to investigate the dynamics of the model when associations between concepts that are not hierarchically related nor is one the property of the other (e.g., between a zebra and a crossroad), would be entered. We have shown that constructing a model for semantic fact retrieval using a distributed ANN approach is indeed feasible and enables a quite fair simulation of the corresponding psychological experimental results. Using such an approach, we have demonstrated that some properties of the cognitive dynamics predicted by the model are obtained implicitly without any need for their explicit formulation. References [1] Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol I, D.E. Rumelhart and J.L. McClelland (Eds.) Cambridge, MA: MIT Press. 1986. [2] Smolensky P., Connectionist Modeling: Neural Computations/Mental Connections. Chapter 2, 49 - 67, Neural Connections, Mental Computation, Nadel L., Cooper L. A., Culicover P. and Harnish M., eds., A Bradford Book, MIT press Cambridge Mass, 1989. [3] Fahlman S., Representing Implicit Knowledge. In G.E. Hinton and J.A. Anderson (Eds.), Parallel Models of Associative Memory, Erlbaum. [4] Hollbach Weber S., A connectionist model of conceptual representation. IJCNN I, 477-483, 1989. [5] Shiue L.C. and Grondin R.O., Neural processing of semantic networks. IJCNN II, 589, 1989. [6] Jagota A. and Jakubowicz O., Knowledge Representation in a Multy Layered Hop eld Network IJCNN I, 435-442, 1989 [7] Ritter H. and Kohonen T., Self-Organizing Semantic Maps. Biol. Cybern. 61, 241-254, 1989. [8] Hinton G.E., Implementing semantic networks in parallel hardware, In G.E. Hinton and J.A. Anderson (Eds.), Parallel Models of Associative Memory, Erlbaum. [9] Amit D.J., Sagi D. and Usher M., Architecture of Attractor Neural Networks Performing Cognitive Fast Scanning. To appear in Networks. 21 [10] Miyashita Y. and Chang H.S., Neuronal correlate of pictorial short-term memory in the primate temporal cortex. Nature 331, 68, 1988. [11] Fodor J. and Pylyshyn Z.W., Connectionism and cognitive architecture: a critical analysis. Cognition 28: 3-71, 1988. [12] Collins A.M. and Quillian M.R., Retrieval Time From Semantic Memory, Journal of verbal learning and verbal behavior, 8, 240-248, 1969. [13] Fahlman S., NETL: A system for representing real-world knowledge. MIT Press, Cambridge, MA, 1979. [14] Amit D.J., Parisi G. and Nicolis S., Neural potentials as stimuli for attractor neural networks. Network, 1, 75 - 88, 1990. [15] Quillian M.R., Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral Sci. 12, 410-430, 1967. [16] Quillian M.R., \Semantic Memory", in Semantic Information Processing, M.Minsky (Ed.), MIT press, Cambridge, Mass, 1968. [17] Quillian M. R., The Teachable Language Comprehender: A simulation program and theory of language. Communications of the ACM, 12, 459-476, 1969. [18] Schubert L., \Extending the Expressive Power of Semantic Networks," Arti cial Intelligence, Vol. 7, No. 2, 1976. [19] Findler N.V., (Ed.), Associative Networks: Representation and Use of Knowledge by Computer, Academic Press, New York, 1979. [20] Schank R. C. and Colby K., (Eds.), Computer Models of Thought and Language, W.H. Freeman, San Fransisco, CA, 1973. [21] Collins A.M. and Loftus E.F., A spreading activation theory of semantic processing. Psychological Review, 82, 407-428, 1975. [22] Hop eld J.J., Neural networks and physical systems with emergent collective abilities. Proc. Nat. Acad. Sci. USA 79, 2554-2558 (1982). [23] Feigelman M.V. and Io e L.B., The augmented models of associative memory asymmetric interaction and hierarchy of patterns. Int. Jour. of Mod. Phys. B, Vol 1, 51-68 (1987). [24] Gutfruend H., Neural Networks with Hierarchically Correlated Patterns. ITPSB-86-151 preprint 1987. 22 [25] Amit D., Gutfruend H. and Sompolinsky H., Information storage in neural networks with low levels of activity. Phys. Rev. A, 35, 2293-2303 (1987). [26] Anderson J.R. Spreading Activation, in Anderson J.R. and Kosslyn S. M. (eds). Essays in learning and Memory. W.H. Freedman and Comp., New York (1984). pattern classi cation and universal recoding, II: expectation, olfaction, and illusions. Biol. Cybern., 23, 187-202 23 View publication stats

Log In

An attractor neural network model of semantic fact retrieval

Related papers

Related papers

Related topics