Academia.eduAcademia.edu

Inferring narrative and intention from playground games

2008, 2008 7th IEEE International Conference on Development and Learning

We present a system which observes humans participating in various playground games and infers their goals and intentions through detecting and analyzing their spatiotemporal activity in relation to one another, and then builds a coherent narrative out of the succession of these intentional states. We show that these narratives capture a great deal of essential information about the observed social roles, types of activity and game rules by demonstrating the system's ability to correctly recognize and group together different runs of the same game, while differentiating them from other games. Furthermore, the system can use the narratives it constructs to learn and theorize about novel observations, allowing it to guess at the rules governing the games it watches. For example, after watching several different games, the system figures out on its own that Tag-like games require close physical proximity in order for the role of "it" to swap from one person to another. Thus a rich and layered trove of social, intentional and cultural information can be drawn out of extremely impoverished and low-context trajectory data.

Inferring Narrative and Intention from Playground Games Christopher Crick Brian Scassellati Yale University New Haven CT Email: [email protected] Yale University New Haven CT Email: [email protected] Abstract—We present a system which observes humans participating in various playground games and infers their goals and intentions through detecting and analyzing their spatiotemporal activity in relation to one another, and then builds a coherent narrative out of the succession of these intentional states. We show that these narratives capture a great deal of essential information about the observed social roles, types of activity and game rules by demonstrating the system’s ability to correctly recognize and group together different runs of the same game, while differentiating them from other games. Furthermore, the system can use the narratives it constructs to learn and theorize about novel observations, allowing it to guess at the rules governing the games it watches. For example, after watching several different games, the system figures out on its own that Tag-like games require close physical proximity in order for the role of “it” to swap from one person to another. Thus a rich and layered trove of social, intentional and cultural information can be drawn out of extremely impoverished and low-context trajectory data. I. I NTRODUCTION Every schoolchild knows that teachers have eyes in the backs of their heads. From across the playground at recess, out of the corner of an eye, they always notice when a friendly game of tag transforms into something superficially similar but much more likely to end in stained clothes, torn jeans and tears. We all have this capacity; we infer a great deal about the intentions, goals, social relationships and rules of interaction from watching people interact, even from such a distance that only gross body movement can really be seen. How else could we enjoy a football game from the upper bleachers? What’s more, we do not need even that much context – we are driven to recognize dramatic situations even when presented with extremely simple cues. Animated boxes on a flat white screen are enough to trigger this inference process, and we happily spin narratives of anthropomorphized black squares playing with, hiding from, bullying and swindling one another. Making sense of very low-context motion data is an important cognitive task that we perform every day, and yet it depends on very little information from the world – so little, in fact, that we can have some hope at designing computational processes that can manipulate the manageable quantity of data to accomplish similar results. A computer or robot system able to watch groups of people perform tasks or play games and figure out their goals and the rules by which they act has a number of obvious applications: entertainment, assistive technology, monitoring and safety. In addition, we hope that the process of building systems that can perform this kind of inference with real people and in real-world situations may help illuminate and ground the psychological questions of how these capabilities develop and operate in humans. Our system begins with positional data taken from real humans playing games with one another, derived from an installed laboratory sensor network. True, this represents a bit of a shortcut – humans have no sensory apparatus that provides us with instantaneous physical location in a 3-d coordinate space. But we do have a huge visual computational apparatus that manages to translate retinal impulses into coherent, persistent, reliable object concepts. At the current state of the art in computational perception, this is still not possible – though work like ours may help to make it so. The system uses the raw moment-by-moment trajectory data to hypothesize about the current intentional states of the people involved, in terms of their tendency to move toward and away from one another. It then creates a narrative by stringing together sequences of these hypothesized intentional states and noticing relevant facts about the moments where the players’ intentions shift. Armed with this information, the system can reliably classify and differentiate a number of playground games: Tag, Smear, Keepaway and Catch. Furthermore, from comparing the raw data to the narrative sequences it develops to describe what it saw, the system can infer some of the hidden rules of the games it sees. For instance, it works out by itself the fact that in Tag, a person can become the new “it” only when the old “it” approaches sufficiently close to tag. II. R ELATED W ORK Our approach begins with the pioneering work of Heider and Simmel more than half a century ago [1]. They found that even simple moving geometric shapes are often perceived in animate, goal-directed terms. Humans tend to attribute goals, roles, social relationships, histories and intentions to such shapes, even when the available context is extremely impoverished. This is notable because such experiences seem to reflect automatic (and even irresistible) visual processing and are insulated to some degree from other aspects of cognition. Such percepts seem relatively unaffected by perceivers’ intentions and beliefs, but are tightly controlled by subtle aspects of the displays themselves (for a review, see [2]. In addition, such percepts seem to occur cross-culturally [3], and even in infancy [4], but they can be disrupted by neuropsychological conditions such as autism spectrum disorder [5], [6] and amygdala damage [7]. Perhaps because such phenomena seem to lie at an interesting intersection of perception and cognition, they have attracted the interest of cognitive psychologists [8], social psychologists [9], [10], developmental psychologists [11], [12], cognitive neuroscientists [13], vision researchers [14], anthropologists [3] and computer scientists [15]. A central goal of this work has been to identify the visual cues that trigger the perception of animacy such as sudden direction and speed changes [14], interactions with spatial contexts [16], apparent violations of Newtonian mechanics [17], or other characteristic movements [18]. The specific analysis undertaken by our system, hypothesizing vectors of attraction and repulsion between agents and objects in the world in order to explain the causal relationships we note in an interaction, relates to the dynamics-based model of causal representation proposed by Wolff [19] and on Talmy’s theory of force dynamics [20]. Humans can explain many events and interactions by invoking a folk-physics notion of force vectors acting upon objects and agents; our system explicitly generates these systems of forces in order to make sense of the events it witnesses. The system discussed in this paper carries forward research we reported last year [21]. That work demonstrated that an analysis based on motion trajectories could recognize events that agreed with human evaluations of the same data, an insight that the current research has leveraged into a much more sophisticated system capable of evaluating a variety of scenarios, extracting important narrative elements, making subtle comparisons and generating meaningful interpretations in an unsupervised fashion. III. S ETUP A. Detecting Motion One of the major strengths of our approach is that the trajectory information upon which the system operates derives from real-world human interaction in an ethologically valid context. To obtain this data, we employ a system of ultrasonic beacons, sensors and triangulation software to track the locations of the game participants and objects to within a few centimeters, generating five position reports per sensor per second – fast enough to create fairly smooth motion trajectories at normal human running speeds, and to notice important events in significantly less than a second. These beacons send messages to one another using a simultaneous radio broadcast and ultrasound chirp, and the receiving units calculate distance by comparing the time difference of arrival between the two signals. The system then uses triangulation from sensors in known locations throughout the room to triangulate the position of the participants, each of whom wore a baseball cap affixed with a uniquely-identified sensor. 1 1 For a complete description of the sensor hardware, see [22]. The embedded control and data analysis software are our own. The sensor system produces a five-dimensional vector of data: the identity of an object, its x, y and z coordinates in the room, and the time at which these coordinates were measured. Further details of the operation and deployment of the network can be found in [21]. B. Characterizing Attraction and Repulsion The basic narrative building blocks employed by our system are sets of hypothesized attractions and repulsions derived from a folk-physical interpretation of the observed data, as suggested by the social force-dynamic theory described in [19] and [20]. For each agent and object in the observed environment, the system calculates the “influence” of the other people and objects on its perceived two-dimensional motion, expressed as constants in a pair of differential equations: Vxni = cxj (xnj − xni ) cxk (xnk − xni ) + + ··· dnij dnik (1) (and similarly for the y dimension). We obtain the (noisy) velocities in the x and y direction, as well as the positions of the other agents and objects, directly from the sensor data: Vxni = − xni xn+1 i tn+1 − tn (2) (again, also for the y dimension). Here, Vxni represents the x component of agent i’s velocity at time n. xni , xnj and xnk are the x coordinates of agents i, j and k respectively, at time n. Likewise, dnij and dnik are the Euclidean distances between i and j or i and k at time n. This results in an underconstrained set of equations; thus to solve for the constants we collect all of the data points falling within a short window of time and find a least-squares best fit. For further details, see [21]. Given the number of participants in our evaluation scenarios and the 5 Hz speed with which our sensor network furnished position reports, this window needed to be 0.5 seconds long – still a source of temporal inaccuracy, but much improved over the two seconds reported in our previous work. C. Identifying Intentions Over Time Each constant determined by the process described above represents in some fashion the influence of one particular agent or object on the motion of another at a particular point in time. Some of these may be spurious relationships, while others capture something essential about the motivations and intentions of the agents involved. To determine the long-term relationships that do represent essential motivational information, we next assemble these basic building blocks – the time-stamped pairwise constants that describe instantaneous attraction and repulsion between each agent and object in the room – into a probabilistic finite state automaton, each state representing a set of intentions that extend over time. At any particular point in time, any particular agent may be attracted or repelled or remain neutral with respect to each other object and agent in the room; n−1 A:B + B:A − C:A − 38.65% A:B + B:A − n A B C A:B + B:A − C:A − 36.44% A:B + B:A − 14.39% 18.71% B:A + A:B − C:A + 0.41% B:A + A:B − C:A + 0.35% Fig. 1. The system’s level of belief in a few intentional states, evolving as new information arrives. At time n−1, the system believes that A is intending to chase B, while B and C are fleeing from A. At time n, new data arrives that shows A continuing to chase B, but C moving sideways. Accordingly, the system’s belief that B and C both want to escape from A declines (top circles), while the belief that C is neutral (middle circles) increases. More of the same kind of data would have to come in before the system would switch its belief to the middle-circle state, at which point it would review its observational history to determine the point where C actually stopped running from A. The bottom circles represent B and C chasing A, while A wants to evade B – a situation that the system currently views as very unlikely. this is characterized by the pairwise constants found in the previous step. The system assumes that the actors in the room remain in a particular intentional state as long as the pattern of hypothesized attractions, repulsions and neutralities remains constant, discounting noise. A particular state, then, might be that A is attracted by B and neutral toward C, B is repelled by A and neutral toward C, and C is repelled by A and neutral toward B. This state might occur, for instance, in the game of tag when A is it and has decided to chase B. The system maintains an evolving set of beliefs about the intentions of the people it observes, modeled as a probability distribution over all of these possible states. As new data comes in, the current belief distribution is adjusted, and the system assumes that the most likely alternative reflects the current state of the game.  Beln−1 (S)(1 + λ c∈S s(cn )) (3) Beln (S) = Z Here, the belief in any particular state S at time n is the belief in that state at time n − 1, modified by the current observation. cn is the value at time n of one of the pairwise relationship constants derived from the data in the previous step; the function s is a sign function that returns 1 if the constant’s sign and the intention represented by the current state agree, -1 if they disagree, and 0 if the state is neutral toward the pairwise relationship represented by the constant. λ is a “learning rate” constant which affects the tradeoff between the system’s sensitivity to error and its decision-making speed. Through trial and error, we found a value of λ = 0.08 to yield the best results. Finally, Z is a normalizing constant obtained by summing the updated belief values across all of the states in the system. For an example of how the beliefs of a few of these states evolve, see Figure 1. The function s depends on the signs, not the values, of the constants obtained in the previous step. It is perfectly straightforward to adjust the function to take account of the magnitudes of the social forces involved, but we discovered that doing so increased the effect of sensor noise unacceptably, without causing a noticeable improvement in performance. Notice that the system never explicitly handles ambivalence in its computations – the states that accommodate neutral attitudes never update their beliefs based on the values of any constant that may exist between the neutral pair. Instead, the degree of belief in these states changes owing to normalization with other states that are changing based on attraction and repulsion. This works perfectly well, however, because it amounts to a decision to “default” to a neutral state – if our degree of belief in both pairwise attraction and repulsion falls, then naturally our belief in a state of ambivalence between the pair should increase. This is not an approach which can deal easily with large numbers. The number of states in the system depends exponentially on the number of objects and agents being tracked. Each state represents a particular configuration of attractions, repulsions and neutralities between each agent pair. Although the number of states is perfectly manageable with four participants, it will eventually become too computationally complex to represent this way. However, there is no reason to believe that humans are any more capable of keeping track of many people’s simultaneous motivations and goals, either. Making sense of crowd interactions may require a whole different set of assumptions, representations and computational machinery. D. Constructing Narrative The process described in the preceding section converts instaneous, noisy velocity vectors into sustained beliefs about the intentional situation that pertains during a particular phase of a witnessed interaction. As the action progresses, so too do the system’s beliefs evolve, and as those beliefs change, the sequence of states becomes a narrative describing the scenario in progress. This narrative can be analyzed statistically to identify the action in progress, differentiate it from other possible activities, and also provide the system with clues to use in unsupervised feature detection. When the system finds that it now believes the action it is watching has changed state, it retroactively updates its beliefs – the point at which the belief changed is not the actual time point at which the witnessed behavior changed. Rather, several rounds of data collection caused the change in belief, so if the system is to reason about an actual event that changed the participants’ intentional state, it must backtrack to the point where the new data began to arrive. This it accomplishes by looking at the derivative of the beliefs in the two states. The system assumes that the actual event occurred at d d Belt (Sold ) ≥ 0 ∨ Belt (Snew ) ≤ 0) (4) dt dt That is, the most recent time point when either the system’s belief in the old state was not decreasing or the system’s belief in its new state was not increasing. The system can now string these states and durations into an actual (admittedly dull) narrative: “State 13 for 4.6 seconds, then state 28 for 3.9 seconds, then state 4 for 7.3 seconds...” It can translate the states into actual English: “A was chasing B for 4.6 seconds while C was running from A, then B was chasing C for 3.9 seconds while A stood by...” It can collect statistics about which states commonly follow which others (a prerequisite for developing the ability to recognize distinct activities). And it has identified points in time where important events take place, which will allow the system to notice information about the events themselves. max(t)( E. Differentiating Activity The system has now constructed a narrative, an ordered sequence of states representing the hypothesized intentions of the people participating. These intentions ebb and flow according to the progress of the game and the shifting roles of the actors as they follow the rules of their particular game. How to evaluate this narrative to determine whether it amounts to anything we would recognize as understanding the game? One way, tentatively explored in [21], is to compare the narrative with ones constructed by human observers of the same events. This is an attractive approach as it directly compares human and machine performance, but usually in qualitative fashion. In this work, we decide instead to use a statistical approach and determine whether the generated narrative suffices for the system to distinguish easily between different games. At the same time, it should be able to recognize that the narratives it constructs regarding different instances of the same game are sufficiently similar. To do this, we treat the narratives (perhaps naturally) as linguistic constructions, and apply similarity metrics which are most often used in computational linguistics and automated language translation. In this work, we use Dice’s coefficient [23], which compares how often any particular word (or in our case, intentional state) follows any other in two separate texts (or narratives). 2nt (5) s= nx + ny Here, nt is the total number of bigrams (one particular state followed by another) used in both narratives, while nx and ny are the number in the first and second narratives, respectively. As an example, take comparison between two games of Tag. One has 21 unique state transitions of the kind suggested in Figure 1, such as a transition from believing that C is fleeing from A to one where C is neutral toward A. Another game has 26 transitions, and the two games share 20 state transitions in 2(20) = 0.86. common. In this case, the coefficient would be 21+26 This is obviously an extremely simple heuristic, and indeed Dice’s coefficient does not perform as well as state-of-theart statistical comparisons for linguistic analysis [24], but it has the advantage of simplicity and does not rely on a large narrative corpus to function. F. Unsupervised Rule Recognition Building step-by-step from applying force dynamics to realword data to create hypotheses about individual intentional states, through establishing a probabilistic belief about the sequence of intentional states in a particular narrative, to comparing and contrasting narratives to find similarites and differences, our system has by now developed a fairly sophisticated and rich understanding of the social framework of the activities that it has watched. Furthermore, it possesses the tools to make its own educated guesses at contextual elements separate from the trajectory information which it used to build up its narratives. We demonstrate this by letting the system look at relative position information (not relative motion, which is what it has been using up to this point) and allowing it to form its own hypotheses about whether position is significant to the rules of the games it observes. The system has already identified the moments where the intentional states of the actors change during gameplay. Why do these changes happen? Something in the rules of the game, most likely. And something to do with relative or absolute position is a good guess. Associated with each particular change in intentional state (the state bigrams discussed in section III-E), the system collects statistics on the absolute position of each actor and the relative distances between them all. If the mean of the relative distances is small, say, µ < 30 cm, or, likewise, if the standard deviation of the absolute positions is small (σ < 30 cm), then it reports that it has found a potential rule, such as: in this game, and similar ones, the role of chaser and chased switch only when the two come into close contact. In other words, Tag! IV. E XPERIMENTAL VALIDATION A. Games To evaluate the performance of the system, we instructed test subjects to play several common playground games for periods of three minutes at a time. The sensor network tracked four objects during the period of play, games played either with four people or with three people and a ball. The system had no independent knowledge of which objects were human and which were inanimate. Six trials of each game were played, three each by two independent groups of people (undergraduate and graduate students). The games were: • Tag: The person designated as “it” chases and tags one of the other players, at which point that player becomes “it” and the roles switch (Figure 2). TABLE II C LASSIFICATION RATES Player A Player B Player C Player D Tag Smear Keepaway Catch 85% 77% 53% 83% Time (in sec) TABLE III S AMPLE LOCATION STATISTICS FOR TAG A chasing → B chasing A chasing → C chasing B chasing → A chasing B chasing → C chasing C chasing → A chasing C chasing → B chasing Y (in cm) X (in cm) µ(dAB ) 21.6 58.3 28.4 107.9 217.9 106.7 µ(dAC ) 104.8 18.0 90.5 141.6 29.3 137.1 µ(dBC ) 184.3 112.5 83.8 21.4 191.4 29.3 Fig. 2. A four-person game of Tag. All four people are constantly in motion, looping about the room as they chase and are chased in turn. B. Activity Differentiation Player A Player B Player C Ball Time (in sec) Y (in cm) X (in cm) Fig. 3. A three-person game of Keepaway. The ball is constantly in motion, as is one of the players, while the other two players are relatively stationary. The identity of the moving player changes several times during the game. • • • Smear: One of the players carries a ball while the others attempt to catch and steal the ball for themselves. Whoever carries the ball is the target of everyone else in the game. Keepaway: Two players throw the ball back and forth, while a third tries to intercept the passes. If successful, the interceptor becomes a thrower and the person responsible for throwing the bad pass must now attempt to intercept (Figure 3). Catch: Each player throws the ball to one of the other two players in turn. TABLE I S IMILARTY METRICS (D ICE ’ S COEFFICIENT ) Tag Smear Keepaway Catch Tag 0.80 0.62 0.38 0.28 Smear 0.62 0.72 0.54 0.30 Keepaway 0.38 0.54 0.66 0.66 Catch 0.28 0.30 0.66 0.86 For each game, we calculated the similarity metric between it and each of the other games – five other instances of the same game and six of each of the others. Table I shows the average of these intra-and inter-game similarity metrics. Tag, Smear and Catch all were significantly more similar to themselves than to any other game. The exception, Keepaway, was indistinguishable from Catch. Interestingly, the converse was not true – although an average game of Keepaway is as different from other Keepaway games as from Catch, games of Catch are much more similar to each other than to Keepaway. Table II depicts a performance measure based on classification rates. For each game, we looked at the five games that the system judged as most similar and scored how many of those were actually the same game. If performance was perfect, then for each of the six trials of each game, the system would pick out all five of the other games of the same type, leading to 30 correct answers per game. In none of the games was the performance this good, but all are well above random (22%). Keepaway once again underperforms. In any event, these results show that the system was indeed able to perceive the intentions of the game participants and construct narratives that captured essential similarities and differences between and among games. Given any two games, by hypothesizing about the intentions of the players and stringing together intentional states into a narrative, the system could usually tell whether the two scenarios told similar stories, and thus whether they were likely different versions of the same game or different games altogether. Interestingly, it even was able to work out that certain games, though different, were more closely related than others – Tag and Smear, which both involve chasing and tagging other people, looked more like each other than Keepaway and Catch, which both involve tossing a ball around, and vice versa. C. Rule Recognition The average number of narrative state transitions detected by the system across the 24 3-minute games was 28 – not a huge corpus to analyze for location-based rules. However, the system already knows which scenarios are similar enough to be considered the same game, enlarging its pool of data significantly. For each game, the system averaged together the absolute and relative location statistics for the five most similar games, found as described in the previous section. The system identified all of the following (by displaying noteworthy state transitions and associated measurements, which the authors have translated into text for illustration): • • • Tag and Smear: when the role of chaser switches between players, those two people are close together. Smear: the two chased objects are always next to each other (presumably the ball and the ball carrier) Catch: three people are always standing in the same place whenever the ball changes direction. Table III shows an abridged example of the relative location data used to make these determinations (for clarity, only three players are shown, not four). In the six games most similar to a particular exemplar of Tag, for every type of intentional state transition, the table depicts the average distance (in cm) between the three players. In every case, the two players who are swapping roles are located close to one another. V. C ONCLUSION Humankind, it has been suggested [25], should more appropriately be called Homo narrans: storytelling man. We are driven to tell stories to make sense of the world, even or perhaps especially when we have only sparse information to go on. We have built a computational system that begins to demonstrate how a rich set of inferences and conclusions can be drawn from rudimentary motion data, leading to largely correct judgments about activity classification and rule inference. To do this, it hypothesizes about human intentions and constructs narratives around real-world social interaction. The better the story it can tell, the more powerful the conclusions it can draw about the activities it sees. We are working to expand the range of activity and the detail of the narrative structure available to our machines. We have demonstrated the system’s ability to pick out important rules governing the interactions it witnesses, but so far those judgments are made entirely after the fact. Hopefully, the system will soon be able to incorporate this information to improve its own process of narrative construction – for example, if the system has begun to guess that touching is an important component of Tag, then it should expect touching at particular points, and that should affect the evolution of its beliefs about the state of the game. Furthermore, a rigorous treatment of narrative expectation should also help in trying to wean the system away from the abiological sensor systems we currently use. If we have good guesses on where to look next, it may make the visual object-tracking problem a little easier. Finally, we plan to integrate this system more closely with robotic participants in the environment. If there is anything people enjoy more than telling stories about other people, it is telling stories that include themselves. The same may be true of robots. ACKNOWLEDGMENT Support for this work was provided by a National Science Foundation CAREER award (#0238334) and NSF award #0534610 (Quantitative Measures of Social Response in Autism). This research was supported in part by a software grant from QNX Software Systems Ltd, and by the Sloan Foundation. R EFERENCES [1] F. Heider and M. Simmel, “An experimental study of apparent behavior,” American Journal of Psychology, vol. 57, pp. 243–259, 1944. [2] B. J. Scholl and P. D. Tremoulet, “Perceptual causality and animacy,” Trends in Cognitive Sciences, vol. 4, no. 8, pp. 299–309, 2000. [3] H. C. Barrett, P. M. Todd, G. F. Miller, and P. W. Blythe, “Accurate judgments of intention from motion cues alone: A cross-cultural study,” Evolution and Human Behavior, vol. 26, pp. 313–331, 2005. [4] P. Rochat, T. Striano, and R. Morgan, “Who is doing what to whom? young infants’ developing sense of social causality in animated displays,” Perception, vol. 33, pp. 355–369, 2004. [5] A. Klin, “Attributing social meaning to ambiguous visual stimuli in higher functioning autism and asperger syndrome: The social attribution task,” Journal of Child Psychology and Psychiatry, vol. 41, pp. 831–846, 2000. [6] M. D. Rutherford, B. F. Pennington, and S. J. Rogers, “The perception of animacy in young children with autism,” Journal of Autism and Developmental Disorders, vol. 36, pp. 983–992, 2006. [7] A. S. Heberlein and R. Adolphs, “Impaired spontaneous anthropomorphizing despite intact perception and social knowledge,” Proceedings of the National Academy of Sciences, vol. 101, pp. 7487–7491, 2004. [8] G. Gigerenzer and P. M. Todd, Simple heuristics that make us smart. Oxford University Press, 1999. [9] R. A. Mar and C. N. Macrae, “Triggering the intentional stance,” in Empathy and Fairness, G. Block and J. Goode, Eds. Chichester, UK: John Wiley & Sons, 2006, pp. 110–132. [10] T. Wheatley, S. C. Milleville, and A. Martin, “Understanding animate agents: Distinct roles for the social network and mirror system,” Psychological Science, vol. 18, pp. 469–474, 2007. [11] V. Dasser, I. Ulbaek, and D. Premack, “The perception of intention,” Science, vol. 243, pp. 365–367, 1989. [12] G. Gergely, Z. Nadasdy, G. Csibra, and S. Biro, “Taking the intentional stance at 12 months of age,” Cognition, vol. 56, pp. 165–193, 1995. [13] S.-J. Blakemore, P. Boyer, M. Pachot-Clouard, A. Meltzoff, C. Segebarth, and J. Decety, “The detection of contingency and animacy from simple animations in the human brain,” Cerebral Cortex, vol. 13, pp. 837–844, 2003. [14] P. D. Tremoulet and J. Feldman, “Perception of animacy from the motion of a single object,” Perception, vol. 29, pp. 943–951, 2000. [15] V. Gaur and B. Scassellati, “Which motion features induce the perception of animacy?” in Proceedings of the 2006 IEEE International Conference for Development and Learning, Bloomington, Indiana, 2006. [16] P. D. Tremoulet and J. Feldman, “The influence of spatial context and the role of intentionality in the interpretation of animacy from motion,” Perception & Psychophysics, vol. 68, no. 6, pp. 1047–1058, 2006. [17] R. Gelman, F. Durgin, and L. Kaufman, Causal cognition: A multidisciplinary debate. Oxford: Clarendon Press, 1995, ch. Distinguishing between animates and inanimates: Not by motion alone, pp. 150–184. [18] A. Michotte, Feelings and emotions: The Mooseheart symposium. New York: McGraw-Hill, 1950, ch. The emotions regarded as functional connections, pp. 114–125. [19] P. Wolff, “Representing causation,” Journal of Experimental Psychology, vol. 136, pp. 82–111, 2007. [20] L. Talmy, “Force dynamics in language and cognition,” Cognitive Science, vol. 12, pp. 49–100, 1988. [21] C. Crick, M. Doniec, and B. Scassellati, “Who is it? inferring role and intent from agent motion,” in Proceedings of the 11th IEEE Conference on Development and Learning, 2007. [22] N. B. Priyantha, “The cricket indoor location system,” Ph.D. dissertation, Massachusetts Institute of Technology, 2005. [23] L. R. Dice, “Measures of the amount of ecologic association between species,” Journal of Ecology, vol. 26, pp. 297–302, 1945. [24] F. J. Ochs and H. Ney, “Improved statistical alignment models,” in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000. [25] J. D. Niles, Homo narrans: the poetics and anthropology of oral literature. University of Pennsylvania Press, 1999.