Syllabus
Syllabus
Syllabus
Declarative Knowledge, Inferential knowledge, Issues in knowledge representation, Frame Problem, Forward Vs. Backward Reasoning. First order predicate logic, Prepositional Logic, Predicate Logic, Inference Rules, Resolution, Unification, Structured Knowledge Representation, Semantic Network, Conceptual graph, Frame Structure, Conceptual Dependency, Script. AI Problem, Space and Search, Means-end Analysis, Breadth first search, Depth first Search, Hill Climbing Search, Best first Search, A* Algorithm Learning, Rote Learning, Learning by taking advice, learning by problem solving, inductive learning, Explanation based learning. Expert System, Application area of Expert System, Structure of Expert System, Characteristics of Expert System, MYSIN Case study. Fuzzy Logic, Memory Organisation, Neural Network, Genetic Algorithm, Matching.
TABLE OF CONTENTS
UNIT 1 : Introduction to AI 1.1 What is Artificial Intelligence? 1.2 Is AI Possible 1.3 Some AI Tasks 1.4 What we can do with AI? 1.5 AI Techniques 1.5.1 Knowledge Representation 1.5.2 Search 1.6 The Underlying Assumption UNIT 2 : Knowledge Representation 2.1 What to Represent? 2.2 Application of Knowledge in AI 2.3 Properties for Knowledge Representation Systems 2.4 Approaches to Knowledge Representation 2.4.1 Simple Relational Knowledge 2.4.1 Inheritable Knowledge 2.5 Inferential Knowledge 2.6 Procedural Vs. Declarative Knowledge 2.7 Issues in Knowledge Representation 2.8 The Frame Problem 2.9 Forward Vs Backward Reasoning UNIT 3 : First Order Predicate Logic 3.1 Logic 3.2 Introduction to Propositional Logic 3.3 Predicate Logic 3.3.1 Introduction to Predicate Logic 3.3.2 Predicate Logic: Semantics 3.4 Quantification 3.5 Well-Formed Formula For First Order Predicate Logic 3.6 From Wff to Proposition 3.7 Transcribing English to Predicate Logic Wffs 3.8 Properties of Statements 3.9 Inference Rules 3.10 Resolution 3.11 Conversion to Clausal Form 3.12 Unification Unit 4 : Structured Knowledge Representation 4.1 Semantic Network 4.2 Conceptual Graph
4.3 Frame Structures 4.4 Conceptual Dependency 4.5 Scripts Unit 5 : Problem, Problem Space and Search 5.1 Search and Control Strategies 5.2 Preliminary Concepts 5.3 Water Container Problem 5.4 Production System 5.5 Problem Characteristics 5.6 Means-end analysis 5.7 Problem Reduction 5.8 Uninformed or Blind Search 5.8.1 Breadth-First Search 5.8.2 Depth-First Search 5.9 Informed Search 5.9.1 Hill Climbing Methods 5.9.2 Best First Search 5.9.3 A* Algorithm Unit 6 : Learning 6.1 What is Learning? 6.2 Types of Learning 6.2.1 Rote Learning 6.2.2 Learning by Taking Advice 6.2.3 Learning by Problem Solving 6.2.4 Inductive Learning 6.2.5 Explanation Based Learning Unit 7 : Expert System 7.1 What is Expert System? 7.2 Expert System Application Area 7.3 Expert System Structure 7.4 Expert System Characteristics 7.5 Conventional Vs Expert Systems 7.6 Participants in Expert Systems Development 7.7 Tools For Development of Expert System 7.8 MYSIN Unit 8 : Matching and Reasoning 8.1 Fuzzy Logic 8.1.1 What is Fuzziness? 8.1.2 Current Application of Fuzzy Logic 8.1.3 Overview of Fuzzy Logic 8.1.4 Fuzzy Sets
8.1.5 Hedges 8.1.6 Fuzzy Set Operations 8.1.7 Fuzzy Inference 8.2 Memory Organisation 8.3 Neural Networks and Parallel Computation 8.3.1 Neural Network Architectures 8.4 Genetic Algorithm 8.5 Matching 8.5.1 Variable Matching
UNIT 1
1.1 What is Artificial Intelligence? 1.2 Is AI Possible 1.3 Some AI Tasks 1.4 What we can do with AI? 1.5 AI Techniques 1.5.1 Knowledge Representation 1.5.2 Search 1.6 The Underlying Assumption
INTRODUCTION TO AI
all of the knowledge and feats, both conscious and unconscious, which we have acquired through study and experience: highly refined sight and sound perception; thought; imagination; the ability to converse, read, write, drive a car, memorize and recall facts, express and feel emotions; and much more. Intelligence is the integrated sum of those feats, which gives us the ability to remember a face not seen for thirty or more years, or to build and send rockets to the moon. It is those capabilities, which set Homo sapiens apart from other forms of living things. And as we shall see, the food for this intelligence is knowledge. Can we ever expect to build systems, which exhibit these characteristics? The answer to this question is yes! Today with the advent of the computer and 50 years of research into AI programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems which can mimic human thought, understand speech, beat the best human chess player, and countless other feats never before possible. It is not my aim to surprise or shock you--but the simplest way I can summarize is to say that there are now in the world machines that can think, that can learn and that can create. Moreover, their ability to do these things is going to increase rapidly until--in a visible future--the range of problems they can handle will be coextensive with the range to which the human mind has been applied. --Herbert Simon In spite of these impressive achievements, we still have not been able to produce coordinated, autonomous system which posses some of the basis abilities of a three-year-old child. These include the ability to recognize and remember numerous diverse objects in a scene, to learn new sounds and associate them with objects and concepts, and to adapt readily to many diverse new situations. These are the challenge now facing researchers in AI and they are not easy ones. They will require important breakthrough before we can expect to equal the performance of our three-year old.
1.2 Is AI Possible?
Before we embark on a course in Artificial Intelligence, we should consider for a moment whether automating intelligence is really possible! Artificial intelligence research makes the assumption that human intelligence can be reduced to the (complex) manipulation of symbols, and that it does not matter what medium is used to manipulate these symbols - it does not have to be a biological brain! This assumption does not go unchallenged among philosophers etc. Some argue that true intelligence can never be achieved by a computer, but requires some human property, which cannot be simulated. There are endless philosophical debates on this issue (some on comp.ai.philosophy), brought recently to public attention again in Penrose's book. The most well known contributions to the philosophical debate are Turing's ``Turing test'' paper, and Searle's ``Chinese room''. Very roughly, Turing considered how you would be able to conclude that a machine was really intelligent. He argued that the only reasonable way was to do a test. The test involves a human communicating with a human and with a computer in other rooms, using a computer for the communication. The first human can ask the other human/computer any questions they like, including very subjective questions like ``What do you think of this Poem''. If the computer answers so well that the first human can't tell which of the two others is human, then we say that the computer is intelligent.
Searle argued that just behaving intelligently wasn't enough. He tried to demonstrate this by suggesting a thought experiment (the ``Chinese room''). Imagine that you don't speak any Chinese, but that you have a huge rule book which allows you to look up Chinese sentences and tells you how to reply to them in Chinese. You don't understand Chinese, but can behave in an apparently intelligent way. He claimed that computers, even if they appeared intelligent, wouldn't really be, as they'd be just using something like the rule book of the Chinese room. Many people go further than Searle, and claim that computers will never even be able to appear to be really intelligent (so will never pass the Turing test). There are therefore a number of positions that you might adopt: y Computers will never even appear to be really intelligent, though they might do a few useful tasks that conventionally require intelligence. y Computers may eventually appear to be intelligent, but in fact they will just be simulating intelligent behavior, and not really be intelligent. y Computers will eventually be really intelligent. y Computers will not only be intelligent, they'll be conscious and have emotions. My view is that, though computers can clearly behave intelligently in performing certain limited tasks, full intelligence is a very long way off and hard to imagine (though I don't see any fundamental reason why a computer couldn't be genuinely intelligent.) However, these philosophical issues rarely impinge on AI practice and research. It is clear that AI techniques can be used to produce useful programs that conventionally require human intelligence, and that this work helps us understand the nature of our own intelligence. This is as much as we can expect from AI for now, and it still makes it a fascinating topic!
1.5 AI Techniques
There are various techniques that have evolved that can be applied to a variety of AI tasks these will be the focus of this course. These techniques are concerned with how we represent, manipulate and reason with knowledge in order to solve problems. y Knowledge Representation y Search
knowledge of the objects in the domain and knowledge of how to reason in that domain - both these types of knowledge must be represented. Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important, as it would be impossible (or at least impractical) to explicitly represent every fact that you might ever need. There are just so many potentially useful facts, most of which you would never even think of. You have to be able to infer new facts from your existing knowledge, as and when needed, and capture general abstractions, which represent general features of sets of objects in the world. Knowledge must be meaningfully represented so that we know how it relates back to the real world. A knowledge representation scheme provides a mapping from features of the world to a formal language. (The formal language will just capture certain aspects of the world, which we believe is important to our problem - we may of course miss out crucial aspects and so fail to really solve our problem, like ignoring friction in a mechanics problem). Anyway, when we manipulate that formal language using a computer we want to make sure that we still have meaningful expressions, which can be mapped back to the real world. This is what we mean when we talk about the semantics of representation languages. In other way we can say that AI techniques is a method that exploits knowledge that should be represented in such a way that: y The knowledge captures generalizations. In other words, it is not necessary to represent separately each individual situation. Instead, situations that share important properties are grouped together if knowledge does not have this property, inordinate amounts of memory and updating will be required. So we usually call something without this property data rather than knowledge. y It can be understood by people who must provide it. Although for many programs, the bulk of the data can be acquired automatically (for example, by taking readings from a variety of instruments), in many AI domains, mist of the knowledge a program has must ultimately be provided by people in terms they understand. y It can easily be modified to correct errors and to reflect changes in the world and in out world view. y It can be used in a great many situations even if it is not totally accurate or complete. y It can be used to help overcome its own sheer bulk by helping to narrow the range of possibilities that must usually be considered.
1.5.2 Search
Another crucial general technique required when writing AI programs is search. Often there is no direct way to find a solution to some problem. However, you do know how to generate possibilities. For example, in solving a puzzle you might know all the possible moves, but not the sequence that would lead to a solution. When working out how to get somewhere you might know all the roads/buses/trains, just not the best route to get you to your destination quickly. Developing good ways to search through these possibilities for a good solution is therefore vital. Brute force techniques, where you generate and try out every possible solution may work, but are often very inefficient, as there are just too many possibilities to try. Heuristic techniques are
often better, where you only try the options, which you think (based on your current best guess) are most likely to lead to a good solution. Although AI techniques must be designed in keeping with these constraints imposed by AI problem, there is some degree of independence between problems and problem solving techniques. It is possible to solve AI problem without using AI techniques (although those solutions are not likely to be vary good). And it is possible to apply AI techniques to the solution of non-AI problems. This is likely to be a good thing to do for problems that possess many of the same characteristics, as do AI problems.
It should allow you to express the knowledge you wish to represent in the language. For example, suppose you want to represent the fact that ``Richard knows how old he is''. This turns out to be difficult to express in some languages. y It should allow new knowledge to be inferred from a basic set of facts, as discussed above. y It should be clear, and have a well-defined syntax and semantics. We want to know what the allowable expressions are in the language, and what they mean. Otherwise we won't be sure if our inferences are correct, or what the results mean. For example, if we have a fact grey (elephant) we want to know whether it means all elephants are grey, some particular one is grey, or what. Some of these features may be present in recent non-AI representation languages, such as deductive and object oriented databases. In fact, these systems have been influenced by early AI research on knowledge representation, and there is some promise of further crossfertilization of ideas, to allow robust, multi-user knowledge/data bases with well-defined semantics and flexible representation and inference capabilities. However, at present the fields are still largely separate, and we will only be discussing basic AI approaches here. Broadly speaking, there are three main approaches to knowledge representation in AI. The most important is arguably the use of logic. A logic, almost by definition, has a well-defined syntax and semantics, and is concerned with truth preserving inference. However, using logic to represent things has problems. On the one hand, it may not be very efficient - if we just want a very restricted class of inferences, we may not want the full power of a logic-based theorem prove, for example. On the other hand, representing some common-sense things in a logic can be very hard. For example in first order predicate logic we can't conclude that something is true one minute, and then later decide that it is not true after all. If we did this it would lead to a contradiction, from which we could prove anything at all! We could decide to use more complex logics, which allow this kind of reasoning - there are all sorts of logics out there, such as default logics, temporal logics and modal logics. However, another approach is to abandon the constraints that the use of a logic imposes and use a less clean, but more flexible knowledge representation language. Two such ``languages'' are structured objects and production systems. The idea of structured objects is to represent knowledge as a collection of objects and relations, the most important relations being the subclass and instance relations. The subclass relation (as you might expect) says that one class is a subclass of another, while the instance relation says that some individual belongs to some class. We'll use them so that ``X subclass Y'' means that X is a subclass of Y, not that X has a subclass Y. (Some books/approaches use the relation is a to refer to the subclass relation. So Fred Bloggs is an instance of the class representing AI3 students, while the class of AI3 students is a subclass of the class of third year students. We can then define property inheritance, so that, by default, Fred inherits all the typical attributes of AI3 students, and AI3 students inherit typical attributes of 3rd yr students. We'll go into this in much more detail below. Production systems consist of a set of if-then rules, and a working memory. The working memory represents the facts that are currently believed to hold, while the if-then rules typically state that if certain conditions hold (e.g. certain facts are in the working memory), then some action should be taken (e.g., other facts should be added or deleted). If the only action allowed is to add a fact to working memory then rules may be essentially logical implications, but
y
generally greater flexibility is allowed. Production rules capture (relatively) procedural knowledge in a simple, modular manner.
2.1 What to Represent? 2.2 Application of Knowledge in AI 2.3 Properties for Knowledge Representation Systems 2.4 Approaches to Knowledge Representation 2.4.1 Simple Relational Knowledge 2.4.1 Inheritable Knowledge 2.5 Inferential Knowledge 2.6 Procedural Vs. Declarative Knowledge 2.7 Issues in Knowledge Representation 2.8 The Frame Problem 2.9 Forward Vs Backward Reasoning
Fig Two Entities in Knowledge Representation English or natural language is an obvious way of representing and handling facts. Logic enables us to consider the following fact: spot is a dog as dog(spot) We could then infer that all dogs have tails with: x dog(x) p hasatail(x) We can then deduce: hasatail(Spot) Using an appropriate backward mapping function we could generate the English sentence Spot has a tail. The available functions are not always one to one but rather are many to many which is a characteristic of English representations. The sentences All dogs have tails and every dog has a tail both say that each dog has a tail but the first could say that each dog has more than one tail try substituting teeth for tails. When an AI program manipulates the internal representation of facts these new representations should also be interpretable as new representations of facts. Consider the classic problem of the mutilated chess board. Problem In a normal chess board the opposite corner squares have been eliminated. The given task is to cover all the squares on the remaining board by dominoes so that each domino covers two squares. No overlapping of dominoes is allowed, can it be done. Consider three data structures
Fig. Mutilated Checker the first two are illustrated in the diagrams above and the third data structure is the number of black squares and the number of white squares. The first diagram loses the colour of the
squares and a solution is not east to see; the second preserves the colours but produces no easier path whereas counting the number of squares of each colour giving black as 32 and the number of white as 30 yields an immediate solution of NO as a domino must be on one white square and one black square, thus the number of squares must be equal for a positive solution.
Figure: Simple Relational Knowledge We can ask things like: y Who is dead? y Who plays Jazz/Trumpet etc.? This sort of representation is popular in database systems.
Fig. Property Inheritance Hierarchy y Boxed nodes -- objects and values of attributes of objects. y Values can be objects with attributes and so on. y Arrows -- point from object to its value. This structure is known as a slot and filler structure, semantic network or a collection of frames. The algorithm to retrieve a value for an attribute of an instance object: 1. Find the object in the knowledge base 2. If there is a value for the attribute report it 3. Otherwise look for a value of instance if none fail 4. Otherwise go to that node and find a value for the attribute and then report it 5. Otherwise search through using isa until a value is found for the attribute. 2.5 Inferential Knowledge Represent knowledge as formal logic: All dogs have tails
x dog(x) p hasatail(x)
Advantages: y A set of strict rules. o Can be used to derive more facts. o Truths of new statements can be verified. o Guaranteed correctness. y Many inference procedures available to in implement standard rules of logic.
2.6 Procedural Vs. Declarative Knowledge Declarative knowledge representation: y Static representation -- knowledge about objects, events etc. and their relationships and states given. y Requires a program to know what to do with knowledge and how to do it. Procedural representation: y control information necessary to use the knowledge is embedded in the knowledge itself. e.g. how to find relevant facts, make inferences etc. y Requires an interpreter to follow instructions specified in knowledge. y Knowledge encoded in some procedures o small programs that know how to do specific things, how to proceed. o e.g a parser in a natural language understander has the knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to routines that know how to process articles, adjectives and nouns. Advantages: y Heuristic or domain specific knowledge can be represented. y Extended logical inferences, such as default reasoning facilitated. y Side effects of actions may be modelled. Some rules may become false in time. Keeping track of this in large systems may be tricky. Disadvantages: y Completeness -- not all cases may be represented. y Consistency -- not all deductions may be correct. e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might discover that Fred is an emu. y Modularity is sacrificed. Changes in knowledge base might have far-reaching effects. y Cumbersome control information.
This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked City. Another representation is band = Naked City band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron, Granularity At what level should the knowledge be represented and what are the primitives. Choosing the Granularity of Representation Primitives are fundamental concepts such as holding, seeing, playing and as English is a very rich language with over half a million words it is clear we will find difficulty in deciding upon which words to choose as our primitives in a series of situations. If Tom feeds a dog then it could become: feeds(tom, dog) If Tom gives the dog a bone like: gives(tom, dog,bone) Are these the same? In any sense does giving an object food constitute feeding? If give(x, food) p feed(x) then we are making progress. But we need to add certain inferential rules. In the famous program on relationships Louise is Bill's cousin How do we represent this? louise = daughter (brother or sister (father or mother( bill))) Suppose it is Chris then we do not know if it is Chris as a male or female and then son applies as well. Clearly the separate levels of understanding require different levels of primitives and these need many rules to link together apparently similar primitives. Obviously there is a potential storage problem and the underlying question must be what level of comprehension is needed.
these nodes and copying these facts most of which do not change often from one node to another. For example, in the robot world, we could spend a lot of time recording. above(Ceiling, Floor) at every node. All of this is, of course, in addition to the real problem of figuring out which facts should be different at each node. This whole problem of representing the facts that change as well as those that do not is known as the frame problem. In some domains, the only hard part is representing all the facts. In others, though, figuring out which ones change is nontrivial. For example, in the robot world, there might be a table with a plant on it under the window. Suppose we move the table to the center of the room. We must also infer that the plant is now in the center of the room but that the window is not. To support this kind of reasoning, some systems make use of an explicit set of axioms called frame axioms, which describe all the things that do not change when a particular operator is applied in state n to produce state n+1. (The things that do change must be mentioned as part of the operator itself.) Thus, in the robot domain, we might write axioms such as Color(x, y, s1) move(x,s1,s2) color(x,y,s2) Which can be read as, if x has color y in state s1 and the operation of moving x is applied in state s1 to produce state s2, then the color of x in s2 is still y. Unfortunately, in any complex domain, a huge number of these axioms becomes necessary. An alternative approach is to make the assumption that the only things that change are the things that must. By must here we mean that the change is either required explicitly by the axioms that describe the operator or that it follows logically from some change that is asserted explicitly. This idea of circumscribing the set of unusual things is a very powerful one; it can be used as a partial solution to the frame problem and as a way of reasoning with incomplete knowledge. But now lets return briefly to the problem of representing a changing problem state. We could do it by simply starting with a description of the initial state and then making changes to that description as indicated by the rules we apply. This solves the problem of the wasted space and time involved in copying the information for each node. And it works fine until the first time the search has to backtrack. Then, unless all the changes that were made can simply be ignored (as they could be if, for example, they were simply additions of new theorems), we are faced with the problem of backing up to some earlier node. But how do we know what changes in the problem state description need to be undone? For example, what do we have to change to undo the effect of moving the table to the center of the room? There are two ways this problem can be solved. y Do not modify the initial state description at all. At each node, store an indication of the specific changes that should be made at this node. Whenever it is necessary to refer to the description of the current problem state. Look at the initial state description and also look back through all the nodes on the path from the start state to the current state. y Modify the initial state description as appropriate, but also record at each node an indication of what to do to undo the move should it ever be necessary to backtrack through the node. Then, whenever it is necessary to backtrack, check each node along the way and perform the indicated operations on the state description.
Sometimes, even these solutions are not enough. We might want to remember, for example, in the robot world, that before the table was moved, it was under the window and after being moved, it was in the center of the room. This can be handled by adding to the representation of each fact a specific indication of the time at which that fact was true. This indication is called a state variable. But to apply the same technique to a real-world problem. We need, for example, separate facts to indicate all the times at which the Statue of Liberty in the New York.
2.9
The object of a search procedure is to discover a path through a problem space from a goal state, there are actually two directions in which such a search could proceed. y Forward, from the start states. y Backward, from the goal states. The production system model of the search process provides an easy way of viewing forward and backward reasoning as symmetric processes. Consider the problem of solving a particular instance of the 8-puzzle. The rules to be used for solving the puzzle can be written as shown in following figure. Assume the areas of the tray are numbered:
1 4 7
2 5 8
3 6 9
Square 1 empty and Square 2 contains tile n Square 2 empty and Square 1 contains tile n Square 1 empty and Square 4 contains tile n Square 4 empty and Square 1 contains tile n Square 2 empty and Square 1 contains tile n Square 1 empty and Square 2 contains tile n Using those rules we could attempt to solve the puzzle shown below.
2 1 7
8 6
3 4 5
1 8 7
2 6
3 4 5
Reason forward from the initial states. Begin by building a tree of move sequences that might be solutions by starting with the initial configuration(s) at the root of the tree. Generate the next level of the tree by finding all the rules whose left sides match the root node and using their right sides to create the new configurations. Generate the next level by taking each node generated at the previous level and applying to it all of the rules whose left sides match it. Continue until a configuration that matches the goal state is generated." Reason backward from the goal states. Begin building a tree of move sequences that might be solutions by starting with the goal configuration(s) at the root of the tree. Generate the next level of the tree by finding all the rules whose right sides match the root node. These are all the rules that, if only we could apply them, would generate the state we wanted. Use the left sides
of the rules to generate the nodes at this second level of the tree. Generate the next level of the tree by taking each node at the previous level and finding all the rules whose right sides match it. Then using the corresponding left sides to generate the new nodes. Continue until a node that matches the initial state is generated. This method of reasoning backward from the desired final state is often called goal-directed reasoning. Four factors influence the question of whether it is better to reason forward or backward y Are there more possible start states or goal states? We want to move from the smaller to the larger y In which direction is the branching factor? We want to proceed in the direction of the smaller branching factor y Will the program be asked to justify its reasoning process to the user? Proceed in the direction that corresponds to the way the user thinks y What kind of event is going to trigger a problem-solving episode? If a new fact triggers reasoning, forward chaining is better. If a query triggers reasoning, backward chain.
3.1 Logic 3.2 Introduction to Propositional Logic 3.3 Predicate Logic 3.3.1 Introduction to Predicate Logic 3.3.2 Predicate Logic: Semantics 3.4 Quantification 3.5 Well-Formed Formula For First Order Predicate Logic 3.6 From Wff to Proposition 3.7 Transcribing English to Predicate Logic Wffs 3.8 Properties of Statements 3.9 Inference Rules 3.10 Resolution 3.11 Conversion to Clausal Form 3.12 Unification
3.1 Logic
Logic is a language for reasoning. It is a collection of rules we use when doing logical reasoning. Human reasoning has been observed over centuries from at least the times of Greeks, and patterns appearing in reasoning have been extracted, abstracted, and streamlined. The foundation of the logic we are going to learn here was laid down by a British mathematician George Boole in the middle of the 19th century, and it was further developed and used in an attempt to derive all of mathematics by Gottlob Frege, a German mathematician, towards the end of the 19th century. A British philosopher/mathematician, Bertrand Russell, found a flaw in basic assumptions in Frege's attempt but he, together with Alfred Whitehead, developed Frege's work further and repaired the damage. The logic we study today is more or less along this line. In logic we are interested in true or false of statements, and how the truth/falsehood of a statement can be determined from other statements. However, instead of dealing with individual specific statements, we are going to use symbols to represent arbitrary statements so that the results can be used in many similar but different cases. The formalization also promotes the clarity of thought and eliminates mistakes. There are various types of logic such as logic of sentences (propositional logic), logic of objects (predicate logic), logic involving uncertainties, logic dealing with fuzziness, temporal logic etc. Here we are going to be concerned with propositional logic and predicate logic, which are fundamental to all types of logic.
of a set of sentences, and if so, how. Thus sentences considered in this logic are not arbitrary sentences but are the ones that are true or false. This kind of sentences are called propositions. Sentences considered in propositional logic are not arbitrary sentences but are the ones that are either true or false, but not both. These kinds of sentences are called propositions. If a proposition is true, then we say it has a truth value of "true"; if a proposition is false, its truth value is "false". For example, "Grass is green", and "2 + 5 = 5" are propositions. The first proposition has the truth value of "true" and the second "false". But "Close the door", and "Is it hot outside?" are not propositions. Also "x is greater than 2", where x is a variable representing a number, is not a proposition, because unless a specific value is given to x we can not say whether it is true or false, nor do we know what x represents. Similarly "x = x" is not a proposition because we don't know what "x" represents hence what "=" means. For example, while we understand what "3 = 3" means, what does "Air is equal to air" or "Water is equal to water" mean? Does it mean a mass of air is equal to another mass or the concept of air is equal to the concept of air? We don't quite know what "x = x" mean. Thus we cannot say whether it is true or not. Hence it is not a proposition. Simple sentences which are true or false are basic propositions. Larger and more complex sentences are constructed from basic propositions by combining them with connectives. Thus propositions and connectives are the basic elements of propositional logic. Though there are many connectives, we are going to use the following five basic connectives here: NOT, AND, OR, IF_THEN (or IMPLY), IF_AND_ONLY_IF. They are also denoted by the symbols: , , , , , respectively. Truth Table Often we want to discuss properties/relations common to all propositions. In such a case rather than stating them for each individual proposition we use variables representing an arbitrary proposition and state properties/relations in terms of those variables. Those variables are called a propositional variable. Propositional variables are also considered a proposition and called a proposition since they represent a proposition hence they behave the same way as propositions. A proposition in general contains a number of variables. For example (P Q) contains variables P and Q each of which represents an arbitrary proposition. Thus a proposition takes different values depending on the values of the constituent variables. This relationship of the value of a proposition and those of its constituent variables can be represented by a table. It tabulates the value of a proposition for all possible values of its variables and it is called a truth table. For example the following table shows the relationship between the values of P, Q and P Q: OR P F Q F (P F Q)
F T T
T F T
T T T
In the table, F represents truth-value false and T true. This table shows that P Q is false if P and Q are both false, and it is true in all the other cases. Meaning of the Connectives y Meaning of connectives: NOT, AND, OR, IMPLIES, IF AND ONLY IF Let us define the meaning of the five connectives by showing the relationship between the truth value (i.e. true or false) of composite propositions and those of their component propositions. They are going to be shown using truth table. In the tables P and Q represent arbitrary propositions, and true and false are represented by T and F, respectively. NOT P T F P F T
This table shows that if P is true, then ( P) is false, and that if P is false, then ( P) is true.
AND P F F T T Q F T F T (P F F F T Q)
This table shows that (P Q) is true if both P and Q are true, and that it is false in any other case. Similarly for the rest of the tables. OR
P F F T T
Q F T F T
(P F T T T
Q)
IMPLIES P F F T T Q F T F T (P T T F T Q)
When P Q is always true, we express that by P Q. That is P Q is used when proposition P always implies proposition Q regardless of the value of the variables in them.
IF AND ONLY IF P F F T T Q F T F T (P T F F T Q)
When P Q is always true, we express that by P Q. That is is used when two propositions always take the same value regardless of the value of the variables in them. IMPLIES P F F T Q F T F (P T T F Q)
When P Q is always true, we express that by P Q. That is P Q is used when proposition P always implies proposition Q regardless of the value of the variables in them. IF AND ONLY IF P F F T T Q F T F T (P T F F T Q)
When P Q is always true, we express that by P Q. That is is used when two propositions always take the same value regardless of the value of the variables in them. Construction of Complex Propositions Syntax of propositions First it is informally shown how complex propositions are constructed from simple ones. Then more general way of constructing propositions is given. In everyday life we often combine propositions to form more complex propositions without paying much attention to them. For example combining "Grass is green", and "The sun is red" we say something like "Grass is green and the sun is red", "If the sun is red, grass is green", "The sun is red and the grass is not green" etc. Here "Grass is green", and "The sun is red" are propositions, and form them using connectives "and", "if... then..." and "not" a little more complex propositions are formed. These new propositions can in turn be combined with other propositions to construct more complex propositions. They then can be combined to form even more complex propositions. This process of obtaining more and more complex propositions can be described more generally as follows: Let X and Y represent arbitrary propositions. Then [ X], [X Y], [X Y], [X Y], and [X Y] are propositions. Note that X and Y here represent an arbitrary proposition. This is actually a part of more rigorous definition of proposition. Example: [P -> [Q V R]] is a proposition and it is obtained by first constructing [Q V R] by applying [X V Y] to propositions Q and R considering them as X and Y, respectively, then by applying [X -> Y] to the two propositions P and [Q V R] considering them as X and Y, respectively.
Note: Rigorously speaking X and Y above are place holders for propositions, and so they are not exactly a proposition. They are called a propositional variable, and propositions formed from them using connectives are called a propositional form. However, we are not going to distinguish them here, and both specific propositions such as "2 is greater than 1" and propositional forms such as (P Q) are going to be called a proposition. For the proposition P Q, the proposition Q P is called its converse, and the proposition Q P is called its contrapositive. For example for the proposition "If it rains, then I get wet", Converse: If I get wet, then it rains. Contrapositive: If I don't get wet, then it does not rain. The converse of a proposition is not necessarily logically equivalent to it, that is they may or may not take the same truth value at the same time. On the other hand, the contrapositive of a proposition is always logically equivalent to the proposition. That is, they take the same truth value regardless of the values of their constituent variables. Therefore, "If it rains, then I get wet." and "If I don't get wet, then it does not rain." are logically equivalent. If one is true then the other is also true, and vice versa. As we are going to see in the next section, reasoning is done on propositions using inference rules. For example, if the two propositions "if it snows, then the school is closed", and "it snows" are true, then we can conclude that "the school is closed" is true. In everyday life, that is how we reason. To check the correctness of reasoning, we must check whether or not rules of inference have been followed to draw the conclusion from the premises. However, for reasoning in English or in general for reasoning in a natural language, that is not necessarily straight forward and it often encounters some difficulties. Firstly, connectives are not necessarily easily identified as we can get a flavor of that from the previous topic on variations of if then statements. Secondly, if the argument becomes complicated involving many statements in a number of different forms twisted and tangled up, it can easily get out of hand unless it is simplified in some way. One solution for that is to use symbols (and mechanize it). Each sentence is represented by symbols representing building block sentences, and connectives. For example, if P represents "it snows" and Q represents "the school is closed", then the previous argument can be expressed as [ [ P -> Q ] ^ P ] -> Q , or P -> Q P ----------------------------Q This representation is concise, much simpler and much easier to deal with. In addition today there are a number of automatic reasoning systems and we can verify our arguments in symbolic form using them. One such system called TPS is used for reasoning exercises in this course. For example, we can check the correctness of our argument using it. To convert English statements into a symbolic form, we restate the given statements using the building block sentences and the connectives of propositional logic (not, and, or, if_then, if_and_only_if), and then substitute the symbols for the building blocks and the connectives.
For example, let P be the proposition "It is snowing", Q be the proposition "I will go the beach", and R be the proposition "I have time". Then first "I will go to the beach if it is not snowing" is restated as "If it is not snowing, I will go to the beach". Then symbols P and Q are substituted for the respective sentences to obtain ~P -> Q. Similarly, "It is not snowing and I have time only if I will go to the beach" is restated as "If it is not snowing and I have time, then I will go to the beach", and it is translated as ( ~P ^ R ) -> Q.
define the truth values of sentences with logical connectives in terms of the truth values of their component sentences. The truth tables provide a simple semantics for expressions in propositional logic. As sentences can only be true or false, truth tables are very simple. In order to infer new facts in a logic we need to apply inference rules. The semantics of the logic will define which inference rules are universally valid. One useful inference rule is the following (called modus ponens) but many others are possible: assertion : a, implication : a b --conclusion : b This rule just says that if a b is true, and a is true, then b is necessarily true. We could prove that this rule is valid using truth tables. Thus we need more powerful logic to deal with these and other problems. The predicate logic is one of such logic and it addresses these issues among others.
Sentences in predicate logic are constructed (much as in propositional logic) by combining atomic sentences with logical connectives, so the following are all sentences in predicate calculus: y friends(alison, richard) p likes(alison, richard) likes(alison, richard) V likes(alison, waffles) ((likes(alison, richard) V likes(alison, waffles)) ~likes(alison, waffles)) p likes(alison, richard) Sentences can also be formed using quantifiers to indicate how any variables in the sentence are to be treated. The two quantifiers in predicate logic are and , so the following are valid sentences: y X bird(X) ~flies(X) i.e., there exists some bird that doesn't fly. y X (person(X) Y loves(X,Y)) i.e., every person has something that they love. A sentence should have all its variables quantified. So strictly, an expression like ``X loves(X, Y)'', though a well formed formula of predicate logic, is not a sentence. Formulae with all their variables quantified are also called closed formulae.
y y
To cope with deficiencies of propositional logic we introduce two new features: predicates and quantifiers. A predicate is a verb phrase template that describes a property of objects, or a relationship among objects represented by the variables. For example, the sentences "The car Tom is driving is blue", "The sky is blue", and "The cover of this book is blue" come from the template "is blue" by placing an appropriate noun/noun phrase in front of it. The phrase "is blue" is a predicate and it describes the property of being blue. Predicates are often given a name. For example any of "is_blue", "Blue" or "B" can be used to represent the predicate "is blue" among others. If we adopt B as the name for the predicate "is_blue", sentences that assert an object is blue can be represented as "B(x)", where x represents an arbitrary object. B(x) reads as "x is blue". Similarly the sentences "John gives the book to Mary", "Jim gives a loaf of bread to Tom", and "Jane gives a lecture to Mary" are obtained by substituting an appropriate object for variables x, y, and z in the sentence "x gives y to z". The template "... gives ... to ..." is a predicate and it describes a relationship among three objects. This predicate can be represented by Give( x, y, z ) or G( x, y, z ), for example. Note: The sentence "John gives the book to Mary" can also be represented by another predicate such as "gives a book to". Thus if we use B( x, y ) to denote this predicate, "John gives the book to Mary" becomes B( John, Mary ). In that case, the other sentences, "Jim gives a loaf of bread to Tom", and "Jane gives a lecture to Mary", must be expressed with other predicates.
In general, a quantification is performed on formulas of predicate logic (called wff ), such as x > 1 or P(x), by using quantifiers on variables. There are two types of quantifiers: universal quantifier and existential quantifier. The universal quantifier turns, for example, the statement x > 1 to "for every object x in the universe, x > 1", which is expressed as " x x > 1". This new statement is true or false in the universe of discourse. Hence it is a proposition once the universe is specified. Similarly the existential quantifier turns, for example, the statement x > 1 to "for some object x in the universe, x > 1", which is expressed as " x x > 1." Again, it is true or false in the universe of discourse, and hence it is a proposition once the universe is specified. Universe of Discourse The universe of discourse, also called universe, is the set of objects of interest. The propositions in the predicate logic are statements on objects of a universe. The universe is thus the domain of the (individual) variables. It can be the set of real numbers, the set of integers, the set of all cars on a parking lot, the set of all students in a classroom etc. The universe is often left implicit in practice. But it should be obvious from the context. The Universal Quantifier The expression: x P(x), denotes the universal quantification of the atomic formula P(x). Translated into the English language, the expression is understood as: "For all x, P(x) holds", "for each x, P(x) holds" or "for every x, P(x) holds". is called the universal quantifier, and x means all the objects x in the universe. If this is followed by P(x) then the meaning is that P(x) is true for every object x in the universe. For example, "All cars have wheels" could be transformed into the propositional form, x P(x), where:
y y
P(x) is the predicate denoting: x has wheels, and the universe of discourse is only populated by cars.
Universal Quantifier and Connective AND If all the elements in the universe of discourse can be listed then the universal quantification x P(x) is equivalent to the conjunction: P(x1)) P(x2) P(x3) ... P(xn) . For example, in the above example of x P(x), if we knew that there were only 4 cars in our universe of discourse (c1, c2, c3 and c4) then we could also translate the statement as: P(c1) P(c2) P(c3) P(c4) The Existential Quantifier The expression: xP(x), denotes the existential quantification of P(x). Translated into the English language, the expression could also be understood as: "There exists an x such that P(x)" or "There is at least one x such that P(x)" is called the existential quantifier, and x means at least one object x in the universe. If this is followed by P(x) then the meaning is that P(x) is true for at least one object x of the universe. For example, "Someone loves you" could be transformed into the propositional form, x P(x), where: y P(x) is the predicate meaning: x loves you, y The universe of discourse contains (but is not limited to) all living creatures. Existential Quantifier and Connective OR
If all the elements in the universe of discourse can be listed, then the existential quantification xP(x) is equivalent to the disjunction: P(x1) P(x2) P(x3) ... P(xn). For example, in the above example of x P(x), if we knew that there were only 5 living creatures in our universe of discourse (say: me, he, she, rex and fluff), then we could also write the statement as: P(me) P(he) P(she) P(rex) P(fluff) An appearance of a variable in a wff is said to be bound if either a specific value is assigned to it or it is quantified. If an appearance of a variable is not bound, it is called free. The extent of the application(effect) of a quantifier, called the scope of the quantifier, is indicated by square brackets [ ]. If there are no square brackets, then the scope is understood to be the smallest wff following the quantification. For example, in x P(x, y), the variable x is bound while y is free. In x [ y P(x, y) Q(x, y) ] , x and the y in P(x, y) are bound, while y in Q(x, y) is free, because the scope of y is P(x, y). The scope of x is [ y P(x, y) Q(x, y) ] . How to read quantified formulas When reading quantified formulas in English, read them from left to right. x can be read as "for every object x in the universe the following holds" and x can be read as "there exists an object x in the universe which satisfies the following" or "for some object x in the universe the following holds". Those do not necessarily give us good English expressions. But they are where we can start. Get the correct reading first then polish your English without changing the truth values. For example, let the universe be the set of airplanes and let F(x, y) denote "x flies faster than y". Then x y F(x, y) can be translated initially as "For every airplane x the following holds: x is faster than every (any) airplane y". In simpler English it means "Every airplane is faster than every airplane (including itself !)". x y F(x, y) can be read initially as "For every airplane x the following holds: for some airplane y, x is faster than y". In simpler English it means "Every airplane is faster than some airplane". x y F(x, y) represents "There exist an airplane x which satisfies the following: (or such that) for every airplane y, x is faster than y". In simpler English it says "There is an airplane which is faster than every airplane" or "Some airplane is faster than every airplane". x y F(x, y) reads "For some airplane x there exists an airplane y such that x is faster than y", which means "Some airplane is faster than some airplane". Order of Application of Quantifiers When more than one variables are quantified in a wff such as y x P( x, y ), they are applied from the inside, that is, the one closest to the atomic formula is applied first. Thus
y x P( x, y ) reads y [ x P( x, y ) ] , and we say "there exists an y such that for every x, P( x, y ) holds" or "for some y, P( x, y ) holds for every x". The positions of the same type of quantifiers can be switched without affecting the truth value as long as there are no quantifiers of the other type between the ones to be interchanged. For example x y z P(x, y , z) is equivalent to y x z P(x, y , z), z y x P(x, y , z), etc. It is the same for the universal quantifier. However, the positions of different types of quantifiers can not be switched.
For example
y P( x, y ) is not equivalent to
number x, there is a number y that is greater than x", which is true, while y x P( x, y ) reads "there is a number that is greater than every (any) number", which is not true.
3.5 Well-Formed Formula for First Order Predicate Logic --- Syntax Rules
Subjects to be Learned y wff (well formed formula) y atomic formula y syntax of wff Contents Not all strings can represent propositions of the predicate logic. Those which produce a proposition when their symbols are interpreted must follow the rules given below, and they are called wffs(well-formed formulas) of the first order predicate logic. Rules for constructing Wffs A predicate name followed by a list of variables such as P(x, y), where P is a predicate name, and x and y are variables, is called an atomic formula. Wffs are constructed using the following rules: 1. True and False are wffs. 2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a variable representing propositions) are wffs. 3. Each atomic formula (i.e. a specific predicate with variables) is a wff. 4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B). 5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so are x A and x A . (Note : More generally, arguments of predicates are something called a term. Also variables representing predicate names (called predicate variables) with a list of variables can form atomic formulas. But we do not get into that here. Those who are interested click here.) For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by Rule 2. Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x) is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained. In this manner, larger and more complex wffs can be constructed following the rules given above. Note, however, that strings that can not be constructed by using those rules are not wffs. For example, xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) . One way to check whether or not an expression is a wff is to try to state it in English. If you can translate it into a correct English sentence, then it is a wff. More examples: To express the fact that Tom is taller than John, we can use the atomic formula taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as taller(Tom, John) taller(John, Tom), which is also a wff.
x taller(x,Tom),
taller(x,Tom), x y taller(x,y) are all wffs among others. However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.
x P(x).
However, the wff x N(x) is satisfiable but not valid. Note that a wff is not valid iff it is unsatisfiable for a valid wff is equivalent to true. Hence its negation is false. Equivalence Two wffs W1 and W2 are equivalent if and only if W1 W2 is valid, that is if and only if W1 W2 is true for all interpretations. For example x P(x) and x P(x) are equivalent for any predicate name P . So are x [ P(x) Q(x) ] and [ x P(x) x Q(x) ] for any predicate names P and Q . y To be precise, it is not for every interpretation but for the ones that "make sense". For example you don't consider the universe of set of people for the predicate x > 1 as an interpretation. Also an interpretation assigns a specific predicate to each predicate variable. A rigorous definition of interpretation etc. are, however, beyond the scope of this course.
"2 is even" is E(2) More difficult translation: In these translations, properties and relationships are mentioned for certain type of elements in the universe such as relationships between integers in the universe of numbers rather than the universe of integers. In such a case the element type is specified as a precondition using if_then construct. Examples: In the examples that follow the universe is the set of numbers including real numbers, and complex numbers. I(x), E(x) and O(x) representing "x is an integer", "x is even", and "x is odd", respectively. "All integers are even" is transcribed as x [ I(x) E(x)] It is first restated as "For every object in the universe (meaning for every number in this case) if it is integer, then it is even". Here we are interested in not any arbitrary object(number) but a specific type of objects, that is integers. But if we write x it means "for any object in the universe". So we must say "For any object, if it is integer .." to narrow it down to integers. "Some integers are odd" can be restated as "There are objects that are integers and odd", which is expressed as x [ I(x) E(x)] For another interpretation of this sentence see a note "A number is even only if it is integer" becomes x [ E(x) I(x)] "Only integers are even" is equivalent to "If it is even, then it is integer". Thus it is translated to x [ E(x) I(x)] Proving Things in Predicate Logic To prove things in predicate calculus we need two things. First we need to know what inference rules are valid - we can't keep going back to the formal semantics when trying to draw a simple inference! Second we need to know a good proof procedure that will allow us to prove things with the inference rules in an efficient manner. When discussing propositional logic we noted that a much used inference rule was modus ponens: A, ApB --B This rule is a sound rule of inference for predicate logic. Given the semantics of the logic, if the premises are true then the conclusions are guaranteed true. Other sound inference rules include modus tollens (if A p B is true and B is false then conclude ~ A), and-elimination (if A B is true then conclude both A is true and B is true), and lots more. In predicate logic we need to consider how to apply these rules if the expressions involved have variables. For example we would like to be able to use the facts X (man(X) p mortal(X)) and man(socrates) and conclude mortal(socrates). To do this we can use modus ponens, but allow universally quantified sentences to be matched with other sentences (like in Prolog). So, if we
have a sentence X A p B and a sentence C then if A and C can be matched or unified then we can apply modus ponens. Representing Things in Predicate Logic Your average AI programmer/researcher may not need to know the details of predicate logic semantics or proof theory, but they do need to know how to represent things in predicate logic, and what expressions in predicate logic mean. Formally we've already gone through what expressions mean, but it may make more sense to give a whole bunch of examples. This section will just give a list of logical expressions paired with English descriptions, then some unpaired logical or English expressions - you should try and work out for yourself how to represent the English expressions in Logic, and what the Logic expressions mean in English. There may be exam questions of this sort. y p m x Table(x) ~numberoflegs (x,4) ``There is some table that doesn't have 4 legs'' x (macintosh(x) p ~realcomputer(x)) ``No macintosh is a real computer'' or ``If something is a macintosh then its not a real computer'' x glaswegian(x) p (supports(x,rangers) V supports(x,celtic)) ``All Glaswegians support either Celtic or Rangers'' existsXsmall(x) on(x, table) ``There is something small on the table'' y ``All elephants are grey'' y ``Every apple is either green or yellow'' y ``There is some student who is intelligent'' y x red(x) on(x,table) p small(x)
y ~x brusslesprout(x) lasiy(x) [Note: When asked to translate English statements into predicate logic you should NOT use set
SOME EQUIVALENCE LAW Idempotent Laws Commutative Laws Distributive Laws Associative Laws Absorptive Laws De Morgans Laws Conditional elimination Bi-conditional elimination P P |P PP|P PQ|QP PQ|QP P (Q R) | (P Q) (P R) P (Q R) | (P Q) (P R) P (Q R) | (P Q) R P (Q R) | (P Q) R A (A B) | A A (A B) | A
(P Q) | P Q (P Q) | P Q
P p Q | P Q P m Q = (P Q) & (Q P)
TRUTH TABLE FOR EQUIVALANENT SENTENCES P true true false false Q true false true false ~P false false true true (~PvQ) true false true true (P Q) true false true true (Q P) true true false true (P Q)&(Q P) true false false true
~P
Implication: P Q Conclusion: ~Q If there is fire here, then there is oxygen here. There is no oxygen here. Therefore, there is no fire here. 3. Chain rules : From P Q, and Q R, infer P R. Or P Q Q R P R For example, Given : (programmer likes LISP) (programmer hate COBOL) And : (programmer hates COBOL) (Programmer likes recursion) Conclusion : (programmer likes LISP) (Programmer likes recursion)
4. Substitution if s is a valid sentence, s derived from s by consistent substitution of propositions in s, is also valid. For example, the sententence P V ~p is valid.
B. Non-Deductive inference Rules are those inference rules, which are not certain. 1. Abductive Inference : Abductive inference is based on the use of known casual knowledge to explain or justify a (possibly invalid) conclusion. Given the truth of proposition Q and the implication P Q, conclude P. For example, people who have had too much to drink tend to stagger when they walk. Therefore, it is not unreasonable to conclude that a person who is staggering is drunk even though this may be an incorrect conclusion. People may stagger when they walk for other reasons, including dizziness from twirling in circles or from some physical problem. We may represent abductive inference with the following, where the c over the implication arrow is meant to imply a possible causal relationship. assertion Q implication P Q conclusion P abductive inference is useful when known causal relations are likely and deductive inferencing is not possible for lack of facts. 2. Inductive Inference : Inductive inference is based on the assumption that a recurring pattern, observed for some event or entity, implies that the pattern is true for all entities in the class. For exp, after seeing a few white swans, we incorrectly infer that all swans are white We can represent inductive inference using the following description. P(a1),,P(ak) x P(x)
Inductive inference, of course, is not a valid form of inference, since it is not usually the case that all objects of a class can be verified as having a particular property. Even so, this is an important and commonly used form of inference. 3. Analogical Inference : Analogical inference is a form of experiential inference. Situation or entities which are alike in some respects tend to be similar in other respects. Thus, when we find that situation(object) A is related in certain ways to B, and A is similar in some context to A, we conclude that B has a similar relation to A in this context. We depict this form of inference with the following description, where the r above the implication symbol mean is related to. P Q P Q Analogical inference, like abductive and inductive is a useful but invalid form of commonsense inference.
3.10 Resolution
The most well known general proof procedure for predicate calculus is resolution. Resolution is a sound proof procedure for proving things by refutation - if you can derive a contradiction from ~P then P must be true. In resolution theorem proving, all statements in the logic are transformed into a normal form involving disjunctions of atomic expressions or negated atomic expressions (e.g., ~dog(X) V animal(X)). This allows new expressions to be deduced using a single inference rule. Basically, if we have an expression A1 v A2 ...v An v ~C and an expression B1 v B2 ...v Bm v C then we can deduce a new expression A1 v A2 ...v An v B1 v B2 ...v Bm. This single inference rule can be applied in a systematic proof procedure. Resolution is a sound proof procedure. If we prove something using it we can be sure it is a valid conclusion. However, there are many other things to worry about when looking at a proof procedure. It may not be complete (i.e., we may not be able to always prove something is true even if it is true) or decidable (the procedure may never halt when trying to prove something that is false). Variants of resolution may be complete, but no proof procedure based on predicate logic is decidable. And of course, it may just not be computationally efficient. It may eventually prove something, but take such a long time that it is just not usable. The efficiency of a proof will often depend as much on how you formulate your problem as on the general proof procedure used, but it is still an important issue to bear in mind. Resolution is very simple. Given two clauses C1 and C2 with no variables in common, if there is a literal l1 in C1 which is a complement of a literal l2 in C2, both l1 and l2 are deleted and a disjuncted C is formed from the remaining reduced clauses. The clause C is called the resolvent of C1 and C2. For example, to resolve the two clauses (~P V Q) and (~Q V R) we write ~P V Q, ~Q V R ~P V R several types of resolution are possible depending on the number and types of parents. We define a few of these types below.
Binary Resolution : Two clauses having complementary literals are combined as disjuncts to produce a single clause after deleting the complementary literals. For example, the binary resolvent of ~P(x,a) V Q(x) and ~Q(b) V R(x) is just ~P(b,a) V R(b) Unit resulting (UR) resolution : A number of clauses are resolved simultaneously to produce a unit clause. All except one of the clauses are unit clauses, and that one clause has exactly one more literal than the total number of unit clauses. For example, resolving the set {~MARRIED(x,y) V ~MOTHER(x,z) V FATHER(y,z), MARRIED(sue,joe), ~FATHER(joe,bill)} Where the substitution F = { sue/x, joe/y, bill/z } is used, resulting in the unit clause ~MOTHER(sue,bill). Linear resolution : When each resolved clause Ci is a parent to the clause Ci+1 (i = 1,2 .,n-1) the process is called linear resolution. For example, given a set S of clauses with C0 S, Cn is derived by a sequence of resolutions, C0 with some clause B0 to get C1, then C1 with some clause B1 to get C2, and so on until Cn has been derived. Linear input resolution : if one of the presents in linear resolution is always from the original set of clauses (the Bi), we have linear input resolution. For example, given the set of clauses S = { P V Q, ~P V Q, P V ~Q, ~P V ~Q} let C0 = (P V Q). Choosing B0 = ~P V Q from the set S and resolving this with C0 we obtain the resolvent Q = C1. B1 must now be chosen from S and the resolvent of C1 and B1 becomes C2 and so on.
of existentially quantified variables with Skolem function and deletion of the respective quantifiers, is then accomplished as follows: 1. If the first (leftmost) quantifier in an expression is an existential quantifier, replace all occurrences of the variable it quantifies with an arbitrary constant not appearing elsewhere in the expression and deleting the quantifier. This same procedure should be followed for all other existential quantifiers not preceded by a universal quantifier, in each case, using different constant symbols in the substitution. 2. For each existential quantifier that is preceded by one or more universal quantifiers ( is within the scope of one or more universal quantifiers), replace all occurrences of the existentially quantified variable by a function symbol not appearing elsewhere in the expression. The arguments assigned to the function should match all the variables appearing in each universal quantifier which precedes the existential quantifier. This existential quantifier should then be deleted. The same process should be repeated for each remaining existential quantifier using a different function symbol and choosing function arguments that correspond to all universally quantified variables that precede the existentially quantified variable being replaced. An example will help to clarify this process. Given the expression x v x y P(f(u),v,x,y) Q(u,v,y) the Skolem form is determined as v x P(f(a),v,x,g(v,x)) Q(a,v,g(v,x)). In making the substitutions, it should be noted that the variable you appearing after the first existential quantifier has been replaced in the second expression by the arbitrary constant a. This constant did not appear elsewhere in the first expression. The variable y has been replaced by the function symbol g having the variable v and x as arguments, since both of these variables are universally quantified to the left of the existential quantifier for y. Replacement of y by an arbitrary function with arguments v and x is justified on the basis that y, following v and x, may be functionally dependency. Step 5. Move all universal quantifiers to the left of the expressions and put the expression on the right into CNF. Step 6. Eliminate all universal quantifiers and conjunctions since they are retained implicitly. The resulting expressions (the expressions previously connected by the conjunctions) are clauses and the set of such expressions is said to be in clausal form. As an example of this process, let us convert the expression x y (z P(f(x),y,z) (u Q(x,u) & v R(y,v))). into clausal form. We have after application of step 1 x y (~(z) P(f(x),y,z) V (u Q(x,u) & (v) R(y,v))). After application of step 2 we obtain x y (z ~P(f(x),y,z) V (u Q(x,u) & (v) R(y,v))). After application of step 4 (step 3 is not required) y (~P(f(a),y,g(y)) V Q(a,h(y)) & R(y,l(y))). After application of step 5 the result is y((~P(f(a),y,g(y)) V Q(a,h(y)) & (~P(f(a),y,g(y)) V R(y,l(y))). Finally, after application of step 6 we obtain the clausal form ~P(f(a),y,g(y)) V Q(a,h(y))
~P(f(a),y,g(y) V R(y,l(y))
3.12 Unification
Resolution works on the principle of identifying complementary literals in two clauses and deleting them thereby forming a new literal (the resolvent). The process is simple and straightforward when one has identical literals. In other words, for clauses containing no variables, resolution is easy. There are three major types of substitutions, viz. 1. Substitution of a variable by a constant. 2. Substitution of a variable by another variable. 3. Substitution of a variable by a function that does not contain the same variable. A substitution that makes two clauses resolvable is called unifier and the process of identifying such unifier is carried out by the unification algorithm. We can also define the same in another way that any substitution that makes two or more expressions equal is called a unifier for the expressions. The unification algorithm tries to find out the Most General Unifier (MGU) between a given set of atomic formulae. For example, to unify P(f(a,x),y,y) and P(x,b,z) we first rename variables so that the two predicates have no variables in common. This can be done by replacing the x in the second predicate with u to given P(u,b,z). Next, we compare the two symbol-by-symbol from left to right until a disagreement is found. Disagreement can be between two different variables, a nonvariable term and a variable, or two nonvariable terms if no disagreement is found, the two are identical and we have succeeded. If a disagreement is found and both are nonvariable terms, unification is impossible, so we have failed. If both are variable, one is replaced throughout by other. Finally, if the disagreement is a variable and a nonvariable term, the variable is replaced by the entire term. Of course, in this last step, replacement is possible only if the term does not contain the variable that is being replaced. This matching process is repeated until the two are unified or until a failure occurs. For the two predicate P. above, a disagreement is first found between the term f(a,x) and variable u. Since f(a.x) does not contain the variable u, we replace u with f(a,x) everywhere it occurs in the literal. This gives a substitution set of { f(a,x)/u } and the partially matched predicates P (f(a,x),y,y) and P (f(a,x),b,z). Proceeding with the match, we find the next disagreement pair, y and b, a variable and term, respectively. Again, we replace the variable y with the term b and update the substitution list to get { f(a,x)/u, b/y }. The final disagreement pair is two variables. Replacing the variable in the second literal with the first we get the substitution set { f(a,x)/u, b/y, y/z } or, equivalently, { f(a,x/u, b/y, b/z}. Note that this procedure can always give the most general unifier.
4.1 Semantic Network 4.2 Conceptual Graph 4.3 Frame Structures 4.4 Conceptual Dependency 4.5 Scripts
Scooter
is-a
Two-wheeler
is-a
is-a
Motor-bike
has
has
Brakes
Moving-vehicle
Engine
has
has
Electrical-System
Fuel-system
From this, it is possible for us to say that a scooter is a two-wheeler and it is a moving vehicle. The network also shows that a moving vehicle needs an engine (could be petrol diesel or any engine), a fuel system to sustain the engine running, an electrical system for its lights, horn and also for initial ignition (in case of petrol vehicles) and brakes (of course, very important).
Unlike FOPL, there is neither generally accepted syntax nor semantics for associative networks. Such rules tend to be designer dependent and vary greatly from one implementation to another. Classification of Nodes in a Semantic Net Generally, the nodes in the semantic net are classified as y Generic nodes. y Individual or instance nodes. A Generic node is a very general node. In the above fig, for the semantic network of Movingvehicle, the Two-wheeler is a generic node because many two-wheeler exist. On the contrary, individual or instance node explicitly state that they are specific node. In the above figure Scooter is an individual node. It is a very specific instance of the two-wheeler. A number of arc relations have become common among users. They include such predicate as is-a, member-of, subset-of, ako (a kind of), has-parts, instance-of, agent etc. Less common arcs have also been used to express modality relation (time, manner, mood), linguistics case relations (theme, source, goal), logical connectives (or, not, and, implies), quantifier (all, some), set relations (superset, subset, member), attributes, and quantification (ordinal, count). One particular arc or link, the is-a link, has taken on a special meaning. It signifies that scooter is a two-wheeler and motorbike is a two-wheeler. Is-a relationship occurs in many representations of worlds. Bill is a student, a car is a furry animal, a tree is a plant, and so on. The is-a link is most often used to represent the fact than an object is of a certain type (predication) or to express the fact that one type is a subtype of another (for example, conditional quantification). Semantic Network structure permits the implementation of property inheritance, a form of inference. Nodes, which are members or subsets of other nodes, may inherit properties from their higher-level ancestor nodes. For example, from the following fig. It is possible to infer that a mouse has hair and drinks milk. Hair mammal is-a rodent is-a mouse milk
formal building blocks for semantic network which, when linked together in a coherent way, form a more complex knowledge structure. An example of such a graph which represents the sentence Ram is eating soup with a spoon is depicted as A plumber is carrying a pipe.
PERSON:ram
Agent
eat
object
FOOD:soup
Instrument
SPOON
In this figure concepts are enclosed in boxes and relations between the concepts are enclosed in ovals. The direction of the arrow corresponds to the order of the arguments in the relation they connect. The last or nth arc (argument) points away from the circle relation and all other arcs point toward the relation. Concept symbols refer to entities, actions, properties, or events in the world. A concept may be individual or generic. Individual concepts have a type field followed by a referent field. The concept [PERSON:ram] has type PERSON and referent ram. Referent like ram and food in figure are called individual concepts since they refer to specific entities. EAT and SPOON have no referent fields since they are generic concepts which refer to unspecified entities. Concepts like AGENT, OBJECT, INSTRUMENT, and PART are obtained from a collection of standard concepts. New concepts and relations can also be defined from these basic ones. A linear conceptual graph, which is easier to present as text can also be given. The linear form equivalent to the above sentence is [PERSON:ram] (AGENT) [EAT](OBJECT) [FOOD:soup] (INSTRUMENT) (SPOON) where square bracket have replaced concept boxes and parentheses have replaced relation circles. Some Examples of Conceptual Graphs are Example 1. "John is going to Boston by bus."
In DF, concepts are represented by rectangles: [Go], [Person: John], [City: Boston], and [Bus]. Conceptual relations are represented by circles or ovals: (Agnt) relates [Go] to the agent John, (Dest) relates [Go] to the destination Boston, and (Inst) relates [Go] to the instrument bus.
Above figure could be read as three English sentences: y Go has an agent which is a person John. y Go has a destination which is a city Boston. y Go has an instrument which is a bus. The linear form for CGs is intended as a more compact notation than DF, but with good human readability. It is exactly equivalent in expressive power to the abstract syntax and the display form. Following is the LF for above Figure : [Go](Agnt)->[Person: John] (Dest)->[City: Boston] (Inst)->[Bus]. In this form, the concepts are represented by square brackets instead of boxes, and the conceptual relations are represented by parentheses instead of circles. A hyphen at the end of a line indicates that the relations attached to the concept are continued on subsequent lines. Example 2. A person is between a rock and a hard place.
. In LF, above Figure may be represented in the following form: [Person]<-(Betw)<-1-[Rock] <-2-[Place]->(Attr)->[Hard]. Example 3. Tom believes that Mary wants to marry a sailor A conceptual graph containing a nest of two contexts
In the above Figure, Tom is the experiencer (Expr) of the concept [Believe], which is linked by the theme relation (Theme) to a proposition that Tom believes. The proposition box contains another conceptual graph, which says that Mary is the experiencer of [Want], which has as theme a situation that Mary hopes will come to pass. That situation is described by another nested graph, which says that Mary (represented by the concept [⊤]) marries a sailor. The dotted line, called a coreference link, shows that the concept [⊤] in the situation box refers to the same individual as the concept [Person: Mary] in the proposition box. Following is the linear form of Figure 4: [Person: Tom]<-(Expr)<-[Believe]->(Thme)[Proposition: [Person: Mary *x]<-(Expr)<-[Want]->(Thme)[Situation: [?x]<-(Agnt)<-[Marry]->(Thme)->[Sailor] ]].
. . .) An example of a simple frame for Ram is depicted in following figure (ram (PROFESSION (VALUE professor)) (AGE (VALUE 40)) (WIFE (VALUE sita)) (CHILDREN (VALUE love kush)) (ADDRESS (STREET (VALUE 7)) (CITY (VALUE audhya)) (STATE (VALUE uttarpradesh)) (ZIP (VALUE 124507)))) From the above figure it will be seen that a frame may have any number of slots, and a slot may have any number of facets, each with any number of values. This gives a very general framework from which to build a variety of knowledge structures. The slots in a frame specify general or specific characteristics of the entity for which the frame represents, and sometimes they include instructions on how to apply or use the slot values. Typically, a slot contains information such as attribute value pairs, default values, conditions for filling a slot, pointers to other related frames, and procedures that are activated when needed for different purposes. Facets (subslots) describe some knowledge or procedures about the attribute in the slot. May takes many form such as: y a constraint value; for example, the slot 'age', would be constraint age has to be an integer between 0 and 120 y a default value; for example, unless there is contrary evidence it is assumes that all people like sambal belacan y If-added Procedure: Executes when new information is placed in the slots. y If-removed Procedure: Executes when information is deleted from the slot. y If-needed Procedure: Executes when new information is needed from the slot, but the slot is empty. y If-changed Procedure: Executes when information changes. Procedural attachments are called demons. They are used to derive slot values. Important aspects of procedural attachments are that they can be used to direct reasoning process. Taking another example (ford (AKO (VALUE car)) (COLOR (VALUE silver)) (MODEL (VALUE 4-door)) (GAS-MILELAGE (DEFALUT fget)) (RANGE (VALUE if-needed))
(WEIGHT (VALUE 2500)) (FUEL-CAPACITY (VALUE 18))) The Ford frame illustrated in above figure has attribute-value slots (COLOR: silver, MODEL: 4door, and the like), a slot; which takes default values for GAS-MILAGE, and a slot with an attached if-needed procedure. The value fget in the GAS-MILAGE slot is a function call to fetch a default value from another frame such as the general car frame for which ford is a-kind-of (AKO). When the value of this slot is evaluated, the fget function is activated. When fget finds no value for gas mileage it recursively looks for a value from ancestor frame until a value is found. The if-needed value in the Range slot is a procedure name that, when called, computes the driving range of the Ford as a function of gas mileage and fuel capacity. Slots with attached procedures such as fget and if-needed are called procedural attachments or demons. They are done automatically when a value is needed but not provided for in a slot.
-- Utter a sound. e.g. say. ATTEND -- Focus a sense on a stimulus. e.g. listen, watch. MOVE -- Movement of a body part by owner. e.g. punch, kick. GRASP -- Actor grasping an object. e.g. clutch. INGEST -- Actor ingesting an object. e.g. eat. EXPEL -- Actor getting rid of an object from body. e.g. ????. CONCEPTUAL CASES (ALL ACTIONS INVOLVE ONE OR MORE OF THESE) o -- Objective Case R -- Recipient Donor Case. I -- Instrumental Case e.g. eat with a spoon. D -- Destination or Directive Case e.g. going home. CONCEPTAL TENSES (TIME OR ACTION OR STATE OF BEING) Conditional (c) Continuing (k) Finished Transition (tf) Future (f) Interrogative (?) Negative (/) Past (p) Present (nil) Start Transition (ts) Timeless (delta) Transition (t) CONCEPTUAL DEPENDENCIES Sementic rules for the formation of dependency structures such as the relationship between an actor and an event or between a primitive action and an instrument. 1. Bird flew. p PP ACT Bird PTRANS
2.
Ram is a student.
PP PP Ram student 3. o ACT 4. Shyam pushed the door. p o PP shyam PROPEL Joe gave sue a flower PP ACT PP o flower 5. CT Joe ate some soup. I Joe INGEST o soup p Joe ATRANS r Sue Joe
door
Joe
do o spoon
Advantages of CD: y Using these primitives involves fewer inference rules. y Many inference rules are already represented in CD structure. y The holes in the initial structure help to focus on the points still to be established. Disadvantages of CD: y Knowledge must be decomposed into fairly low level primitives. y Impossible or difficult to find correct set of primitives. y A lot of inference may still be required. y Representations can be complex even for relatively simple actions. Consider: Dave bet Frank five pounds that Wales would win the Rugby World Cup. Complex representations require a lot of storage Applications of CD: MARGIE (Meaning Analysis, Response Generation and Inference on English) -- model natural language understanding. SAM (Script Applier Mechanism) -- Scripts to understand stories. See next section. PAM (Plan Applier Mechanism) -- Scripts to understand stories. Schank et al. developed all of the above.
4.5 SCRIPTS
A script is a structure that prescribes a set of circumstances which could be expected to follow on from one another. It is similar to a thought sequence or a chain of situations which could be anticipated. It could be considered to consist of a number of slots or frames but with more specialised roles. Scripts are beneficial because: y Events tend to occur in known runs or patterns. y Causal relationships between events exist. y Entry conditions exist which allow an event to take place y Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme or when a purchaser buys a house. The components of a script include: Entry Conditions -- these must be satisfied before events in the script can occur. Results -- Conditions that will be true after events in script occur. Props -- Slots representing objects involved in events. Roles -- Persons involved in the events. Track -- Variations on the script. Different tracks may share components of the same script. Scenes -- The sequence of events that occur. Events are represented in conceptual dependency form. Scripts are useful in describing certain situations such as robbing a bank. This might involve: y Getting a gun. y Hold up a bank. y Escape with the money. Here the Props might be y Gun, G. y Loot, L. y Bag, B y Get away car, C. The Roles might be: y Robber, S. y Cashier, M. y Bank Manager, O. y Policeman, P. The Entry Conditions might be: y S is poor. y S is destitute. The Results might be: y S has more money.
O is angry. y M is in a state of shock. y P is shot. There are 3 scenes: obtaining the gun, robbing the bank and the getaway.
y
Fig. Simplified Bank Robbing Script Some additional points to note on Scripts:
If a particular script is to be applied it must be activated and the activating depends on its significance. y If a topic is mentioned in passing then a pointer to that script could be held. y If the topic is important then the script should be opened. y The danger lies in having too many active scripts much as one might have too many windows open on the screen or too many recursive calls in a program. y Provided events follow a known trail we can use scripts to represent the actions involved and use them to answer detailed questions. y Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery goes wrong). Advantages of Scripts: y Ability to predict events. y A single coherent interpretation may be build up from a collection of observations. Disadvantages: y Less general than frames. May not be suitable to represent all kinds of knowledge.
y
5.1 Search and Control Strategies 5.2 Preliminary Concepts 5.3 Water Container Problem 5.4 Production System 5.5 Problem Characteristics 5.6 Means-end analysis 5.7 Problem Reduction 5.8 Uninformed or Blind Search 5.8.1 Breadth-First Search 5.8.2 Depth-First Search 5.9 Informed Search 5.9.1 Hill Climbing Methods 5.9.2 Best First Search 5.9.3 A* Algorithm
nodes from a parent node. Once this is done, a child is said to be generated and the parent is said to be explored. The process of generating all of the children of a parent is also known as expanding the node. A search procedure is a strategy for selecting the order in which nodes are generated and a given path selected. Search problem may be classified by the information used to carry out a given strategy. In blind or uninformed search, no performance is given to the order of successor node generation and selection. The path selected is blindly or mechanically followed. No information is used to determine the preference of one child over another. In informed or directed search, some information about the problem space is used to compute a preference among the children for exploration and expansion. Before proceeding with a comparison of strategies, we consider next some typical search problems.
Pour water from 3l container into 4l conatiner until 4l container is full Pour water from 4l container into the 3l container until the 3l container is full 7. Pour all the water from 3l container into the 4l container 8. Pour all the water from 4l container into the 3l container Preconditions need to be staisfied before an operator can be applied.
5. 6.
EXAMPLE : # 1 can be applied if there is less than 4l water in the container. IF there is less than 4l in the 4l container THEN fill the 4l container. Adding pre-conditions to operators => generation of production rules. Forwarded form of rule # 1 : IF (x,y| x?4) THEN (4,y) The forwarded set of production rules : R1 IF (x,y| x?4) THEN (4,y) R2 IF (x,y| y?3) THEN (x,3) R3 IF (x,y| x>0) THEN (0,y) R4 IF (x,y| y>0) THEN (x,0) R5 IF (x,y| x+y>=4 ? y>0 ? x?4) THEN (4,y-(4-x)) R6 IF (x,y| x+y>=3 ? x>0 ? y?3) THEN (x-(3-y),3) R7 IF (x,y| x+y?=4 ? y>0) THEN (x+y,0) R8 IF (x,y| x+y?=3 ? x>0) THEN (0,x+y) In certain states, more than one rule can be applied. EXAMPLE: (4,0) satisfies the preconditions of R2,R3 ? R6
So far, our definition of a production system has been very general. It encompasses a great many systems, including water jug problem solver. It also encompasses a family of general production system interpreters, including: y Basic production system languages, such as OPS5 and ACT. y More complex, often hybrid systems called expert system shells, which provide complete (relatively speaking) environments for the construction of knowledge base expert systems. y General problem-solving architectures like SOAR, a system based on a specific set of cognitively motivated hypotheses about the nature of problem solving. All of these systems provide the overall architecture of a production system and allow the programmer to write rules that define particular problems to be solved.
R & (~P
Q)
(~P
Q) & R
(~~P V Q) & R
(P V Q) & R
(Q V P) & R As a simple example, we suppose General Problem Solver is given the initial prepositional logic object Li = (R & (~P Q)) and goal object Lg = ((Q V P) & R). To determine Lg from Li requires a few simple transformations. The system first determines the difference between the two expressions and then systematically reduces these differences until Lg is obtained from Li or failure occurs. For example, a comparison of Li and Lg reveals the difference that R is on the left in Li but on the right in Lg. This causes a subgoal to be set up to reduce this difference. The subgoal, in turn, calls for an application of the reduction method, namely to rewrite Li in the equivalent form LI = ((~P Q) & R). The rest of the solution process follows the path indicated in the tree of above figure. The Key Idea in Means-Ends Analysis is to Reduce Differences The purpose of means-ends analysis is to identify a procedure that cause a transition from the current state to the goal state, or at least to an intermediate state. Here is the general Procedure To perform means-ends analysis, Until the goal is reached or no more procedures are available Describe the current state, the goal state, and the difference between the two. Use the difference between the current state and goal state, possibly with the description of the current or goal state, to select a promising procedure. Use the promising procedure and update the current state. If an acceptable solution is found,announce it; otherwise,announce failure. Difference-Procedure Tables Often Determines the Means > The key idea in means-ends analysis is to reduce differences. > Means-ends analysis is often mediated via difference-procedure tables. The difference-procedure table determines what to do, leaving descriptions of the current state and destination state with no purpose other than to specify the origin and destination for the appropriate procedure.
5.7 Problem-Reduction
Sometimes, it is possible to convert difficult goals into one or more easier-to-achive subgoals. Each subgoal, in turn, maybe divided still more finely into one or more lower-level subgoals. The most typical example is in computer programming. Problem Reduction is ubiquitous in Programming.Most real world programs consists of a collection of specialized procedures. Each time one specified procedure calls another, it effects a problem-reduction step.The key idea in Problem Reduction is to Explore a Goal Tree A goal tree is a semantic tree in which nodes represent goals and branches indicate how you can achive goals by sloving one or more subgoals. A goal tree consists of, And goals, all of which must be satisfied, Or goals , one of which must be satisfied.
Goal:Steal TV set
3. If the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove and expand the first element from the queue and place all the children at the end of the queue in any order. 5. Return to step 2. The time complexity of the breadth-first search is O(bd). The space complexity is also O(bd).
Generally two categories of problem use heuristics. 1. Problem for which no exact algorithm are known and one needs to find an approximate and satisfying solution. e.g., computer vision, speech recognition etc.; 2. Problem for which exact solutions are known, but computationally infeasible. e.g., chess etc. The following algorithm make use of heuristic evaluation functions.
21
18
28
24
16
19
21
23
25
23
16
19
11 22
20
25
25
25
Hill-climbing technique is being used in some activity or other in out day-to-day chore. Some of them are: 1. While listening to somebody playing flute on the transistor, tone and volume control are adjusted in a way that makes the music melodius.
2. While tuning the carburetor of a scooter, the accelerator is raised to its maximum once and the carburetor is tuned so that the engine keeps on running for a considerably long period of time. 3. An electronics experts, while making the transistor for the first time, tunes the radio set at mid-afternoon when the signal is weak for proper reception. Problem of Hill-Climbing Technique 1. Local maximum: A state that is better than al its neighbors but not so when compared to states to states that are farther away.
Local Maximum 2. Plateau: A flat area of the search space, in which all neighbors have the same value.
Plateau 3. Ridge: Described as a long and narrow stretch of elevated ground or a narrow elevation or raised path running along or across a surface by the Oxford English Dictionary, this is an area in the path which must be traversed very carefully because movement in any direction might maintain one at the same level or result in fast descent.
Ridge In order to overcome these problems, adopt one of the following or a combination of the following methods. y Backtracking for local maximum. Backtracking helps in undoing what has been done so far and permits to try a totally different path to attain the global peak. y A big jump is the solution to escape from the plateau. A huge jump is recommended because in a plateau all neighboring point have the same value. y Trying different paths at the same time is the solution for circumventing ridges. The problem encountered with hill climbing can be avoided using a best first search approach.
2 M
6 7 H J First, the start node S is expanded. It has three children A,B and C with values 3,6 and 5 respectively. These values approximately indicate how far they are from the goal node. The child with minimum value namely A is chosen. The children of A are generated. They are D and E with values 9 and 8. The search process has now four nodes to search for. i.e., node D with value 9, node E with value 8, node B with value 6 and node C with value 5. Of them, node C has got the minimal value which is expanded to give node H with value 7. At this point, the nodes available for search are (D : 9), (E : 8), (B : 6) and (H : 7) where (E : F) indicates that (E) is the node and F is its evaluation value. Of these, B is minimal and hence B is expanded to give (F : 12), (G : 14).
At this juncture, the nodes available for search are (D : 9), (E : 8), (H : 7), (F : 12) and (G : 14) out of which (H : 7) is minimal and is expanded to give (I : 5), (J : 6). The entire steps of the search process are given in table below.
Children
Available nodes
Nodes Chosen
(A : 3), (B : 6), (C : 5) (A : 3), (B : 6), (C : 5) (A : 3) (D : 9), (E : 8) (B : 6), (C : 5), (D : 9), (E : 8) (C : 5 ) (H : 7) (B : 6), (D : 9), (E : 8), (H : 7) (B : 6) (F : 12), (G : 14) (D : 9), (E : 8), (H : 7), (H : 7) (F : 12), (G : 14) (I : 5), (J : 6) (D : 9), (E : 8), (F : 12), (I : 5) (G : 14), (I : 5), [J : 6) (K : 1), (L : 0), (M : 2) (D : 9), (E : 8), (F : 12), Search (G : 14), [J : 6), (K : 1), stops as
As you can see, best-first search jumps all around in the search graph to identify the node with minimal evaluation function value. The algorithm is given below 1. Place the starting node s on the queue. 2. If the queue is empty, return failure and stop. 3. if the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove the first element from the queue, expand it and compute the estimated goal distances for each child. Place the children on the queue (at either end) and arrange all queue elements in ascending order corresponding to goal distance from the front of the queue. 5. Return to step 2.
5.9.3 A* Algorithm
In best-first search, we brought in a heuristic value called evaluation function value. It is a value that estimates how far a particular node is from the goal. Apart from the evaluation
function values, one can also bring in cost functions. Cost functions indicate how much resources like time, energy, money etc. have been spent in reaching a particular node from the start. While evaluation function values deal with the future, cost function values deal with the past. Since cost function values are really expended, they are more concrete than evaluation function values. If it is possible for one to obtain the evaluation function values and the cost function values, then A* algorithm can be used. The basic principle is that sum the cost and evaluation function values for a state to get its goodness worth and use this as a yardstick instead of the evaluation function value in best-first search. The sum of the evaluation function value and the cost along the path leading to that state is called fitness number. Consider the following figure again with the same evaluation function values. Now associate with each node are three number, the evaluation function value, the cost function value and the fitness number. The fitness number, as stated earlier, is the total of the evaluation function value and the costfunction value. For example, consider node K, the fitness number is 20, which is obtained as follows: (Evaluation function of K) + (Cost function involved from start node S to node K) = I + (Cost function from S to C + Cost function from C to H + Cost function from H to I + Cost function from I to K) = 1 + 6 + 5 + 7 + 1 = 20. While best-first search uses the evaluation function value only for expanding the best node,
D 9 14 2 3 6 3 S 2 6 8 6B A 2 4 3 G 19 11 C 5 5 F 12 18 5 I 14 23 7 18 H 7 6 6 J 2 M 2 21 1 2
J L
E 8 13 K 1 20
20 Goal Node
23 The Algorithm for A* as follows 1. Put the initial node on a list START 2. If (START is empty) or (START = GOAL) terminate search 3. Remove the first node from START. Call this node a 4. if (a = GOAL) terminate search with success. 5. Else if node a has successor, generate all of them. Estimate the fitness number of the successors by totaling the evaluation function value and the cost function value. Sort the list by fitness number. 6. Name the new list as START 1 7. Replace START with START 1 8. Goto Step 2.
UNIT 6 LEARNING
6.1 What is Learning? 6.2 Types of Learning 6.2.1 Rote Learning 6.2.2 Learning by Taking Advice 6.2.3 Learning by Problem Solving 6.2.4 Inductive Learning 6.2.5 Explanation Based Learning 6.1 What is Learning? Learning is a an area of AI that focusses on processes of self-improvement. Information processes that improve their performance or enlarge their knowledge bases are said to learn. y Intelligence implies that an organism or machine must be able to adapt to new situations. y It must be able to learn to do new things. y This requires knowledge acquisition, inference, updating/refinement of knowledge base, acquisition of heuristics, applying faster searches, etc. A simple model of learning systems:
Simon [1983] has proposed that learning denotes ...changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. As thus defined, learning covers a wide range of phenomena. At the end of the spectrum is skill refinement (i.e. people get better at many tasks simply by practicing). At the other end of the spectrum lies knowledge aquisition. Many AI programs draw heavily on knowledge as their source of power. How can we learn? Many approaches have been taken to attempt to provide a machine with learning capabilities. This is because learning tasks cover a wide range of phenomena. Listed below are a few examples of how one may learn. We will look at these in detail shortly Skill refinement -- one can learn by practicing, e.g playing the piano. Knowledge acquisition -- one can learn by experience and by storing the experience in a knowledge base. One basic example of this type is rote learning. Taking advice
-- Similar to rote learning although the knowledge that is input may need to be transformed (or operationalised) in order to be used effectively. Problem Solving -- if we solve a problem one may learn from this experience. The next time we see a similar problem we can solve it more efficiently. This does not usually involve gathering new knowledge but may involve reorganisation of data or remembering how to achieve to solution. Induction -- One can learn from examples. Humans often classify things in the world without knowing explicit rules. Usually involves a teacher or trainer to aid the classification. Discovery -- Here one learns knowledge without the aid of a teacher. Analogy -- If a system can recognise similarities in information already stored then it may be able to transfer some knowledge to improve to solution of the task in hand.
Automated Advice Taking The following steps summarise this method: Request -- This can be simple question asking about general advice or more complicated by identifying shortcomings in the knowledge base and asking for a remedy. Interpret -- Translate the advice into an internal representation. Operationalise -- Translated advice may still not be usable so this stage seeks to provide a representation that can be used by the performance element. Integrate When knowledge is added to the knowledge base care must be taken so that bad sideeffects are avoided. E.g. Introduction of redundancy and contradictions. Evaluate -- The system must assess the new knowledge for errors, contradictions etc. The steps can be iterated.
For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We could treat laying the table as on action even though it involves a sequence of actions. The STRIPS problem-solving employed macro-operators in it's learning phase. Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true. STRIPS can achieve ON(A,B) in four steps: UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B) STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE), postconditions ON(A,B), ON(C,TABLE) and the four steps as its body. MACROP can now be used in future operation. But it is not very general. The above can be easily generalised with variables used in place of the blocks. Learning by Chunking Chunking involves similar ideas to Macro Operators and originates from psychological ideas on memory and problem solving. The computational basis is in production systems (studied earlier). SOAR is a system that use production rules to represent its knowledge. It also employs chunking to learn from experience. Basic Outline of SOAR's Method y SOAR solves problems it fires productions these are stored in long term memory. y Some firings turn out to be more useful than others. y When SOAR detects are useful sequence of firings, it creates chunks. y A chunk is essentially a large production that does the work of an entire sequence of smaller ones. y Chunks may be generalised before storing.
-- what the learning sees in the world. A goal concept -- a high level description of what the program is supposed to learn. A operational criterion -- a description of which concepts are usable. A domain theory -- a set of rules that describe relationships between objects and actions in a domain. From this EBL computes a generalisation of the training example that is sufficient not only to describe the goal concept but also satisfies the operational criterion. This has two steps: Explanation -- the domain theory is used to prune away all unimportant aspects of the training example with respect to the goal concept. Generalisation -- the explanation is generalised as far possible while still describing the goal concept.
7.1 What is Expert System? 7.2 Expert System Application Area 7.3 Expert System Structure 7.4 Expert System Characteristics 7.5 Conventional Vs Expert Systems 7.6 Participants in Expert Systems Development 7.7 Tools For Development of Expert System 7.8 MYSIN
Speed Variable Consistent Cost High Affordable We build expert systems for 2 reasons: 1. to replace an expert 2. to assist an expert Reasons for replacing an expert: y Make available expertise after hours or in other locations
Automate a routine task requiring an expert. y Expert is retiring or leaving y Expert is expensive Reasons for assisting an expert: y Aiding expert in some routine task to improve productivity y Aiding expert in some difficult task to effectively manage the complexities y Making available to the expert information that is difficult to recall
y
1. The Knowledge Base (LTM) The key bottleneck in developing an expert system. Contain everything necessary for understanding, formulating and solving a problem. It contains facts and heuristics. The most popular approach to representing domain knowledge is using production rules. Rule 1 IF car won't start THEN problem in electrical system
Rule 2 IF problem in electrical system AND battery voltage is below 10 volts THEN bad battery 2. Working Memory (STM) Contains facts about a problem that are discovered during consultation with the expert system. System matches this information with knowledge contained in the knowledge base to infer new facts. The conclusion reach will enter the working memory. Thus, working memory contain information either supplied by user or infer by the system. 3. Inference Engine Knowledge processor which is modeled after the expert reasoning power. Processor in an expert system that matches the facts contained in the working memory with the domain knowledge contained in the knowledge base, to draw conclusion about the problems. It taps the knowledge base and working memory to derive new information and solve problems 4. The User Interface The user communicates with the expert system through the user interface. It allows the user to query the system, supply information and receive advice. The aims are to provide the same form of communication facilities provided by the expert. But normally has less capability of understanding natural language and general knowledge. Example of inference process: Consultation with a Mr. Mech (ES). Based on previous rules. MM: The car cannot start? User: YES User assert: The car cannot start. MM assert: Problem in electrical system. MM: Is the battery voltage 10 volts? User: YES User assert: The battery voltage 10 volts. MM assert: Bad battery. The Explanation Facility A trademark of expert systems: ability to explain their reasoning. An additional component of expert system. ES can provide explanation on: 1. WHY it is asking the question 2. HOW it reached some conclusion. Explaining WHY Example: MM: Will the car not start? Person: WHY MM: If I know that the car won't start then I usually assume the problem is the electrical system.
Expert responds with what they might conclude from the answer. ES respond to a WHY query by displaying the rule it is currently pursuing. Explaining HOW Besides giving final results, expert system can explain how it arrived at a result. Conventional programming works on well-defined problem but not for expert system. Therefore, needs justification on the system findings. Example: MM: The battery is bad Person : HOW MM: Since your car won't start, I assumed there was a problem with the electrical system. Once I found the battery voltage was below 10 volts, I knew the battery was bad. ES respond by tracing back through the rules that fire the conclusion. This tracing is a map of the system line of reasoning.
Conclusion: Ahmad takes panadol 8. Reasons Heuristically Experts are adapt at drawing on their experiences to help them efficiently solved some current problem. Typical heuristics used by experts: y I always check the electrical first y People rarely get a cold during the summer y If I suspect cancer, then I always check the family history 9. Makes Mistakes Expert systems can make mistakes. Since the knowledge of expert have to be captured as close as possible in expert system, like its human counterpart, it can make mistakes. 10. Thrives on Reasonable Complexity The problem should be reasonably complex, not too easy or too difficult. 11. Focuses Expertise Most experts are skillful at solving problems within their narrow area of expertise, but have limited ability outside this area.
Programs do not make mistakes (only programmers do) Do not usually explain why input data are needed or how conclusions were drawn The system operates only when it is completed Execution is done (algorithmic) basis on a step-by-step
The system can operate with only a few rules (as a first prototype) Execution is done by using heuristics and logic Can operate with incomplete or uncertain information Effective manipulation of large knowledge bases Representation and use of knowledge
1. Languages Expert systems may be written in symbolic languages, such as LISP, or PROLOG or in conventional high level languages such as FORTRAN, C and PASCAL LISP All the large early expert systems were developed in LISP (List Processing) or a tool written in LISP. LISP deals with symbols. PROLOG Research on logic programming culminated in the seventies with the invention of the PROLOG language. Means Programming in Logic. A PROLOG program can be thought of as a database of facts and rules. 2. Expert System Shells A shell is a program that can be used to build expert systems. An expert system shell performs three major functions: 1. Assists in building the knowledge base by allowing the developer to insert knowledge into knowledge representation structures 2. Provides methods of inference or deduction that reason on the basis of information in the knowledge base and new facts input by the user 3. Provides an interface that allows the user to set up reasoning task and query the system about its reasoning strategy 3. AI Environments or Toolkits More expensive and powerful than either languages or shells. Advantage of using toolkits: They provide a variety of knowledge representation techniques such as rules and frames (inheritance) The following is the actual figures for the different development tools use by expert system builders in the UK: Conventional Languages 11% AI languages 23% Expert system Shells 56% Toolkits 11%
7.8 MYCIN : AN ES FOR THE TREATMENT AND DIAGNOSIS OF MENINGITIS AND BACTERIMIA INFECTIONS
Developed at Stanford University in the mid 1970's The first large expert system that perform at the level of human expert. Use as benchmark by expert system developers. Provide consultative advise about bacteremia and meningitis. Bacteremia is an infection that involves bacteria in the blood. Meningitis is an infections that inflammations of the membranes that envelop the brain and spinal cord. Can be fatal, thus need quick response, but positive identification normally takes 24 - 48 hours. Normally doctors must begin treatment in the
absence of lab results. Very complex and doctor need advice of an expert of bacteremia and meningitis. How MYCIN works? y MYCIN begin by initiating a dialogue. y Physician response to the questions y MYCIN provides diagnosis and prescription How MYCIN reasons? y Laboratory results of body fluid analyses y Symptoms that patient is displaying y Characteristics of the patient such as age, sex etc. MYCIN consultative proceeds in 2 phases: 1. Diagnosis is made to identify the most likely infection organisms 2. Prescribe one or more drugs (antibiotics) A consultation With MYCIN 1. Please enter information about the patient. Name Age Sex Race Sally 42 years Female Melayu 2. Are there any positive cultures obtained from SALLY? No 3. Are there any pending cultures of SALLY? Yes 4. Site: date collected: stain examined: method CSF 3-Jun-75 Yes N/A 5. Stain of smear: (type NONE if no organism was seen on the smear) None Unable to reach conclusion from lab test, like physician continue asking questions. 6. Has Sally recently had symptoms of persistent headache or other abnormal neurologic symptoms(dizziness, lethargy etc.) Yes 7. Has SALLY recently had objective evidence of abnormal neurologic signs documented by physician observation of examination? Yes Questions 6 and 7 traditional approach to diagnose by symptoms. Conclude from the questions by giving first conclusion. The CSF cultures will be considered to be associated with MENINGITIS. The following questions will identify particular organism that cause the infections. Is the meningitis a hospital-acquired infection? Yes 9. Is SALLY clinical history consistent with EPIGLOTTITIS? No 10. Is Sally clinical history consistent with OTITIS-MEDIA? 8.
No 11. Has the patient ever been to an area that is endemic for coccidiomycoses? Unknown 12. Is Sally a burn patient? Yes (Why) The type of infection may be bacterial . How? List of rules that conclude the infection may be bacteria will be listed. The final phase, prescription of antibiotics. 13. Does SALLY have clinically significant allergic reaction to any antimicrobial agent? No 14. Is Sally pregnant of breast-feeding? No 15. Do you have reason to suspect that SALLY may have impaired renal functions? No 16. SALLY weight in kilograms? 51.4 My preferred therapy recommendation is as follows: Give the following combination: 1. ETHAMBUTAL Dose: 1.28g (13.0 100mg tablets) q24h PO for 60 days then 770 mg (7.5 100 mg tablets) q24h PO. Comments: periodic vision screening tests are recommended for optic neuritis. 2. INH Dose: 513 mg (5.0 100mg-tablets) q24h PO 3. RIFAMPIN Dose: 600 mg PO q24h Comments: Administer dose on empty stomach. HOW GOOD AN EXPERT IS MYCIN In one complex evaluation, 8 independent evaluators evaluate MYCIN with 9 others human diagnostician for 10 difficult cases of meningitis. The task used was the selection of drugs for cases of meningitis before causative agents had been identified. Two phases of the evaluation: 1. MCYIN and 9 human experts evaluate 10 cases 2. Each of them prescribe medications Two evaluative criteria was used to see whether the prescriptions: 1. Would be effective against the actual bacteria after it was finally identified. 2. Adequately covered for other possible bacteria while avoiding over prescribing. Result:
Criteria 1: MYCIN and 3 other humans expert consistently prescribe therapy that would have been effective for all 10 cases. Criteria 2: MYCIN received higher ratings. 65% correct in all the cases whereas human expert 42.5% to 62.5%. MYCIN strengths is based on 4 factors: 1. MYCIN's knowledge base is extremely detail because acquired from the best human practitioners. 2. MYCIN do not overlook anything or forget any details. It considers every possibility. 3. MYCIN never jumps to conclusions of fails to ask for key pieces of information. 4. MYCIN is maintained at a major medical center and consequently, completely current. MYCIN represents 50 man-years of effort.
8.1 Fuzzy Logic 8.1.1 What is Fuzziness? 8.1.2 Current Application of Fuzzy Logic 8.1.3 Overview of Fuzzy Logic 8.1.4 Fuzzy Sets 8.1.5 Hedges 8.1.6 Fuzzy Set Operations 8.1.7 Fuzzy Inference 8.2 Memory Organisation 8.3 Neural Networks and Parallel Computation 8.3.1 Neural Network Architectures 8.4 Genetic Algorithm 8.5 Matching 8.5.1 Variable Matching
For example:
The degree to which a person is young is a fuzzy event rather than a random event. Suppose you have been in a desert for a week without a drink and you came upon a bottle A and B, marked with the following information: P (A belongs to a set of drinkable liquid) = 0.9 B in fuzzy set of drinkable liquid = 0.9 Which one would you choose? Some unrealistic and realistic quotes: Q: How was the weather like yesterday in San Francisco?
A1: Oh! The temperature was -5.5 degrees centigrade A2: Oh! It was really cold.
Experts rely on common sense to solve problem. This type of knowledge exposed when expert describe problem with vague terms. Example of vague terms: When it is really/quite hot ... If a person is very tall he is suitable for ... Only very small person can enter into that hole I am quite young Mr. Azizi drive his car moderately fast How can we represent and reason with vague terms in a computer?
a meteorological expert system in China for determining areas in which to establish rubber tree orchards
Linguistic Variables
Fuzzy terms are called linguistic variables. (or fuzzy variables)
Linguistic Variable
Temperature Height Weight Speed
Typical Values
hot, cold short, medium, tall light, heavy slow, creeping, fast
Conclusion: Add a little cold water Possible numerical values of linguistic variables is called UNIVERSE OF DISCOURSE. Example: The Universe of Discourse for the linguistic variable speed in R1 is in the range [0,100mph]. Thus, the phrase "speed is slow" occupies a section of the variables Universe of Discourse. - It is a fuzzy set. (slow)
Example:
Consider young people (age <= 10). If person age is 5 assign membership value 0.9 if 13, a value of 0.1 Age = linguistic variable young = one of it fuzzy sets Other fuzzy sets, old and middle age.
Can use the same method for other height description such as short or medium. Multiple fuzzy sets on the same universe of discourse are refers to as Fuzzy Subsets. Thus, membership value of a given object will be assigned to each set. (refer to fig. 2) Individual with height 5.5 is a medium person with membership value 1. At the same time member of short and tall person with membership value 0.25. Single object is a partial member of multiple sets.
8.1.5 HEDGES
We have learn how to capture and representing vague linguistic term using fuzzy set. In normal conversation, we add additional vagueness by using adverbs such as:
Concentration (Very)
Further reducing the membership values of those element that have smaller membership values. Q CON(A) (x) = (Q A(x))2 Given fuzzy set of tall persons, can create a new set of very tall person. Example: Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7) Very tall = { /5, /5.5, /6, /6.5, /7}
Dilation (somewhat)
Dilates the fuzzy elements by increasing the membership values with small membership values more than elements with high membership values. Q DIL(A) (x) = (Q A(x))0.5 Example: Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7} somewhat tall = { /5, /5.5, /6, /6.5, /7}
Intensification (Indeed)
Intensifying the meaning of phrase by increasing the membership values above 0.5 and decreasing those below 0.5. Q INT(A) (x) = 2(Q A(x))2 for 0 e Q A(x) e 0.5 = 1 - 2(1 - Q A(x))2 for 0.5 e Q A(x) e 1 Example: short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} indeed short = { /5, /5.5, /6, /6.5, /7}
Union
Union of 2 sets is comprised of those elements that belong to one or both sets. Q A B (X) = max (Q A(x), Q B(x)) x X Example:
Tall = {0/5, 0.2/5.5, 0.5/6, 0.8/6.5, 1/7} Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} Q tall short = Attains its highest vales at the limits and lowest at the middle. Tall or short can mean ________
Complementation (Not)
Find complement ~A by using the following operation:
Q
~A
(x) = 1 - Q A(x)\
Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} Not short = { /5, /5.5, /6, /6.5, /7}
1. Fuzzification
The crisp values input are assigned to the appropriate input fuzzy variables and converted to the degree of membership.
2. Propagation (Inference)
Fuzzy rules are applied to the fuzzy variables where degrees of membership computed in the condition part are propagated to the fuzzy variables in the conclusion part. (max-min and maxproduct inference)
3. De-fuzzification
The resultant degrees of membership for the fuzzy variables are converted back into crisp values. Example: Assume 2 cars traveling the same sped along a straight road. The distance between the cars becomes one of the factors for the second driver to brake his car to avoid collision. The following rule might be used by the second driver: IF the distance between cars is very small AND the speed of car is high
THEN brake very hard for speed reduction. IF distance between cars is slightly long AND the speed of car is not too low THEN brake moderately hard to reduce speed In the above examples, the rules are feature with: linguistic variables: fuzzy subsets: connectives: hedges: Diagram: Fuzzy Logic Controllers are build up from 4 main components: Fuzzifier Fuzzy inference engine Defuzzifier
problem is that if a script contains too many details, it will not be matched and retrived correctely when new, similar situation arise. In general, it is difficult to know which script to retrive. One reason reason for this is that script are too monolithic. It is hard to do any kind of partial matching. It is also hard to modify a script. More recent work reduces scripts to individual scenes, which can be shared across multiple structures. Streotypical sequences of scenes are strung together into memory organization packets. Usually, three distinct MOPs encode knowledge about an event sequence. One MOP represents the physical sequence of events, such as entering a dentists office, sitting in the waiting room, reading a magazine, sitting inte dentists chair, etc. Another MOP represent the set of social event that take place. These are events that involve personal interactions. A third MOP revolves around the goals of the person in the particualr episode. Any of these MOPs may be important for understanding new situations. MOPs organize scenes, and they themselves are further organized into higer-level MOPs. For example, the MOP for visiting the office of a professional may contain a sequence of abstract general scenes, such as talking to an assistant, waiting, and meeting. High-level MOPs contain no actual memories, so where do they come from? New MOPs are created upon the failure of exceptions. When we use scripts for story understanding, we are able to locate interesting parts of the story by noticing places where events do not conform to the scripts expectations. In a MOP-based system, if an expectation is repeatedly violeted, then the MOP is generalized or split. Eventually, episodic memories can fade away, leaving only a set of generalized MOPs. These MOPs look something like scripts, except that they share scenes with one another. Lets look at an example. the first time you go to the dentist, you must determine how things work from scratch since you have no prior experience. In doing so, you store detailed accounts of each scene and string them together into a MOP. The next time you visit the dentist, that MOP provides certain expectations, which are mostly met. You are able to deal with the situation easily and make inferences that you could not make the first time. If any expectation fails, this provides grounds for modifying the MOP. Now, suppose you later visit a doctors office. As you begin to store episodic scenes, you notice similarties between these scenes and scenes from the dentist MOP. Such similarities provide a basis for using the dentist MOP to generate expectations. Miltiple trips to the doctor will result in a doctor MOP that is slightly different from the dentist MOP. Later experiences with visiting lawyers and gevernment officials will result in other MOPs. Ultimately, the structures shared by all of these MOPs will cause a generalized MOP to appear. Whenever you visit a professions office in the future, you can use the generalized MOP to provide expectations. With MOPs, memory is both a constructive and reconstructive process. It is constructive because new experience create new memory structures. It is reconstrctive because even if the details of a particular episode are lost, the MOP provides information about what was likely to have happened. The ability to do this kind of reconstruction is an important feature of human memory. There are several MOP-based computer programs. CYRUS is a program that contrains episodes taken from the life of a particular individual. CYRUS can answer questions that require significant amount of memory reconstruction. The IPP program accepts stories about terrorist attacks and stores them in an episodic memory. As it notices similarities in the stories, it
creates general memory structures. These structures improve its ability to understand. MOPTRANS uses a MOP-based memory to understand sentences in one language and translate them into another.
Warren McCulloch after completing medical school at Yale, along with Walter Pitts a mathematician proposed a hypothesis to explain the fundamentals of how neural networks made the brain work. Based on experiments with neurons, McCulloch and Pitts showed that neurons might be considered devices for processing binary numbers. An important back of mathematic logic, binary numbers (represented as 1's and 0's or true and false) were also the
basis of the electronic computer. This link is the basis of computer-simulated neural
networks, also know as Parallel computing. A century earlier the true / false nature of binary numbers was theorized in 1854 by George Boole in his postulates concerning the Laws of Thought. Boole's principles make up what is known as Boolean algebra, the collection of logic concerning AND, OR, NOT operands. For example according to the Laws of thought the statement: (for this example consider all apples red) y Apples are red-- is True y Apples are red AND oranges are purple-- is False y Apples are red OR oranges are purple-- is True y Apples are red AND oranges are NOT purple-- is also True Boole also assumed that the human mind works according to these laws, it performs logical operations that could be reasoned. Ninety years later, Claude Shannon applied Boole's principles in circuits, the blueprint for electronic computers. Boole's contribution to the future of computing and Artificial Intelligence was immeasurable, and his logic is the basis of neural networks. McCulloch and Pitts, using Boole's principles, wrote a paper on neural network theory. The thesis dealt with how the networks of connected neurons could perform logical operations. It also stated that, one the level of a single neuron, the release or failure to release an impulse was the basis by which the brain makes true / false decisions. Using the idea of feedback theory, they described the loop which existed between the senses ---> brain ---> muscles, and likewise concluded that Memory could be defined as the signals in a closed loop of neurons. Although we now know that logic in the brain occurs at a level higher then McCulloch and Pitts theorized, their contributions were important to AI because they showed how the firing of signals between connected neurons could cause the brains to make decisions. McCulloch and Pitt's theory is the basis of the artificial neural network theory. Using this theory, McCulloch and Pitts then designed electronic replicas of neural networks, to show how electronic networks could generate logical processes. They also stated that neural networks may, in the future, be able to learn, and recognize patterns. The results of their research and two of Weiner's books served to increase enthusiasm, and laboratories of computer simulated neurons were set up across the country. Two major factors have inhibited the development of full scale neural networks. Because of the expense of constructing a machine to simulate neurons, it was expensive even to construct neural networks with the number of neurons in an ant. Although the cost of components have decreased, the computer would have to grow thousands of times larger to be on the scale of the human brain. The second factor is current computer architecture. The standard Von Neuman computer, the architecture of nearly all computers, lacks an adequate number of pathways between components. Researchers are now developing alternate architectures for use with neural networks. Even with these inhibiting factors, artificial neural networks have presented some impressive results. Frank Rosenblatt, experimenting with computer simulated networks, was able to create a machine that could mimic the human thinking process, and recognize letters. But, with new top-down methods becoming popular, parallel computing was put on hold. Now neural networks are making a return, and some researchers believe that with new computer
architectures, parallel computing and the bottom-up theory will be a driving factor in creating artificial intelligence.
x1 x2 =x*w x3
w1 w2 y
w3
the node fires, and produces an output y, an activation function value placed on the nodes output links. This output may then be the input to other nodes or the final output response from the networks. Following fig illustrates three layers of a number of interconnected nodes. The first layer serves as the input layer, receiving inputs from some set of stimuli. The second layer(called the hidden layer) receives input from the first layer and produces a pattern of inputs to the third layer, the output layer. The pattern of outputs from the final layer are the networks response to the input stimuli patterns. Input links to layer j (j = 1, 2, 3) have weight wij for i = 1, 2 . n. General multiplayer networks having n nodes (number of rows) in each of m layers (number of columns of nodes) will have weights represented as an n * m matrix W. Using this representation, nodes having no interconnecting links will have a weight value of zero. Networks consisting of more than three layers would, of course, be correspondingly more complex than the network depicted in following Figure.
A neural network can be thought of as a black box that transforms the input vector x to the output vector y where the transformation performed is the result of the pattern of connections and weights, that is, according to the values of the weight matrix W. Consider the vector product
x * w = x iwi
W12
x1
W13
w11 w21
w22
w31
w32
layer 2
layer 3
There is a geometric interpretation for this product. It is equivalent to projecting one vector onto the other vector in n-dimensional space. This notation is depicted in following Figure for the two-dimensional case. The magnitude of the resultant vector is given by
x * w = |x||w| cos U
where |x| denotes the norm or length of the vector x. note that this product is maximum when both vectors point in the same direction, that is, when U = 0. The product is a minimum when both point in opposite directions or when U = 1800 degrees. This illustrates how the vectors in the weight matrix W influence the inputs to the nodes in a neural network.
generation while the poorer performing structures are discarded. The basic cycle is illustrated in following figure.
Generate initial population So Structures perform given tasks repeatedly Performance utility values assigned to knowledge structures New population is generated from best performing structures
inversion operation concatenates the tail of the string to the head of the same string. Thus, if the sixth position were selected (x1x2x3x4x5x6x7x8), the inverted string would be x7x8x1x2x3x4x5x6. A third operator, mutation, is used to insure that all locations of the rule space are reachable, that every potential rule in the rule space is available for evaluation. This insures that the selection process does not get caught in a local minimum. For example, it may happen that use of the crossover and inversion operators will only produce a set of structures that are better than all local neighbors but not only produce a set of structures that are better than all local neighbors but not optimal in a global sense. This can happen since crossover and inversion may not be able to produce some undiscovered structures. The mutation operator can overcome this by simply selecting any bit position in a string at random and changing it. This operator is typically used only infrequently to prevent random wandering in the search space.
8.5 Matching:
Matching is the process of comparing two or more structures to discover their likenesses or differences. The structures may represent a wide range of objects including physical entities, words of phrases in some language, complete classes of things, general concepts, relations between complex entities, and the like. The representations will be given in one or more of the formalisms like FOPL, networks, or some other scheme, and matching will involve comparing the component parts of such structures. Matching is used in a variety of programs for different reasons. It may serve to control the sequence of operations, to identify or classify objects, to determine the best of a number of different alternatives, or to retrieve items from a database. It is an essential operation in such diverse programs as speech recognition, natural language understanding, vision, learning, automated reasoning, planning, automatic programming, and expert systems, as well as many others. In its simplest form, matching is just the process of comparing two structures or pattern for equality. The match fails if the patterns differ in any aspect. For example a match between the two characters strings acdebfba and acdebeba fails on an exact match since the string differ in the sixth character positions. In more complex cases the matching process may permit transformations in the patterns in order to achieve an equality match. The transformation may be a simple change of some variables to constants, or it may amount to ignoring some components during the match operation. For example, a pattern matching variable such as ?x may be used to permit successful matching between the two patterns (a b (c d) e) and (a b ?x e) by binding ?x to (c d). Such matching are usually restricted in some way, however, as is the case with the unification of two clauses where only consistent bindings are permitted. Thus, two patterns such as (a b (c d) e f) and (a b ?x e ?x ) would not match since ?x could not be bound to two different constants. In some extreme cases, a complete change of representational from may be required in either one or both structures before a match can be attempted. This will be the case, for example, when one visual object is represented as a vector of pixel gray levels and objects to be matched
are represented as descriptions in predicate logic or some other high level statements. A direct comparison is impossible unless one form has been transformed into the other. In subsequent chapters we will see examples of many problems where exact matches are inappropriate, and some form of partial matching is more meaningful. Typically in such cases, one is interested in finding a best match between pairs of structures. This will be the case in object classification problems, for example, when object descriptions are subject to corruption by noise or distortion. In such cases, a measure of the degree of match may also be required. Other types of partial matching may require finding a match between certain key elements while ignoring all other elements in the pattern. For example, a human language input unit should be flexible enough to recognize any of the following three statements as expressing a choice of preference for the low-calorie food item. I prefer the low-calorie choice. I want the low-calorie item. The low-calorie one please. Recognition of the intended request can be achieved by matching against key words in a template containing low-calorie and ignoring other words except, perhaps, negative modifiers. Finally, some problems may obviate the need for a form of fuzzy matching where an entitys degree of membership in one or more classes is appropriate. Some classification problems will apply here if the boundaries between the classes are not distinct, and an object may belong to more than one class. Following fig illustrate the general match process where an input description is being compared with other descriptions. As stressed earlier, the term object is used here in a general sense. It does not necessarily imply physical objects. All objects will be represented in some formalism such as a vector of attribute values, prepositional logic or FOPL statements, rules, frame-like structures, or other scheme. Transformations, if required, may involve simple instantiations or unifications among clauses or more complex operation such as transforming a two-dimensional scene to a description in some formal language. Once the descriptions have been transformed into the same schema, the matching process is performed element-by-element using a relational or other test (like equality or ranking). The test results may then be combined in some way to provide an overall measure of similarity. The choice of measure will depend on the match criteria and representation scheme employed.
Representation
Transformations
Result
The output of the matcher is a description of the match. It may be a simple yes or no response or a list of variable bindings, or as complicated as a detailed annotation of the similarities and differences between the matched objects. To summarize then, matching may be exact, used with or without pattern variables, partial, or fuzzy, and any matching algorithm will be based on such factors as Choice of representation scheme for the objects being matched, Criteria for matching (exact, partial, fuzzy, and so on), Choice of measure required to perform the match in accordance with the chosen criteria, and Type of match description required for output.
to 1. However, the difference between successive values does not necessarily have any quantitative meaning. Binary variable. Qualitative discrete variables which may assume only one of two values, such as 0 or 1, good or bad, yes or no, high or low. Interval variables. Quantitative variables, which take on numeric values and for which equal differences between values, have the same significance. For example, real numbers corresponding to temperature or integers corresponding to an amount of money are considered as interval variables.