Notes
Notes
Notes
*
you will need to have very good results to pass
*
If you fail, you have to take the course or drop it!
Specically: AP exams (oral) some time next week (see me for a date)
Be prepared to answer elementary questions about: discrete mathematics, terms,
substitution, abstract interpretation, computation, recursion, termination, elemen-
tary complexity, Standard ML, types, formal languages, Boolean expressions
(possible subjects of the exam)
Warning: you should be very sure of yourself to try (genius in C
++
insucient)
c : Michael Kohlhase 4
Although advanced placement is possible, it will be very hard to pass the AP test. Passing an AP
does not just mean that you have to have a passing grade, but very good grades in all the topics
that we cover. This will be very hard to achieve, even if you have studied a year of Computer
Science at another university (dierent places teach dierent things in the rst year). You can still
take the exam, but you should keep in mind that this means considerable work for the instrutor.
2.2.2 Homeworks, Submission, and Cheating
Homework assignments
Goal: Reinforce and apply what is taught in class.
homeworks: will be small individual problem/programming/proof assignments
(but take time to solve)
admin: To keep things running smoothly
Homeworks will be posted on PantaRhei
Homeworks are handed in electronically in grader (plain text, Postscript, PDF,. . . )
go to the recitations, discuss with your TA (they are there for you!)
Homework discipline:
start early! (many assignments need more than one evenings work)
Dont start by sitting at a blank screen
Humans will be trying to understand the text/code/math when grading it.
c : Michael Kohlhase 5
Homework assignments are a central part of the course, they allow you to review the concepts
covered in class, and practice using them.
10
Homework Submissions, Grading, Tutorials
Submissions: We use Heinrich Stamerjohanns grader system
submit all homework assignments electronically to https://jgrader.de
you can login with you Jacobs account (should have one!)
feedback/grades to your submissions
get an overview over how you are doing! (do not leave to midterm)
Tutorials: select a tutorial group and actually go to it regularly
to discuss the course topics after class (GenCS needs pre/postparation)
to discuss your homework after submission (to see what was the problem)
to nd a study group (probably the most determining factor of success)
c : Michael Kohlhase 6
The next topic is very important, you should take this very seriously, even it you think that this
is just a self-serving regulation made by the faculty.
All societies have their rules, written and unwritten ones, which serve as a social contract
among its members, protect their interestes, and optimize the functioning of the society as a
whole. This is also true for the community of scientists worldwide. This society is special, since it
balances intense cooperation on joint issues with erce competition. Most of the rules are largely
unwritten; you are expected to follow them anyway. The code of academic integrity at Jacobs is
an attempt to put some of the aspects into writing.
It is an essential part of your academic education that you learn to behave like academics,
i.e. to function as a member of the academic community. Even if you do not want to become
a scientist in the end, you should be aware that many of the people you are dealing with have
gone through an academic education and expect that you (as a graduate of Jacobs) will behave
by these rules.
The Code of Academic Integrity
Jacobs has a Code of Academic Integrity
this is a document passed by the faculty (our law of the university)
you have signed it last week (we take this seriously)
It mandates good behavior and penalizes bad from both faculty and students
honest academic behavior (we dont cheat)
respect and protect the intellectual property of others (no plagiarism)
treat all Jacobs members equally (no favoritism)
this is to protect you and build an atmosphere of mutual respect
academic societies thrive on reputation and respect as primary currency
The Reasonable Person Principle (one lubricant of academia)
we treat each other as reasonable persons
the others requests and needs are reasonable until proven otherwise
c : Michael Kohlhase 7
11
To understand the rules of academic societies it is central to realize that these communities are
driven by economic considerations of their members. However, in academic societies, the the
primary good that is produced and consumed consists in ideas and knowledge, and the primary
currency involved is academic reputation
3
. Even though academic societies may seem as altruistic
scientists share their knowledge freely, even investing time to help their peers understand the
concepts more deeply it is useful to realize that this behavior is just one half of an economic
transaction. By publishing their ideas and results, scientists sell their goods for reputation. Of
course, this can only work if ideas and facts are attributed to their original creators (who gain
reputation by being cited). You will see that scientists can become quite erce and downright
nasty when confronted with behavior that does not respect others intellectual property.
One special case of academic rules that aects students is the question of cheating, which we will
cover next.
Cheating [adapted from CMU:15-211 (P. Lee, 2003)]
There is no need to cheat in this course!! (hard work will do)
cheating prevents you from learning (you are cutting your own esh)
if you are in trouble, come and talk to me (I am here to help you)
We expect you to know what is useful collaboration and what is cheating
you will be required to hand in your own original code/text/math for all assignments
you may discuss your homework assignments with others, but if doing so impairs your
ability to write truly original code/text/math, you will be cheating
copying from peers, books or the Internet is plagiarism unless properly attributed
(even if you change most of the actual words)
more on this as the semester goes on . . .
*
There are data mining tools that monitor the originality of text/code.
*
c : Michael Kohlhase 8
We are fully aware that the border between cheating and useful and legitimate collaboration is
dicult to nd and will depend on the special case. Therefore it is very dicult to put this into
rm rules. We expect you to develop a rm intuition about behavior with integrity over the course
of stay at Jacobs.
2.2.3 Resources
3
Of course, this is a very simplistic attempt to explain academic societies, and there are many other factors at
work there. For instance, it is possible to convert reputation into money: if you are a famous scientist, you may
get a well-paying job at a good university,. . .
12
Textbooks, Handouts and Information, Forum
No required textbook, but course notes, posted slides
Course notes in PDF will be posted at http://kwarc.info/teaching/GenCS1.html
Everything will be posted on Planet GenCS (Notes+assignments+course forum)
announcements, contact information, course schedule and calendar
discussion among your fellow students(careful, I will occasionally check for academic integrity!)
http://gencs.kwarc.info (follow instructions there)
if there are problems send e-mail to [email protected]
c : Michael Kohlhase 9
No Textbook: Due to the special circumstances discussed above, there is no single textbook that
covers the course. Instead we have a comprehensive set of course notes (this document). They
are provided in two forms: as a large PDF that is posted at the course web page and on the
Planet GenCS system. The latter is actually the preferred method of interaction with the course
materials, since it allows to discuss the material in place, to play with notations, to give feedback,
etc. The PDF le is for printing and as a fallback, if the Planet GenCS system, which is still under
development develops problems.
Software/Hardware tools
You will need computer access for this course
(come see me if you do not have a computer of your own)
we recommend the use of standard software tools
the emacs and vi text editor (powerful, exible, available, free)
UNIX (linux, MacOSX, cygwin) (prevalent in CS)
FireFox (just a better browser (for Math))
learn how to touch-type NOW (reap the benets earlier, not later)
c : Michael Kohlhase 10
Touch-typing: You should not underestimate the amount of time you will spend typing during
your studies. Even if you consider yourself uent in two-nger typing, touch-typing will give you
a factor two in speed. This ability will save you at least half an hour per day, once you master it.
Which can make a crucial dierence in your success.
Touch-typing is very easy to learn, if you practice about an hour a day for a week, you will
re-gain your two-nger speed and from then on start saving time. There are various free typing
tutors on the network. At http://typingsoft.com/all_typing_tutors.htm you can nd about
programs, most for windows, some for linux. I would probably try Ktouch or TuxType
Darko Pesikan recommends the TypingMaster program. You can download a demo version
from http://www.typingmaster.com/index.asp?go=tutordemo
You can nd more information by googling something like learn to touch-type. (goto http:
//www.google.com and type these search terms).
Next we come to a special project that is going on in parallel to teaching the course. I am using
the coures materials as a research object as well. This gives you an additional resource, but may
aect the shape of the coures materials (which now server double purpose). Of course I can use
all the help on the research project I can get.
13
Experiment: E-Learning with OMDoc/PantaRhei
My research area: deep representation formats for (mathematical) knowledge
Application: E-learning systems (represent knowledge to transport it)
Experiment: Start with this course (Drink my own medicine)
Re-Represent the slide materials in OMDoc (Open Math Documents)
Feed it into the PantaRhei system (http://trac.mathweb.org/planetary)
Try it on you all (to get feedback from you)
Tasks (Unfortunately, I cannot pay you for this; maybe later)
help me complete the material on the slides (what is missing/would help?)
I need to remember what I say, examples on the board. (take notes)
Benets for you (so why should you help?)
you will be mentioned in the acknowledgements (for all that is worth)
you will help build better course materials (think of next-years freshmen)
c : Michael Kohlhase 11
2.3 Motivation and Introduction
Before we start with the course, we will have a look at what Computer Science is all about. This
will guide our intuition in the rest of the course.
Consider the following situation, Jacobs University has decided to build a maze made of high
hedges on the the campus green for the students to enjoy. Of course not any maze will do, we
want a maze, where every room is reachable (unreachable rooms would waste space) and we want
a unique solution to the maze to the maze (this makes it harder to crack).
What is Computer Science about?
For instance: Software! (a hardware example would also work)
Example 2 writing a program to generate mazes.
We want every maze to be solvable. (should have path from entrance to exit)
Also: We want mazes to be fun, i.e.,
We want maze solutions to be unique
We want every room to be reachable
How should we think about this?
c : Michael Kohlhase 12
There are of course various ways to build such a a maze; one would be to ask the students from
biology to come and plant some hedges, and have them re-plant them until the maze meets our
criteria. A better way would be to make a plan rst, i.e. to get a large piece of paper, and draw
a maze before we plant. A third way is obvious to most students:
14
An Answer:
Lets hack
c : Michael Kohlhase 13
However, the result would probably be the following:
*
2am in the IRC Quiet Study Area
*
c : Michael Kohlhase 14
If we just start hacking before we fully understand the problem, chances are very good that we
will waste time going down blind alleys, and garden paths, instead of attacking problems. So the
main motto of this course is:
*
no, lets think
*
The GIGO Principle: Garbage In, Garbage Out ( ca. 1967)
Applets, Not Craplets
tm
( ca. 1997)
c : Michael Kohlhase 15
Thinking about a problem will involve thinking about the representations we want to use (after
all, we want to work on the computer), which computations these representations support, and
what constitutes a solutions to the problem.
This will also give us a foundation to talk about the problem with our peers and clients. Enabling
students to talk about CS problems like a computer scientist is another important learning goal
of this course.
15
We will now exemplify the process of thinking about the problem on our mazes example. It
shows that there is quite a lot of work involved, before we write our rst line of code. Of course,
sometimes, explorative programming sometimes also helps understand the problem , but we would
consider this as part of the thinking process.
Thinking about the problem
Idea: Randomly knock out walls until we get a good maze
Think about a grid of rooms separated by walls.
Each room can be given a name.
Mathematical Formulation:
a set of rooms: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p
Pairs of adjacent rooms that have an open wall between them.
Example 3 For example, a, b and g, k are pairs.
Abstractly speaking, this is a mathematical structure called a graph.
c : Michael Kohlhase 16
Of course, the thinking process always starts with an idea of how to attack the problem. In our
case, this is the idea of starting with a grid-like structure and knocking out walls, until we have a
maze which meets our requirements.
Note that we have already used our rst representation of the problem in the drawing above: we
have drawn a picture of a maze, which is of course not the maze itself.
Denition 4 A representation is the realization of real or abstract persons, objects, circum-
stances, Events, or emotions in concrete symbols or models. This can be by diverse methods, e.g.
visual, aural, or written; as three-dimensional model, or even by dance.
Representations will play a large role in the course, we should always be aware, whether we are
talking about the real thing or a representation of it (chances are that we are doing the latter
in computer science). Even though it is important, to be able to always able to distinguish
representations from the objects they represent, we will often be sloppy in our language, and rely
on the ability of the reader to distinguish the levels.
From the pictorial representation of a maze, the next step is to come up with a mathematical
representation; here as sets of rooms (actually room names as representations of rooms in the
maze) and room pairs.
16
Why math?
Q: Why is it useful to formulate the problem so that mazes are room sets/pairs?
A: Data structures are typically dened as mathematical structures.
A: Mathematics can be used to reason about the correctness and eciency of data
structures and algorithms.
A: Mathematical structures make it easier to think to abstract away from unnecessary
details and avoid hacking.
c : Michael Kohlhase 17
The advantage of a mathematical representation is that it models the aspects of reality we are
interested in in isolation. Mathematical models/representations are very abstract, i.e. they have
very few properties: in the rst representational step we took we abstracted from the fact that
we want to build a maze made of hedges on the campus green. We disregard properties like maze
size, which kind of bushes to take, and the fact that we need to water the hedges after we planted
them. In the abstraction step from the drawing to the set/pairs representation, we abstracted
from further (accidental) properties, e.g. that we have represented a square maze, or that the
walls are blue.
As mathematical models have very few properties (this is deliberate, so that we can understand
all of them), we can use them as models for many concrete, real-world situations.
Intuitively, there are few objects that have few properties, so we can study them in detail. In our
case, the structures we are talking about are well-known mathematical objects, called graphs.
We will study graphs in more detail in this course, and cover them at an informal, intuitive level
here to make our points.
Mazes as Graphs
Denition 5 Informally, a graph consists of a set of nodes and a set of edges.
(a good part of CS is about graph algorithms)
Denition 6 A maze is a graph with two special nodes.
Interpretation: Each graph node represents a room, and an edge from node x to node
y indicates that rooms x and y are adjacent and there is no wall in between them.
The rst special node is the entry, and the second one the exit of the maze.
Can be represented as
_
_
_
a, e, e, i, i, j,
f, j, f, g, g, h,
d, h, g, k, a, b
m, n, n, o, b, c
k, o, o, p, l, p
_
_
, a, p
_
c : Michael Kohlhase 18
17
Mazes as Graphs (Visualizing Graphs via Diagrams)
Graphs are very abstract objects, we need a good, intuitive way of thinking about them.
We use diagrams, where the nodes are visualized as dots and the edges as lines between
them.
Our maze
can be visualized as
Note that the diagram is a visualization (a representation intended for humans to process
visually) of the graph, and not the graph itself.
c : Michael Kohlhase 19
Now that we have a mathematical model for mazes, we can look at the subclass of graphs that
correspond to the mazes that we are after: unique solutions and all rooms are reachable! We will
concentrate on the rst requirement now and leave the second one for later.
Unique solutions
Q: What property must the graph have for the maze to have a solution?
A: A path from a to p.
Q: What property must it have for the maze to have a unique solution?
A: The graph must be a tree.
c : Michael Kohlhase 20
Trees are special graphs, which we will now dene.
18
Mazes as trees
Denition 7 Informally, a tree is a graph:
with a unique root node, and
each node having a unique parent.
Denition 8 A spanning tree is a tree that includes all of the nodes.
Q: Why is it good to have a spanning tree?
A: Trees have no cycles! (needed for uniqueness)
A: Every room is reachable from the root!
c : Michael Kohlhase 21
So, we know what we are looking for, we can think about a program that would nd spanning
trees given a set of nodes in a graph. But since we are still in the process of thinking about the
problems we do not want to commit to a concrete program, but think about programs in the
abstract (this gives us license to abstract away from many concrete details of the program and
concentrate on the essentials).
The computer science notion for a program in the abstract is that of an algorithm, which we
will now dene.
Algorithm
Now that we have a data structure in mind, we can think about the algorithm.
Denition 9 An algorithm is a series of instructions to control a (computation) process
Example 10 (Kruskals algorithm, a graph algorithm for spanning trees)
Randomly add a pair to the tree if it wont create a cycle. (i.e. tear down a wall)
Repeat until a spanning tree has been created.
c : Michael Kohlhase 22
19
Denition 11 An algorithm is a collection of formalized rules that can be understood and exe-
cuted, and that lead to a particular endpoint or result.
Example 12 An example for an algorithm is a recipe for a cake, another one is a rosary a
kind of chain of beads used by many cultures to remember the sequence of prayers. Both the
recipe and rosary represent instructions that specify what has to be done step by step. The
instructions in a recipe are usually given in natural language text and are based on elementary
forms of manipulations like scramble an egg or heat the oven to 250 degrees Celsius. In
a rosary, the instructions are represented by beads of dierent forms, which represent dierent
prayers. The physical (circular) form of the chain allows to represent a possibly innite sequence
of prayers.
The name algorithm is derived from the word al-Khwarizmi, the last name of a famous Persian
mathematician. Abu Jafar Mohammed ibn Musa al-Khwarizmi was born around 780 and died
around 845. One of his most inuential books is Kitab al-jabr wal-muqabala or Rules of
Restoration and Reduction. It introduced algebra, with the very word being derived from a part
of the original title, namely al-jabr. His works were translated into Latin in the 12th century,
introducing this new science also in the West.
The algorithm in our example sounds rather simple and easy to understand, but the high-level
formulation hides the problems, so let us look at the instructions in more detail. The crucial one
is the task to check, whether we would be creating cycles.
Of course, we could just add the edge and then check whether the graph is still a tree, but this
would be very expensive, since the tree could be very large. A better way is to maintain some
information during the execution of the algorithm that we can exploit to predict cyclicity before
altering the graph.
Creating a spanning tree
When adding a wall to the tree, how do we detect that it wont create a cycle?
When adding wall x, y, we want to know if there is already a path from x to y in the
tree.
In fact, there is a fast algorithm for doing exactly this, called Union-Find.
Denition 13 (Union Find Algorithm) The Union Find Algorithm successively
puts nodes into an equivalence class if there is a path connecting them.
Before adding an edge x, y to the tree, it makes sure that x and y are not in the
same equivalence class.
Example 14 A partially constructed maze
c : Michael Kohlhase 23
Now that we have made some design decision for solving our maze problem. It is an important part
of thinking about the problem to determine whether these are good choices. We have argued
above, that we should use the Union-Find algorithm rather than a simple generate-and-test
20
approach based on the expense, by which we interpret temporally for the moment. So we ask
ourselves
How fast is our Algorithm?
Is this a fast way to generate mazes?
How much time will it take to generate a maze?
What do we mean by fast anyway?
In addition to nding the right algorithms, Computer Science is about analyzing the
performance of algorithms.
c : Michael Kohlhase 24
In order to get a feeling what we mean by fast algorithm, we to some preliminary computations.
Performance and Scaling
Suppose we have three algorithms to choose from. (which one to select)
Systematic analysis reveals performance characteristics.
For a problem of size n (i.e., detecting cycles out of n nodes) we have
n 100n s 7n
2
s 2
n
s
1 100 s 7 s 2 s
5 .5 ms 175 s 32 s
10 1 ms .7 ms 1 ms
45 4.5 ms 14 ms 1.1 years
100 . . . . . . . . .
1 000 . . . . . . . . .
10 000 . . . . . . . . .
1 000 000 . . . . . . . . .
c : Michael Kohlhase 25
What?! One year?
2
10
= 1 024 (1024 s)
2
45
= 35 184 372 088 832 (3.510
13
s = 3.510
7
s 1.1 years)
we denote all times that are longer than the age of the universe with
n 100n s 7n
2
s 2
n
s
1 100 s 7 s 2 s
5 .5 ms 175 s 32 s
10 1 ms .7 ms 1 ms
45 4.5 ms 14 ms 1.1 years
100 100 ms 7 s 10
16
years
1 000 1 s 12 min
10 000 10 s 20 h
1 000 000 1.6 min 2.5 mo
c : Michael Kohlhase 26
21
So it does make a dierence for larger problems what algorithm we choose. Considerations like
the one we have shown above are very important when judging an algorithm. These evaluations
go by the name of complexity theory.
We will now briey preview other concerns that are important to computer science. These are
essential when developing larger software packages. We will not be able to cover them in this
course, but leave them to the second year courses, in particular software engineering.
Modular design
By thinking about the problem, we have strong hints about the structure of our program
Grids, Graphs (with edges and nodes), Spanning trees, Union-nd.
With disciplined programming, we can write our program to reect this structure.
Modular designs are usually easier to get right and easier to understand.
c : Michael Kohlhase 27
Is it correct?
How will we know if we implemented our solution correctly?
What do we mean by correct?
Will it generate the right answers?
Will it terminate?
Computer Science is about techniques for proving the correctness of programs
c : Michael Kohlhase 28
Let us summarize!
The science in CS: not hacking, but
Thinking about problems abstractly.
Selecting good structures and obtaining correct and fast algorithms/machines.
Implementing programs/machines that are understandable and correct.
c : Michael Kohlhase 29
In particular, the course General Computer Science is not a programming course, it is about
being able to think about computational problems and to learn to talk to others about these
problems.
22
3 Elementary Discrete Math
3.1 Mathematical Foundations: Natural Numbers
We have seen in the last section that we will use mathematical models for objects and data struc-
tures throughout Computer Science. As a consequence, we will need to learn some math before
we can proceed. But we will study mathematics for another reason: it gives us the opportunity
to study rigorous reasoning about abstract objects, which is needed to understand the science
part of Computer Science.
Note that the mathematics we will be studying in this course is probably dierent from the
mathematics you already know; calculus and linear algebra are relatively useless for modeling
computations. We will learn a branch of math. called discrete mathematics, it forms the
foundation of computer science, and we will introduce it with an eye towards computation.
Lets start with the math!
Discrete Math for the moment
Kenneth H. Rosen Discrete Mathematics and Its Applications, McGraw-Hill,
1990 [Ros90].
Harry R. Lewis and Christos H. Papadimitriou, Elements of the Theory of Computation,
Prentice Hall, 1998. [LP98]
Paul R. Halmos, Naive Set Theory, Springer Verlag, 1974 [Hal74].
c : Michael Kohlhase 30
The roots of computer science are old, much older than one might expect. The very concept of
computation is deeply linked with what makes mankind special. We are the only animal that
manipulates abstract concepts and has come up with universal ways to form complex theories and
to apply them to our environments. As humans are social animals, we do not only form these
theories in our own minds, but we also found ways to communicate them to our fellow humans.
The most fundamental abstract theory that mankind shares is the use of numbers. This theory
of numbers is detached from the real world in the sense that we can apply the use of numbers to
arbitrary objects, even unknown ones. Suppose you are stranded on an lonely island where you
see a strange kind of fruit for the rst time. Nevertheless, you can immediately count these fruits.
Also, nothing prevents you from doing arithmetics with some fantasy objects in your mind. The
question in the following sections will be: what are the principles that allow us to form and apply
numbers in these general ways? To answer this question, we will try to nd general ways to specify
and manipulate arbitrary objects. Roughly speaking, this is what computation is all about.
23
Something very basic:
Numbers are symbolic representations of numeric quantities.
There are many ways to represent numbers (more on this later)
lets take the simplest one (about 8,000 to 10,000 years old)
we count by making marks on some surface.
For instance //// stands for the number four (be it in 4 apples, or 4 worms)
Let us look at the way we construct numbers a little more algorithmically,
these representations are those that can be created by the following two rules.
o-rule consider as an empty space.
s-xrule given a row of marks or an empty space, make another / mark at the right end
of the row.
Example 15 For ////, Apply the o-rule once and then the s-rule four times.
Denition 16 we call these representations unary natural numbers.
c : Michael Kohlhase 31
In addition to manipulating normal objects directly linked to their daily survival, humans also
invented the manipulation of place-holders or symbols. A symbol represents an object or a set
of objects in an abstract way. The earliest examples for symbols are the cave paintings showing
iconic silhouettes of animals like the famous ones of Cro-Magnon. The invention of symbols is not
only an artistic, pleasurable waste of time for mankind, but it had tremendous consequences.
There is archaeological evidence that in ancient times, namely at least some 8000 to 10000 years
ago, men started to use tally bones for counting. This means that the symbol bone was used to
represent numbers. The important aspect is that this bone is a symbol that is completely detached
from its original down to earth meaning, most likely of being a tool or a waste product from a
meal. Instead it stands for a universal concept that can be applied to arbitrary objects.
Instead of using bones, the slash / is a more convenient symbol, but it is manipulated in the same
way as in the most ancient times of mankind. The o-rule us to start with a blank slate or an
empty container like a bowl. The s- or successor-rule allows to put an additional bone into a bowl
with bones, respectively, to append a slash to a sequence of slashes. For instance //// stands for
the number four be it in 4 apples, or 4 worms. This representation is constructed by applying
24
the o-rule once and than the s-rule four times.
o
i=o
n
i
= o
and
s(m)
i=o
=
m
i=o
m
i
n
i
.
Denition 41 The multiplication operation can be dened by the equations n o = o
and n s(m) = n n m.
Denition 42 The product operation can be dened by the equations
o
i=o
n
i
= o
and
s(m)
i=o
=
m
i=o
m
i
n
i
.
Denition 43 The exponentiation operation can be dened by the equations
exp(n, o) = s(o) and exp(n, s(m)) = n exp(n, m).
c : Michael Kohlhase 41
Talking (and writing) about Mathematics
29
3.2 Talking (and writing) about Mathematics
Before we go on, we need to learn how to talk and write about mathematics in a succinct way.
This will ease our task of understanding a lot.
Talking about Mathematics (MathTalk)
Denition 44 Mathematicians use a stylized language that
uses formulae to represent mathematical objects,
6
uses math idioms for special situations (e.g. i, hence, let. . . be. . . , then. . . )
classies statements by role (e.g. Denition, Lemma, Theorem, Proof, Example)
We call this language mathematical vernacular.
Denition 45 Abbreviations for Mathematical statements
and are common notations for and and or
not is in mathematical statements often denoted with
x.P (x S.P) stands for condition P holds for all x (in S)
x.P (x S.P) stands for there exists an x (in S) such that proposition P holds
,x.P (,x S.P) stands for there exists no x (in S) such that proposition P holds
1
x.P (
1
x S.P) stands for there exists one and only one x (in S) such that
proposition P holds
i as abbreviation for if and only if, symbolized by
the symbol is used a as shortcut for implies
Observation: With these abbreviations we can use formulae for statements.
Example 46 x.y.x = y (x ,= y) reads
For all x, there is a y, such that x = y, i (if and only if) it is not the case
that x ,= y.
c : Michael Kohlhase 42
f
EdNote: think about how to reactivate this example
We will use mathematical vernacular throughout the remainder of the notes. The abbreviations
will mostly be used in informal communication situations. Many mathematicians consider it bad
style to use abbreviations in printed text, but approve of them as parts of formulae (see e.g.
Denition 3.3 for an example).
To keep mathematical formulae readable (they are bad enough as it is), we like to express mathe-
matical objects in single letters. Moreover, we want to choose these letters to be easy to remember;
e.g. by choosing them to remind us of the name of the object or reect the kind of object (is it a
number or a set, . . . ). Thus the 50 (upper/lowercase) letters supplied by most alphabets are not
sucient for expressing mathematics conveniently. Thus mathematicians use at least two more
alphabets.
30
The Greek, Curly, and Fraktur Alphabets Homework
Homework: learn to read, recognize, and write the Greek letters
A alpha B beta gamma
delta E epsilon Z zeta
H eta , theta I iota
K kappa lambda M mu
N nu Xi o O omicron
, Pi P rho sigma
T tau upsilon phi
X chi psi omega
we will need them, when the other alphabets give out.
BTW, we will also use the curly Roman and Fraktur alphabets:
/, B, c, T, c, T, (, , 1, , /, /, /, A, O, T, Q, 1, S, T , |, 1, V, ., ], ?
A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z
c : Michael Kohlhase 43
On our way to understanding functions
We need to understand sets rst.
c : Michael Kohlhase 44
Naive Set Theory
3.3 Naive Set Theory
We now come to a very important and foundational aspect in Mathematics: Sets. Their importance
comes from the fact that all (known) mathematics can be reduced to understanding sets. So it is
important to understand them thoroughly before we move on.
But understanding sets is not so trivial as it may seem at rst glance. So we will just represent
sets by various descriptions. This is called naive set theory, and indeed we will see that it leads
us in trouble, when we try to talk about very large sets.
31
Understanding Sets
Sets are one of the foundations of mathematics,
and one of the most dicult concepts to get right axiomatically
Denition 47 A set is everything that can form a unity in the face of God.
(Georg Cantor ((1845), (1918)))
For this course: no denition; just intuition (naive set theory)
To understand a set S, we need to determine, what is an element of S and what isnt.
Notations for sets (so we can write them down)
listing the elements within curly brackets: e.g. a, b, c
to describe the elements by a property: x [ x has property P
by stating element-hood (a S) or not (b , S).
Warning: Learn to distinguish between objects and their representations!
(a, b, c and b, a, a, c are dierent representations of the same set)
c : Michael Kohlhase 45
Now that we can represent sets, we want to compare them. We can simply dene relations between
sets using the three set description operations introduced above.
Relations between Sets
set equality: A B : x.x A x B
subset: A B : x.x A x B
proper subset: A B : (x.x A x B) (A ,= B)
superset: A B : x.x A x B
proper superset: A B : (x.x A x B) (A ,= B)
c : Michael Kohlhase 46
We want to have some operations on sets that let us construct new sets from existing ones. Again,
can dene them.
32
Operations on Sets
union: A B := x [ x A x B
union over a collection: Let I be a set and S
i
a family of sets indexed by I, then
iI
S
i
:= x [ i I.x S
i
. We write
n
i=1
S
i
for
i[1,n]
S
i
intersection: A B := x [ x A x B
intersection over a collection: Let I be a set and S
i
a family of sets indexed by I, then
iI
S
i
:= x [ i I.x S
i
. We write
n
i=1
S
i
for
i[1,n]
S
i
set dierence: AB := x [ x A x , B
the power set: T(A) := S [ S A
the empty set: x.x ,
Cartesian product: AB := a, b [ a A b B, call a, b pair.
n-fold Cartesian product: A
1
A
n
:= a
1
, . . . , a
n
[ i.a
i
A
i
,
call a
1
, . . . , a
n
an n-tuple
n-dim Cartesian space: A
n
:= a
1
, . . . , a
n
[ a
i
A,
call a
1
, . . . , a
n
a vector
c : Michael Kohlhase 47
7
EdNote:7
These operator denitions give us a chance to reect on how we do denitions in mathematics.
3.3.1 Denitions in Mathtalk
Mathematics uses a very eective technique for dealing with conceptual complexity. It usually
starts out with discussing simple, basic objects and their properties. These simple objects can be
combined to more complex, compound ones. Then it uses a denition to give a compound object
a new name, so that it can be used like a basic one. In particular, the newly dened object can be
used to form compound objects, leading to more and more complex objects that can be described
succinctly. In this way mathematics incrementally extends its vocabulary by add layers and layers
of denitions onto very simple and basic beginnings. We will now discuss four denition schemata
that will occur over and over in this course.
Denition 48 The simplest form of denition schema is the simple denition. This just intro-
duces a name (the deniendum) for a compound object (the deniens). Note that the name must
be new, i.e. may not have been used for anything else, in particular, the deniendum may not
occur in the deniens. We use the symbols := (and the inverse =:) to denote simple denitions in
formulae.
Example 49 We can give the unary natural number //// the name . In a formula we write
this as := //// or //// =: .
Denition 50 A somewhat more rened form of denition is used for operators on and relations
between objects. In this form, then deniendum is the operator or relation is applied to n distinct
variables v
1
, . . . , v
n
as arguments, and the deniens is an expression in these variables. When the
new operator is applied to arguments a
1
, . . . , a
n
, then its value is the deniens expression where
the v
i
are replaced by the a
i
. We use the symbol := for operator denitions and : for pattern
denitions.
8
EdNote:8
7
EdNote: need to dene the big operators for sets
8
EdNote: maybe better markup up pattern denitions as binding expressions, where the formal variables are bound.
33
Example 51 The following is a pattern denition for the set intersection operator :
A B := x [ x A x B
The pattern variables are Aand B, and with this denition we have e.g. = x [ x x .
Denition 52 We now come to a very powerful denition schema. An implicit denition (also
called denition by description) is a formula A, such that we can prove
1
n.A, where n is a new
name.
Example 53 x.x , is an implicit denition for the empty set . Indeed we can prove unique
existence of by just exhibiting and showing that any other set S with x.x , S we have S .
IndeedS cannot have elements, so it has the same elements ad , and thus S .
Sizes of Sets
We would like to talk about the size of a set. Let us try a denition
Denition 54 The size #(A) of a set A is the number of elements in A.
Intuitively we should have the following identities:
#(a, b, c) = 3
#(N) = (innity)
#(A B) #(A) + #(B) (
*
cases with )
#(A B) min(#(A), #(B))
#(AB) = #(A) #(B)
But how do we prove any of them? (what does number of elements mean anyways?)
Idea: We need a notion of counting, associating every member of a set with a unary
natural number.
Problem: How do we associate elements of sets with each other?
(wait for bijective functions)
c : Michael Kohlhase 48
But before we delve in to the notion of relations and functions that we need to associate set
members and counding let us now look at large sets, and see where this gets us.
Sets can be Mind-boggling
sets seem so simple, but are really quite powerful (no restriction on the elements)
There are very large sets, e.g. the set o of all sets
contains the , for each object O we have O, O, O, O, . . . o,. . .
contains all unions, intersections, power sets, . . .
contains itself: o o (scary!)
Lets make o less scary
c : Michael Kohlhase 49
34
A less scary o?
Idea: how about the set o
t
of all sets that do not contain themselves
Question: is o
t
o
t
? (were we successful?)
suppose it is, then then we must have o
t
, o
t
, since we have explicitly taken out the
sets that contain themselves
suppose it is not, then have o
t
o
t
, since all other sets are elements.
In either case, we have o
t
o
t
i o
t
, o
t
, which is a contradiction!
(Russells Antinomy [Bertrand Russell 03])
Does MathTalk help?: no: o
t
:= m [ m , m
MathTalk allows statements that lead to contradictions, but are legal wrt. vocabu-
lary and grammar.
We have to be more careful when constructing sets! (axiomatic set theory)
for now: stay away from large sets. (stay naive)
c : Michael Kohlhase 50
Even though we have seen that naive set theory is inconsistent, we will use it for this course.
But we will take care to stay away from the kind of large sets that we needed to constuct the
paradoxon.
3.4 Relations and Functions
Now we will take a closer look at two very fundamental notions in mathematics: functions and
relations. Intuitively, functions are mathematical objects that take arguments (as input) and
return a result (as output), whereas relations are objects that take arguments and state whether
they are related.
We have alread encountered functions and relations as set operations e.g. the elementhood
relation which relates a set to its elements or the powerset function that takes a set and produces
another (its powerset).
35
Relations
Denition 55 R AB is a (binary) relation between A and B.
Denition 56 If A = B then R is called a relation on A.
Denition 57 R AB is called total i x A.y B.x, y R.
Denition 58 (R)
1
:= y, x [ x, y R is the converse relation of R.
Note: (R)
1
B A.
The composition of R AB and S B C is dened as S R :=
a, c (AC) [ b B.a, b R b, c S
Example 59 relation , =, has color
Note: we do not really need ternary, quaternary, . . . relations
Idea: Consider AB C as A(B C) and a, b, c as a, b, c
we can (and often will) see a, b, c as a, b, c dierent representations of the same
object.
c : Michael Kohlhase 51
We will need certain classes of relations in following, so we introduce the necessary abstract
properties of relations.
Properties of binary Relations
Denition 60 A relation R AA is called
reexive on A, i a A.a, a R
symmetric on A, i a, b A.a, b R b, a R
antisymmetric on A, i a, b A.(a, b R b, a R) a = b
transitive on A, i a, b, c A.(a, b R b, c R) a, c R
equivalence relation on A, i R is reexive, symmetric, and transitive
partial order on A, i R is reexive, antisymmetric, and transitive on A.
a linear order on A, i R is transitive and for all x, y A with x ,= y either x, y R
or y, x R
Example 61 The equality relation is an equivalence relation on any set.
Example 62 The relation is a linear order on N (all elements are comparable)
Example 63 On sets of persons, the mother-of relation is an non-symmetric, non-
reexive relation.
Example 64 On sets of persons, the ancestor-of relation is a partial order that is not
linear.
c : Michael Kohlhase 52
36
AS
T
g
A
(/).
Denition 130 If t = c(t
t
) then we say that the symbol c is the head of t (write
head(t)). If t = a, then head(t) = a; head(t
1
, t
2
) is undened.
Notation 131 We will write c(a, b) instead of c(a, b) (cf. binary function)
c : Michael Kohlhase 87
The main purpose of ground constructor terms will be to represent data. In the data type from Ex-
ample 126 the ground constructor term s(s(o)) can be used to represent the unary natural number
2. Similarly, in the abstract data type from Example 127, the term cons(s(s(o)), cons(s(o), nil))
represents the list [2, 1].
Note: that to be a good data representation format for a set S of objects, ground constructor
terms need to
cover S, i.e. that for every object s S there should be a ground constructor term that
represents s.
be unambiguous, i.e. that we can decide equality by just looking at them, i.e. objects s S
and t S are equal, i their representations are.
But this is just what our Peano Axioms are for, so abstract data types come with specialized
Peano axioms, which we can paraphrase as
Peano Axioms for Abstract Data Types
Idea: Sorts represent sets!
Axiom 132 if t is a constructor term of sort T, then t T
Axiom 133 equality on constructor terms is trivial
Axiom 134 only constructor terms of sort T are in T (induction axioms)
c : Michael Kohlhase 88
58
Example 135 (An Abstract Data Type of Truth Values) We want to build an abstract
data type for the set T, F of truth values and various operations on it: We have looked at the ab-
breviations , , , for and, or, not, and implies. These can be interpreted as functions
on truth values: e.g. (T) = F, . . . . We choose the abstract data type B, [T : B], [F : B],
and have the abstract procedures
: ::B B B; (T, T) T, (T, F) F, (F, T) F, (F, F) F.
: ::B B B; (T, T) T, (T, F) T, (F, T) T, (F, F) F.
: ::B B; (T) F, (F) T,
: ::B B B; (
B
,
B
) ((
B
),
B
)
Note that A implies B, i A is false or B is true.
Subterms
Idea: Well-formed parts of constructor terms are constructor terms again
(maybe of a dierent sort)
Denition 136 Let / be an abstract data type and s and b be terms over /, then we
say that s is an immediate subterm of t, i t = f(s) or t = s, b or t = b, s.
Denition 137 We say that a s is a subterm of t, i s = t or there is an immediate
subterm t
t
of t, such that s is a subterm of t
t
.
Example 138 f(a) is a subterm of the terms f(a) and h((g((f(a)), (f(b))))), and an
immediate subterm of h((f(a))).
c : Michael Kohlhase 89
Now that we have established how to represent data, we will develop a theory of programs, which
will consist of directed equations in this case. We will do this as theories often are developed;
we start o with a very rst theory will not meet the expectations, but the test will reveal how
we have to extend the theory. We will iterate this procedure of theorizing, testing, and theory
adapting as often as is needed to arrive at a successful theory.
4.4.2 A First Abstract Interpreter
Let us now come up with a rst formulation of an abstract interpreter, which we will rene later
when we understand the issues involved. Since we do not yet, the notions will be a bit vague for
the moment, but we will see how they work on the examples.
59
But how do we compute?
Problem: We can dene functions, but how do we compute them?
Intuition: We direct the equations (l2r) and use them as rules.
Denition 139 If s, t T
g
T
(o
0
, ) are ground constructor terms and head(s) = f,
then we call s t a rule for f.
Example 140 turn (nil) = o and (cons(n, l)) = s((l))
to (nil) o and (cons(n, l)) s((l))
Denition 141 Let / := o
0
, T, then call a triple f::A R; 1 an abstract proce-
dure, i 1 is a set of rules for f. A is called the argument sort and R is called the result
sort of f::A R; 1.
Denition 142 A computation of an abstract procedure p is a sequence of ground
constructor terms t
1
t
2
. . . according to the rules of p. (whatever that means)
Denition 143 An abstract computation is a computation that we can perform in our
heads. (no real world constraints like memory size, time limits)
Denition 144 An abstract interpreter is an imagined machine that performs (abstract)
computations, given abstract procedures.
c : Michael Kohlhase 90
The central idea here is what we have seen above: we can dene functions by equations. But of
course when we want to use equations for programming, we will have to take some freedom of
applying them, which was useful for proving properties of functions above. Therefore we restrict
them to be applied in one direction only to make computation deterministic.
An Abstract Interpreter (preliminary version)
Denition 145 (Idea) Replace equals by equals! (this is licensed by the rules)
Input: an abstract procedure f::A R; 1 and an argument a T
g
A
(/).
Output: a result r T
g
R
(/).
Process: take the term t := f(a), for each rule (l r) 1, match t against l
If l matches a subterm s of t, instantiate r to r
t
, and replace s in t with r
t
. Take
the result as the new argument.
if not, proceed with the next rule.
Repeat this, until no rule applies
Denition 146 We say that an abstract procedure f::A R; 1 terminates (on a
T
g
A
(/)), i the computation (starting with f(a)) reaches a state, where no rule applies.
c : Michael Kohlhase 91
Let us now see how this works in an extended example; we use the abstract data type of lists from
Example 127 (only that we abbreviate unary natural numbers).
60
Example: the functions and @ on lists
Consider the abstract procedures ::1(N)1(N) ; (cons(n,l))@((l),cons(n,nil)),(nil)nil])
and @::1(N)1(N) ; @(cons(1,nil),cons(2,nil))cons(1,@(nil,cons(2,nil))),@(nil,l)l])
Then we have the following abstract computation
(cons(2, cons(1, nil))) @((cons(1, nil)), cons(2, nil))
((cons(n, l)) @((l), cons(n, nil)) with n = 2 and l = cons(1, nil))
@((cons(1, nil)), cons(2, nil)) @(@((nil), cons(1, nil)), cons(2, nil))
((cons(n, l)) @((l), cons(n, nil)) with n = 1 and l = nil)
@(@((nil), cons(1, nil)), cons(2, nil)) @(@(nil, cons(1, nil)), cons(2, nil))
((nil) nil)
@(@(nil, cons(1, nil)), cons(2, nil)) @(cons(1, nil), cons(2, nil))
(@(nil, l) l with l = cons(1, nil))
@(cons(1, nil), cons(2, nil)) cons(1, @(nil, cons(2, nil)))
(@(cons(n, l), r) cons(n, @(l, r)) with n = 1, l = nil, and r = cons(2, nil))
cons(1, @(nil, cons(2, nil))) cons(1, cons(2, nil)) (@(nil, l) l with l = cons(2, nil))
Aha: terminates on the argument cons(2, cons(1, nil))
c : Michael Kohlhase 92
Now lets get back to theory, unfortunately we do not have the means to write down rules: they
contain variables, which are not allowed in ground constructor rules. So what do we do in this
situation, we just extend the denition of the expressions we are allowed to write down.
Constructor Terms with Variables
Wait a minute!: what are these rules in abstract procedures?
Answer: pairs of constructor terms (really constructor terms?)
Idea: variables stand for arbitrary constructor terms (lets make this formal)
Denition 147 Let o
0
, T be an abstract data type. A (constructor term) variable is
a pair of a symbol and a base sort. E.g. x
A
, n
N1
, x
C
3,. . . .
Denition 148 We denote the current set of variables of sort A with 1
A
, and use
1 :=
AS
0 1
A
for the set of all variables.
Idea: add the following rule to the denition of constructor terms
variables of sort A o
0
are constructor terms of sort A.
Denition 149 If t is a constructor term, then we denote the set of variables occurring
in t with free(t). If free(t) = , then we say t is ground or closed.
c : Michael Kohlhase 93
To have everything at hand, we put the whole denition onto one slide.
61
Constr. Terms with Variables: The Complete Denition
Denition 150 Let o
0
, T be an abstract data type and 1 a set of variables, then we
call a representation t a constructor term (with variables from 1) of sort T, i
T o
0
and [t : T] T, or
t 1
T
is a variable of sort T o
0
, or
T = A B and t is of the form a, b, where a and b are constructor terms with
variables of sorts A and B, or
t is of the form c(a), where a is a constructor term with variables of sort A and there
is a constructor declaration [c: A T] T.
We denote the set of all constructor terms of sort A with T
A
(/; 1) and use T (/; 1) :=
AS
T
A
(/; 1).
c : Michael Kohlhase 94
Now that we have extended our model of terms with variables, we will need to understand how to
use them in computation. The main intuition is that variables stand for arbitrary terms (of the
right sort). This intuition is modeled by the action of instantiating variables with terms, which in
turn is the operation of applying a substitution to a term.
4.4.3 Substitutions
Substitutions are very important objects for modeling the operational meaning of variables: ap-
plying a substitution to a term instantiates all the variables with terms in it. Since a substitution
only acts on the variables, we simplify its representation, we can view it as a mapping from vari-
ables to terms that can be extended to a mapping from terms to terms. The natural way to dene
substitutions would be to make them partial functions from variables to terms, but the denition
below generalizes better to later uses of substitutions, so we present the real thing.
Substitutions
Denition 151 Let / be an abstract data type and (1 T (/; 1)), then we call
a substitution on /, i (x
A
) T
A
(/; 1), and supp() := x
A
1
A
[ (x
A
) ,= x
A
Mutual Recursion
generally, we can make more than one declaration at one time, e.g.
- val pi = 3.14 and e = 2.71;
val pi = 3.14
val e = 2.71
this is useful mainly for function declarations, consider for instance:
fun even (zero) = true
| even (suc(n)) = odd (n)
and odd (zero) = false
| odd(suc(n)) = even (n)
trace: (even(4) odd(3) even(2) odd(1) even(0) true)
c : Michael Kohlhase 109
This mutually recursive denition is somewhat like the childrens riddle, where we dene the left
hand as that hand where the thumb is on the right side and the right hand as that where the
thumb is on the right hand. This is also a perfectly good mutual recursion, only in contrast to
the even/odd example above the base cases are missing.
70
4.6 Even more SML: Exceptions and State in SML
Programming with Eects
Until now, our procedures have been characterized entirely by their values on their
arguments (as a mathematical function behaves)
This is not enough, therefore SML also considers eects, e.g. for
input/output: the interesting bit about a print statement is the eect
mutation: allocation and modication of storage during evaluation
communication: data may be sent and received over channels
exceptions: abort evaluation by signaling an exceptional condition
Idea: An eect is any action resulting from an evaluation that is not returning a value
(formal denition dicult)
Documentation: should always address arguments, values, and eects!
c : Michael Kohlhase 110
Raising Exceptions
Idea: Exceptions are generalized error codes
Example 175 predened exceptions (exceptions have names)
- 3 div 0;
uncaught exception divide by zero
raised at: <file stdIn>
- fib(100);
uncaught exception overflow
raised at: <file stdIn>
Example 176 user-dened exceptions (exceptions are rst-class objects)
- exception Empty;
exception Empty
- Empty;
val it = Empty : exn
Example 177 exception constructors (exceptions are just like any other value)
- exception SysError of int;
exception SysError of int;
- SysError
val it = fn : int -> exn
c : Michael Kohlhase 111
71
Programming with Exceptions
Example 178 A factorial function that checks for non-negative arguments
(just to be safe)
exception Factorial;
- fun safe_factorial n =
if n < 0 then raise Factorial
else if n = 0 then 1
else n * safe_factorial (n-1)
val safe_factorial = fn : int -> int
- safe_factorial(~1);
uncaught exception Factorial
raised at: stdIn:28.31-28.40
unfortunately, this program checks the argument in every recursive call
c : Michael Kohlhase 112
Programming with Exceptions (next attempt)
Idea: make use of local function denitions that do the real work
- local
fun fact 0 = 1 | fact n = n * fact (n-1)
in
fun safe_factorial n =
if n >= 0 then fact n else raise Factorial
end
val safe_factorial = fn : int -> int
- safe_factorial(~1);
uncaught exception Factorial
raised at: stdIn:28.31-28.40
this function only checks once, and the local function makes good use of pattern matching
( standard programming pattern)
c : Michael Kohlhase 113
72
Handling Exceptions
Denition 179 (Idea) Exceptions can be raised (through the evaluation pattern) and
handled somewhere above (throw and catch)
Consequence: Exceptions are a general mechanism for non-local transfers of control.
Denition 180 (SML Construct) exception handler: exp handle rules
Example 181 Handling the Factorial expression
fun factorial_driver () =
let val input = read_integer ()
val result = toString (safe_factorial input)
in
print result
end
handle Factorial => print "Outofrange."
| NaN => print "NotaNumber!"
For more information on SML: RTFM (read the ne manuals)
c : Michael Kohlhase 114
Input and Output in SML
Input and Output is handled via streams (think of innite strings)
there are two predened streams TextIO.stdIn and TextIO.stdOut
( = keyboard input and screen)
Input: via {TextIO.inputLine : TextIO.instream -> string
- TextIO.inputLine(TextIO.stdIn);
sdflkjsdlfkj
val it = "sdflkjsdlfkj" : string
Example 182 the read_integer function (just to be complete)
exception NaN; (* Not a Number *)
fun read_integer () =
let
val in = TextIO.inputLine(TextIO.stdIn);
in
if is_integer(in) then to_int(in) else raise NaN
end;
c : Michael Kohlhase 115
73
5 Encoding Programs as Strings
With the abstract data types we looked at last, we studied term structures, i.e. complex mathe-
matical objects that were built up from constructors, variables and parameters. The motivation
for this is that we wanted to understand SML programs. And indeed we have seen that there is a
close connection between SML programs on the one side and abstract data types and procedures
on the other side. However, this analysis only holds on a very high level, SML programs are not
terms per se, but sequences of characters we type to the keyboard or load from les. We only
interpret them to be terms in the analysis of programs.
To drive our understanding of programs further, we will rst have to understand more about se-
quences of characters (strings) and the interpretation process that derives structured mathematical
objects (like terms) from them. Of course, not every sequence of characters will be interpretable,
so we will need a notion of (legal) well-formed sequence.
5.1 Formal Languages
We will now formally dene the concept of strings and (building on that) formal langauges.
The Mathematics of Strings
Denition 183 An alphabet A is a nite set; we call each element a A a character,
and an n-tuple of s A
n
a string (of length n over A).
Denition 184 Note that A
0
= , where is the (unique) 0-tuple. With the
denition above we consider as the string of length 0 and call it the empty string and
denote it with
Note: Sets ,= Strings, e.g. 1, 2, 3 = 3, 2, 1, but 1, 2, 3 , = 3, 2, 1.
Notation 185 We will often write a string c
1
, . . . , c
n
as c
1
. . . c
n
, for instance abc
for a, b, c
Example 186 Take A = h, 1, / as an alphabet. Each of the symbols h, 1, and / is
a character. The vector /, /, 1, h, 1 is a string of length 5 over A.
Denition 187 (String Length) Given a string s we denote its length with [s[.
Denition 188 The concatenation conc(s, t) of two strings s = s
1
, ..., s
n
A
n
and
t = t
1
, ..., t
m
A
m
is dened as s
1
, ..., s
n
, t
1
, ..., t
m
A
n+m
.
We will often write conc(s, t) as s +t or simply st
(e.g. conc(text, book) = text + book = textbook)
c : Michael Kohlhase 116
We have multiple notations for concatenation, since it is such a basic operation, which is used
so often that we will need very short notations for it, trusting that the reader can disambiguate
based on the context.
Now that we have dened the concept of a string as a sequence of characters, we can go on to
give ourselves a way to distinguish between good strings (e.g. programs in a given programming
language) and bad strings (e.g. such with syntax errors). The way to do this by the concept of a
formal language, which we are about to dene.
74
Formal Languages
Denition 189 Let A be an alphabet, then we dene the sets A
+
:=
iN
+ A
i
of
nonempty strings and A
:= A
+
of strings.
Example 190 If A = a, b, c, then A
is a substring
of a string t A
by
s <
lex
t : s t (u, v, w A
. We call <
lex
the lexical order induced by <
A
on A
.
Theorem 201 <
lex
is a partial order. If <
A
is dened as total order, then <
lex
is total.
Example 202 Roman alphabet with a<b<c <z telephone book order
((computer <
lex
text), (text <
lex
textbook))
c : Michael Kohlhase 119
Even though the denition of the lexical ordering is relatively involved, we know it very well, it is
the ordering we know from the telephone books.
The next task for understanding programs as mathematical objects is to understand the process
of using strings to encode objects. The simplest encodings or codes are mappings from strings
to strings. We will now study their properties.
5.2 Elementary Codes
The most characterizing property for a code is that if we encode something with this code, then
we want to be able to decode it again: We model a code as a function (every character should
have a unique encoding), which has a partial inverse (so we can decode). We have seen above,
that this is is the case, i the function is injective; so we take this as the dening characteristic of
a code.
Character Codes
Denition 203 Let A and B be alphabets, then we call an injective function c: A
B
+
a character code. A string c(w) c(a) [ a A := B
+
is called a codeword.
Denition 204 A code is a called binary i B = 0, 1.
Example 205 Let A = a, b, c and B = 0, 1, then c: A B
+
with c(a) = 0011,
c(b) = 1101, c(c) = 0110 c is a binary character code and the strings 0011, 1101, and
0110 are the codewords of c.
Denition 206 The extension of a code (on characters) c: A B
+
to a function
c
t
: A
is dened as c
t
(a
1
, . . . , a
n
= c(a
1
), . . . , c(a
n
)).
Example 207 The extension c
t
of c from the above example on the string bbabc
c
t
(bbabc) = 1101
..
c(b)
, 1101
..
c(b)
, 0011
..
c(a)
, 1101
..
c(b)
, 0110
..
c(c)
c : Michael Kohlhase 120
76
Morse Code
In the early days of telecommunication the Morse Code was used to transmit texts,
using long and short pulses of electricity.
Denition 208 (Morse Code) The following table gives the Morse code for the text
characters:
A .- B -... C -.-. D -.. E .
F ..-. G . H .... I .. J .
K -.- L .-.. M N -. O
P .. Q .- R .-. S ... T -
U ..- V ...- W . X -..- Y -.
Z ..
1 .- 2 .. 3 ... 4 ....- 5 .....
6 -.... 7 ... 8 .. 9 -. 0
Furthermore, the Morse code uses ... for full stop (sentence termination), ..
for comma, and .... for question mark.
Example 209 The Morse Code in the table above induces a character code : 1
., .
c : Michael Kohlhase 121
Codes on Strings
Denition 210 A function c
t
: A
of a prex code c: A B
+
is a string
code.
Proof: We will prove this theorem via induction over the string length n
P.1 We show that c
() = , hence c
is injective.
P.1.2 n = 1 (another): If [s[ = 1 then c
: A (B )
+
, such that c(a) c
(a).
P.2 If c
(a) c
(b).
P.3 So, c
(b) contains the character not only at the end but also somewhere in the
middle.
P.4 This contradicts our construction c
A)
Denition 219 (ISO-Latin (ISO/IEC 8859)) 16 Extensions of ASCII to 8-bit
(256 characters) ISO-Latin 1 = Western European, ISO-Latin 6 = Arabic,ISO-Latin 7 =
Greek. . .
Problem: No cursive Arabic, Asian, African, Old Icelandic Runes, Math,. . .
Idea: Do something totally dierent to include all the worlds scripts: For a scalable
architecture, separate
what characters are available from the (character set)
bit string-to-character mapping (character encoding)
c : Michael Kohlhase 127
80
The goal of the UniCode standard is to cover all the worlds scripts (past, present, and future) and
provide ecient encodings for them. The only scripts in regular use that are currently excluded
are ctional scripts like the elvish scripts from the Lord of the Rings or Klingon scripts from the
Star Trek series.
An important idea behind UniCode is to separate concerns between standardizing the character
set i.e. the set of encodable characters and the encoding itself.
Unicode and the Universal Character Set
Denition 220 (Twin Standards) A scalable Architecture for representing all the
worlds scripts
The Universal Character Set dened by the ISO/IEC 10646 International Standard,
is a standard set of characters upon which many character encodings are based.
The Unicode Standard denes a set of standard character encodings, rules for nor-
malization, decomposition, collation, rendering and bidirectional display order
Denition 221 Each UCS character is identied by an unambiguous name and an
integer number called its code point.
The UCS has 1.1 million code points and nearly 100 000 characters.
Denition 222 Most (non-Chinese) characters have code points in [1, 65536] (the basic
multilingual plane).
Notation 223 For code points in the Basic Multilingual Plane (BMP), four digits are
used, e.g. U+0058 for the character LATIN CAPITAL LETTER X;
c : Michael Kohlhase 128
Note that there is indeed an issue with space-ecient encoding here. UniCode reserves space for
2
32
(more than a million) characters to be able to handle future scripts. But just simply using
32 bits for every UniCode character would be extremely wasteful: UniCode-encoded versions of
ASCII les would be four times as large.
Therefore UniCode allows multiple encodings. UTF-32 is a simple 32-bit code that directly uses
the code points in binary form. UTF-8 is optimized for western languages and coincides with
the ASCII where they overlap. As a consequence, ASCII encoded texts can be decoded in UTF-8
without changes but in the UTF-8 encoding, we can also address all other UniCode characters
(using multi-byte characters).
81
Character Encodings in Unicode
Denition 224 A character encoding is a mapping from bit strings to UCS code points.
Idea: Unicode supports multiple encodings (but not character sets) for eciency
Denition 225 (Unicode Transformation Format) UTF-8, 8-bit, variable-
width encoding, which maximizes compatibility with ASCII.
UTF-16, 16-bit, variable-width encoding (popular in Asia)
UTF-32, a 32-bit, xed-width encoding (for safety)
Denition 226 The UTF-8 encoding follows the following encoding scheme
Unicode Byte1 Byte2 Byte3 Byte4
U+000000 U+00007F 0xxxxxxx
U+000080 U+0007FF 110xxxxx 10xxxxxx
U+000800 U+00FFFF 1110xxxx 10xxxxxx 10xxxxxx
U+010000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Example 227 $ = U+0024 is encoded as 00100100 (1 byte)
= U+00A2 is encoded as 11000010,10100010 (two bytes)
e = U+20AC is encoded as 11100010,10000010,10101100 (three bytes)
c : Michael Kohlhase 129
Note how the xed bit prexes in the encoding are engineered to determine which of the four cases
apply, so that UTF-8 encoded documents can be safely decoded..
5.4 Formal Languages and Meaning
After we have studied the elementary theory of codes for strings, we will come to string represen-
tations of structured objects like terms. For these we will need more rened methods.
As we have started out the course with unary natural numbers and added the arithmetical
operations to the mix later, we will use unary arithmetics as our running example and study
object.
82
A formal Language for Unary Arithmetics
Goal: We want to develop a formal language that means something.
Idea: Start with something very simple: Unary Arithmetics
(i.e. N with addition, multiplication, subtraction, and integer division)
E
un
is based on the alphabet
un
:= C
un
V F
2
un
B, where
C
un
:= /
iN
E
i
un
, where
E
1
un
:= C
un
V
E
i+1
un
:= |a, add(a,b), sub(a,b), mul(a,b), div(a,b), mod(a,b) [ a, b E
i
un
We call a string in E
un
an expression of unary arithmetics.
c : Michael Kohlhase 130
The rst thing we notice is that the alphabet is not just a at any more, we have characters
with dierent roles in the alphabet. These roles have to do with the symbols used in the complex
objects (unary arithmetic expressions) that we want to encode.
The formal language E
un
is constructed in stages, making explicit use of the respective roles of
the characters in the alphabet. Constants and variables form the basic inventory in E
1
un
, the
respective next stage is built up using the function names and the structural characters to encode
the applicative structure of the encoded terms.
Note that with this construction E
i
un
E
i+1
un
.
A formal Language for Unary Arithmetics (Examples)
Example 228 add(//////,mul(x1902,///)) E
un
Proof: we proceed according to the denition
P.1 We have ////// Cun, and x1902 V , and /// Cun by denition
P.2 Thus ////// E
1
un
, and x1902 E
1
un
and /// E
1
un
,
P.3 Hence, ////// E
2
un
and mul(x1902,///) E
2
un
P.4 Thus add(//////,mul(x1902,///)) E
3
un
P.5 And nally add(//////,mul(x1902,///)) Eun
other examples:
div(x201,add(////,x12))
sub(mul(///,div(x23,///)),///)
what does it all mean? (nothing, E
un
is just a set of strings!)
c : Michael Kohlhase 131
83
To show that a string is an expression s of unary arithmetics, we have to show that it is in the
formal language E
un
. As E
un
is the union over all the E
i
un
, the string s must already be a member
of a set E
j
un
for some j N. So we reason by the denintion establising set membership.
Of course, computer science has better methods for dening languages than the ones used here
(context free grammars), but the simple methods used here will already suce to make the relevant
points for this course.
Syntax and Semantics (a rst glimpse)
Denition 229 A formal language is also called a syntax, since it only concerns the
form of strings.
to give meaning to these strings, we need a semantics, i.e. a way to interpret these.
Idea (Tarski Semantics): A semantics is a mapping from strings to objects we already
know and understand (e.g. arithmetics).
e.g. add(//////,mul(x1902,///)) 6 + (x
1907
3) (but what does this mean?)
looks like we have to give a meaning to the variables as well, e.g. x1902 3, then
add(//////,mul(x1902,///)) 6 + (3 3) = 15
c : Michael Kohlhase 132
So formal languages do not mean anything by themselves, but a meaning has to be given to them
via a mapping. We will explore that idea in more detail in the following.
84
6 Boolean Algebra
We will now look a formal language from a dierent perspective. We will interpret the language
of Boolean expressions as formulae of a very simple logic: A logic is a mathematical construct
to study the association of meaning to strings and reasoning processes, i.e. to study how humans
5
derive new information and knowledge from existing one.
6.1 Boolean Expressions and their Meaning
In the following we will consider the Boolean Expressions as the language of Propositional Logic,
in many ways the simplest of logics. This means we cannot really express very much of interest,
but we can study many things that are common to all logics.
Let us try again (Boolean Expressions)
Denition 230 (Alphabet) E
bool
is based on the alphabet / :=
C
bool
V F
1
bool
F
2
bool
B, where C
bool
= 0, 1, F
1
bool
= and F
2
bool
= +, .
(V and B as in E
un
)
Denition 231 (Formal Language) E
bool
:=
iN
E
i
bool
, where E
1
bool
:= C
bool
V
and
E
i+1
bool
:= a, (a), (a+b), (ab) [ a, b E
i
bool
.
Denition 232 Let a E
bool
. The minimal i, such that a E
i
bool
is called the depth
of a.
e
1
:= ((x1)+x3) (depth 3)
e
2
:= (((x1x2))+(x3x4)) (depth 4)
e
3
:= ((x1+x2)+((((x1)x2))+(x3x4))) (depth 6)
c : Michael Kohlhase 133
5
until very recently, humans were thought to be the only systems that could come up with complex argumenta-
tions. In the last 50 years this has changed: not only do we attribute more reasoning capabilities to animals, but
also, we have developed computer systems that are increasingly capable of reasoning.
85
Boolean Expressions as Structured Objects.
Idea: As strings in in E
bool
are built up via the union-principle, we can think of them
as constructor terms with variables
Denition 233 The abstract data type
B := B, [1: B], [0: B], [: B B], [+: B B B], [: B B B]
via the translation
Denition 234 : E
bool
T
B
(B; 1) dened by
(1) := 1 (0) := 0
((A)) := ((A))
((AB)) := ((A), (B)) ((A+B)) := +((A), (B))
We will use this intuition for our treatment of Boolean expressions and treak the strings
and constructor terms synonymouslhy. ( is a (hidden) isomorphism)
Denition 235 We will write (A) as A and (A, B) as A B (and similarly for +).
Furthermore we will write variables such as x71 as x
71
and elide brackets for sums and
products according to their usual precedences.
Example 236 ((((x1x2))+(x3x4))) = x
1
x
2
+x
3
x
4
*
: Do not confuse + and (Boolean sum and product) with their arithmetic counterparts.
(as members of a formal language they have no meaning!)
c : Michael Kohlhase 134
Now that we have dened the formal language, we turn the process of giving the strings a meaning.
We make explicit the idea of providing meaning by specifying a function that assigns objects that
we already understand to representations (strings) that do not have a priori meaning.
The rst step in assigning meaning is to x a set of objects what we will assign as meanings: the
universe (of discourse). To specify the meaning mapping, we try to get away with specifying
as little as possible. In our case here, we assign meaning only to the constants and functions and
induce the meaning of complex expressions from these. As we have seen before, we also have to
assign meaning to variables (which have a dierent ontological status from constants); we do this
by a special meaning function: a variable assignment.
86
Boolean Expressions: Semantics via Models
Denition 237 A model |, J for E
bool
is a set | of objects (called the universe)
together with an interpretation function J on / with J(C
bool
) |, J(F
1
bool
) T(|; |),
and J(F
2
bool
) T(|
2
; |).
Denition 238 A function : V | is called a variable assignment.
Denition 239 Given a model |, J and a variable assignment , the evaluation func-
tion J
: E
bool
| is dened recursively: Let c C
bool
, a, b E
bool
, and x V , then
J
(a) = J()(J
(a))
J
(a +b) = J(+)(J
(a), J
(b)) and J
(a b) = J()(J
(a), J
(b))
| = T, F with 0 F, 1 T, + , , .
| = E
un
with 0 /, 1 //, + div, mod, x.5.
| = 0, 1 with 0 0, 1 1, + min, max, x.1 x.
c : Michael Kohlhase 135
Note that all three models on the bottom of the last slide are essentially dierent, i.e. there is
no way to build an isomorphism between them, i.e. a mapping between the universes, so that all
Boolean expressions have corresponding values.
To get a better intuition on how the meaning function works, consider the following example.
We see that the value for a large expression is calculated by calculating the values for its sub-
expressions and then combining them via the function that is the interpretation of the constructor
at the head of the expression.
Evaluating Boolean Expressions
Let := [T/x1], [F/x2], [T/x3], [F/x4], and 1 = |0 F, 1 T, + , , , then
1((x1 +x2) + (x1 x2 +x3 x4))
= 1(x1 +x2) 1(x1 x2 +x3 x4)
= 1(x1) 1(x2) 1(x1 x2) 1(x3 x4)
= (x1) (x2) (1(x1 x2)) 1(x3 x4)
= (T F) ((1(x1) 1(x2)) (1(x3) 1(x4)))
= T ((1(x1)) (x2)) ((x3) (x4))
= T (((x1)) F) (T F)
= T ((T) F) F
= T (F F) F
= T (F) F = T T F = T
What a mess!
c : Michael Kohlhase 136
87
A better mouse-trap: Truth Tables
Truth tables to visualize truth functions:
T F
F T
T F
T T F
F F F
+ T F
T T T
F T F
If we are interested in values for all assignments (e.g. of x
123
x
4
+x
123
x
72
)
assignments intermediate results full
x
4
x
72
x
123
e
1
:= x
123
x
72
e
2
:= e
1
e
3
:= x
123
x
4
e
3
+ e
2
F F F F T F T
F F T F T F T
F T F F T F T
F T T T F F F
T F F F T F T
T F T F T T T
T T F F T F T
T T T T F T T
c : Michael Kohlhase 137
Boolean Algebra
Denition 240 A Boolean algebra is E
bool
together with the models
T, F, 0 F, 1 T, + , , .
0, 1, 0 0, 1 1, + max, min, x.1 x.
BTW, the models are equivalent (0 =F, 1 =T)
Denition 241 We will use B for the universe, which can be either 0, 1 or T, F
Denition 242 We call two expressions e
1
, e
2
E
bool
equivalent (write e
1
e
2
), i
J
(e
1
) = J
(e
2
) for all J and .
Theorem 243 e
1
e
2
, i (e1 +e
2
) (e
1
+e
2
) is a theorem of Boolean Algebra.
c : Michael Kohlhase 138
As we are mainly interested in the interplay between form and meaning in Boolean Algebra, we
will often identify Boolean expressions, if they have the same values in all situations (as specied
by the variable assignments). The notion of equivalent formulae formalizes this intuition.
88
Boolean Equivalences
Given a, b, c E
bool
, +, , let :=
_
+ if =
else
We have the following equivalences in Boolean Algebra:
a b b a (commutativity)
(a b) c a (b c) (associativity)
a (bc) (a b)(a c) (distributivity)
a (ab) a (covering)
(a b)(a b) a (combining)
(a b)((a c)(b c)) (a b)(a c) (consensus)
a b ab (De Morgan)
c : Michael Kohlhase 139
6.2 Boolean Functions
We will now turn to semantical counterparts of Boolean expressions: Boolean functions. These
are just n-ary functions on the Boolean values.
Boolean functions are interesting, since can be used as computational devices; we will study
this extensively in the rest of the course. In particular, we can consider a computer CPU as
collection of Boolean functions (e.g. a modern CPU with 64 inputs and outputs can be viewed as
a sequence of 64 Boolean functions of arity 64: one function per output pin).
The theory we will develop now will help us understand how to implement Boolean functions
(as specications of computer chips), viewing Boolean expressions very abstract representations of
congurations of logic gates and wiring. We will study the issues of representing such congurations
in more detail later
12
EdNote:12
Boolean Functions
Denition 244 A Boolean function is a function from B
n
to B.
Denition 245 Boolean functions f, g : B
n
B are called equivalent, (write f g),
i f(c) = g(c) for all c B
n
. (equal as functions)
Idea: We can turn any Boolean expression into a Boolean function by ordering the
variables (use the lexical ordering on X 1, . . . , 9
+
0, . . . , 9
)
Denition 246 Let e E
bool
and x
1
, . . . , x
n
the set of variables in e, then call
V L(e) := x
1
, . . . , x
n
the variable list of e, i (x
i
<
lex
x
j
) where i j.
Denition 247 Let e E
bool
with V L(e) = x
1
, . . . , x
n
, then we call the function
f
e
: B
n
B with f
e
: c J
c
(e)
the Boolean function induced by e, where
c1,...,cn)
: x
i
c
i
.
Theorem 248 e
1
e
2
, i f
e1
= f
e2
.
c : Michael Kohlhase 140
12
EdNote: make a forward reference here.
89
The denition above shows us that in theory every Boolean Expression induces a Boolean function.
The simplest way to compute this is to compute the truth table for the expression and then read
o the function from the table.
Boolean Functions and Truth Tables
The truth table of a Boolean function is dened in the obvious way:
x
1
x
2
x
3
f
x
1
(x
2
+x
3
)
T T T T
T T F F
T F T T
T F F T
F T T F
F T F F
F F T F
F F F F
compute this by assigning values and evaluating
Question: can we also go the other way? (from function to expression?)
Idea: read expression of a special form from truth tables (Boolean Polynomials)
c : Michael Kohlhase 141
Computing a Boolean expression from a given Boolean function is more interesting there are
many possible candidates to choose from; after all any two equivalent expressions induce the same
function. To simplify the problem, we will restrict the space of Boolean expressions that realize a
given Boolean function by looking only for expressions of a given form.
Boolean Polynomials
special form Boolean Expressions
a literal is a variable or the negation of a variable
a monomial or product term is a literal or the product of literals
a clause or sum term is a literal or the sum of literals
a Boolean polynomial or sum of products is a product term or the sum of product
terms
a clause set or product of sums is a sum term or the product of sum terms
For literals x
i
, write x
1
i
, for x
i
write x
0
i
. (
*
not exponentials, but intended truth values)
Notation 249 Write x
i
x
j
instead of x
i
x
j
. (like in math)
c : Michael Kohlhase 142
Armed with this normal form, we can now dene an way of realizing
13
Boolean functions. EdNote:13
13
EdNote: dene that formally above
90
Normal Forms of Boolean Functions
Denition 250 Let f : B
n
B be a Boolean function and c B
n
, then M
c
:=
n
j=1
x
cj
j
and S
c
:=
n
j=1
x
1cj
j
Denition 251 The disjunctive normal form (DNF) of f is
cf
1
(1)
M
c
(also called the canonical sum (written as DNF(f)))
Denition 252 The conjunctive normal form (CNF) of f is
cf
1
(0)
S
c
(also called the canonical product (written as CNF(f)))
x
1
x
2
x
3
f monomials clauses
0 0 0 1 x
0
1
x
0
2
x
0
3
0 0 1 1 x
0
1
x
0
2
x
1
3
0 1 0 0 x
1
1
+ x
0
2
+ x
1
3
0 1 1 0 x
1
1
+ x
0
2
+ x
0
3
1 0 0 1 x
1
1
x
0
2
x
0
3
1 0 1 1 x
1
1
x
0
2
x
1
3
1 1 0 0 x
0
1
+ x
0
2
+ x
1
3
1 1 1 1 x
1
1
x
1
2
x
1
3
DNF of f: x
1
x
2
x
3
+x
1
x
2
x
3
+x
1
x
2
x
3
+x
1
x
2
x
3
+x
1
x
2
x
3
CNF of f: (x
1
+x
2
+x
3
) (x
1
+x
2
+x
3
) (x
1
+x
2
+x
3
)
c : Michael Kohlhase 143
Normal Boolean Expressions
Denition 253 A monomial or clause is called normal, i each variable appears at most
once.
Note: Any monomial or clause can be reduced to a constant or a normal term.
Given a monomial or clause T
1
x
c1
T
2
x
c2
T
3
( +, )
we can rewrite to T = (T
1
T
2
T
3
)
. .
T
x
c1
x
c2
(using commutativity and associativity)
simplify the subterm x
c1
x
c2
according to tables:
c
1
c
2
x
c1
x
c2
T
0 0 x
0
T
t
x
0
0 1 0 0
1 0 0 0
1 1 x
1
T
t
x
1
c
1
c
2
x
c1
+x
c2
T
0 0 x
0
T
t
x
0
0 1 1 1
1 0 1 1
1 1 x
1
T
t
x
1
c : Michael Kohlhase 144
91
MinTerms and MaxTerms
Denition 254 An n-variable minterm (maxterm) is a normal monomial (clause) with
n literals.
Note: each monomial in the DNF of a Boolean function f is a minterm and each clause
in the CNF of f is a maxterm.
Denition 255 Given a Boolean expression e with n variables and a vector x B
n
.
The expression e covers x i f
e
(x) = 1.
Example 256 The expression x
1
x
2
+x
3
covers 1, 1, 0.
Note: Each minterm in a DNF covers exactly one vector. Namely, m = x
c1
1
x
c2
2
. . . x
cn
n
covers the value c
1
, . . . , c
n
. So by denition of the DNF, each minterm m in the DNF
of a function f : B
n
B covers exactly one x B
n
with f(x) = 1.
c : Michael Kohlhase 145
In the light of the argument of understanding Boolean expressions as implementations of Boolean
functions, the process becomes interesting while realizing specications of chips. In particular it
also becomes interesting, which of the possible Boolean expressions we choose for realizing a given
Boolean function. We will analyze the choice in terms of the cost of a Boolean expression.
Costs of Boolean Expressions
Idea: Complexity Analysis is about the estimation of resource needs
if we have two expressions for a Boolean function, which one to choose?
Idea: Let us just measure the size of the expression(after all it needs to be written down)
Better Idea: count the number of operators (computation elements)
Denition 257 The cost C(e) of e E
bool
is the number of operators in e.
Example 258 C(x
1
+x
3
) = 2, C(x
1
x
2
+x
3
x
4
) = 4,
C((x
1
+x
2
) + (x
1
x
2
+x
3
x
4
)) = 7
Denition 259 Let f : B
n
B be a Boolean function, then C(f) :=
min(C(e) [ f = f
e
) is the cost of f.
Note: We can nd expressions of arbitrarily high cost for a given Boolean function.
(e e 1)
but how to nd such an e with minimal cost for f?
c : Michael Kohlhase 146
6.3 Complexity Analysis for Boolean Expressions
92
The Landau Notations (aka. big-O Notation)
Denition 260 Let f, g : N N, we say that f is asymptotically bounded by g, written
as (f
a
g), i there is an n
0
N, such that f(n) g(n) for all n > n
0
.
Denition 261 The three Landau sets O(g), (g), (g) are dened as
O(g) = f [ k > 0.f
a
k g
(g) = f [ k > 0.f
a
k g
(g) = O(g) (g)
Intuition: The Landau sets express the shape of growth of the graph of a function.
If f O(g), then f grows at most as fast as g. (f is in the order of g)
If f (g), then f grows at least as fast as g. (f is at least in the order of g)
If f (g), then f grows as fast as g. (f is strictly in the order of g)
c : Michael Kohlhase 147
Commonly used Landau Sets
Landau set class name rank Landau set class name rank
O(1) constant 1 O(n
2
) quadratic 4
O(log
2
(n)) logarithmic 2 O(n
k
) polynomial 5
O(n) linear 3 O(k
n
) exponential 6
Theorem 262 These -classes establish a ranking
(increasing rank increasing growth)
O(1)O(log
2
(n))O(n)O(n
2
)O(n
k
)O(k
n
)
where k
t
> 2 and k > 1. The reverse holds for the -classes
(1)(log
2
(n))(n)(n
2
)(n
k
)(k
n
)
Idea: Use O-classes for worst-case complexity analysis and -classes for best-case.
c : Michael Kohlhase 148
Examples
Idea: the fastest growth function in sum determines the O-class
Example 263 (n.263748) O(1)
Example 264 (n.26n + 372) O(n)
Example 265 (n.7(n
2
) 372n + 92) O(n
2
)
Example 266 (n.857(n
10
) + 7342(n
7
) + 26(n
2
) + 902) O(n
10
)
Example 267 (n.3 (2
n
) + 72) O(2
n
)
Example 268 (n.3 (2
n
) + 7342(n
7
) + 26(n
2
) + 722) O(2
n
)
c : Michael Kohlhase 149
93
With the basics of complexity theory well-understood, we can now analyze the cost-complexity of
Boolean expressions that realize Boolean functions. We will rst derive two upper bounds for the
cost of Boolean functions with n variables, and then a lower bound for the cost.
The rst result is a very naive counting argument based on the fact that we can always realize a
Boolean function via its DNF or CNF. The second result gives us a better complexity with a more
involved argument. Another dierence between the proofs is that the rst one is constructive,
i.e. we can read an algorithm that provides Boolean expressions of the complexity claimed by the
algorithm for a given Boolean function. The second proof gives us no such algorithm, since it is
non-constructive.
An Upper Bound for the Cost of BF with n variables
Idea: Every Boolean function has a DNF and CNF, so we compute its cost.
Example 269 Let us look at the size of the DNF or CNF for f (B
3
B).
x
1
x
2
x
3
f monomials clauses
0 0 0 1 x
0
1
x
0
2
x
0
3
0 0 1 1 x
0
1
x
0
2
x
1
3
0 1 0 0 x
1
1
+ x
0
2
+ x
1
3
0 1 1 0 x
1
1
+ x
0
2
+ x
0
3
1 0 0 1 x
1
1
x
0
2
x
0
3
1 0 1 1 x
1
1
x
0
2
x
1
3
1 1 0 0 x
0
1
+ x
0
2
+ x
1
3
1 1 1 1 x
1
1
x
1
2
x
1
3
Theorem 270 Any f : B
n
B is realized by an e E
bool
with C(e) O(n 2
n
).
Proof: by counting (constructive proof (we exhibit a witness))
P.1 either e
n
:= CNF(f) has
2
n
2
clauses or less or DNF(f) does monomials
P.2 take smaller one, multiply/sum the monomials/clauses at cost 2
n1
1
P.3 there are n literals per clause/monomial e
i
, so C(e
i
) 2n 1
P.4 so C(e
n
) 2
n1
1 + 2
n1
(2n 1) and thus C(e
n
) O(n 2
n
)
c : Michael Kohlhase 150
For this proof we will introduce the concept of a realization cost function : N N to save
space in the argumentation. The trick in this proof is to make the induction on the arity work
by splitting an n-ary Boolean function into two n1-ary functions and estimate their complexity
separately. This argument does not give a direct witness in the proof, since to do this we have to
decide which of these two split-parts we need to pursue at each level. This yields an algorithm for
determining a witness, but not a direct witness itself.
94
We can do better (if we accept complicated witness)
Theorem 271 Let (n) := max(C(f) [ f : B
n
B), then O(2
n
).
Proof: we show that (n) 2
n
+d by induction on n
P.1.1 base case: We count the operators in all members: B B = f
1
, f
0
, f
x1
, f
x1
,
so (1) = 1 and thus (1) 2
1
+d for d = 0.
P.1.2 step case:
P.1.2.1 given f (B
n
B), then f(a
1
, . . . , a
n
) = 1, i either
a
n
= 0 and f(a
1
, . . . , a
n1
, 0) = 1 or
a
n
= 1 and f(a
1
, . . . , a
n1
, 1) = 1
P.1.2.2 Let f
i
(a
1
, . . . , a
n1
) := f(a
1
, . . . , a
n1
, i) for i 0, 1,
P.1.2.3 then there are e
i
E
bool
, such that f
i
= f
ei
and C(e
i
) = 2
n1
+d. (IH)
P.1.2.4 thus f = f
e
, where e := x
n
e
0
+x
n
e
1
and (n) = 2 2
n1
+ 2d + 4.
c : Michael Kohlhase 151
The next proof is quite a lot of work, so we will rst sketch the overall structure of the proof,
before we look into the details. The main idea is to estimate a cleverly chosen quantity from
above and below, to get an inequality between the lower and upper bounds (the quantity itself is
irrelevant except to make the proof work).
A Lower Bound for the Cost of BF with n Variables
Theorem 272 (
2
n
log
2
(n)
)
Proof: Sketch (counting again!)
P.1 the cost of a function is based on the cost of expressions.
P.2 consider the set c
n
of expressions with n variables of cost no more than (n).
P.3 nd an upper and lower bound for #(c
n
): ((n) #(c
n
) ((n)))
P.4 in particular: (n) ((n))
P.5 solving for (n) yields (n) (n) so (
2
n
log
2
(n)
)
We will expand P.3 and P.5 in the next slides
c : Michael Kohlhase 152
95
A Lower Bound For (n)-Cost Expressions
Denition 273 c
n
:= e E
bool
[ e has n variables and C(e) (n)
Lemma 274 #(c
n
) #(B
n
B)
Proof:
P.1 For all f
n
(B
n
B) we have C(f
n
) (n)
P.2 C(f
n
) = min(C(e) [ f
e
= f
n
) choose e
fn
with C(e
fn
) = C(f
n
)
P.3 all distinct: if e
g
e
h
, then f
eg
= f
e
h
and thus g = h.
Corollary 275 #(c
n
) 2
(2
n
)
Proof: consider the n dimensional truth tables
P.1 2
n
entries that can be either 0 or 1, so 2
(2
n
)
possibilities
P.2 so #(B
n
B) = 2
(2
n
)
c : Michael Kohlhase 153
An Upper Bound For (n)-cost Expressions
Idea: Estimate the number of E
bool
strings that can be formed at a given cost by looking
at the length and alphabet size.
Denition 276 Given a cost c let (e) be the length of e considering variables as single
characters. We dene
(c) := max((e) [ e E
bool
(C(e) c))
Lemma 277 (n) 5n for n > 0.
Proof: by induction on n
P.1.1 base case: The cost 1 expressions are of the form (vw) and (v), where v and
w are variables. So the length is at most 5.
P.1.2 step case: (n) = ((e1e2)) = (e1) + (e2) + 3, where C(e1) +C(e2) n 1.
so (n) (i) +(j) + 3 5 C(e1) + 5 C(e2) + 3 5 n 1 + 5 = 5n
Corollary 278 max((e) [ e c
n
) 5 (n)
c : Michael Kohlhase 154
96
An Upper Bound For (n)-cost Expressions
Idea: e c
n
has at most n variables by denition.
Let /
n
:= x
1
, . . ., x
n
, 0, 1, , +, , (, ), then #(/
n
) = n + 7
Corollary 279 c
n
5(n)
i=0
/
n
i
and #(c
n
)
(n+7)
5(n)+1
1
n+7
Proof: Note that the /
j
are disjoint for distinct n.
#(
5(n)
i=0
An
i
) =
5(n)
i=0
#(An
i
) =
5(n)
i=0
#(An
i
) =
5(n)
i=0
(n + 7)
i
=
(n + 7)
5(n)+1
1
n + 7
c : Michael Kohlhase 155
Solving for (n)
(n+7)
5(n)+1
1
n+7
2
(2
n
)
(n + 7)
5(n)+1
2
(2
n
)
(as (n + 7)
5(n)+1
(n+7)
5(n)+1
1
n+7
)
5(n) + 1 log
2
(n + 7) 2
n
(as log
a
(x) = log
b
(x) log
a
(b))
5(n) + 1
2
n
log
2
(n+7)
(n) 1/5
(2
n
)
log
2
(n+7)
1
(n) (
2
n
log
2
(n)
)
c : Michael Kohlhase 156
6.4 The Quine-McCluskey Algorithm
After we have studied the worst-case complexity of Boolean expressions that realize given Boolean
functions, let us return to the question of computing realizing Boolean expressions in practice. We
will again restrict ourselves to the subclass of Boolean polynomials, but this time, we make sure
that we nd the optimal representatives in this class.
The rst step in the endeavor of nding minimal polynomials for a given Boolean function is to
optimize monomials for this task. We have two concerns here. We are interested in monomials
that contribute to realizing a given Boolean function f (we say they imply f or are implicants),
and we are interested in the cheapest among those that do. For the latter we have to look at a way
to make monomials cheaper, and come up with the notion of a sub-monomial, i.e. a monomial
that only contains a subset of literals (and is thus cheaper.)
97
.
Denition 282 Let M = L
1
L
n
and M
t
= L
t
1
L
t
n
be monomials, then M
t
is
called a sub-monomial of M (write M
t
M), i M
t
= 1 or
for all j n
t
, there is an i n, such that L
t
j
= L
i
and
there is an i n, such that L
i
,= L
t
j
for all j n
In other words: M is a sub-monomial of M
t
, i the literals of M are a proper subset of
the literals of M
t
.
c : Michael Kohlhase 157
With these denitions, we can convince ourselves that sub-monomials are dominated by their
super-monomials. Intuitively, a monomial is a conjunction of conditions that are needed to make
the Boolean function f true; if we have fewer of them, then we cannot approximate the truth-
conditions of f suciently. So we will look for monomials that approximate f well enough and
are shortest with this property: the prime implicants of f.
Constructing Minimal Polynomials: Prime Implicants
Lemma 283 If M
t
M, then M
t
dominates M.
Proof:
P.1 Given c B
n
with f
M
(c) = T, we have, f
Li
(c) = T for all literals in M.
P.2 As M
t
is a sub-monomial of M, then f
L
j
(c) = T for each literal L
t
j
of M
t
.
P.3 Therefore, f
M
(c) = T.
Denition 284 An implicant M of f is a prime implicant of f i no sub-monomial of
M is an implicant of f.
c : Michael Kohlhase 158
The following Theorem veries our intuition that prime implicants are good candidates for con-
structing minimal polynomials for a given Boolean function. The proof is rather simple (if no-
tationally loaded). We just assume the contrary, i.e. that there is a minimal polynomial p that
contains a non-prime-implicant monomial M
k
, then we can decrease the cost of the of p while still
inducing the given function f. So p was not minimal which shows the assertion.
98
Prime Implicants and Costs
Theorem 285 Given a Boolean function f ,= x.F and a Boolean polynomial f
p
f
with minimal cost, i.e., there is no other polynomial p
t
p such that C(p
t
) < C(p).
Then, p solely consists of prime implicants of f.
Proof: The theorem obviously holds for f = x.T.
P.1 For other f, we have f f
p
where p :=
n
i=1
M
i
for some n 1 monomials M
i
.
P.2 Nos, suppose that M
i
is not a prime implicant of f, i.e., M
t
~ f for some M
t
M
k
with k < i.
P.3 Let us substitute M
k
by M
t
: p
t
:=
k1
i=1
M
i
+M
t
+
n
i=k+1
M
i
P.4 We have C(M
t
) < C(M
k
) and thus C(p
t
) < C(p) (def of sub-monomial)
P.5 Furthermore M
k
M
t
and hence that p p
t
by Lemma 283.
P.6 In addition, M
t
p as M
t
~ f and f = p.
P.7 similarly: M
i
p for all M
i
. Hence, p
t
p.
P.8 So p
t
p and f
p
f. Therefore, p is not a minimal polynomial.
c : Michael Kohlhase 159
This theorem directly suggests a simple generate-and-test algorithm to construct minimal poly-
nomials. We will however improve on this using an idea by Quine and McCluskey. There are of
course better algorithms nowadays, but this one serves as a nice example of how to get from a
theoretical insight to a practical algorithm.
The Quine/McCluskey Algorithm (Idea)
Idea: use this theorem to search for minimal-cost polynomials
Determine all prime implicants (sub-algorithm QMC
1
)
choose the minimal subset that covers f (sub-algorithm QMC
2
)
Idea: To obtain prime implicants,
start with the DNF monomials (they are implicants by construction)
nd submonomials that are still implicants of f.
Idea: Look at polynomials of the form p := mx
i
+mx
i
(note: p m)
c : Michael Kohlhase 160
Armed with the knowledge that minimal polynomials must consist entirely of prime implicants,
we can build a practical algorithm for computing minimal polynomials: In a rst step we compute
the set of prime implicants of a given function, and later we see whether we actually need all of
them.
For the rst step we use an important observation: for a given monomial m, the polynomials
mx +mx are equivalent, and in particular, we can obtain an equivalent polynomial by replace the
latter (the partners) by the former (the resolvent). That gives the main idea behind the rst part
of the Quine-McCluskey algorithm. Given a Boolean function f, we start with a polynomial for f:
the disjunctive normal form, and then replace partners by resolvents, until that is impossible.
99
The algorithm QMC
1
, for determining Prime Implicants
Denition 286 Let M be a set of monomials, then
1(M) := m [ (mx) M (mx) M is called the set of resolvents of M
1(M) := m M [ m has a partner in M (nx
i
and nx
i
are partners)
Denition 287 (Algorithm) Given f : B
n
B
let M
0
:= DNF(f) and for all j > 0 compute (DNF as set of monomials)
M
j
:= 1(M
j1
) (resolve to get sub-monomials)
P
j
:= M
j1
1(M
j1
) (get rid of redundant resolution partners)
terminate when M
j
= , return P
prime
:=
n
j=1
P
j
c : Michael Kohlhase 161
We will look at a simple example to fortify our intuition.
Example for QMC
1
x1 x2 x3 f monomials
F F F T x1
0
x2
0
x3
0
F F T T x1
0
x2
0
x3
1
F T F F
F T T F
T F F T x1
1
x2
0
x3
0
T F T T x1
1
x2
0
x3
1
T T F F
T T T T x1
1
x2
1
x3
1
P
prime
=
3
_
j=1
P
j
= x1 x3, x2
M
0
= {x1 x2 x3
=: e
0
1
, x1 x2 x3
=: e
0
2
, x1 x2 x3
=: e
0
3
, x1 x2 x3
=: e
0
4
, x1 x2 x3
=: e
0
5
}
M
1
= { x1 x2
R(e
0
1
,e
0
2
)
=: e
1
1
, x2 x3
R(e
0
1
,e
0
3
)
=: e
1
2
, x2 x3
R(e
0
2
,e
0
4
)
=: e
1
3
, x1 x2
R(e
0
3
,e
0
4
)
=: e
1
4
, x1 x3
R(e
0
4
,e
0
5
)
=: e
1
5
}
P
1
=
M
2
= { x2
R(e
1
1
,e
1
4
)
, x2
R(e
1
2
,e
1
3
)
}
P
2
= {x1 x3}
M
3
=
P
3
= {x2}
But: even though the minimal polynomial only consists of prime implicants, it need not
contain all of them
c : Michael Kohlhase 162
We now verify that the algorithm really computes what we want: all prime implicants of the
Boolean function we have given it. This involves a somewhat technical proof of the assertion
below. But we are mainly interested in the direct consequences here.
100
Properties of QMC
1
Lemma 288 (proof by simple (mutual) induction)
1. all monomials in M
j
have exactly n j literals.
2. M
j
contains the implicants of f with n j literals.
3. P
j
contains the prime implicants of f with n j + 1 for j > 0 . literals
Corollary 289 QMC
1
terminates after at most n rounds.
Corollary 290 P
prime
is the set of all prime implicants of f.
c : Michael Kohlhase 163
Note that we are not nished with our task yet. We have computed all prime implicants of a given
Boolean function, but some of them might be un-necessary in the minimal polynomial. So we
have to determine which ones are. We will rst look at the simple brute force method of nding
the minimal polynomial: we just build all combinations and test whether they induce the right
Boolean function. Such algorithms are usually called generate-and-test algorithms.
They are usually simplest, but not the best algorithms for a given computational problem. This
is also the case here, so we will present a better algorithm below.
Algorithm QMC
2
: Minimize Prime Implicants Polynomial
Denition 291 (Algorithm) Generate and test!
enumerate S
p
P
prime
, i.e., all possible combinations of prime implicants of f,
form a polynomial e
p
as the sum over S
p
and test whether f
ep
= f and the cost of
e
p
is minimal
Example 292 P
prime
= x1 x3, x2, so e
p
1, x1 x3, x2, x1 x3 +x2.
Only f
x1 x3+x2
f, so x1 x3 +x2 is the minimal polynomial
Complaint: The set of combinations (power set) grows exponentially
c : Michael Kohlhase 164
101
n
j=1
p
j
for some n N and prime implicants
p
j
.
P.2 so for all c B
n
with f(c) = T there is a j n with f
pj
(c) = T.
P.3 so p cannot be essential
c : Michael Kohlhase 166
Let us now apply the optimized algorithm to a slightly bigger example.
A complex Example for QMC (Function and DNF)
x1 x2 x3 x4 f monomials
F F F F T x1
0
x2
0
x3
0
x4
0
F F F T T x1
0
x2
0
x3
0
x4
1
F F T F T x1
0
x2
0
x3
1
x4
0
F F T T F
F T F F F
F T F T T x1
0
x2
1
x3
0
x4
1
F T T F F
F T T T F
T F F F F
T F F T F
T F T F T x1
1
x2
0
x3
1
x4
0
T F T T T x1
1
x2
0
x3
1
x4
1
T T F F F
T T F T F
T T T F T x1
1
x2
1
x3
1
x4
0
T T T T T x1
1
x2
1
x3
1
x4
1
c : Michael Kohlhase 167
102
A complex Example for QMC (QMC
1
)
M
0
= x1
0
x2
0
x3
0
x4
0
, x1
0
x2
0
x3
0
x4
1
, x1
0
x2
0
x3
1
x4
0
,
x1
0
x2
1
x3
0
x4
1
, x1
1
x2
0
x3
1
x4
0
, x1
1
x2
0
x3
1
x4
1
,
x1
1
x2
1
x3
1
x4
0
, x1
1
x2
1
x3
1
x4
1
M
1
= x1
0
x2
0
x3
0
, x1
0
x2
0
x4
0
, x1
0
x3
0
x4
1
, x1
1
x2
0
x3
1
,
x1
1
x2
1
x3
1
, x1
1
x3
1
x4
1
, x2
0
x3
1
x4
0
, x1
1
x3
1
x4
0
P
1
=
M
2
= x1
1
x3
1
P
2
= x1
0
x2
0
x3
0
, x1
0
x2
0
x4
0
, x1
0
x3
0
x4
1
, x2
0
x3
1
x4
0
M
3
=
P
3
= x1
1
x3
1
P
prime
= x1 x2 x3, x1 x2 x4, x1 x3 x4, x2 x3 x4, x1 x3
c : Michael Kohlhase 168
A better Mouse-trap for QMC
1
: optimizing the data structure
Idea: Do the calculations directly on the DNF table
x1 x2 x3 x4 monomials
F F F F x1
0
x2
0
x3
0
x4
0
F F F T x1
0
x2
0
x3
0
x4
1
F F T F x1
0
x2
0
x3
1
x4
0
F T F T x1
0
x2
1
x3
0
x4
1
T F T F x1
1
x2
0
x3
1
x4
0
T F T T x1
1
x2
0
x3
1
x4
1
T T T F x1
1
x2
1
x3
1
x4
0
T T T T x1
1
x2
1
x3
1
x4
1
Note: the monomials on the right hand side are only for illustration
Idea: do the resolution directly on the left hand side
Find rows that dier only by a single entry. (rst two rows)
resolve: replace them by one, where that entry has an X (canceled literal)
Example 297 F, F, F, F and F, F, F, T resolve to F, F, F, X.
c : Michael Kohlhase 169
103
A better Mouse-trap for QMC
1
: optimizing the data structure
One step resolution on the table
x1 x2 x3 x4 monomials
F F F F x1
0
x2
0
x3
0
x4
0
F F F T x1
0
x2
0
x3
0
x4
1
F F T F x1
0
x2
0
x3
1
x4
0
F T F T x1
0
x2
1
x3
0
x4
1
T F T F x1
1
x2
0
x3
1
x4
0
T F T T x1
1
x2
0
x3
1
x4
1
T T T F x1
1
x2
1
x3
1
x4
0
T T T T x1
1
x2
1
x3
1
x4
1
x1 x2 x3 x4 monomials
F F F X x1
0
x2
0
x3
0
F F X F x1
0
x2
0
x4
0
F X F T x1
0
x3
0
x4
1
T F T X x1
1
x2
0
x3
1
T T T X x1
1
x2
1
x3
1
T X T T x1
1
x3
1
x4
1
X F T F x2
0
x3
1
x4
0
T X T F x1
1
x3
1
x4
0
Repeat the process until no more progress can be made
x1 x2 x3 x4 monomials
F F F X x1
0
x2
0
x3
0
F F X F x1
0
x2
0
x4
0
F X F T x1
0
x3
0
x4
1
T X T X x1
1
x3
1
X F T F x2
0
x3
1
x4
0
This table represents the prime implicants of f
c : Michael Kohlhase 170
A complex Example for QMC (QMC
1
)
The PIT:
FFFF FFFT FFTF FTFT TFTF TFTT TTTF TTTT
x1 x2 x3 T T F F F F F F
x1 x2 x4 T F T F F F F F
x1 x3 x4 F T F T F F F F
x2 x3 x4 F F T F T F F F
x1 x3 F F F F T T T T
x1 x2 x3 is not essential, so we are left with
FFFF FFFT FFTF FTFT TFTF TFTT TTTF TTTT
x1 x2 x4 T F T F F F F F
x1 x3 x4 F T F T F F F F
x2 x3 x4 F F T F T F F F
x1 x3 F F F F T T T T
here x2, x3, x4 is not essential, so we are left with
FFFF FFFT FFTF FTFT TFTF TFTT TTTF TTTT
x1 x2 x4 T F T F F F F F
x1 x3 x4 F T F T F F F F
x1 x3 F F F F T T T T
all the remaining ones (x1 x2 x4, x1 x3 x4, and x1 x3) are essential
So, the minimal polynomial of f is x1 x2 x4 +x1 x3 x4 +x1 x3.
c : Michael Kohlhase 171
*
The following section about KV-Maps was only taught until fall 2008, it is included here just
for reference
*
6.5 A simpler Method for nding Minimal Polynomials
104
Simple Minimization: Karnaugh-Veitch Diagram
The QMC algorithm is simple but tedious (not for the back of an envelope)
KV-maps provide an ecient alternative for up to 6 variables
Denition 298 A Karnaugh-Veitch map (KV-map) is a rectangular table lled with
truth values induced by a Boolean function. Minimal polynomials can be read of KV-
maps by systematically grouping equivalent table cells into rectangular areas of size 2
k
.
Example 299 (Common KV-map schemata)
2 vars 3 vars 4 vars
A A
B
B
AB AB AB AB
C
C
AB AB AB AB
CD m0 m4 m12 m8
CD m1 m5 m13 m9
CD m3 m7 m15 m11
CD m2 m6 m14 m10
square ring torus
2/4-groups 2/4/8-groups 2/4/8/16-groups
Note: Note that the values in are ordered, so that exactly one variable ips sign between
adjacent cells (Gray Code)
c : Michael Kohlhase 172
105
KV-maps Example: E(6, 8, 9, 10, 11, 12, 13, 14)
Example 300
# A B C D V
0 F F F F F
1 F F F T F
2 F F T F F
3 F F T T F
4 F T F F F
5 F T F T F
6 F T T F T
7 F T T T F
8 T F F F T
9 T F F T T
10 T F T F T
11 T F T T T
12 T T F F T
13 T T F T T
14 T T T F T
15 T T T T F
The corresponding KV-map:
AB AB AB AB
CD F F T T
CD F F T T
CD F F F T
CD F T T T
in the red/brown group
A does not change, so include A
B changes, so do not include it
C does not change, so include C
D changes, so do not include it
So the monomial is AC
in the green/brown group we have AB
in the blue group we have BC D
The minimal polynomial for E(6, 8, 9, 10, 11, 12, 13, 14) is AB +AC +BC D
c : Michael Kohlhase 173
KV-maps Caveats
groups are always rectangular of size 2
k
(no crooked shapes!)
a group of size 2
k
induces a monomial of size n k (the bigger the better)
groups can straddle vertical borders for three variables
groups can straddle horizontal and vertical borders for four variables
picture the the n-variable case as a n-dimensional hypercube!
c : Michael Kohlhase 174
106
7 Propositional Logic
Boolean Expressions and Propositional Logic
7.1 Boolean Expressions and Propositional Logic
We will now look at Boolean expressions from a dierent angle. We use them to give us a very
simple model of a representation language for
knowledge in our context mathematics, since it is so simple, and
argumentation i.e. the process of deriving new knowledge from older knowledge
Still another Notation for Boolean Expressions
Idea: get closer to MathTalk
Use , , , , and directly (after all, we do in MathTalk)
construct more complex names (propositions) for variables
(Use ground terms of sort B in an ADT)
Denition 301 Let = o, T be an abstract data type, such that B o and
[: B B], [: B B B] T, then we call the set T
g
B
() of ground -terms of
sort B a formulation of Propositional Logic.
We will also call this formulation Predicate Logic without Quantiers and denote it with
PLNQ.
Denition 302 Call terms in T
g
B
() without , , , , and atoms.(write /())
Note: Formulae of propositional logic are Boolean Expressions
replace A B by (A B) (B A) and A B by A B. . .
Build print routine with
A B =
A
B, and
A =
A and that turns atoms into
variable names. (variables and atoms are countable)
c : Michael Kohlhase 175
Conventions for Brackets in Propositional Logic
we leave out outer brackets: A B abbreviates (A B).
implications are right associative: A
1
A
n
C abbreviates A
1
( ( (A
n
C)))
a stands for a left bracket whose partner is as far right as is consistent with existing
brackets (A C D = A (C D))
c : Michael Kohlhase 176
We will now use the distribution of values of a Boolean expression under all (variable) assignments
to characterize them semantically. The intuition here is that we want to understand theorems,
examples, counterexamples, and inconsistencies in mathematics and everyday reasoning
6
.
6
Here (and elsewhere) we will use mathematics (and the language of mathematics) as a test tube for under-
standing reasoning, since mathematics has a long history of studying its own reasoning processes and assumptions.
107
The idea is to use the formal language of Boolean expressions as a model for mathematical
language. Of course, we cannot express all of mathematics as Boolean expressions, but we can at
least study the interplay of mathematical statements (which can be true or false) with the copula
and, or and not.
Semantic Properties of Boolean Expressions
Denition 303 Let / := |, J be our model, then we call e
true under in /, i J
(e) = T (write / [=
e)
false under in /, i J
e)
satisable in /, i J
for J
(e), if / = |, J.
(and [[e]]
,
, if e is ground, and [[e]], if / is clear)
Denition 307 (Entailment) (aka. logical consequence)
We say that e entails f (e [= f), i J
(e) = T
(i.e. all assignments that make e true also make f true)
c : Michael Kohlhase 177
Let us now see how these semantic properties model mathematical practice.
In mathematics we are interested in assertions that are true in all circumstances. In our model
of mathematics, we use variable assignments to stand for circumstances. So we are interested
in Boolean expressions which are true under all variable assignments; we call them valid. We
often give examples (or show situations) which make a conjectured assertion false; we call such
examples counterexamples, and such assertions falsiable. We also often give examples for
certain assertions to show that they can indeed be made true (which is not the same as being
valid yet); such assertions we call satisable. Finally, if an assertion cannot be made true in any
circumstances we call it unsatisable; such assertions naturally arise in mathematical practice in
the form of refutation proofs, where we show that an assertion (usually the negation of the theorem
we want to prove) leads to an obviously unsatisable conclusion, showing that the negation of the
theorem is unsatisable, and thus the theorem valid.
108
Example: Propositional Logic with ADT variables
Idea: We use propositional logic to express things about the world
(PLNQ = Predicate Logic without Quantiers)
Abstract Data Type: B, I, . . ., [love: I I B], [bill : I], [mary: I], . . .
ground terms:
g
1
:= love(bill, mary) (how nice)
g
2
:= love(mary, bill) love(bill, mary) (how sad)
g3 := love(bill, mary) love(mary, john) hate(bill, john) (how natural)
Semantics: by mapping into known stu, (e.g. I to persons B to T, F)
Idea: Import semantics from Boolean Algebra (atoms are variables)
only need variable assignment : /() T, F
Example 308 J
(A B) = T, i J
(A) = F or J
(B) = T.
K := P Q P, S := (P Q R) (P Q) P R
A B A
B
MP
A
[B/X](A)
Subst
Let us look at a 1
0
theorem (with a proof)
C C (Tertium non datur)
Proof:
P.1 (C (C C) C) (C C C) C C
(S with [C/P], [C C/Q], [C/R])
P.2 C (C C) C (K with [C/P], [C C/Q])
P.3 (C C C) C C (MP on P.1 and P.2)
P.4 C C C (K with [C/P], [C/Q])
P.5 C C (MP on P.3 and P.4)
P.6 We have shown that
)
0 C C (i.e. C C is a theorem) (is is also valid?)
c : Michael Kohlhase 180
This is indeed a very simple logic, that with all of the parts that are necessary:
A formal language: expressions built up from variables and implications.
A semantics: given by the obvious interpretation function
A calculus: given by the two axioms and the two inference rules.
The calculus gives us a set of rules with which we can derive new formulae from old ones. The
axioms are very simple rules, they allow us to derive these two formulae in any situation. The
inference rules are slightly more complicated: we read the formulae above the horizontal line as
assumptions and the (single) formula below as the conclusion. An inference rule allows us to derive
the conclusion, if we have already derived the assumptions.
Now, we can use these inference rules to perform a proof. A proof is a sequence of formulae that
can be derived from each other. The representation of the proof in the slide is slightly compactied
to t onto the slide: We will make it more explicit here. We rst start out by deriving the formula
(P Q R) (P Q) P R (1)
which we can always do, since we have an axiom for this formula, then we apply the rule subst,
where A is this result, B is C, and X is the variable P to obtain
(C Q R) (C Q) C R (2)
Next we apply the rule subst to this where B is C C and X is the variable Q this time to
obtain
(C (C C) R) (C C C) C R (3)
111
And again, we apply the rule subst this time, B is C and X is the variable R yielding the rst
formula in our proof on the slide. To conserve space, we have combined these three steps into one
in the slide. The next steps are done in exactly the same way.
The name MP comes from the Latin name modus ponens (the mode of putting [new facts]),
this is one of the classical syllogisms discovered by the ancient Greeks. The name Subst is just
short for substitution, since the rule allows to instantiate variables in formulae with arbitrary
other formulae.
We will now generalize what we have seen in the example so that we can talk about calculi and
proofs in other situations and see what was specic to the example.
Derivations and Proofs
Denition 310 A derivation of a formula C from a set 1 of hypotheses (write 1 C)
is a sequence A
1
, . . . , A
m
of formulae, such that
A
m
= C (derivation culminates in C)
for all (1 i m), either A
i
1 (hypothesis)
or there is an inference rule
A
l1
; A
l
k
A
i
, where l
j
< i for all j k.
Example 311 In the propositional calculus of natural deduction we have A B A:
the sequence is A B A, A, B A
Ax
A B A A
E
B A
Denition 312 A derivation
C
A is called a proof of A and if one exists (
C
A)
then A is called a (-theorem.
Denition 313 an inference rule J is called admissible in (, if the extension of ( by J
does not yield new theorems.
c : Michael Kohlhase 181
With formula schemata we mean representations of sets of formulae. In our example above, we
used uppercase boldface letters as (meta)-variables for formulae. For instance, the the modus
ponens inference rule stands for
As an axiom does not have assumptions, it can be added to a proof at any time. This is just
what we did with the axioms in our example proof.
In general formulae can be used to represent facts about the world as propositions; they have
a semantics that is a mapping of formulae into the real world (propositions are mapped to truth
values.) We have seen two relations on formulae: the entailment relation and the deduction
relation. The rst one is dened purely in terms of the semantics, the second one is given by a
calculus, i.e. purely syntactically.
The main question we must ask ourselves: is there any relation between these relations?
Ideally, both relations would be the same, then the calculus would allow us to infer all facts
that can be represented in the given formal language and that are true in the real world, and only
those. In other words, our representation and inference is faithful to the world.
A consequence of this is that we can rely on purely syntactical means to make predictions
about the world. Computers rely on formal representations of the world; if we want to solve a
problem on our computer, we rst represent it in the computer (as data structures, which can be
seen as a formal language) and do syntactic manipulations on these structures (a form of calculus).
Now, if the provability relation induced by the calculus and the validity relation coincide (this will
112
be quite dicult to establish in general), then the solutions of the program will be correct, and
we will nd all possible ones.
Of course, the logics we have studied so far are very simple, and not able to express interesting
facts about the world, but we will study them as a simple example of the fundamental problem of
Computer Science: How do the formal representations correlate with the real world.
Properties of Calculi (Theoretical Logic)
Correctness: (provable implies valid)
1 B implies 1 [= B (equivalent: A implies [=A)
Completeness: (valid implies provable)
1 [= B implies 1 B (equivalent: [=A implies A)
Goal: A i [=A (provability and validity coincide)
To TRUTH through PROOF (CALCULEMUS [Leibniz 1680])
c : Michael Kohlhase 182
Within the world of logics, one can derive new propositions (the conclusions, here: Socrates is
mortal) from given ones (the premises, here: Every human is mortal and Sokrates is human).
Such derivations are proofs.
Logics can describe the internal structure of real-life facts; e.g. individual things, actions, prop-
erties. A famous example, which is in fact as old as it appears, is illustrated in the slide below.
If a logic is correct, the conclusions one can prove are true (= hold in the real world) whenever
the premises are true. This is a miraculous fact (think about it!)
113
The miracle of logics
Purely formal derivations are true in the real world!
c : Michael Kohlhase 183
7.3 Proof Theory for the Hilbert Calculus
We now show one of the meta-properties (correctness) for the Hilbert calculus 1
0
. The statement
of the result is rather simple: it just says that the set of provable formulae is a subset of the set of
valid formulae. In other words: If a formula is provable, then it must be valid (a rather comforting
property for a calculus).
1
0
is correct (rst version)
Theorem 314 A implies [=A for all propositions A.
Proof: show by induction over proof length
P.1 Axioms are valid (we already know how to do this!)
P.2 inference rules preserve validity (lets think)
P.2.1 Subst: complicated, see next slide
P.2.2 MP:
P.2.2.1 Let A B be valid, and : 1
o
T, F arbitrary
P.2.2.2 then J
(A) = F or J
(A) = T ,= F, so J
(B) = T.
P.2.2.4 As was arbitrary, B is valid.
c : Michael Kohlhase 184
To complete the proof, we have to prove two more things. The rst one is that the axioms are
valid. Fortunately, we know how to do this: we just have to show that under all assignments, the
axioms are satised. The simplest way to do this is just to use truth tables.
114
1
0
axioms are valid
Lemma 315 The H
0
axioms are valid.
Proof: We simply check the truth tables
P.1
P Q Q P P Q P
F F T T
F T F T
T F T T
T T T T
P.2
P Q R A := P Q R B := P Q C := P R A B C
F F F T T T T
F F T T T T T
F T F T T T T
F T T T T T T
T F F T F F T
T F T T F T T
T T F F T F T
T T T T T T T
c : Michael Kohlhase 185
The next result encapsulates the soundness result for the substitution rule, which we still owe. We
will prove the result by induction on the structure of the formula that is instantiated. To get the
induction to go through, we not only show that validity is preserved under instantiation, but we
make a concrete statement about the value itself.
A proof by induction on the structure of the formula is something we have not seen before. It
can be justied by a normal induction over natural numbers; we just take property of a natural
number n to be that all formulae with n symbols have the property asserted by the theorem. The
only thing we need to realize is that proper subterms have strictly less symbols than the terms
themselves.
Substitution Value Lemma and Correctness
Lemma 316 Let A and B be formulae, then J
([B/X](A)) = J
(A), where =
, [J
(B)/X]
Proof: by induction on the depth of A (number of nested symbols)
P.1 We have to consider two cases
P.1.1 depth=0, then A is a variable, say Y .:
P.1.1.1 We have two cases
P.1.1.1.1 X = Y : then 1([B/X](A)) = 1([B/X](X)) = 1(B) = (X) = 1
(X) =
1
(A).
P.1.1.1.2 X ,= Y : then 1([B/X](A)) = 1([B/X](Y )) = 1(Y ) = (Y ) = (Y ) =
1
(Y ) = 1
(A).
P.1.2 depth> 0, then A = C D:
P.1.2.1 We have 1([B/X](A)) = T, i 1([B/X](C)) = F or 1([B/X](D)) = T.
P.1.2.2 This is the case, i 1
(C) = F or 1
(A) = 1
(C D) = T, i 1([B/X](A)) = T by denition.
P.2 We have considered all the cases and proven the assertion.
c : Michael Kohlhase 186
Armed with the substitution value lemma, it is quite simple to establish the correctness of the
substitution rule. We state the assertion rather succinctly: Subst preservers validity, which
115
means that if the assumption of the Subst rule was valid, then the conclusion is valid as well, i.e.
the validity property is preserved.
Correctness of Substitution
Lemma 317 Subst preserves validity.
Proof: We have to show that [B/X](A) is valid, if A is.
P.1 Let A be valid, B a formula, : 1
o
T, F a variable assignment, and :=
, [J
(B)/X].
P.2 then J
([B/X](A)) = J
,[1(B)/X]
(A) = T, since A is valid.
P.3 As the argumentation did not depend on the choice of , [B/X](A) valid and we
have proven the assertion.
c : Michael Kohlhase 187
The next theorem shows that the implication connective and the entailment relation are closely
related: we can move a hypothesis of the entailment relation into an implication assumption in the
conclusion of the entailment relation. Note that however close the relationship between implication
and entailment, the two should not be confused. The implication connective is a syntactic formula
constructor, whereas the entailment relation lives in the semantic realm. It is a relation between
formulae that is induced by the evaluation mapping.
The Entailment Theorem
Theorem 318 If 1, A [= B, then 1 [= (A B).
Proof: We show that J
(1) = T whenever
1, A [= B
P.1 Let us assume there is an assignment , such that J
(A B) = F.
P.2 Then J
(A) = T and J
(B) = F by denition.
P.3 But we also know that J
(B) = T, since 1, A [= B.
P.4 This contradicts our assumption J
(A B) = F; in other words, A B
is valid.
c : Michael Kohlhase 188
Now, we complete the theorem by proving the converse direction, which is rather simple.
116
The Entailment Theorem (continued)
Corollary 319 1, A [= B, i 1 [= (A B)
Proof: In the light of the previous result, we only need to prove that 1, A [= B, whenever
1 [= (A B)
P.1 To prove that 1, A [= B we assume that J
(1, A) = T.
P.2 In particular, J
(A B) = T since 1 [= (A B).
P.3 Thus we have J
(A) = F or J
(B) = T.
P.4 The rst cannot hold, so the second does, thus 1, A [= B.
c : Michael Kohlhase 189
The entailment theorem has a syntactic counterpart for some calculi. This result shows a close
connection between the derivability relation and the implication connective. Again, the two should
not be confused, even though this time, both are syntactic.
The main idea in the following proof is to generalize the inductive hypothesis from proving A B
to proving A C, where C is a step in the proof of B. The assertion is a special case then, since
B is the last step in the proof of B.
The Deduction Theorem
Theorem 320 If 1, A B, then 1 A B
Proof: By induction on the proof length
P.1 Let C
1
, . . . , C
m
be a proof of B from the hypotheses 1.
P.2 We generalize the induction hypothesis: For all l (1 i m) we construct proofs
1 A C
i
. (get A B for i = m)
P.3 We have to consider three cases
P.3.1 Case 1: C
i
axiom or C
i
1:
P.3.1.1 Then 1 C
i
by construction and 1 C
i
A C
i
by Subst from Axiom 1.
P.3.1.2 So 1 A C
i
by MP.
P.3.2 Case 2: C
i
= A:
P.3.2.1 We have already proven A A, so in particular 1 A C
i
.
(more hypotheses do not hurt)
P.3.3 Case 3: everything else:
P.3.3.1 C
i
is inferred by MP from C
j
and C
k
= C
j
C
i
for j, k < i
P.3.3.2 We have 1 A C
j
and 1 A C
j
C
i
by IH
P.3.3.3 Furthermore, (A C
j
C
i
) (A C
j
) A C
i
by Axiom 2 and Subst
P.3.3.4 and thus 1 A C
i
by MP (twice).
P.4 We have treated all cases, and thus proven 1 A C
i
for (1 i m).
P.5 Note that C
m
= B, so we have in particular proven 1 A B.
c : Michael Kohlhase 190
117
In fact (you have probably already spotted this), this proof is not correct. We did not cover all
cases: there are proofs that end in an application of the Subst rule. This is a common situation,
we think we have a very elegant and convincing proof, but upon a closer look it turns out that
there is a gap, which we still have to bridge.
This is what we attempt to do now. The rst attempt to prove the subst case below seems to
work at rst, until we notice that the substitution [B/X] would have to be applied to A as well,
which ruins our assertion.
The missing Subst case
Oooops: The proof of the deduction theorem was incomplete
(we did not treat the Subst case)
Lets try:
Proof: C
i
is inferred by Subst from C
j
for j < i with [B/X].
P.1 So C
i
= [B/X](C
j
); we have 1 A C
j
by IH
P.2 so by Subst we have 1 [B/X](A C
j
). (Oooops! ,= A C
i
)
c : Michael Kohlhase 191
In this situation, we have to do something drastic, like come up with a totally dierent proof.
Instead we just prove the theorem we have been after for a variant calculus.
Repairing the Subst case by repairing the calculus
Idea: Apply Subst only to axioms (this was sucient in our example)
1
1
Axiom Schemata: (innitely many axioms)
A B A, (A B C) (A B) A C
Only one inference rule: MP.
Denition 321 1
1
introduces a (potentially) dierent derivability relation than 1
0
we
call them
)
0 and
)
1
c : Michael Kohlhase 192
Now that we have made all the mistakes, let us write the proof in its nal form.
118
Deduction Theorem Redone
Theorem 322 If 1, A
)
1 B, then 1
)
1 A B
Proof: Let C
1
, . . . , C
m
be a proof of B from the hypotheses 1.
P.1 We construct proofs 1
)
1 A C
i
for all (1 i n) by induction on i.
P.2 We have to consider three cases
P.2.1 C
i
is an axiom or hypothesis:
P.2.1.1 Then 1
)
1 C
i
by construction and 1
)
1 C
i
A C
i
by Ax1.
P.2.1.2 So 1
)
1 C
i
by MP
P.2.2 C
i
= A:
P.2.2.1 We have proven
)
0 A A, (check proof in 1
1
)
We have
)
1 A C
i
, so in particular 1
)
1 A C
i
P.2.3 else:
P.2.3.1 C
i
is inferred by MP from C
j
and C
k
= C
j
C
i
for j, k < i
P.2.3.2 We have 1
)
1 A C
j
and 1
)
1 A C
j
C
i
by IH
P.2.3.3 Furthermore, (A C
j
C
i
) (A C
j
) A C
i
by Axiom 2
P.2.3.4 and thus 1
)
1 A C
i
by MP (twice). (no Subst)
c : Michael Kohlhase 193
The deduction theorem and the entailment theorem together allow us to understand the claim that
the two formulations of correctness (A B implies A [= B and A implies [=B) are equivalent.
Indeed, if we have A B, then by the deduction theorem A B, and thus [=A B by
correctness, which gives us A [= B by the entailment theorem. The other direction and the
argument for the corresponding statement about completeness are similar.
Of course this is still not the version of the proof we originally wanted, since it talks about the
Hilbert Calculus 1
1
, but we can show that 1
1
and 1
0
are equivalent.
But as we will see, the derivability relations induced by the two caluli are the same. So we can
prove the original theorem after all.
119
The Deduction Theorem for 1
0
Lemma 323
)
1 =
)
0
Proof:
P.1 All 1
1
axioms are 1
0
theorems. (by Subst)
P.2 For the other direction, we need a proof transformation argument:
P.3 We can replace an application of MP followed by Subst by two Subst applications
followed by one MP.
P.4 . . . A B. . . A. . . B. . . [C/X](B) . . . is replaced by
. . . A B. . . [C/X](A) [C/X](B) . . . A. . . [C/X](A) . . . [C/X](B) . . .
P.5 Thus we can push later Subst applications to the axioms, transforming a 1
0
proof
into a 1
1
proof.
Corollary 324 1, A
)
0 B, i 1
)
0 A B.
Proof Sketch: by MP and
)
1 =
)
0
c : Michael Kohlhase 194
We can now collect all the pieces and give the full statement of the correctness theorem for 1
0
1
0
is correct (full version)
Theorem 325 For all propositions A, B, we have A
)
0 B implies A [= B.
Proof:
P.1 By deduction theorem A
)
0 B, i A C,
P.2 by the rst correctness theorem this is the case, i [=A B,
P.3 by the entailment theorem this holds, i A [= C.
c : Michael Kohlhase 195
A Calculus for Mathtalk
7.4 A Calculus for Mathtalk
In our introduction to Subsection 7.1 we have positioned Boolean expressions (and proposition
logic) as a system for understanding the mathematical language mathtalk introduced in Sub-
section 3.2. We have been using this language to state properties of objects and prove them all
through this course without making the rules the govern this activity fully explicit. We will rectify
this now: First we give a calculus that tries to mimic the the informal rules mathematicians use
int their proofs, and second we show how to extend this calculus of natural deduction to the
full langauge of mathtalk.
We will now introduce the natural deduction calculus for propositional logic. The calculus was
created in order to model the natural mode of reasoning e.g. in everyday mathematical practice.
This calculus was intended as a counter-approach to the well-known Hilbert style calculi, which
were mainly used as theoretical devices for studying reasoning in principle, not for modeling
particular reasoning styles.
120
Rather than using a minimal set of inference rules, the natural deduction caluculus provides
two/three inference rules for every connective and quantier, one introduction rule (an inference
rule that derives a formula with that symbol at the head) and one elimination rule (an inference
rule that acts on a formula with this head and derives a set of subformulae).
Calculi: Natural Deduction (ND
0
) [Gentzen30]
Idea: ND
0
tries to mimic human theorem proving behavior (non- minimal)
Denition 326 The ND
0
calculus has rules for the introduction and elimination of
connectives
Introduction Elimination Axiom
A B
A B
I
A B
A
E
l
A B
B
E
r
A A
TND
[A]
1
B
A B
I
1
A B A
B
E
TND is used only in classical logic (otherwise constructive/intuitionistic)
c : Michael Kohlhase 196
The most characactersic rule in the natural deduction calculus is the I rule. It corresponds to
the mathematical way of proving an implication A B: We assume that A true and show B
from this assumption. When we can do this we discharge (get rid of) the assumption and conclude
A B. This mode of reasoning is called hypothetical reasoning.
Let us now consider an example of hypothetical reasoning in action.
Natural Deduction: Examples
Inference with local hypotheses
[A B]
1
E
r
B
[A B]
1
E
l
A
I
B A
I
1
A B B A
c : Michael Kohlhase 197
Another characteristic of the natural deduction calculus is that it has inference rules (introduction
and elimination rules) for all connectives. So we extend the set of rules from Denition 326 for
disjunction, negation and falsity.
121
More Rules for Natural Deduction
Denition 327 ND
0
has the following additional rules for the remaining connectives.
A
A B
I
l
B
A B
I
r
A B
[A]
.
.
.
C
[B]
.
.
.
C
C
E
[A]
.
.
.
F
A
I
A
A
E
A A
F
FI
F
A
FE
c : Michael Kohlhase 198
The next step now is to extend the language of propositional logic to include the quantiers
and . To do this, we will extend the language PLNQ with formulae of the form x A and x A,
where x is a variable and A is a formula. This system (which ist a little more involved than we
make believe now) is called rst-order logic.
14
EdNote:14
Building on the calculus ND
0
, we dene a rst-order calculus for mathtalk by providing intro-
duction and elimination rules for the quantiers.
First-Order Natural Deduction
Rules for propositional connectives just as always
Denition 328 (New Quantier Rules) The AT extends ND
0
by the following
four rules
A
X.A
I
X.A
[B/X](A)
E
[B/X](A)
X.A
I
X.A
[[c/X](A)]
.
.
.
C
C
E
:= A
[ A .
c : Michael Kohlhase 204
The idea about literals is that they are atoms (the simplest formulae) that carry around their
intended truth value.
Now we will also review some propositional identities that will be useful later on. Some of
them we have already seen, and some are new. All of them can be proven by simple truth table
arguments.
126
Test Calculi: Tableaux and Model Generation
Idea: instead of showing Th, show Th trouble (use for trouble)
Example 332
Tableau Refutation (Validity) Model generation (Satisability)
[=P Q Q P [=P (Q R) Q
P Q Q P
F
P Q
T
Q P
F
P
T
Q
T
P
F
Q
F
P (Q R) Q
T
P (Q R)
T
Q
T
Q
F
P
T
Q R
T
Q
T
R
T
R
F
No Model Herbrand Model P
T
, Q
F
, R
F
:= P T, Q F, R F
Algorithm: Fully expand all possible tableaux, (no rule can be applied)
Satisable, i there are open branches (correspond to models)
c : Michael Kohlhase 205
Tableau calculi develop a formula in a tree-shaped arrangement that represents a case analysis on
when a formula can be made true (or false). Therefore the formulae are decorated with exponents
that hold the intended truth value.
On the left we have a refutation tableau that analyzes a negated formula (it is decorated with the
intended truth value F). Both branches contain an elementary contradiction .
On the right we have a model generation tableau, which analyzes a positive formula (it is
decorated with the intended truth value T. This tableau uses the same rules as the refutation
tableau, but makes a case analysis of when this formula can be satised. In this case we have a
closed branch and an open one, which corresponds a model).
Now that we have seen the examples, we can write down the tableau rules formally.
127
Analytical Tableaux (Formal Treatment of T
0
)
formula is analyzed in a tree to determine satisability
branches correspond to valuations (models)
one per connective
A B
T
A
T
B
T
T0
A B
F
A
F
B
F
T0
A
T
A
F
T0
T
A
F
A
T
T0
F
A
,=
T0cut
Use rules exhaustively as long as they contribute new material
Denition 333 Call a tableau saturated, i no rule applies, and a branch closed, i it
ends in , else open. (open branches in saturated tableaux yield models)
Denition 334 (T
0
-Theorem/Derivability) A is a T
0
-theorem (
(0
A), i there
is a closed tableau with A
F
at the root.
w
o
(1
o
) derives A in T
0
(
(0
A), i there is a closed tableau starting with A
F
and
T
.
c : Michael Kohlhase 206
These inference rules act on tableaux have to be read as follows: if the formulae over the line
appear in a tableau branch, then the branch can be extended by the formulae or branches below
the line. There are two rules for each primary connective, and a branch closing rule that adds the
special symbol (for unsatisability) to a branch.
We use the tableau rules with the convention that they are only applied, if they contribute new
material to the branch. This ensures termination of the tableau procedure for propositional logic
(every rule eliminates one primary connective).
Denition 335 We will call a closed tableau with the signed formula A
.
The saturated tableau represents a full case analysis of what is necessary to give A the truth value
; since all branches are closed (contain contradictions) this is impossible.
Denition 336 We will call a tableau refutation for A
F
a tableau proof for A, since it refutes the
possibility of nding a model where A evaluates to F. Thus A must evaluate to T in all models,
which is just our denition of validity.
Thus the tableau procedure can be used as a calculus for propositional logic. In contrast to the
calculus in section ?? it does not prove a theorem A by deriving it from a set of axioms, but
it proves it by refuting its negation. Such calculi are called negative or test calculi. Generally
negative calculi have computational advanages over positive ones, since they have a built-in sense
of direction.
We have rules for all the necessary connectives (we restrict ourselves to and , since the others
can be expressed in terms of these two via the propositional identities above. For instance, we can
write A B as (A B), and A B as A B,. . . .)
We will now look at an example. Following our introduction of propositional logic in in Exam-
ple 308 we look at a formulation of propositional logic with fancy variable names. Note that
love(mary, bill) is just a variable name like P or X, which we have used earlier.
128
A Valid Real-World Example
Example 337 Mary loves Bill and John loves Mary entails John loves Mary
love(mary, bill) love(john, mary) love(john, mary)
F
((love(mary, bill) love(john, mary)) love(john, mary))
F
(love(mary, bill) love(john, mary)) love(john, mary)
T
(love(mary, bill) love(john, mary))
T
(love(mary, bill) love(john, mary))
F
love(mary, bill) love(john, mary)
T
love(john, mary)
T
love(mary, bill)
T
love(john, mary)
T
love(john, mary)
F
Then again the entailment theorem (Corollary 319) yields the assertion. Indeed we can
make J
(love(john, mary)) = F.
c : Michael Kohlhase 208
Obviously, the tableau above is saturated, but not closed, so it is not a tableau proof for our initial
entailment conjecture. We have marked the literals on the open branch green, since they allow us
to read of the conditions of the situation, in which the entailment fails to hold. As we intuitively
argued above, this is the situation, where Mary loves Bill. In particular, the open branch gives
us a variable assignment (marked in green) that satises the initial formula. In this case, Mary
loves Bill, which is a situation, where the entailment fails. Practical Enhancements for
TableauxPractical Enhancements for Tableaux
9.1.2 Practical Enhancements for Tableaux
129
Propositional Identities
Denition 339 Let and be new logical constants with J() = T and J() = F
for all assignments .
We have to following identities:
Name for for
Idenpotence = =
Identity = =
Absorption I = =
Commutativity = =
Associativity ( ) = ( ) ( ) = ( )
Distributivity ( ) = = ( ) ( )
Absorption II ( ) = =
De Morgans Laws ( ) = ( ) =
Double negation =
Denitions = = ( ) ( )
c : Michael Kohlhase 209
We have seen in the examples above that while it is possible to get by with only the connectives
and , it is a bit unnatural and tedious, since we need to eliminate the other connectives rst.
In this section, we will make the calculus less frugal by adding rules for the other connectives,
without losing the advantage of dealing with a small calculus, which is good making statements
about the calculus.
The main idea is to add the new rules as derived rules, i.e. inference rules that only abbreviate
deductions in the original calculus. Generally, adding derived inference rules does not change the
derivability relation of the calculus, and is therefore a safe thing to do. In particular, we will add
the following rules to our tableau system.
We will convince ourselves that the rst rule is a derived rule, and leave the other ones as an
exercise.
Derived Rules of Inference
Denition 340 Let ( be a calculus, a rule of inference
A
1
. . . A
n
C
is called a derived
inference rule in (, i there is a (-proof of A
1
, . . . , A
n
C.
Denition 341 We have th following derived rules of inference
A B
T
A
F
B
T
A B
F
A
T
B
F
A
T
A B
T
B
T
A B
T
A
T
B
T
A B
F
A
F
B
F
A B
T
A
T
B
T
A
F
B
F
A B
F
A
T
B
F
A
F
B
T
A
T
A B
T
A B
T
(A B)
T
A B
F
A
F
A
T
A
F
B
F
B
T
c : Michael Kohlhase 210
130
With these derived rules, theorem proving becomes quite ecient. With these rules, the tableau
(??) would have the following simpler form:
Tableaux with derived Rules (example)
Example 342
love(mary, bill) love(john, mary) love(john, mary)
F
love(mary, bill) love(john, mary)
T
love(john, mary)
F
love(mary, bill)
T
love(john, mary)
T
is valid under , i J
(A) = .
Denition 345 A tableau T is satisable, i there is a satisable branch T in T , i.e.
if the set of formulae in T is satisable.
Lemma 346 Tableau rules transform satisable tableaux into satisable ones.
Theorem 347 (Correctness) A set of propositional formulae is valid, if there is a
closed tableau T for
F
.
Proof: by contradiction: Suppose is not valid.
P.1 then the initial tableau is satisable (
F
satisable)
P.2 T satisable, by our Lemma.
P.3 there is a satisable branch (by denition)
P.4 but all branches are closed (T closed)
c : Michael Kohlhase 212
Thus we only have to prove Lemma 1
18
, this is relatively easy to do. For instance for the rst EdNote:18
rule: if we have a tableau that contains A B
T
and is satisable, then it must have a satisable
branch. If A B
T
is not on this branch, the tableau extension will not change satisability, so we
can assue that it is on the satisable branch and thus J
(A) = T and J
C (A B)
T
C A
F
B
T
Others:
C (A B)
T
C A
F
B
T
C (A B)
F
C A
T
; C B
F
C A B
T
C A
T
; C B
T
C A B
F
C A
F
B
F
c : Michael Kohlhase 216
With these derived rules, theorem proving becomes quite ecient. To get a better understanding
of the calculus, we look at an example: we prove an axiom of the Hilbert Calculus we have studied
above.
Example: Proving Axiom S
Example 357 Clause Normal Form transformation
(P Q R) (P Q) P R
F
P Q R
T
; (P Q) P R
F
P
F
(Q R)
T
; P Q
T
; P R
F
P
F
Q
F
R
T
; P
F
Q
T
; P
T
; R
F
CNF = P
F
Q
F
R
T
, P
F
Q
T
, P
T
, R
F
_
a, e, e, i, i, j,
f, j, f, g, g, h,
d, h, g, k, a, b
m, n, n, o, b, c
k, o, o, p, l, p
_
_
, a, p
_
We represented the maze as a graph for clarity.
Now, we are interested in circuits, which we will also represent as graphs.
Let us look at the theory of graphs rst (so we know what we are doing)
c : Michael Kohlhase 222
Graphs and trees are fundamental data structures for computer science, they will pop up in many
disguises in almost all areas of CS. We have already seen various forms of trees: formula trees,
tableaux, . . . . We will now look at their mathematical treatment, so that we are equipped to talk
and think about combinatory circuits.
We will rst introduce the formal denitions of graphs (trees will turn out to be special graphs),
and then fortify our intuition using some examples.
138
then Ve := V
e
|v, Ee := E
e
|v, v
r), and fe := f
e
|v , where
P
e
= V
e
, E
e
) is the parse-tree of e
, v
r is the root of P
e
, and v is an object not in V
e
.
if e = e1 e2 with |, + then Ve := Ve
1
Ve
2
|v, Ee :=
Ee
1
Ee
2
|v, v
r
1
), v, v
r
2
), and fe := fe
1
fe
2
|v , where the Pe
i
= Ve
i
, Ee
i
)
are the parse-trees of ei and v
r
i
is the root of Pe
i
and v is an object not in Ve
1
Ve
2
.
if e (V C) then, Ve = |e and Ee = .
Example 384 the parse tree of (x
1
x
2
+x
3
) x
1
+x
4
is
*
+
*
x
1
x
2
x
3
+
x
1
x
4
c : Michael Kohlhase 233
Introduction to Combinatorial Circuits
11.2 Introduction to Combinatorial Circuits
We will now come to another model of computation: combinatorial circuits (also called combina-
tional circuits). These are models of logic circuits (physical objects made of transistors (or cathode
tubes) and wires, parts of integrated circuits, etc), which abstract from the inner structure for the
switching elements (called gates) and the geometric conguration of the connections. Thus, com-
binatorial circuits allow us to concentrate on the functional properties of these circuits, without
getting bogged down with e.g. conguration- or geometric considerations. These can be added to
the models, but are not part of the discussion of this course.
145
Combinatorial Circuits as Graphs
Denition 385 A combinatorial circuit is a labeled acyclic graph G = V, E, f
g
with
label set OR, AND, NOT, such that
indeg(v) = 2 and outdeg(v) = 1 for all nodes v (f
g
)
1
(AND, OR)
indeg(v) = outdeg(v) = 1 for all nodes v (f
g
)
1
(NOT)
We call the set I(G) (O(G)) of initial (terminal) nodes in G the input (output) vertexes,
and the set F(G) := V ((I(G) O(G))) the set of gates.
Example 386 The following graph G
cir1
= V, E is a combinatorial circuit
i1
g1
AND
g2
OR
i2 i3
g3
OR
g4
NOT
o1 o2
Denition 387 Add two special input nodes 0, 1 to a combinatorial circuit G to form
a combinatorial circuit with constants. (will use this from now on)
c : Michael Kohlhase 234
So combinatorial circuits are simply a class of specialized labeled directed graphs. As such, they
inherit the nomenclature and equality conditions we introduced for graphs. The motivation for
the restrictions is simple, we want to model computing devices based on gates, i.e. simple compu-
tational devices that behave like logical connectives: the AND gate has two input edges and one
output edge; the the output edge has value 1, i the two input edges do too.
Since combinatorial circuits are a primary tool for understanding logic circuits, they have their
own traditional visual display format. Gates are drawn with special node shapes and edges are
traditionally drawn on a rectangular grid, using bifurcating edges instead of multiple lines with
blobs distinguishing bifurcations from edge crossings. This graph design is motivated by readability
considerations (combinatorial circuits can become rather large in practice) and the layout of early
printed circuits.
146
Using Special Symbols to Draw Combinatorial Circuits
The symbols for the logic gates AND, OR, and NOT.
AND
OR
NOT
o1
o2
i1
i2
i3
Junction Symbols as shorthands for several edges
a
c
b
a
c
b
=
o1
o2
i1
i2
i3
c : Michael Kohlhase 235
In particular, the diagram on the lower right is a visualization for the combinatory circuit G
circ1
from the last slide.
To view combinatorial circuits as models of computation, we will have to make a connection
between the gate structure and their input-output behavior more explicit. We will use a tool for
this we have studied in detail before: Boolean expressions. The rst thing we will do is to annotate
all the edges in a combinatorial circuit with Boolean expressions that correspond to the values on
the edges (as a function of the input values of the circuit).
Computing with Combinatorial Circuits
Combinatorial Circuits and parse trees for Boolean expressions look similar
Idea: Lets annotate edges in combinatorial circuit with Boolean Expressions!
Denition 388 Given a combinatorial circuit G = V, E, f
g
and an edge e = v, w
E, the expression label f
L
(e) is dened as
f
L
(v, w) if
v v I(G)
f
L
(u, v) fg(v) = NOT
f
L
(u, v) f
L
(u
, v) fg(v) = AND
f
L
(u, v) + f
L
(u
, v) fg(v) = OR
o1
o2
i1
i2
i3
i1
i2
i3
( i1 * i2 )
( i2 + i3 )
(( i1 * i2 )+ i3 )
( i2 + i3 )
c : Michael Kohlhase 236
Armed with the expression label of edges we can now make the computational behavior of combi-
natory circuits explicit. The intuition is that a combinatorial circuit computes a certain Boolean
function, if we interpret the input vertices as obtaining as values the corresponding arguments
147
and passing them on to gates via the edges in the circuit. The gates then compute the result from
their input edges and pass the result on to the next gate or an output vertex via their output
edge.
Computing with Combinatorial Circuits
Denition 389 A combinatorial circuit G = V, E, f
g
with input vertices i
1
, . . . , i
n
and output vertices o
1
, . . . , o
m
computes an n-ary Boolean function
f : 0, 1
n
0, 1
m
; i
1
, . . . , i
n
f
e1
(i
1
, . . . , i
n
), . . . , f
em
(i
1
, . . . , i
n
)
where e
i
= f
L
(v, o
i
).
Example 390 The circuit example on the last slide denes the Boolean function
f : 0, 1
3
0, 1
2
; i
1
, i
2
, i
3
f
i1i2+i3
, f
i2i3
d
i=1
2
i
= 2
d+1
1.
c : Michael Kohlhase 239
149
This shows that balanced binary trees grow in breadth very quickly, a consequence of this is that
they are very shallow (and this compute very fast), which is the essence of the next result.
Depth Lemma for Balanced Trees
Lemma 397 Let G = V, E be a balanced binary tree, then dp(G) = log
2
(#(V ))|.
Proof: by calculation
P.1 Let V
t
:= V W, where W is the set of nodes at level d = dp(G)
P.2 By the size lemma, #(V
t
) = 2
d1+1
1 = 2
d
1
P.3 then #(V ) = 2
d
1 +k, where k = #(W) and (1 k 2
d
)
P.4 so #(V ) = c (2
d
) where c R and 1c<2, or 0log
2
(c)<1
P.5 thus log
2
(#(V )) = log
2
(c (2
d
)) = log
2
(c) +d and
P.6 hence d = log
2
(#(V )) log
2
(c) = log
2
(#(V ))|.
c : Michael Kohlhase 240
Leaves of Binary Trees
Lemma 398 Any binary tree with m leaves has 2m1 vertexes.
Proof: by induction on m.
P.1 We have two cases m = 1: then V = v
r
and #(V ) = 1 = 2 1 1.
P.1.2 m > 1:
P.1.2.1 then any binary tree G with m1 leaves has 2m3 vertexes (IH)
P.1.2.2 To get m leaves, add 2 children to some leaf of G. (add two to get one more)
P.1.2.3 Thus #(V ) = 2 m3 + 2 = 2 m1.
c : Michael Kohlhase 241
In particular, the size of a binary tree is independent of the its form if we x the number of leaves.
So we can optimimze the depth of a binary tree by taking a balanced one without a size penalty.
This will become important for building fast combinatory circuits.
11.3.2 Realizing n-ary Gates
We now use the results on balanced binary trees to build generalized gates as building blocks for
combinational circuits.
150
n-ary Gates as Subgraphs
Idea: Identify (and abbreviate) frequently occurring subgraphs
Denition 399 AND(x
1
, . . . , x
n
) := 1
n
i=1
x
i
and OR(x
1
, . . . , x
n
) := 1
n
i=1
x
i
Note: These can be realized as balanced binary trees G
n
Corollary 400 C(G
n
) = n 1 and dp(G
n
) = log
2
(n)|.
Notation 401
AND OR
c : Michael Kohlhase 242
Using these building blocks, we can establish a worst-case result for the depth of a combinatory
circuit computing a given Boolean function.
Worst Case Depth Theorem for Combinatorial Circuits
Theorem 402 The worst case depth dp(G) of a combinatorial circuit G which realizes
an k n-dimensional boolean function is bounded by dp(G) n +,log
2
(n)| + 1.
Proof: The main trick behind this bound is that AND and OR are associative and that
the according gates can be arranged in a balanced binary tree.
P.1 Function f corresponding to the output o
j
of the circuit G can be transformed in
DNF
P.2 each monomial consists of at most n literals
P.3 the possible negation of inputs for some literals can be done in depth 1
P.4 for each monomial the ANDs in the related circuit can be arranged in a balanced
binary tree of depth ,log
2
(n)|
P.5 there are at most 2
n
monomials which can be ORed together in a balanced binary
tree of depth ,log
2
(2
n
)| = n.
c : Michael Kohlhase 243
Of course, the depth result is related to the rst worst-case complexity result for Boolean expres-
sions (Theorem 272); it uses the same idea: to use the disjunctive normal form of the Boolean
function. However, instead of using a Boolean expression, we become more concrete here and use
a combinatorial circuit.
151
An example of a DNF circuit
=
if L
i
=X
i
if L
i
=X
i
X
1
X
2
X
3
X
n
O
j
M
1 M
2
M
3
M
k
c : Michael Kohlhase 244
In the circuit diagram above, we have of course drawn a very particular case (as an example
for possible others.) One thing that might be confusing is that it looks as if the lower n-ary
conjunction operators look as if they have edges to all the input variables, which a DNF does not
have in general.
Of course, by now, we know how to do better in practice. Instead of the DNF, we can always com-
pute the minimal polynomial for a given Boolean function using the Quine-McCluskey algorithm
and derive a combinatorial circuit from this. While this does not give us any theoretical mileage
(there are Boolean functions where the DNF is already the minimal polynomial), but will greatly
improve the cost in practice.
Until now, we have somewhat arbitrarily concentrated on combinational circuits with AND, OR,
and NOT gates. The reason for this was that we had already developed a theory of Boolean
expressions with the connectives , , and that we can use. In practical circuits often other
gates are used, since they are simpler to manufacture and more uniform. In particular, it is
sucient to use only one type of gate as we will see now.
152
Other Logical Connectives and Gates
Are the gates AND, OR, and NOT ideal?
Idea: Combine NOT with the binary ones to NAND, NOR (enough?)
NAND
NOR
NAND 1 0
1 0 1
0 1 1
and
NOR 1 0
1 0 0
0 0 1
Corresponding logical conectives are written as (NAND) and (NOR).
We will also need the exclusive or (XOR) connective that returns 1 i either of its
operands is 1.
XOR 1 0
1 1 0
0 0 1
The gate is written as , the logical connective as .
c : Michael Kohlhase 245
The Universality of NAND and NOR
Theorem 403 NAND and NOR are universal; i.e. any Boolean function can be ex-
pressed in terms of them.
Proof: express AND, OR, and NOT via NAND and NOR respectively
NOT(a) NAND(a, a) NOR(a, a)
AND(a, b) NAND(NAND(a, b), NAND(a, b)) NOR(NOR(a, a), NOR(b, b))
OR(a, b) NAND(NAND(a, a), NAND(b, b)) NOR(NOR(a, b), NOR(a, b))
here are the corresponding diagrams for the combinational circuits.
a
a
b
a
b
NOT(a)
(a OR b)
(a AND b)
a
a
b
a
b
NOT(a)
(a AND b)
(a OR b)
c : Michael Kohlhase 246
Of course, a simple substitution along these lines will blow up the cost of the circuits by a factor of
up to three and double the depth, which would be prohibitive. To get around this, we would have
to develop a theory of Boolean expressions and complexity using the NAND and NOR connectives,
along with suitable replacements for the Quine-McCluskey algorithm. This would give cost and
depth results comparable to the ones developed here. This is beyond the scope of this course.
Basic Arithmetics with Combinational Circuits
153
11.4 Basic Arithmetics with Combinational Circuits
We have seen that combinational circuits are good models for implementing Boolean functions:
they allow us to make predictions about properties like costs and depths (computation speed),
while abstracting from other properties like geometrical realization, etc.
We will now extend the analysis to circuits that can compute with numbers, i.e. that implement
the basic arithmetical operations (addition, multiplication, subtraction, and division on integers).
To be able to do this, we need to interpret sequences of bits as integers. So before we jump into
arithmetical circuits, we will have a look at number representations.
11.4.1 Positional Number Systems
Positional Number Systems
Problem: For realistic arithmetics we need better number representations than the unary
natural numbers ([
n
(unary)[ (n) [number of /])
Recap: the unary number system
build up numbers from /es (start with and add /)
addition as concatenation (, , exp, . . . dened from that)
Idea: build a clever code on the unary numbers
interpret sequences of /es as strings: stands for the number 0
Denition 404 A positional number system A is a triple A = D
b
,
b
,
b
with
D
b
is a nite alphabet of b digits. (b := #(D
b
) base or radix of A)
b
: D
b
, /, . . . , /
[b1]
is bijective (rst b unary numbers)
b
: D
b
+
/
; n
k
, . . . , n
1
k
i=1
b
(n
i
) exp(/
[b]
, /
[i1]
)
(extends
b
to string code)
c : Michael Kohlhase 247
In the unary number system, it was rather simple to do arithmetics, the most important oper-
ation (addition) was very simple, it was just concatenation. From this we can implement the
other operations by simple recursive procedures, e.g. in SML or as abstract procedures in abstract
data types. To make the arguments more transparent, we will use special symbols for the arith-
metic operations on unary natural numbers: (addition), (multiplication),
n
i=1
(sum over n
numbers), and
n
i=1
(product over n numbers).
The problem with the unary number system is that it uses enormous amounts of space, when
writing down large numbers. Using the Landau notation we introduced earlier, we see that for
writing down a number n in unary representation we need n slashes. So if [
n
(unary)[ is the cost
of representing n in unary representation, we get [
n
(unary)[ (n). Of course that will never
do for practical chips. We obviously need a better encoding.
If we look at the unary number system from a greater distance (now that we know more CS, we can
interpret the representations as strings), we see that we are not using a very important feature of
strings here: position. As we only have one letter in our alphabet (/), we cannot, so we should use
a larger alphabet. The main idea behind a positional number system A = D
b
,
b
,
b
is that we
encode numbers as strings of digits (characters in the alphabet D
b
), such that the position matters,
and to give these encoding a meaning by mapping them into the unary natural numbers via a
mapping
b
. This is the the same process we did for the logics; we are now doing it for number
154
systems. However, here, we also want to ensure that the meaning mapping
b
is a bijection, since
we want to dene the arithmetics on the encodings by reference to The arithmetical operators on
the unary natural numbers.
We can look at this as a bootstrapping process, where the unary natural numbers constitute
the seed system we build up everything from.
Just like we did for string codes earlier, we build up the meaning mapping
b
on characters from
D
b
rst. To have a chance to make bijective, we insist that the character code
b
is is a
bijection from D
b
and the rst b unary natural numbers. Now we extend
b
from a character code
to a string code, however unlike earlier, we do not use simple concatenation to induce the string
code, but a much more complicated function based on the arithmetic operations on unary natural
numbers. We will see later
19
that this give us a bijection between D
b
+
and the unary natural EdNote:19
numbers.
Commonly Used Positional Number Systems
Example 405 The following positional number systems are in common use.
name set base digits example
unary N
1
1 / /////
1
binary N
2
2 0,1 0101000111
2
octal N
8
8 0,1,. . . ,7 63027
8
decimal N
10
10 0,1,. . . ,9 162098
10
or 162098
hexadecimal N
16
16 0,1,. . . ,9,A,. . . ,F FF3A12
16
Notation 406 attach the base of A to every number from A. (default: decimal)
Trick: Group triples or quadruples of binary digits into recognizable chunks
(add leading zeros as needed)
110001101011100
2
= 0110
2
6
16
0011
2
3
16
0101
2
5
16
1100
2
C
16
= 635C
16
110001101011100
2
= 110
2
6
8
001
2
1
8
101
2
5
8
011
2
3
8
100
2
4
8
= 61534
8
F3A16 = F16
1111
2
316
0011
2
A16
1010
2
= 1111001110102, 47218 = 48
100
2
78
111
2
28
010
2
18
001
2
= 1001110100012
c : Michael Kohlhase 248
We have all seen positional number systems: our decimal system is one (for the base 10). Other
systems that important for us are the binary system (it is the smallest non-degenerate one) and
the octal- (base 8) and hexadecimal- (base 16) systems. These come from the fact that binary
numbers are very hard for humans to scan. Therefore it became customary to group three or four
digits together and introduce we (compound) digits for them. The octal system is mostly relevant
for historic reasons, the hexadecimal system is in widespread use as syntactic sugar for binary
numbers, which form the basis for circuits, since binary digits can be represented physically by
current/no current.
Now that we have dened positional number systems, we want to dene the arithmetic operations
on the these number representations. We do this by using an old trick in math. If we have
an operation f
T
: T T on a set T and a well-behaved mapping from a set S into T, then
we can pull-back the operation on f
T
to S by dening the operation f
S
: S S by f
S
(s) :=
()
1
(f
T
((s))) according to the following diagram.
19
EdNote: reference
155
S
S
T
T
()
1
fS = ()
1
fT fT
Obviously, this construction can be done in any case, where is bijective (and thus has an inverse
function). For dening the arithmetic operations on the positional number representations, we
do the same construction, but for binary functions (after we have established that is indeed a
bijection).
The fact that
b
is a bijection a posteriori justies our notation, where we have only indicated the
base of the positional number system. Indeed any two positional number systems are isomorphic:
they have bijections
b
into the unary natural numbers, and therefore there is a bijection between
them.
Arithmetics for PNS
Lemma 407 Let A := D
b
,
b
,
b
be a PNS, then
b
is bijective.
Proof: construct (
b
)
1
by successive division modulo the base of A.
Idea: use this to dene arithmetics on A.
Denition 408 Let A := D
b
,
b
,
b
be a PNS of base b, then we dene a binary
function +
b
: N
b
N
b
N
b
by x+
b
y := (
b
)
1
(
b
(x)
b
(y)).
Note: The addition rules (carry chain addition) generalize from the decimal system to
general PNS
Idea: Do the same for other arithmetic operations. (works like a charm)
Future: Concentrate on binary arithmetics. (implement into circuits)
c : Michael Kohlhase 249
11.4.2 Adders
The next step is now to implement the induced arithmetical operations into combinational circuits,
starting with addition. Before we can do this, we have to specify which (Boolean) function we
really want to implement. For convenience, we will use the usual decimal (base 10) representations
of numbers and their operations to argue about these circuits. So we need conversion functions
from decimal numbers to binary numbers to get back and forth. Fortunately, these are easy to
come by, since we use the bijections from both systems into the unary natural numbers, which
we can compose to get the transformations.
156
Arithmetic Circuits for Binary Numbers
Idea: Use combinational circuits to do basic arithmetics.
Denition 409 Given the (abstract) number a N, B(a) denotes from now on the
binary representation of a.
For the opposite case, i.e., the natural number represented by a binary string
a = a
n1
, . . . , a
0
B
n
, the notation a is used, i.e.,
a = a
n1
, . . . , a
0
=
n1
i=0
a
i
(2
i
)
Denition 410 An n-bit adder is a circuit computing the function f
n
+2
: B
n
B
n
B
n+1
with
f
n
+2
(a; b) := B(a +b)
c : Michael Kohlhase 250
If we look at the denition again, we see that we are again using a pull-back construction. These
will pop up all over the place, since they make life quite easy and safe.
Before we actually get a combinational circuit for an n-bit adder, we will build a very useful circuit
as a building block: the half adder (it will take two to build a full adder).
The Half-Adder
There are dierent ways to implement an adder. All of them build upon two basic
components, the half-adder and the full-adder.
k1
i=0
a
i
g(b
ki
)
We have C(CSA
n
) = 3C(CSA
n/2
) +C(MUX
n/2+1
) = 3C(CSA
n/2
) + 3(n/2 + 1) + 1 = 3C(CSA
n/2
) +
3
2
n + 4
So, C(CSA
n
) is a function that can be handled via Masters theorem with a = 3, b = 2,
n = b
k
, g(n) = 3/2n + 4, and c = C(f
1
CSA
) = C(FA
1
) = 5
thus C(CSA
n
) = 5 (3
log
2
(n)
) +
log
2
(n)1
i=0
(3
i
)
3
2
n (2
i
) + 4
Note: a
log
2
(n)
= (2
log
2
(a)
)
log
2
(n)
= 2
log
2
(a)log
2
(n)
= (2
log
2
(n)
)
log
2
(a)
= n
log
2
(a)
C(CSA
n
) = 5 (3
log
2
(n)
) +
log
2
(n)1
i=0
(3
i
)
3
2
n (2
i
) + 4
= 5(n
log
2
(3)
) +
log
2
(n)
i=1
n
3
2
i
n + 4
= 5(n
log
2
(3)
) +n
log
2
(n)
i=1
(
3
2
i
) + 4log
2
(n)
= 5(n
log
2
(3)
) + 2n (
3
2
log
2
(n)+1
) 1 + 4log
2
(n)
= 5(n
log
2
(3)
) + 3n (n
log
2
(
3
2
)
) 2n + 4log
2
(n)
= 8(n
log
2
(3)
) 2n + 4log
2
(n) O(n
log
2
(3)
)
Theorem 422 The cost and the depth of the conditional sum adder are in the following
complexity classes:
C(CSA
n
) O(n
log
2
(3)
) dp(CSA
n
) O(log
2
(n))
Compare with: C(CCA
n
) O(n) dp(CCA
n
) O(n)
So, the conditional sum adder has a smaller depth than the carry chain adder. This
smaller depth is paid with higher cost.
There is another adder that combines the small cost of the carry chain adder with the
low depth of the conditional sum adder. This carry lookahead adder CLA
n
has a cost
C(CLA
n
) O(n) and a depth of dp(CLA
n
) O(log
2
(n)).
c : Michael Kohlhase 259
164
Instead of perfecting the n-bit adder further (and there are lots of designs and optimizations out
there, since this has high commercial relevance), we will extend the range of arithmetic operations.
The next thing we come to is subtraction. Arithmetics for Twos Complement Numbers
11.5 Arithmetics for Twos Complement Numbers
This of course presents us with a problem directly: the n-bit binary natural numbers, we have
used for representing numbers are closed under addition, but not under subtraction: If we have
two n-bit binary numbers B(n), and B(m), then B(n +m) is an n+1-bit binary natural number.
If we count the most signicant bit separately as the carry bit, then we have a n-bit result. For
subtraction this is not the case: B(n m) is only a n-bit binary natural number, if m n
(whatever we do with the carry). So we have to think about representing negative binary natural
numbers rst. It turns out that the solution using sign bits that immediately comes to mind is
not the best one.
Negative Numbers and Subtraction
Note: So far we have completely ignored the existence of negative numbers.
Problem: Subtraction is a partial operation without them.
Question: Can we extend the binary number systems for negative numbers?
Simple Solution: Use a sign bit. (additional leading bit that indicates whether the number is positive)
Denition 423 ((n + 1)-bit signed binary number system)
a
n
, . . . , a
0
:=
_
a
n1
, . . . , a
0
if a
n
= 0
a
n1
, . . . , a
0
if a
n
= 1
Note: We need to x string length to identify the sign bit. (leading zeroes)
Example 424 In the 8-bit signed binary number system
10011001 represents -25 ((10011001
) = (2
4
+ 2
3
+ 2
0
))
00101100 corresponds to a positive number: 44
c : Michael Kohlhase 260
Here we did the naive solution, just as in the decimal system, we just added a sign bit, which
species the polarity of the number representation. The rst consequence of this that we have to
keep in mind is that we have to x the width of the representation: Unlike the representation for
binary natural numbers which can be arbitrarily extended to the left, we have to know which bit
is the sign bit. This is not a big problem in the world of combinational circuits, since we have a
xed width of input/output edges anyway.
165
Problems of Sign-Bit Systems
Generally: An n-bit signed binary number system allows to represent the integers from
2
n1
+ 1 to +2
n1
1.
2
n1
1 positive numbers, 2
n1
1 negative numbers, and the zero
Thus we represent #(s
[ s B
n
) = 2 ((2
n1
) 1) + 1 = 2
n
1 numbers all in
all
One number must be represented twice (But there are 2
n
strings of length n.)
10 . . . 0 and 00 . . . 0 both represent the zero as 1 0 = 1 0.
signed binary Z
0 1 1 1 7
0 1 1 0 6
0 1 0 1 5
0 1 0 0 4
0 0 1 1 3
0 0 1 0 2
0 0 0 1 1
0 0 0 0 0
1 0 0 0 -0
1 0 0 1 -1
1 0 1 0 -2
1 0 1 1 -3
1 1 0 0 -4
1 1 0 1 -5
1 1 1 0 -6
1 1 1 1 -7
We could build arithmetic circuits using this, but there is a more elegant way!
c : Michael Kohlhase 261
All of these problems could be dealt with in principle, but together they form a nuisance, that at
least prompts us to look for something more elegant. The twos complement representation also
uses a sign bit, but arranges the lower part of the table in the last slide in the opposite order,
freeing the negative representation of the zero. The technical trick here is to use the sign bit (we
still have to take into account the width n of the representation) not as a mirror, but to translate
the positive representation by subtracting 2
n
.
166
The Twos Complement Number System
Denition 425 Given the binary string a = a
n
, . . . , a
0
B
n+1
, where n > 1. The
integer represented by a in the (n+1)-bit twos complement, written as a
2s
n
, is dened
as
a
2s
n
= a
n
(2
n
) +a
n1
, . . . , a
0
= a
n
(2
n
) +
n1
i=0
a
i
(2
i
)
Notation 426 Write B
2s
n
(z) for the binary string that represents z in the twos com-
plement number system, i.e., B
2s
n
(z)
2s
n
= z.
2s compl. Z
0 1 1 1 7
0 1 1 0 6
0 1 0 1 5
0 1 0 0 4
0 0 1 1 3
0 0 1 0 2
0 0 0 1 1
0 0 0 0 0
1 1 1 1 -1
1 1 1 0 -2
1 1 0 1 -3
1 1 0 0 -4
1 0 1 1 -5
1 0 1 0 -6
1 0 0 1 -7
1 0 0 0 -8
c : Michael Kohlhase 262
We will see that this representation has much better properties than the naive sign-bit representa-
tion we experimented with above. The rst set of properties are quite trivial, they just formalize
the intuition of moving the representation down, rather than mirroring it.
Properties of Twos Complement Numbers (TCN)
Let b = b
n
, . . . , b
0
be a number in the n + 1-bit twos complement system, then
Positive numbers and the zero have a sign bit 0, i.e., b
n
= 0 (b
2s
n
0).
Negative numbers have a sign bit 1, i.e., b
n
= 1 b
2s
n
< 0.
For positive numbers, the twos complement representation corresponds to the normal
binary number representation, i.e., b
n
= 0 b
2s
n
= b
There is a unique representation of the number zero in the n-bit twos complement
system, namely B
2s
n
(0) = 0, . . ., 0.
This number system has an asymmetric range 1
2s
n
:= 2
n
, . . . , 2
n
1.
c : Michael Kohlhase 263
The next property is so central for what we want to do, it is upgraded to a theorem. It says that
the mirroring operation (passing from a number to its negative sibling) can be achieved by two
very simple operations: ipping all the zeros and ones, and incrementing.
167
The Structure Theorem for TCN
Theorem 427 Let a B
n+1
be a binary string, then a
2s
n
= a
2s
n
+ 1, where a is
the pointwise bit complement of a.
Proof: by calculation using the denitions
a
n
, a
n1
, . . . , a
0
2s
n
= a
n
(2
n
) +a
n1
, . . . , a
0
= a
n
(2
n
) +
n1
i=0
a
i
(2
i
)
= 1 a
n
(2
n
) +
n1
i=0
1 a
i
(2
i
)
= 1 a
n
(2
n
) +
n1
i=0
2
i
n1
i=0
a
i
(2
i
)
= 2
n
+a
n
(2
n
) + 2
n1
a
n1
, . . . , a
0
= (2
n
+ 2
n
) +a
n
(2
n
) a
n1
, . . . , a
0
1
= (a
n
(2
n
) +a
n1
, . . . , a
0
) 1
= a
2s
n
1
c : Michael Kohlhase 264
A rst simple application of the TCN structure theorem is that we can use our existing conversion
routines (for binary natural numbers) to do TCN conversion (for integers).
Application: Converting from and to TCN?
to convert an integer z Z with z N into an n-bit TCN
generate the n-bit binary number representation B(z) = b
n1
, . . . , b
0
2s
n+1
=
a
n1
, . . . , a
0
2s
n
.
Proof: by calculation
a
n
, . . . , a
0
2s
n+1
= a
n
(2
n+1
) +a
n
, . . . , a
0
= a
n
(2
n+1
) +a
n
(2
n
) +a
n1
, . . . , a
0
= a
n
((2
n+1
) + (2
n
)) +a
n1
, . . . , a
0
= a
n
(2 (2
n
) + (2
n
)) +a
n1
, . . . , a
0
= a
n
(2
n
) +a
n1
, . . . , a
0
= a
n1
, . . . , a
0
2s
n
c : Michael Kohlhase 268
We will now come to a major structural result for twos complement numbers. It will serve two
purposes for us:
1. It will show that the same circuits that produce the sum of binary numbers also produce
proper sums of twos complement numbers.
2. It states concrete conditions when a valid result is produced, namely when the last two
carry-bits are identical.
The TCN Main Theorem
Denition 432 Let a, b B
n+1
and c B with a = a
n
, . . . , a
0
and b = b
n
, . . . , b
0
,
then we call (ic
k
(a, b, c)), the k-th intermediate carry of a, b, and c, i
ic
k
(a, b, c), s
k1
, . . . , s
0
= a
k1
, . . . , a
0
+b
k1
, . . . , b
0
+c
for some s
i
B.
Theorem 433 Let a, b B
n
and c B, then
1. a
2s
n
+b
2s
n
+c 1
2s
n
, i (ic
n+1
(a, b, c)) = (ic
n
(a, b, c)).
2. If (ic
n+1
(a, b, c)) = (ic
n
(a, b, c)), then a
2s
n
+ b
2s
n
+ c = s
2s
n
, where
ic
n+1
(a, b, c), s
n
, . . . , s
0
= a +b +c.
c : Michael Kohlhase 269
Unfortunately, the proof of this attractive and useful theorem is quite tedious and technical
170
Proof of the TCN Main Theorem
Proof: Let us consider the sign-bits a
n
and b
n
separately from the value-bits a
t
=
a
n1
, . . . , a
0
and b
t
= b
n1
, . . . , b
0
.
P.1 Then
a
t
+b
t
+c = a
n1
, . . . , a
0
+b
n1
, . . . , b
0
+c
= ic
n
(a, b, c), s
n1
, . . . , s
0
and a
n
+b
n
+ (ic
n
(a, b, c)) = ic
n+1
(a, b, c), s
n
.
P.2 We have to consider three cases
P.2.1 a
n
= b
n
= 0:
P.2.1.1 a
2s
n
and b
2s
n
are both positive, so (ic
n+1
(a, b, c)) = 0 and furthermore
(ic
n
(a, b, c)) = 0 a
t
+b
t
+c 2
n
1
a
2s
n
+b
2s
n
+c 2
n
1
P.2.1.2 Hence,
a
2s
n
+b
2s
n
+c = a
t
+b
t
+c
= s
n1
, . . . , s
0
= 0, s
n1
, . . . , s
0
= s
2s
n
P.2.2 a
n
= b
n
= 1:
P.2.2.1 a
2s
n
and b
2s
n
are both negative, so (ic
n+1
(a, b, c)) = 1 and furthermore
(ic
n
(a, b, c)) = 1, i a
t
+b
t
+c 2
n
, which is the case, i a
2s
n
+b
2s
n
+c =
2
n+1
+a
t
+b
t
+c 2
n
P.2.2.2 Hence,
a
2s
n
+b
2s
n
+c = 2
n
+a
t
+2
n
+b
t
+c
= 2
n+1
+a
t
+b
t
+c
= 2
n+1
+1, s
n1
, . . . , s
0
= 2
n
+s
n1
, . . . , s
0
= s
2s
n
P.2.3 a
n
,= b
n
:
P.2.3.1 Without loss of generality assume that a
n
= 0 and b
n
= 1.
(then (ic
n+1
(a, b, c)) = (ic
n
(a, b, c)))
P.2.3.2 Hence, the sum of a
2s
n
and b
2s
n
is in the admissible range 1
2s
n
as
a
2s
n
+b
2s
n
+c = a
t
+b
t
+c 2
n
and (0 a
t
+b
t
+c 2
n+1
1)
P.2.3.3 So we have
a
2s
n
+b
2s
n
+c = 2
n
+a
t
+b
t
+c
= 2
n
+ic
n
(a, b, c), s
n1
, . . . , s
0
= (1 (ic
n
(a, b, c))) (2
n
) +s
n1
, . . . , s
0
= ic
n
(a, b, c), s
n1
, . . . , s
0
2s
n
P.2.3.4 Furthermore, we can conclude that ic
n
(a, b, c), s
n1
, . . . , s
0
2s
n
= s
2s
n
as s
n
=
a
n
b
n
(ic
n
(a, b, c)) = 1 (ic
n
(a, b, c)) = ic
n
(a, b, c).
P.3 Thus we have considered all the cases and completed the proof.
c : Michael Kohlhase 270
171
The Main Theorem for TCN again
Given two (n + 1)-bit twos complement numbers a and b. The above theorem tells
us that the result s of an (n + 1)-bit adder is the proper sum in twos complement
representation i the last two carries are identical.
If not, a and b were too large or too small. In the case that s is larger than 2
n
1, we
say that an overow occurred.In the opposite error case of s being smaller than 2
n
, we
say that an underow occurred.
c : Michael Kohlhase 271
11.6 Towards an Algorithmic-Logic Unit
The most important application of the main TCN theorem is that we can build a combinatorial
circuit that can add and subtract (depending on a control bit). This is actually the rst instance
of a concrete programmable computation device we have seen up to date (we interpret the control
bit as a program, which changes the behavior of the device). The fact that this is so simple, it
only runs two programs should not deter us; we will come up with more complex things later.
172
Building an Add/Subtract Unit
Idea: Build a Combinational Circuit that can add and subtract (sub = 1 subtract)
If sub = 0, then the circuit acts like an adder (a 0 = a)
If sub = 1, let S := a
2s
n
+b
n1
, . . . , b
0
2s
n
+ 1 (a 0 = 1 a)
For s 1
2s
n
the TCN main theorem and the TCN structure theorem together guarantee
s = a))
2s
n
+bn1, . . . , b0))
2s
n
+ 1
= a))
2s
n
b))
2s
n
1 + 1
n
A
n+1
n
n
s
sub
a b b
n1 0
Summary: We have built a combinational circuit that can perform 2 arithmetic operations
depending on a control bit.
Idea: Extend this to a arithmetic logic unit (ALU) with more operations
(+, -, *, /, n-AND, n-OR,. . . )
c : Michael Kohlhase 272
In fact extended variants of the very simple Add/Subtract unit are at the heart of any computer.
These are called arithmetic logic units.
173
12 Sequential Logic Circuits and Memory Elements
So far we have considered combinatorial logic, i.e. circuits for which the output depends only
on the inputs. In many instances it is desirable to have the next output depend on the current
output.
Sequential Logic Circuits
In combinational circuits, outputs only depend on inputs (no state)
We have disregarded all timing issues (except for favoring shallow circuits)
Denition 434 Circuits that remember their current output or state are often called
sequential logic circuits.
Example 435 A counter , where the next number to be output is determined by the
current number stored.
Sequential logic circuits need some ability to store the current state
c : Michael Kohlhase 273
Clearly, sequential logic requires the ability to store the current state. In other words, memory
is required by sequential logic circuits. We will investigate basic circuits that have the ability to
store bits of data. We will start with the simplest possible memory element, and develop more
elaborate versions from it.
The circuit we are about to introduce is the simplest circuit that can keep a state, and thus act
as a (precursor to) a storage element. Note that we are leaving the realm of acyclic graphs here.
Indeed storage elements cannot be realized with combinational circuits as dened above.
RS Flip-Flop
Denition 436 A RS-ipop (or RS-latch)is constructed by feeding the outputs
of two NOR gates back to the other NOR gates input. The inputs R and S
are referred to as the Reset and Set inputs, respectively.
R S Q Q
Comment
0 1 1 0 Set
1 0 0 1 Reset
0 0 Q Q
Hold state
1 1 ? ? Avoid
Note: the output Q is simply the inverse of Q. (supplied for convenience)
Note: An RS ipop can also be constructed from NAND gates.
c : Michael Kohlhase 274
T F
0 1 0
1 0 0
To understand the operation of the RS-ipop we rst reminde ourselves of the
truth table of the NOR gate on the right: If one of the inputs is 1, then the output
is 0, irrespective of the other. To understand the RS-ipop, we will go through
the input combinations summarized in the table above in detail. Consider the
following scenarios:
174
S = 1 and R = 0 The output of the bottom NOR gate is 0, and thus Q
t
= 0 irrespective of the
other input. So both inputs to the top NOR gate are 0, thus, Q = 1. Hence, the input
combination S = 1 and R = 0 leads to the ipop being set to Q = 1.
S = 0 and R = 1 The argument for this situation is symmetric to the one above, so the outputs
become Q = 0 and Q
t
= 1. We say that the ipop is reset.
S = 0 and R = 0 Assume the ipop is set (Q = 1 and Q
t
= 0), then the output of the top
NOR gate remains at Q = 1 and the bottom NOR gate stays at Q
t
= 0. Similarly, when
the ipop is in a reset state (Q = 0 and Q
t
= 1), it will remain there with this input
combination. Therefore, with inputs S = 0 and R = 0, the ipop remains in its state.
S = 1 and R = 1 This input combination will be avoided, we have all the functionality (set, reset,
and hold) we want from a memory element.
An RS-ipop is rarely used in actual sequential logic. However, it is the fundamental building
block for the very useful D-ipop.
The D-Flipop: the simplest memory device
Recap: A RS-ipop can store a state (set Q to 1 or reset Q to 0)
Problem: We would like to have a single data input and avoid R = S states.
Idea: Add interface logic to do just this
Denition 437 A D-Flipop is an RS-ipop with interface logic as below.
E D R S Q Comment
1 1 0 1 1 set Q to 1
1 0 1 0 0 reset Q to 0
0 D 0 0 Q hold Q
The inputs D and E
are called the data and enable inputs.
When E = 1 the value of D determines the value of the output Q, when E returns to
0, the most recent input D is remembered.
c : Michael Kohlhase 275
Sequential logic circuits are constructed from memory elements and combinatorial logic gates.
The introduction of the memory elements allows these circuits to remember their state. We will
illustrate this through a simple example.
175
Example: On/O Switch
Problem: Pushing a button toggles a LED between on and o.
(rst push switches the LED on, second push o,. . . )
Idea: Use a D-ipop (to remember whether the LED is currently on or o) connect its
Q
t
ouput to its D input (next state is inverse of current state)
c : Michael Kohlhase 276
In the on/o circuit, the external inputs (buttons) were connected to the E input.
Denition 438 Such circuits are often called asynchronous as they keep track of events that
occur at arbitrary instants of time, synchronous circuits in contrast operate on a periodic basis
and the Enable input is connected to a common clock signal.
Random Access Memory Chips
Random access memory (RAM) is used for storing a large number of bits.
RAM is made up of storage elements similar to the D-ipops we discussed.
Principally, each storage element has a unique number or address represented in binary
form.
When the address of the storage element is provided to the RAM chip, the corresponding
memory element can be written to or read from.
We will consider the following questions:
What is the physical structure of RAM chips?
How are addresses used to select a particular storage element?
What do individual storage elements look like?
How is reading and writing distinguished?
c : Michael Kohlhase 277
176
Address Decoder Logic
Idea: Need a circuit that activates the storage element given the binary address:
At any time, only 1 output line is on and all others are o.
The line that is on species the desired element
Denition 439 The n-bit address decoder ADL
n
has a n inputs and 2
n
outputs.
f
m
ADL
(a) = b
1
, . . . , b
2
n, where b
i
= 1, i i = a.
Example 440 (Address decoder logic for 2-bit addresses)
c : Michael Kohlhase 278
Storage Elements
Idea (Input): Use a D-ipop connect its E input to the ADL output.
Connect the D-input to the common RAM data input line. (input only if addressed)
Idea (Output): Connect the ipop output to common RAM output line. But rst AND
with ADL output (output only if addressed)
Problem: The read process should leave the value of the gate unchanged.
Idea: Introduce a write enable signal(protect data during read) AND it with the ADL
output and connect it to the ipops E input.
Denition 441 A Storage Element is given by the foolowing diagram
c : Michael Kohlhase 279
177
Remarks
The storage elements are often simplied to reduce the number of transistors.
For example, with care one can replace the ipop by a capacitor.
Also, with large memory chips it is not feasible to connect the data input and output
and write enable lines directly to all storage elements.
Also, with care one can use the same line for data input and data output.
Today, multi-gigabyte RAM chips are on the market.
The capacity of RAM chips doubles approximately every year.
c : Michael Kohlhase 280
Layout of Memory Chips
To take advantage of the two-dimensional nature of the chip, storage elements are ar-
ranged on a square grid. (columns and rows of storage elements)
For example, a 1 Megabit RAM chip has of 1024 rows and 1024 columns.
identy storage element by its row and column coordinates.(AND them for addressing)
Hence, to select a particular storage location the address information must be translated
into row and column specication.
The address information is divided into two halfs; the top half is used to select the row
and the bottom half is used to select the column.
c : Michael Kohlhase 281
178
13 Machines
13.1 How to build a Computer (in Principle)
In this part of the course, we will learn how to use the very simple computational devices we
built in the last section and extend them to fully programmable devices using the von Neumann
Architecture. For this, we need random access memory (RAM).
For our purposes, we just understand n-bit memory cells as devices that can store n binary
values. They can be written to, (after which they store the n values at their n input edges), and
they can be queried: then their output edges have the n values that were stored in the memory
cell. Querying a memory cell does not change the value stored in it.
Our notion of time is similarly simple, in our analysis we assume a series of discrete clock ticks
that synchronize all events in the circuit. We will only observe the circuits on each clock tick and
assume that all computational devices introduced for the register machine complete computation
before the next tick. Real circuits, also have a clock that synchronizes events (the clock frequency
(currently around 3 GHz for desktop CPUs) is a common approximation measure of processor
performance), but the assumption of elementary computations taking only one click is wrong in
production systems.
How to Build a Computer (REMA; the Register Machine)
Take an n-bit arithmetic logic unit (ALU)
add registers: few (named) n-bit memory cells near the ALU
program counter (PC) (points to current command in program store)
accumulator (ACC) (the a input and output of the ALU)
add RAM: lots of random access memory (elsewhere)
program store: 2n-bit memory cells (addressed by P : N B
2n
)
data store: n-bit memory cells (words addressed by D: N B
n
)
add a memory management unit(MMU) (move values between RAM and registers)
program it in assembler language (lowest level of programming)
c : Michael Kohlhase 282
We have three kinds of memory areas in the REMA register machine: The registers (our architecture
has two, which is the minimal number, real architectures have more for convenience) are just simple
n-bit memory cells.
The programstore is a sequence of up to 2
n
memory 2n-bit memory cells, which can be accessed
(written to and queried) randomly i.e. by referencing their position in the sequence; we do not have
to access them by some xed regime, e.g. one after the other, in sequence (hence the name random
access memory: RAM). We address the Program store by a function P : N B
2n
. The data store
is also RAM, but a sequence or n-bit cells, which is addressed by the function D: N B
n
.
The value of the program counter is interpreted as a binary number that addresses a 2n-bit cell
in the program store. The accumulator is the register that contains one of the inputs to the ALU
before the operation (the other is given as the argument of the program instruction); the result of
the ALU is stored in the accumulator after the instruction is carried out.
179
Memory Plan of a Register Machine
ACC (accumulator)
IN1 (index register 1)
IN2 (index register 2)
PC (program counter)
save
load
P
r
o
g
r
a
m
Addresses
Program Store
2nbit Cells
Data Store
CPU
Addresses
2
3
1
0 Operation Argument
nbit Cells
3
2
1
0
c : Michael Kohlhase 283
The ALU and the MMU are control circuits, they have a set of n-bit inputs, and n-bit outputs,
and an n-bit control input. The prototypical ALU, we have already seen, applies arithmetic or
logical operator to its regular inputs according to the value of the control input. The MMU is
very similar, it moves n-bit values between the RAM and the registers according to the value at
the control input. We say that the MMU moves the (n-bit) value from a register R to a memory
cell C, i after the move both have the same value: that of R. This is usually implemented as a
query operation on R and a write operation to C. Both the ALU and the MMU could in principle
encode 2
n
operators (or commands), in practice, they have fewer, since they share the command
space.
Circuit Overview over the CPU
ALU
Operation Argument
ACC
Program Store
Logic
Address
PC
c : Michael Kohlhase 284
In this architecture (called the register machine architecture), programs are sequences of 2n-
bit numbers. The rst n-bit part encodes the instruction, the second one the argument of the
instruction. The program counter addresses the current instruction (operation + argument).
We will now instantiate this general register machine with a concrete (hypothetical) realization,
which is sucient for general programming, in principle. In particular, we will need to identify a
set of program operations. We will come up with 18 operations, so we need to set n 5. It is
possible to do programming with n = 4 designs, but we are interested in the general principles
more than optimization.
180
The main idea of programming at the circuit level is to map the operator code (an n-bit binary
number) of the current instruction to the control input of the ALU and the MMU, which will then
perform the action encoded in the operator.
Since it is very tedious to look at the binary operator codes (even it we present them as hexadecimal
numbers). Therefore it has become customary to use a mnemonic encoding of these in simple word
tokens, which are simpler to read, the assembler language.
Assembler Language
Idea: Store program instructions as n-bit values in program store, map these to control
inputs of ALU, MMU.
Denition 442 assembler language (ASM)as mnemonic encoding of n-bit binary codes.
instruction eect PC comment
LOAD i ACC: = D(i) PC: = PC +1 load data
STORE i D(i): = ACC PC: = PC +1 store data
ADD i ACC: = ACC +D(i) PC: = PC +1 add to ACC
SUB i ACC: = ACC D(i) PC: = PC +1 subtract from ACC
LOADI i ACC: = i PC: = PC +1 load number
ADDI i ACC: = ACC +i PC: = PC +1 add number
SUBI i ACC: = ACC i PC: = PC +1 subtract number
c : Michael Kohlhase 285
Denition 443 The meaning of the program instructions are specied in their ability to change
the state of the memory of the register machine. So to understand them, we have to trace the
state of the memory over time (looking at a snapshot after each clock tick; this is what we do
in the comment elds in the tables on the next slide). We speak of an imperative programming
language, if this is the case.
Example 444 This is in contrast to the programming language SML that we have looked at
before. There we are not interested in the state of memory. In fact state is something that we
want to avoid in such functional programming languages for conceptual clarity; we relegated all
things that need state into special constructs: eects.
To be able to trace the memory state over time, we also have to think about the initial state of the
register machine (e.g. after we have turned on the power). We assume the state of the registers
and the data store to be arbitrary (who knows what the machine has dreamt). More interestingly,
we assume the state of the program store to be given externally. For the moment, we may assume
(as was the case with the rst computers) that the program store is just implemented as a large
array of binary switches; one for each bit in the program store. Programming a computer at that
time was done by ipping the switches (2n) for each instructions. Nowadays, parts of the initial
program of a computer (those that run, when the power is turned on and bootstrap the operating
system) is still given in special memory (called the rmware) that keeps its state even when power
is shut o. This is conceptually very similar to a bank of switches.
181
Example Programs
Example 445 Exchange the values of cells 0 and 1 in the data store
P instruction comment
0 LOAD 0 ACC: = D(0) = x
1 STORE 2 D(2): = ACC = x
2 LOAD 1 ACC: = D(1) = y
3 STORE 0 D(0): = ACC = y
4 LOAD 2 ACC: = D(2) = x
5 STORE 1 D(1): = ACC = x
Example 446 Let D(1) = a, D(2) = b, and D(3) = c, store a +b +c in data cell 4
P instruction comment
0 LOAD 1 ACC: = D(1) = a
1 ADD 2 ACC: = ACC +D(2) = a +b
2 ADD 3 ACC: = ACC +D(3) = a +b +c
3 STORE 4 D(4): = ACC = a +b +c
use LOADI i, ADDI i, SUBI i to set/increment/decrement ACC (impossible otherwise)
c : Michael Kohlhase 286
So far, the problems we have been able to solve are quite simple. They had in common that we had
to know the addresses of the memory cells we wanted to operate on at programming time, which
is not very realistic. To alleviate this restriction, we will now introduce a new set of instructions,
which allow to calculate with addresses.
Index Registers
Problem: Given D(0) = x and D(1) = y, how to we store y into cell x of the data store?
(impossible, as we have only absolute addressing)
Denition 447 (Idea) introduce more registers and register instructions
(IN1, IN2 suce)
instruction eect PC comment
LOADIN j i ACC: = D(INj +i) PC: = PC +1 relative load
STOREIN j i D(INj +i): = ACC PC: = PC +1 relative store
MOVE S T T : = S PC: = PC +1 move register S (source)
to register T (target)
Problem Solution:
P instruction comment
0 LOAD 0 ACC: = D(0) = x
1 MOVE ACC IN1 IN1: = ACC = x
2 LOAD 1 ACC: = D(1) = y
3 STOREIN 1 0 D(x) = D(IN1 +0): = ACC = y
c : Michael Kohlhase 287
Note that the LOADIN are not binary instructions, but that this is just a short notation for unary
instructions LOADIN 1 and LOADIN 2 (and similarly for MOVE S T).
Note furthermore, that the addition logic in LOADIN j is simply for convenience (most assembler
182
languages have it, since working with address osets is commonplace). We could have always
imitated this by a simpler relative load command and an ADD instruction.
A very important ability we have to add to the language is a set of instructions that allow us to
re-use program fragments multiple times. If we look at the instructions we have seen so far, then
we see that they all increment the program counter. As a consequence, program execution is a
linear walk through the program instructions: every instruction is executed exactly once. The
set of problems we can solve with this is extremely limited. Therefore we add a new kind of
instruction. Jump instructions directly manipulate the program counter by adding the argument
to it (note that this partially invalidates the circuit overview slide above
21
, but we will not worry EdNote:21
about this).
Another very important ability is to be able to change the program execution under certain
conditions. In our simple language, we will only make jump instructions conditional (this is
sucient, since we can always jump the respective instruction sequence that we wanted to make
conditional). For convenience, we give ourselves a set of comparison relations (two would have
suced, e.g. = and <) that we can use to test.
Jump Instructions
Problem: Until now, we can only write linear programs
(A program with n steps executes n instructions)
Idea: Need instructions that manipulate the PC directly
Denition 448 Let 1 <, =, >, , ,=, be a comparison relation
instruction eect PC comment
JUMP i PC: = PC +i jump forward i steps
JUMPR i PC: =
PC +i if 1(ACC, 0)
PC +1 else
conditional jump
Denition 449 (Two more)
instruction eect PC comment
NOP i PC: = PC +1 no operation
STOP i stop computation
c : Michael Kohlhase 288
The nal addition to the language are the NOP (no operation) and STOP operations. Both do not
look at their argument (we have to supply one though, so we t our instruction format). the NOP
instruction is sometimes convenient, if we keep jump osets rational, and the STOP instruction
terminates the program run (e.g. to give the user a chance to look at the results.)
21
EdNote: reference
183
Example Program
Now that we have completed the language, let us see what we can do.
Example 450 Let D(0) = n, D(1) = a, and D(2) = b, copy the values of cells
a, . . . , a +n 1 to cells b, . . . , b +n 1, while a, b 3 and [a b[ n.
P instruction comment P instruction comment
0 LOAD 1 ACC: = a 10 MOVE ACC IN1 IN1: = IN1 +1
1 MOVE ACC IN1 IN1: = a 11 MOVE IN2 ACC
2 LOAD 2 ACC: = b 12 ADDI 1
3 MOVE ACC IN2 IN2: = b 13 MOVE ACC IN2 IN2: = IN2 +1
4 LOAD 0 ACC: = n 14 LOAD 0
5 JUMP= 13 if n = 0 then stop 15 SUBI 1
6 LOADIN 1 0 ACC: = D(IN1) 16 STORE 0 D(0): = D(0) 1
7 STOREIN 2 0 D(IN2): = ACC 17 JUMP 12 goto step 5
8 MOVE IN1 ACC 18 STOP 0 Stop
9 ADDI 1
Lemma 451 We have D(0) = n (i 1), IN1 = a +i 1, and IN2 = b +i 1 for
all (1 i n + 1). (the program does what we want)
proof by induction on n.
Denition 452 The induction hypotheses are called loop invariants.
c : Michael Kohlhase 289
13.2 How to build a SML-Compiler (in Principle)
13.2.1 A Stack-based Virtual Machine
In this part of the course, we will build a compiler for a simple functional programming language.
A compiler is a program that examines a program in a high-level programming language and
transforms it into a program in a language that can be interpreted by an existing computation
engine, in our case, the register machine we discussed above.
We have seen that our register machine runs programs written in assembler, a simple machine
language expressed in two-word instructions. Machine languages should be designed such that on
the processors that can be built machine language programs can execute eciently. On the other
hand machine languages should be built, so that programs in a variety of high-level programming
languages can be transformed automatically (i.e. compiled) into ecient machine programs. We
have seen that our assembler language ASM is a serviceable, if frugal approximation of the rst goal
for very simple processors. We will now show that it also satises the second goal by exhibiting a
compiler for a simple SML-like language.
In the last 20 years, the machine languages for state-of-the art processors have hardly changed.
This stability was a precondition for the enormous increase of computing power we have witnessed
during this time. At the same time, high-level programming languages have developed consider-
ably, and with them, their needs for features in machine-languages. This leads to a signicant
mismatch, which has been bridged by the concept of a virtual machine.
Denition 453 A virtual machine is a simple machine-language program that interprets a slightly
higher-level program the byte code and simulates it on the existing processor.
Byte code is still considered a machine language, just that it is realized via software on a real
computer, instead of running directly on the machine. This allows to keep the compilers simple
while only paying a small price in eciency.
In our compiler, we will take this approach, we will rst build a simple virtual machine (an ASM
program) and then build a compiler that translates functional programs into byte code.
184
Virtual Machines
Question: How to run high-level programming languages (like SML) on REMA?
Answer: By providing a compiler, i.e. an ASM program that reads SML programs (as
data) and transforms them into ASM programs.
But: ASM is optimized for building simple, ecient processors, not as a translation target!
Idea: Build an ASM program VM that interprets a better translation target language
(interpret REMA+VM as a virtual machine)
Denition 454 An ASM program VM is called a virtual machine for L(VM), i VM inputs
a L(VM) program (as data) and runs it on REMA.
Plan: Instead of building a compiler for SML to ASM, build a virtual machine VM for REMA
and a compiler from SML to L(VM). (simpler and more transparent)
c : Michael Kohlhase 290
A Virtual Machine for Functional Programming
We will build a stack-based virtual machine; this will have four components
Command Interpreter
Stack Program Store
VPC
The stack is a memory segment operated as a last-in-rst-out LIFO sequence
The program store is a memory segment interpreted as a sequence of instructions
The command interpreter is a ASM program that interprets commands from the pro-
gram store and operates on the stack.
The virtual program counter (VPC) is a register that acts as a the pointer to the
current instruction in the program store.
The virtual machine starts with the empty stack and VPC at the beginning of the
program.
c : Michael Kohlhase 291
185
A Stack-Based VM language (Arithmetic Commands)
Denition 455 VM Arithmetic Commands act on the stack
instruction eect VPC
con i pushes i onto stack VPC: = VPC + 2
add pop x, pop y, push x +y VPC: = VPC + 1
sub pop x, pop y, push x y VPC: = VPC + 1
mul pop x, pop y, push x y VPC: = VPC + 1
leq pop x, pop y, if x y push 1, else push 0 VPC: = VPC + 1
Example 456 The L(VM) program con 4 con 7 add pushes 7 +4 = 11 to the stack.
Example 457 Note the order of the arguments: the program con 4 con 7 sub rst
pushes 4, and then 7, then pops x and then y (so x = 7 and y = 4) and nally pushes
x y = 7 4 = 3.
Stack-based operations work very well with the recursive structure of arithmetic expres-
sions: we can compute the value of the expression 4 3 7 2 with
con 2 con 7 mul 7 2
con 3 con 4 mul 4 3
sub 4 3 7 2
c : Michael Kohlhase 292
Note: A feature that we will see time and again is that every (syntactically well-formed) expression
leaves only the result value on the stack. In the present case, the computation never touches the
part of the stack that was present before computing the expression. This is plausible, since the
computation of the value of an expression is purely functional, it should not have an eect on the
state of the virtual machine VM (other than leaving the result of course).
A Stack-Based VM language (Control)
Denition 458 Control operators
instruction eect VPC
jp i VPC: = VPC +i
cjp i pop x if x = 0, then VPC: = VPC +i else VPC: = VPC + 2
halt
cjp is a jump on false-type expression.(if the condition is false, we jump else we continue)
Example 459 For conditional expressions we use the conditional jump expressions: We
can express if 1 2 then 4 3 else 7 5 by the program
con 2 con 1 leq cjp 9 if 1 2
con 3 con 4 sub jp 7 then 4 3
con 5 con 7 mul else 7 5
halt
c : Michael Kohlhase 293
In the example, we rst push 2, and then 1 to the stack. Then leq pops (so x = 1), pops again
186
(making y = 2) and computes x y (which comes out as true), so it pushes 1, then it continues
(it would jump to the else case on false).
Note: Again, the only eect of the conditional statement is to leave the result on the stack. It
does not touch the contents of the stack at and below the original stack pointer.
A Stack-Based VM language (Imperative Variables)
Denition 460 Imperative access to variables: Let o(i) be the number at stack position
i.
instruction eect VPC
peek i push S(i) VPC: = VPC + 2
poke i pop x S(i): = x VPC: = VPC + 2
Example 461 The program con 5 con 7 peek 0 peek 1 add poke 1 mul halt
computes 5 (7 + 5) = 60.
c : Michael Kohlhase 294
Of course the last example is somewhat contrived, this is certainly not the best way to compute
5 (7 + 5) = 60, but it does the trick.
Extended Example: A while Loop
Example 462 Consider the following program that computes (12)! and the correspond-
ing L(VM) program:
var n := 12; var a := 1; con 12 con 1
while 2 <= n do ( peek 0 con 2 leq cjp 18
a := a * n; peek 0 peek 1 mul poke 1
n := n - 1; con 1 peek 0 sub poke 0
) jp 21
return a; peek 1 halt
Note that variable declarations only push the values to the stack, (memory allocation)
they are referenced by peeking the respective stack position
they are assigned by pokeing the stack position (must remember that)
c : Michael Kohlhase 295
We see that again, only the result of the computation is left on the stack. In fact, the code snippet
consists of two variable declarations (which extend the stack) and one while statement, which
does not, and the return statement, which extends the stack again. In this case, we see that
even though the while statement does not extend the stack it does change the stack below by the
variable assignments (implemented as poke in L(VM)). We will use the example above as guiding
intuition for a compiler from a simple imperative language to L(VM) byte code below. But rst we
build a virtual machine for L(VM).
We will now build a virtual machine for L(VM) along the specication above.
187
A Virtual Machine for L(VM)
We need to build a concrete ASM program that acts as a virtual machine for L(VM).
Choose a concrete register machine size: e.g. 32-bit words (like in a PC)
Choose memory layout in the data store
the VM stack: D(8) to D(2
24
1), and (need the rst 8 cells for VM data)
the L(VM) program store: D(2
24
) to D(2
32
1)
We represent the virtual program counter VPC by the index register IN1 and the
stack pointer by the index register IN2 (with oset 8).
We will use D(0) as an argument store.
choose a numerical representation for the L(VM) instructions: (have lots of space)
halt 0, add 1, sub 2, . . .
c : Michael Kohlhase 296
Recall that the virtual machine VM is a ASM program, so it will reside in the REMA program store.
This is the program executed by the register machine. So both the VM stack and the L(VM) program
have to be stored in the REMA data store (therefore we treat L(VM) programs as sequences of words
and have to do counting acrobatics for instructions of diering length). We somewhat arbitrarily
x a boundary in the data store of REMA at cell number 2
24
1. We will also need a little piece
of scratch-pad memory, which we locate at cells 0-7 for convenience (then we can simply address
with absolute numbers as addresses).
Memory Layout for the Virtual Machine
Scratch Area
Program
Stack
Program Store
2nbit Cells
CPU
Operation Argument
Data Store
ACC (accumulator)
IN1 (VM prog. cnt.)
PC (program counter)
IN3 (frame pointer)
IN2 (stack pointer)
for VM
ASM Program
nbit Cells
c : Michael Kohlhase 297
188
Extending REMA and ASM
Give ourselves another register IN3 (and LOADIN 3, STOREIN 3, MOVE IN3,
MOVE IN3 )
We will use a syntactic variant of ASM for transparency
JUMP and JUMP
1
with labels of the form foo
(compute relative jump distances automatically)
inc R for MOVE R ACC, ADDI 1, MOVE ACC R (dec R similar)
note that inc R and dec R overwrite the current ACC (take care of it)
All additions can be eliminated by substitution.
c : Michael Kohlhase 298
With these extensions, it is quite simple to write the ASM code that implements the virtual machine
VM. The rst part is a simple jump table, a piece of code that does nothing else than distributing
the program ow according to the (numerical) instruction head. We assume that this program
segment is located at the beginning of the program store, so that the REMA program counter points
to the rst instruction. This initializes the VM program counter and its stack pointer to the rst
cells of their memory segments. We assume that the L(VM) program is already loaded in its proper
location, since we have not discussed input and output for REMA.
Starting VM: the Jump Table
label instruction eect comment
LOADI 2
24
ACC: = 2
24
load VM start address
MOVE ACC IN1 VPC: = ACC set VPC
LOADI 7 ACC: = 7 load top of stack address
MOVE ACC IN2 SP: = ACC set SP
jt) LOADIN 1 0 ACC: = D(IN1) load instruction
JUMP= halt) goto halt)
SUBI 1 next instruction code
JUMP= add) goto add)
SUBI 1 next instruction code
JUMP= sub) goto sub)
.
.
.
.
.
.
.
.
.
halt) STOP 0 stop
.
.
.
.
.
.
.
.
.
c : Michael Kohlhase 299
Now it only remains to present the ASM programs for the individual L(VM) instructions. We
will start with the arithmetical operations. The code for con is absolutely straightforward: we
increment the VM program counter to point to the argument, read it, and store it to the cell the
(suitably incremented) VM stack pointer points to. Once procedure has been executed we increment
the VM program counter again, so that it points to the next L(VM) instruction, and jump back to
the beginning of the jump table.
For the add instruction we have to use the scratch pad area, since we have to pop two values
from the stack (and we can only keep one in the accumulator). We just cache the rst value in
cell 0 of the program store.
189
Implementing Arithmetic Operators
label instruction eect comment
con) inc IN1 VPC: = VPC + 1 point to arg
inc IN2 SP: = SP + 1 prepare push
LOADIN 1 0 ACC: = D(VPC) read arg
STOREIN 2 0 D(SP): = ACC store for push
inc IN1 VPC: = VPC + 1 point to next
JUMP jt) jump back
add) LOADIN 2 0 ACC: = D(SP) read arg 1
STORE 0 D(0): = ACC cache it
dec IN2 SP: = SP 1 pop
LOADIN 2 0 ACC: = D(SP) read arg 2
ADD 0 ACC: = ACC +D(0) add cached arg 1
STOREIN 2 0 D(SP): = ACC store it
inc IN1 VPC: = VPC + 1 point to next
JUMP jt) jump back
sub, mul, and leq similar to add.
c : Michael Kohlhase 300
For example, mul could be implemented as follows:
label instruction eect comment
mul dec IN2 SP: = SP 1
LOADI 0
STORE 1 D(1): = 0 initialize result
LOADIN 2 1 ACC: = D(SP + 1) read arg 1
STORE 0 D(0): = ACC initialize counter to arg 1
loop JUMP
=
end if counter=0, we are nished
LOADIN 2 0 ACC: = D(SP) read arg 2
ADD 1 ACC: = ACC +D(1) current sum increased by arg 2
STORE 1 D(1): = ACC cache result
LOAD 0
SUBI 1
STORE 0 D(0): = D(0) 1 decrease counter by 1
JUMP loop repeat addition
end LOAD 1 load result
STOREIN 2 0 push it on stack
inc IN1
JUMP jt back to jump table
Note that mul is the only instruction whose corresponding piece of code is not of the unit
complexity. For the jump instructions, we do exactly what we would expect, we load the jump
distance, add it to the register IN1, which we use to represent the VM program counter VPC.
Incidentally, we can use the code for jp for the conditional jump cjp.
190
Control Instructions
label instruction eect comment
jp) MOVE IN1 ACC ACC: = VPC
STORE 0 D(0): = ACC cache VPC
LOADIN 1 1 ACC: = D(VPC + 1) load i
ADD 0 ACC: = ACC +D(0) compute new VPC value
MOVE ACC IN1 IN1: = ACC update VPC
JUMP jt) jump back
cjp) dec IN2 SP: = SP 1 update for pop
LOADIN 2 1 ACC: = D(SP + 1) pop value to ACC
JUMP= jp) perform jump if ACC = 0
MOVE IN1 ACC otherwise, go on
ADDI 2
MOVE ACC IN1 VPC: = VPC + 2 point to next
JUMP jt) jump back
c : Michael Kohlhase 301
Imperative Stack Operations: peek
label instruction eect comment
peek) MOVE IN1 ACC ACC: = IN1
STORE 0 D(0): = ACC cache VPC
LOADIN 1 1 ACC: = D(VPC + 1) load i
MOVE ACC IN1 IN1: = ACC
inc IN2 prepare push
LOADIN 1 8 ACC: = D(IN1 +8) load S(i)
STOREIN 2 0 push S(i)
LOAD 0 ACC: = D(0) load old VPC
ADDI 2 compute new value
MOVE ACC IN1 update VPC
JUMP jt) jump back
c : Michael Kohlhase 302
Imperative Stack Operations: poke
label instruction eect comment
poke) MOVE IN1 ACC ACC: = IN1
STORE 0 D(0): = ACC cache VPC
LOADIN 1 1 ACC: = D(VPC + 1) load i
MOVE ACC IN1 IN1: = ACC
LOADIN 2 0 ACC: = S(i) pop to ACC
STOREIN 1 8 D(IN1 +8): = ACC store in S(i)
dec IN2 IN2: = IN2 1
LOAD 0 ACC: = D(0) get old VPC
ADD 2 ACC: = ACC +2 add 2
MOVE ACC IN1 update VPC
JUMP jt) jump back
c : Michael Kohlhase 303
13.2.2 A Simple Imperative Language
We will now build a compiler for a simple imperative language to warm up to the task of building
one for a functional one. We will write this compiler in SML, since we are most familiar with this.
The rst step is to dene the language we want to talk about.
191
A very simple Imperative Programming Language
Plan: Only consider the bare-bones core of a language.
(we are only interested in principles)
We will call this language SW (Simple While Language)
no types: all values have type int, use 0 for false all other numbers for true.
only worry about abstract syntax (we do not want to build a parser) We will realize
this as an SML data type.
c : Michael Kohlhase 304
The following slide presents the SML data types for SW programs.
Abstract Syntax of SW
Denition 463 type id = string (* identifier *)
datatype exp = (* expression *)
Con of int (* constant *)
| Var of id (* variable *)
| Add of exp* exp (* addition *)
| Sub of exp * exp (* subtraction *)
| Mul of exp * exp (* multiplication *)
| Leq of exp * exp (* less or equal test *)
datatype sta = (* statement *)
Assign of id * exp (* assignment *)
| If of exp * sta * sta (* conditional *)
| While of exp * sta (* while loop *)
| Seq of sta list (* sequentialization *)
type declaration = id * exp
type program = declaration list * sta * exp
c : Michael Kohlhase 305
A SW program (see the next slide for an example) rst declares a set of variables (type declaration),
executes a statement (type sta), and nally returns an expression (type exp). Expressions of SW
can read the values of variables, but cannot change them. The statements of SW can read and
change the values of variables, but do not return values (as usual in imperative languages). Note
that SW follows common practice in imperative languages and models the conditional as a state-
ment.
Concrete vs. Abstract Syntax of a SW Program
var n:= 12; var a:= 1;
while 2<=n do
a:= a*n;
n:= n-1
end
return a
([ ("n", Con 12), ("a", Con 1) ],
While(Leq(Con 2, Var"n"),
Seq [Assign("a", Mul(Var"a", Var"n")),
Assign("n", Sub(Var"n", Con 1))]
),
Var"a")
c : Michael Kohlhase 306
192
As expected, the program is represented as a triple: the rst component is a list of declarations, the
second is a statement, and the third is an expression (in this case, the value of a single variable).
We will use this example as the guiding intuition for building a compiler.
Before we can come to the implementation of the compiler, we will need an infrastructure for
environments.
Needed Infrastructure: Environments
Need a structure to keep track of the values of declared identiers.
(take shadowing into account)
Denition 464 An environment is a nite partial function from keys (identiers) to
values.
We will need the following operations on environments:
creation of an empty environment ( the empty function)
insertion of a key/value pair k, v into an environment : (, [v/k])
lookup of the value v for a key k in ((k))
Realization in SML by a structure with the following signature
type a env (* a is the value type *)
exception Unbound of id (* Unbound *)
val empty : a env
val insert : id * a * a env -> a env (* id is the key type *)
val lookup : id * a env -> a
c : Michael Kohlhase 307
We will also need an SML type for L(VM) programs. Fortunately, this is very simple.
An SML Data Type for L(VM) Programs
type index = int
type noi = int (* number of instructions *)
datatype instruction =
con of int
| add | sub | mul (* addition, subtraction, multiplication *)
| leq (* less or equal test *)
| jp of noi (* unconditional jump *)
| cjp of noi (* conditional jump *)
| peek of index (* push value from stack *)
| poke of index (* update value in stack *)
| halt (* halt machine *)
type code = instruction list
fun wlen (xs:code) = foldl (fn (x,y) => wln(x)+y) 0 xs
fun wln(con _)=2 | wln(add)=1 | wln(sub)=1 | wln(mul)=1 | wln(leq)=1
| wln(jp _)=2 | wln(cjp _)=2
| wln(peek _)=2 | wln(poke _)=2 | wln(halt)=1
c : Michael Kohlhase 308
The next slide has the main SML function for compiling SW programs. Its argument is a SW program
(type program) and its result is an expression of type code, i.e. a list of L(VM) instructions. From
193
there, we only need to apply a simple conversion (which we omit) to numbers to obtain L(VM)
byte code.
Compiling SW programs
SML function from SW programs (type program) to L(VM) programs (type code).
uses three auxiliary functions for compiling declarations (compileD), statements
(compileS), and expressions (compileE).
these use an environment to relate variable names with their stack index.
the initial environment is created by the declarations.
(therefore compileD has an environment as return value)
type env = index env
fun compile ((ds,s,e) : program) : code =
let
val (cds, env) = compileD(ds, empty, ~1)
in
cds @ compileS(s,env) @ compileE(e,env) @ [halt]
end
c : Michael Kohlhase 309
The next slide has the function for compiling SW expressions. It is realized as a case statement
over the structure of the expression.
Compiling SW Expressions
constants are pushed to the stack.
variables are looked up in the stack by the index determined by the environment (and
pushed to the stack).
arguments to arithmetic operations are pushed to the stack in reverse order.
fun compileE (e:exp, env:env) : code =
case e of
Con i => [con i]
| Var i => [peek (lookup(i,env))]
| Add(e1,e2) => compileE(e2, env) @ compileE(e1, env) @ [add]
| Sub(e1,e2) => compileE(e2, env) @ compileE(e1, env) @ [sub]
| Mul(e1,e2) => compileE(e2, env) @ compileE(e1, env) @ [mul]
| Leq(e1,e2) => compileE(e2, env) @ compileE(e1, env) @ [leq]
c : Michael Kohlhase 310
Compiling SW statements is only slightly more complicated: the constituent statements and ex-
pressions are compiled rst, and then the resulting code fragments are combined by L(VM) control
instructions (as the fragments already exist, the relative jump distances can just be looked up).
For a sequence of statements, we just map compileS over it using the respective environment.
194
Compiling SW Statements
fun compileS (s:sta, env:env) : code =
case s of
Assign(i,e) => compileE(e, env) @ [poke (lookup(i,env))]
| If(e,s1,s2) =>
let
val ce = compileE(e, env)
val cs1 = compileS(s1, env)
val cs2 = compileS(s2, env)
in
ce @ [cjp (wlen cs1 + 4)] @ cs1 @ [jp (wlen cs2 + 2)] @ cs2
end
| While(e, s) =>
let
val ce = compileE(e, env)
val cs = compileS(s, env)
in
ce @ [cjp (wlen cs + 4)] @ cs @ [jp (~(wlen cs + wlen ce + 2))]
end
| Seq ss => foldr (fn (s,c) => compileS(s,env) @ c) nil ss
c : Michael Kohlhase 311
As we anticipated above, the compileD function is more complex than the other two. It gives
L(VM) program fragment and an environment as a value and takes a stack index as an additional
argument. For every declaration, it extends the environment by the key/value pair k/v, where k
is the variable name and v is the next stack index (it is incremented for every declaration). Then
the expression of the declaration is compiled and prepended to the value of the recursive call.
Compiling SW Declarations
fun compileD (ds: declaration list, env:env, sa:index): code*env =
case ds of
nil => (nil,env)
| (i,e)::dr => let
val env = insert(i, sa+1, env)
val (cdr,env) = compileD(dr, env, sa+1)
in
(compileE(e,env) @ cdr, env)
end
c : Michael Kohlhase 312
This completes the compiler for SW (except for the byte code generator which is trivial and an
implementation of environments, which is available elsewhere). So, together with the virtual
machine for L(VM) we discussed above, we can run SW programs on the register machine REMA.
If we now use the REMA simulator from exercise
22
, then we can run SW programs on our com- EdNote:22
puters outright.
One thing that distinguishes SW from real programming languages is that it does not support
procedure declarations. This does not make the language less expressive in principle, but makes
structured programming much harder. The reason we did not introduce this is that our virtual
machine does not have a good infrastructure that supports this. Therefore we will extend L(VM)
with new operations next.
Note that the compiler we have seen above produces L(VM) programs that have what is often
called memory leaks. Variables that we declare in our SW program are not cleaned up before the
program halts. In the current implementation we will not x this (We would need an instruction
22
EdNote: include the exercises into the course materials and reference the right one here
195
for our VM that will pop a variable without storing it anywhere or that will simply decrease
virtual stack pointer by a given value.), but we will get a better understanding for this when we
talk about the static procedures next.
Compiling the Extended Example: A while Loop
Example 465 Consider the following program that computes (12)! and the correspond-
ing L(VM) program:
var n := 12; var a := 1; con 12 con 1
while 2 <= n do ( peek 0 con 2 leq cjp 18
a := a * n; peek 0 peek 1 mul poke 1
n := n - 1; con 1 peek 0 sub poke 0
) jp 21
return a; peek 1 halt
Note that variable declarations only push the values to the stack, (memory allocation)
they are referenced by peeking the respective stack position
they are assigned by pokeing the stack position (must remember that)
c : Michael Kohlhase 313
Denition 466 In general, we need an environment and an instruction sequence to represent a
procedure, but in many cases, we can get by with an instruction sequence alone. We speak of
static procedures in this case.
Example 467 Some programming languages like C or Pascal are designed so that all procedures
can be represented as static procedures. SML and Java do not restrict themselves in this way.
We will now extend the virtual machine by four instructions that allow to represent static proce-
dures with arbitrary numbers of arguments. We will explain the meaning of these extensions via
an example: the procedure on the next slide, which computes 10
2
.
Adding (Static) Procedures
We have a full compiler for a very simple imperative programming language
Problem: No support for subroutines/procedures.
(no support for structured programming)
Extensions to the Virtual Machine
type index = int
type noi = int (* number of instructions *)
type noa = int (* number of arguments *)
type ca = int (* code address *)
datatype instruction =
| proc of noa*noi (* begin of procedure code *)
| arg of index (* push value from frame *)
| call of ca (* call procedure *)
| return (* return from procedure call *)
c : Michael Kohlhase 314
196
New Commands for L(VM)
Denition 468 proc a l contains information about the number a of arguments and
the length l of the procedure in the number of words needed to store it, together with
the length of proc a l itself (3).
Denition 469 arg i pushes the i
th
argument from the current frame to the stack.
Denition 470 call p pushes the current program address (opens a new frame), and
jumps to the program address p.
Denition 471 return takes the current frame from the stack, jumps to previous
program address.
c : Michael Kohlhase 315
Translation of a Static Procedure
Example 472
[proc 2 26, (* fun exp(x,n) = *)
con 0, arg 2, leq, cjp 5, (*
if n<=0 3 *)
con 1, return, (* then 1 *)
con 1, arg 2, sub, arg 1, (*
else x*exp(x,n-1) *)
call 0, arg 1, mul,
return, (* in *)
con 2, con 10, call 0, (*
exp(10,2) *)
halt] (* end *)
c : Michael Kohlhase 316
197
Static Procedures (Simulation)
Example 473
proc 2 26,
[con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
empty stack
proc jumps over the body of the procedure declaration
(with the help of its second argument.)
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, jp 13,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
We push the arguments onto the stack
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
call pushes the return address (of the call statement in the L(VM) program)
then it jumps to the rst body instruction.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
0
2
arg i pushes the i
th
argument onto the stack
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
0
Comparison turns out false, so we push 0.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
cjp pops the truth value and jumps (on false).
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
1
2
we rst push 1
then we push the second argument (from the call frame position 2)
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
1
we subtract
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
1
10
then we push the second argument (from the call frame position 1)
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
call jumps to the rst body instruction,
and pushes the return address (22 this time) onto the stack.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
0
1
we augment the stack
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
we compare the qtop two, and jump ahead (on false)
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
1
1
we augment the stack again
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
0
10
subtract and push the rst argument
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
10
22
0
-2
10
-1
22
0
call pushes the return address and moves the current frame up
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
10
22
0
-2
10
-1
22
0
0
0
we augment the stack again,
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
10
22
0
-2
10
-1
22
0
leq compares the top two numbers, cjp pops the result and does not jump.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
10
22
0
-2
10
-1
22
0
1
we push the result value 1
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
1
return interprets the top of the stack as the result,
it jumps to the return address memorized right below the top of the stack,
deletes the current frame
and puts the result back on top of the remaining stack.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
1
10
arg pushes the rst argument from the (new) current frame
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
10
32
1
-2
10
-1
22
0
10
mul multiplies, pops the arguments and pushes the result.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
10
return interprets the top of the stack as the result,
it jumps to the return address,
deletes the current frame
and puts the result back on top of the remaining stack.
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
2
-2
10
-1
32
0
100
we push argument 1 (in this case 10), multiply the top two numbers, and push the
result to the stack
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
100
return interprets the top of the stack as the result,
it jumps to the return address (32 this time),
deletes the current frame
and puts the result back on top of the remaining stack (which is empty here).
[proc 2 26,
con 0, arg 2, leq, cjp 5,
con 1, return,
con 1, arg 2, sub, arg 1,
call 0, arg 1, mul,
return,
con 2, con 10, call 0,
halt]
100
we are nally done; the result is on the top of the stack. Note that the stack below
has not changed.
c : Michael Kohlhase 317
198
What have we seen?
The four new VM commands allow us to model static procedures.
proc a l contains information about the number a of arguments and the length l of the
procedure
arg i pushes the i
th
argument from the current frame to the stack.
(Note that arguments are stored in reverse order on the stack)
call p pushes the current program address (opens a new frame), and jumps to the pro-
gram address p
return takes the current frame from the stack, jumps to previous program address.
(which is cached in the frame)
call and return jointly have the eect of replacing the arguments by the result of the
procedure.
c : Michael Kohlhase 318
We will now extend our implementation of the virtual machine by the new instructions.
Realizing Call Frames on the Stack
Problem: How do we know what the current frame is? (after all, return has to pop it)
Idea: Maintain another register: the frame pointer (FP), and cache information about
the previous frame and the number of arguments in the frame.
last argument -n
rst argument
-1
argument number
previous frame
return address
0
frame pointer
Add two internal cells to the frame, that are hidden to the outside. The upper one is
called the anchor cell.
In the anchor cell we store the stack address of the anchor cell of the previous frame.
The frame pointer points to the anchor cell of the uppermost frame.
c : Michael Kohlhase 319
199
Realizing proc
proc a l jumps over the procedure with the help of the length l of the procedure.
label instruction eect comment
proc) MOVE IN1 ACC ACC: = VPC
STORE 0 D(0): = ACC cache VPC
LOADIN 1 2 ACC: = D(VPC + 2) load length
ADD 0 ACC: = ACC +D(0) compute new VPC value
MOVE ACC IN1 IN1: = ACC update VPC
JUMP jt) jump back
c : Michael Kohlhase 320
Realizing arg
arg i pushes the i
th
argument from the current frame to the stack.
use the register IN3 for the frame pointer. (extend for rst frame)
label instruction eect comment
arg LOADIN 1 1 ACC: = D(VPC + 1) load i
STORE 0 D(0): = ACC cache i
MOVE IN3 ACC
STORE 1 D(1): = FP cache FP
SUBI 1
SUB 0 ACC: = FP 1 i load argument position
MOVE ACC IN3 FP: = ACC move it to FP
inc IN2 SP: = SP + 1 prepare push
LOADIN 3 0 ACC: = D(FP) load arg i
STOREIN 2 0 D(SP): = ACC push arg i
LOAD 1 ACC: = D(1) load FP
MOVE ACC IN3 FP: = ACC recover FP
MOVE IN1 ACC
ADDI 2
MOVE ACC IN1 VPC: = VPC + 2 next instruction
JUMP jt jump back
c : Michael Kohlhase 321
Realizing call
call p pushes the current program address, and jumps to the program address p
(pushes the internal cells rst!)
label instruction eect comment
call MOVE IN1 ACC
STORE 0 D(0): = IN1 cache current VPC
inc IN2 SP: = SP + 1 prepare push for later
LOADIN 1 1 ACC: = D(VPC + 1) load argument
ADDI 2
24
+ 3 ACC: = ACC +2
24
+ 3 add displacement and skip proc a l
MOVE ACC IN1 VPC: = ACC point to the rst instruction
LOADIN 1 2 ACC: = D(VPC 2) stealing a from proc a l
STOREIN 2 0 D(SP): = ACC push the number of arguments
inc IN2 SP: = SP + 1 prepare push
MOVE IN3 ACC ACC: = IN3 load FP
STOREIN 2 0 D(SP): = ACC create anchor cell
MOVE IN2 IN3 FP: = SP update FP
inc IN2 SP: = SP + 1 prepare push
LOAD 0 ACC: = D(0) load VPC
ADDI 2 ACC: = ACC +2 point to next instruction
STOREIN 2 0 D(SP): = ACC push the return address
JUMP jt jump back
c : Michael Kohlhase 322
200
Note that with these instructions we have maintained the linear quality. Thus the virtual machine
is still linear in the speed of the underlying register machine REMA.
Realizing return
return takes the current frame from the stack, jumps to previous program address.
(which is cached in the frame)
label instruction eect comment
return) LOADIN 2 0 ACC: = D(SP) load top value
STORE 0 D(0): = ACC cache it
LOADIN 2 1 ACC: = D(SP 1) load return address
MOVE ACC IN1 IN1: = ACC set VPC to it
LOADIN 3 1 ACC: = D(FP 1) load the number n of arguments
STORE 1 D(1): = D(FP 1) cache it
MOVE IN3 ACC ACC: = FP ACC = FP
SUBI 1 ACC: = ACC 1 ACC = FP 1
SUB 1 ACC: = ACC D(1) ACC = FP 1 n
MOVE ACC IN2 IN2: = ACC SP = ACC
LOADIN 3 0 ACC: = D(FP) load anchor value
MOVE ACC IN3 IN3: = ACC point to previous frame
LOAD 0 ACC: = D(0) load cached return value
STOREIN 2 0 D(IN2): = ACC pop return value
JUMP jt) jump back
c : Michael Kohlhase 323
Note that all the realizations of the L(VM) instructions are linear code segments in the assembler
code, so they can be executed in linear time. Thus the virtual machine language is only a constant
factor slower than the clock speed of REMA. This is a characteristic of most virtual machines.
13.2.3 Compiling Basic Functional Programs
We now have the prerequisites to model procedures calls in a programming language. Instead of
adding them to a imperative programming language, we will study them in the context of a func-
tional programming language. For this we choose a minimal core of the functional programming
language SML, which we will call ML. For this language, static procedures as we have seen them
above are enough.
ML, a very simple Functional Programming Language
Plan: Only consider the bare-bones core of a language (we only interested in principles)
We will call this language ML (micro ML)
no types: all values have type int, use 0 for false all other numbers for true.
only worry about abstract syntax (we do not want to build a parser) We will realize
this as an SML data type.
c : Michael Kohlhase 324
201
Abstract Syntax of ML
type id = string (* identifier
*)
datatype exp = (* expression
*)
Con of int (* constant
*)
| Id of id (* argument
*)
| Add of exp * exp (* addition
*)
| Sub of exp * exp (* subtraction
*)
| Mul of exp * exp (* multiplication
*)
| Leq of exp * exp (* less or equal test *)
| App of id * exp list (* application
*)
| If of exp * exp * exp (* conditional
*)
type declaration = id * id list * exp
type program = declaration list * exp
c : Michael Kohlhase 325
Concrete vs. Abstract Syntax of ML
A ML program rst declares procedures, then evaluates expression for the return value.
let
fun exp(x,n) =
if n<=0
then 1
else x*exp(x,n-1)
in
exp(2,10)
end
([
("exp", ["x", "n"],
If(Leq(Id"n", Con 0),
Con 1,
Mul(Id"x", App("exp", [Id"x", Sub(Id"n", Con 1)]))))
],
App("exp", [Con 2, Con 10])
)
c : Michael Kohlhase 326
The next step is to build a compiler for ML into programs in the extended L(VM). Just as above,
we will write this compiler in SML.
202
Compiling ML Expressions
exception Error of string
datatype idType = Arg of index | Proc of ca
type env = idType env
fun compileE (e:exp, env:env, tail:code) : code =
case e of
Con i => [con i] @ tail
| Id i => [arg((lookupA(i,env)))] @ tail
| Add(e1,e2) => compileEs([e1,e2], env) @ [add] @ tail
| Sub(e1,e2) => compileEs([e1,e2], env) @ [sub] @ tail
| Mul(e1,e2) => compileEs([e1,e2], env) @ [mul] @ tail
| Leq(e1,e2) => compileEs([e1,e2], env) @ [leq] @ tail
| If(e1,e2,e3) => let
val c1 = compileE(e1,env,nil)
val c2 = compileE(e2,env,tail)
val c3 = compileE(e3,env,tail)
in if null tail
then c1 @ [cjp (4+wlen c2)] @ c2
@ [jp (2+wlen c3)] @ c3
else c1 @ [cjp (2+wlen c2)] @ c2 @ c3
end
| App(i, es) => compileEs(es,env) @ [call (lookupP(i,env))] @ tail
c : Michael Kohlhase 327
Compiling ML Expressions (Continued)
and (* mutual recursion with compileE *)
fun compileEs (es : exp list, env:env) : code =
foldl (fn (e,c) => compileE(e, env, nil) @ c) nil es
fun lookupA (i,env) =
case lookup(i,env) of
Arg i => i
| _ => raise Error("Argumentexpected:" \^ i)
fun lookupP (i,env) =
case lookup(i,env) of
Proc ca => ca
| _ => raise Error("Procedureexpected:" \^ i)
c : Michael Kohlhase 328
203
Compiling ML Expressions (Continued)
fun insertArgs (i, (env, ai)) = (insert(i,Arg ai,env), ai+1)
fun insertArgs (is, env) = (foldl insertArgs (env,1) is)
fun compileD (ds: declaration list, env:env, ca:ca) : code*env =
case ds of
nil => (nil,env)
| (i,is,e)::dr =>
let
val env = insert(i, Proc(ca+1), env)
val env = insertArgs(is, env)
val ce = compileE(e, env, [return])
val cd = [proc (length is, 3+wlen ce)] @ ce
(* 3+wlen ce = wlen cd *)
val (cdr,env) = compileD(dr, env, ca + wlen cd)
in
(cd @ cdr, env)
end
c : Michael Kohlhase 329
Compiling ML
fun compile ((ds,e) : program) : code =
let
val (cds,env) = compileD(ds, empty, ~1)
in
cds @ compileE(e,env,nil) @ [halt]
end
handle
Unbound i => raise Error("Unboundidentifier:" \^ i)
c : Michael Kohlhase 330
Where To Go Now?
We have completed a ML compiler, which generates L(VM) code from ML programs.
ML is minimal, but Turing-Complete (has conditionals and procedures)
c : Michael Kohlhase 331
13.3 A theoretical View on Computation
Now that we have seen a couple of models of computation, computing machines, programs, . . . ,
we should pause a moment and see what we have achieved.
204
What have we achieved
what have we done? We have sketched
a concrete machine model (combinatory circuits)
a concrete algorithm model (assembler programs)
Evaluation: (is this good?)
how does it compare with SML on a laptop?
Can we compute all (string/numerical) functions in this model?
Can we always prove that our programs do the right thing?
Towards Theoretical Computer Science (as a tool to answer these)
look at a much simpler (but less concrete) machine model (Turing Machine)
show that TM can [encode/be encoded in] SML, assembler, Java,. . .
Conjecture 474 [Church/Turing] (unprovable, but accepted)
All non-trivial machine models and programming languages are equivalent
c : Michael Kohlhase 332
The idea we are going to pursue here is a very fundamental one for Computer Science: The Turing
Machine. The main idea here is that we want to explore what the simplest (whatever that
may mean) computing machine could be. The answer is quite surprising, we do not need wires,
electricity, silicon, etc; we only need a very simple machine that can write and read to a tape
following a simple set of rules.
Of course such machines can be built (and have been), but this is not the important aspect here.
Turing machines are mainly used for thought experiments, where we simulate them in our heads.
Note that the physical realization of the machine as a box with a (paper) tape is immaterial, it
is inspired by the technology at the time of its inception (in the late 1940ties; the age of ticker-tape
commuincation).
205
Turing Machines
Idea: Simulate a machine by a person executing a well-dened procedure!
Setup: Person changes the contents of an innite amount of ordered paper sheets that
can contain one of a nite set of symbols.
Memory: The person needs to remember one of a nite set of states
Procedure: If your state is 42 and the symbol you see is a 0 then replace this with a
1, remember the state 17, and go to the following sheet.
c : Michael Kohlhase 333
Coded description
of some TM
Input for TM
Loopdetector
Turing Machine
"yes, it will halt"
"no, it will not halt"
Proof:
P.1 lets do the argument with SML instead of a TM
assume that there is a loop detector program written in SML
"yes, it will halt"
"no, it will not halt"
SML Program
Loopdetector
SML Program
Input for Program
c : Michael Kohlhase 338
208
Testing the Loop Detector Program Proof:
P.1 The general shape of the Loop detector program
fun will_halt(program,data) =
... lots of complicated code ...
if ( ... more code ...) then true else false;
will_halt : (int -> int) -> int -> bool
test programs behave exactly as we anticipated
fun halter (n) = 1;
halter : int -> int
fun looper (n) = looper(n+1);
looper : int -> int
will_halt(halter,1);
val true : bool
will_halt(looper,1);
val false : bool
P.2 Consider the following Program
function turing (prog) = if will_halt(prog,prog) then looper(1) else 1;
P.3 Yeah, so what? what happens, if we feed the turing function to itself?
c : Michael Kohlhase 339
What happens indeed? Proof:
P.1 function turing (prog) = if will\_halt(prog,prog) then looper(1) else 1;
the turing function uses will_halt to analyze the function given to it.
If the function halts when fed itself as data, the turing function goes into an innite
loop.
If the function goes into an innite loop when fed itself as data, the turing function
immediately halts.
P.2 But if the function happens to be the turing function itself, then
the turing function goes into an innite loop if the turing function halts
(when fed itself as input)
the turing function halts if the turing function goes into an innite loop
(when fed itself as input)
P.3 This is a blatant logical contradiction! Thus there cannot be a will_halt function
c : Michael Kohlhase 340
209
Universal Turing machines
Note: A Turing machine computes a xed partial string function.
In that sense it behaves like a computer with a xed program.
Idea: we can encode the action table of any Turing machine in a string.
try to construct a Turing machine that expects on its tape
a string describing an action table followed by
a string describing the input tape, and then
computes the tape that the encoded Turing machine would have computed.
Theorem 479 such a Turing machine is indeed possible(e.g. with 2 states, 18 symbols)
Denition 480 call it a universal Turing machine. (it can simulate any TM)
UTM accepts a coded description of a Turing machine and simulates the behavior of
the machine on the input data.
The coded description acts as a program that the UTM executes, the UTMs own
internal program is xed.
The existence of the UTM is what makes computers fundamentally dierent from other
machines such as telephones, CD players, VCRs, refrigerators, toaster-ovens, or cars.
c : Michael Kohlhase 341
14 Problem Solving and Search
14.1 Problem Solving
In this section, we will look at a class of algorithms called search algorithms. These are algorithms
that help in quite general situations, where there is a precisely described problem, that needs to
be solved.
Before we come to the algorithms, we need to get a grip on the problems themselves, and the
problem solving process.
The rst step is to classify the problem solving process by the amount of knowledge we have
available. It makes a dierence, whether we know all the factors involved in the problem before
we actually are in the situation. In this case, we can solve the problem in the abstract, i.e. make
a plan before we actually enter the situation (i.e. oine), and then when the problem arises, only
execute the plan. If we do not have complete knowledge, then we can only make partial plans, and
210
have to be in the situation to obtain new knowledge (e.g. by observing the eects of our actions or
the actions of others). As this is much more dicult we will restrict ourselves to oine problem
solving.
Problem solving
Problem: Find algorithms that help solving problems in general
Idea: If we can describe/represent problems in a standardized way, we may have a chance
to nd general algorithms.
We will use the following two concepts to describe problems
States A set of possible situations in in our problem domain
Actions A set of possible actions that get us from one state to another.
Using these, we can view a sequence of actions as a solution, if it brings us into a situation,
where the problem is solved.
Denition 481 Oine problem solving: Acting only with complete knowledge of prob-
lem and solution
Denition 482 Online problem solving: Acting without complete knowledge
Here: we are concerned with oine problem solving only.
c : Michael Kohlhase 342
We will use the following problem as a running example. It is simple enough to t on one slide
and complex enough to show the relevant features of the problem solving algorithms we want to
talk about.
Example: Traveling in Romania
Scenario: On holiday in Romania; currently in Arad, Flight leaves tomorrow from
Bucharest.
Formulate problem: States: various cities Actions: drive between cities
Solution: Appropriate sequence of cities, e.g.: Arad, Sibiu, Fagaras, Bucharest
c : Michael Kohlhase 343
211
Problem Formulation
The problem formulation models the situation at an appropriate level of abstraction.
(do not model things like put on my left sock, etc.)
it describes the initial state (we are in Arad)
it also limits the objectives. (excludes, e.g. to stay another couple of weeks.)
Finding the right level of abstraction and the required (not more!) information is often
the key to success.
Denition 483 A problem (formulation) T := o, O, J, ( consists of a set o of states
and a set O of operators that specify how states can be accessed from each other. Certain
states in o are designated as goal states (( o) and there is a unique initial state J.
Denition 484 A solution for a problem T consists of a sequence of actions that bring
us from J to a goal state.
c : Michael Kohlhase 344
Problem types
Single-state problem
observable (at least the initial state)
deterministic (i.e. the successor of each state is determined)
static (states do not change other than by our own actions)
discrete (a countable number of states)
Multiple-state problem:
initial state not/partially observable (multiple initial states?)
deterministic, static, discrete
Contingency problem:
non-deterministic (solution can branch, depending on contingencies)
unknown state space (like a baby, agent has to learn about states and actions)
c : Michael Kohlhase 345
We will explain these problem types with another example. The problem T is very simple: We
have a vacuum cleaner and two rooms. The vacuum cleaner is in one room at a time. The oor
can be dirty or clean.
The possible states are determined by the position of the vacuum cleaner and the information,
whether each room is dirty or not. Obviously, there are eight states: o = 1, 2, 3, 4, 5, 6, 7, 8 for
simplicity.
The goal is to have both rooms clean, the vacuum cleaner can be anywhere. So the set ( of
goal states is 7, 8. In the single-state version of the problem, [right, suck] shortest solution, but
[suck, right, suck] is also one. In the multiple-state version we have [right(2, 4, 6, 8), suck(4, 8), left(3, 7), suck(7)].
212
Example: vacuum-cleaner world
Single-state Problem:
Start in 5
Solution: [right, suck]
Multiple-state Problem:
Start in 1, 2, 3, 4, 5, 6, 7, 8
Solution: [right, suck, left, suck] right 2, 4, 6, 8
suck 4, 8
left 3, 7
suck 7
c : Michael Kohlhase 346
Example: vacuum-cleaner world (continued)
Contingency Problem:
Murphys Law: suck can dirty a clean carpet
Local sensing: dirty / notdirty at location only
Start in: 1, 3
Solution: [suck, right, suck] suck 5, 7
right 6, 8
suck 6, 8
better: [suck, right, if dirt then suck] (decide whether in 6 or 8 using local sensing)
c : Michael Kohlhase 347
213
In the contingency version of T a solution is the following: [suck(5, 7), right (6, 8), suck (6, 8)],
[suck(5, 7)], etc. Of course, local sensing can help: narrow 6, 8 to 6 or 8, if we are in the
rst, then suck.
Single-state problem formulation
Dened by the following four items
1. Initial state: (e.g. Arad)
2. Successor function S: (e.g. S(Arad) = goZer, Zerind, goSib, Sibiu, . . .)
3. Goal test: (e.g. x = Bucharest (explicit test)
noDirt(x) (implicit test)
)
4. Path cost (optional): (e.g. sum of distances, number of operators executed, etc.)
Solution: A sequence of operators leading from the initial state to a goal state
c : Michael Kohlhase 348
Path cost: There may be more than one solution and we might want to have the best one in
a certain sense.
Selecting a state space
Abstraction: Real world is absurdly complex
State space must be abstracted for problem solving
(Abstract) state: Set of real states
(Abstract) operator: Complex combination of real actions
Example: Arad Zerind represents complex set of possible routes
(Abstract) solution: Set of real paths that are solutions in the real world
c : Michael Kohlhase 349
State: e.g., we dont care about tourist attractions found in the cities along the way. But this is
problem dependent. In a dierent problem it may well be appropriate to include such information
in the notion of state.
Realizability: one could also say that the abstraction must be sound wrt. reality.
214
Example: The 8-puzzle
States integer locations of tiles
Actions left, right, up, down
Goal test = goal state?
Path cost 1 per move
c : Michael Kohlhase 350
How many states are there? N factorial, so it is not obvious that the problem is in NP. One
needs to show, for example, that polynomial length solutions do always exist. Can be done by
combinatorial arguments on state space graph (really ?).
Example: Vacuum-cleaner
States integer dirt and robot locations
Actions left, right, suck, noOp
Goal test notdirty?
Path cost 1 per operation (0 for noOp)
c : Michael Kohlhase 351
215
Example: Robotic assembly
States real-valued coordinates of
robot joint angles and parts of the object to be assembled
Actions continuous motions of robot joints
Goal test assembly complete?
Path cost time to execute
c : Michael Kohlhase 352
14.2 Search
Tree search algorithms
Simulated exploration of state space in a search tree by generating successors of already-
explored states (Oine Algorithm)
procedure Tree-Search (problem, strategy) : <a solution or failure>
<initialize the search tree using the initial state of problem>
loop
if <there are no candidates for expansion> <return failure> end if
<choose a leaf node for expansion according to strategy>
if <the node contains a goal state> return <the corresponding solution>
else <expand the node and add the resulting nodes to the search tree>
end if
end loop
end procedure
c : Michael Kohlhase 353
Tree Search: Example
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj Oradea Arad
c : Michael Kohlhase 354
216
Tree Search: Example
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj Oradea Arad
c : Michael Kohlhase 355
Tree Search: Example
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj Oradea Arad
c : Michael Kohlhase 356
Tree Search: Example
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj Oradea Arad
c : Michael Kohlhase 357
Implementation: States vs. nodes
A (representation of) a physical conguration
A data structure constituting part of a search tree
(includes parent, children, depth, path cost, etc.)
c : Michael Kohlhase 358
217
Implementation of search algorithms
procedure Tree_Search (problem,strategy)
fringe := insert(make_node(initial_state(problem)))
loop
if fringe <is empty> fail end if
node := first(fringe,stratety)
if NodeTest(State(node)) return State(node)
else fringe := insert_all(expand(node,problem),strategy)
end if
end loop
end procedure
Denition 485 The fringe is a list nodes not yet considered. It is ordered by the search
strategy (see below)
c : Michael Kohlhase 359
State gives the state that is represented by node
Expand = creates new nodes by applying possible actions to node
A node is a data structure representing states, will be explained in a moment.
Make-Queue creates a queue with the given elements.
fringe holds the queue of nodes not yet considered.
Remove-First returns rst element of queue and as a side eect removes it from fringe.
State gives the state that is represented by node.
Expand applies all operators of the problem to the current node and yields a set of new nodes.
Insert inserts an element into the current fringe queue. This can change the behavior of the
search.
Insert-All Perform Insert on set of elements.
Search strategies
Strategy: Denes the order of node expansion
Important properties of strategies:
completeness does it always nd a solution if one exists?
time complexity number of nodes generated/expanded
space complexity maximum number of nodes in memory
optimality does it always nd a least-cost solution?
Time and space complexity measured in terms of:
b maximum branching factor of the search tree
d depth of a solution with minimal distance to root
m maximum depth of the state space (may be )
c : Michael Kohlhase 360
Complexity means here always worst-case complexity.
Note that there can be innite branches, see the search tree for Romania.
14.3 Uninformed Search Strategies
218
Uninformed search strategies
Denition 486 (Uninformed search) Use only the information available in the
problem denition
Frequently used strategies:
Breadth-rst search
Uniform-cost search
Depth-rst search
Depth-limited search
Iterative deepening search
c : Michael Kohlhase 361
The opposite of uninformed search is informed or heuristic search. In the example, one could add,
for instance, to prefer cities that lie in the general direction of the goal (here SE).
Uninformed search is important, because many problems do not allow to extract good heuris-
tics.
Breadth-rst search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 362
219
Breadth-First Search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 363
Breadth-First Search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 364
Breadth-First Search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 365
220
Breadth-First Search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 366
Breadth-First Search
Idea: Expand shallowest unexpanded node
Implementation: fringe is a FIFO queue, i.e. successors go in at the end of the queue
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 367
We will now apply the breadth-rst search strategy to our running example: Traveling in Romania.
Note that we leave out the green dashed nodes that allow us a preview over what the search tree
will look like (if expanded). This gives a much
Breadth-First Search: Romania
Arad
c : Michael Kohlhase 368
221
Breadth-First Search: Romania
Arad
Sibiu Timisoara Zerind
c : Michael Kohlhase 369
Breadth-First Search: Romania
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea
c : Michael Kohlhase 370
Breadth-First Search:Romania
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj
c : Michael Kohlhase 371
Breadth-First Search:Romania
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea Arad Lugoj Oradea Arad
c : Michael Kohlhase 372
222
Breadth-rst search: Properties
Complete Yes (if b is nite)
Time 1 +b + (b
2
) + (b
3
) +. . . + (b
d
) +b((b
d
) 1) O(b
d+1
)
i.e. exponential in d
Space O(b
d+1
) (keeps every node in memory)
Optimal Yes (if cost = 1 per step), not optimal in general
Disadvantage: Space is the big problem(can easily generate nodes at 5MB/sec so 24hrs = 430GB)
Optimal?: if cost varies for dierent steps, there might be better solutions below the
level of the rst solution.
An alternative is to generate all solutions and then pick an optimal one. This works only,
if m is nite.
c : Michael Kohlhase 373
The next idea is to let cost drive the search. For this, we will need a non-trivial cost function: we
will take the distance between cities, since this is very natural. Alternatives would be the driving
time, train ticket cost, or the number of tourist attractions along the way.
Of course we need to update our problem formulation with the necessary information.
Romania with Step Costs as Distances
c : Michael Kohlhase 374
Uniform-cost search
Idea: Expand least-cost unexpanded node
Implementation: fringe is queue ordered by increasing path cost.
Note: Equivalent to breadth-rst search if all step costs are equal (DFS: see below)
Arad
c : Michael Kohlhase 375
223
Uniform Cost Search: Romania
Idea: Expand least-cost unexpanded node
Implementation: fringe is queue ordered by increasing path cost.
Note: Equivalent to breadth-rst search if all step costs are equal (DFS: see below)
Arad
Sibiu
140
Timisoara
118
Zerind
75
c : Michael Kohlhase 376
Uniform Cost Search: Romania
Idea: Expand least-cost unexpanded node
Implementation: fringe is queue ordered by increasing path cost.
Note: Equivalent to breadth-rst search if all step costs are equal (DFS: see below)
Arad
Sibiu
140
Timisoara
118
Zerind
75
Oradea
71
Arad
75
c : Michael Kohlhase 377
Uniform Cost Search: Romania
Idea: Expand least-cost unexpanded node
Implementation: fringe is queue ordered by increasing path cost.
Note: Equivalent to breadth-rst search if all step costs are equal (DFS: see below)
Arad
Sibiu
140
Timisoara
118
Zerind
75
Arad
118
Lugoj
111
Oradea
71
Arad
75
c : Michael Kohlhase 378
224
Uniform Cost Search: Romania
Idea: Expand least-cost unexpanded node
Implementation: fringe is queue ordered by increasing path cost.
Note: Equivalent to breadth-rst search if all step costs are equal (DFS: see below)
Arad
Sibiu
140
Timisoara
118
Zerind
75
Arad
140
Fagaras
99
Oradea
151
R. Vilcea
80
Arad
118
Lugoj
111
Oradea
71
Arad
75
c : Michael Kohlhase 379
Note that we must sum the distances to each leaf. That is, we go back to the rst level after step
3.
Uniform-cost search: Properties
Complete Yes (if step costs > 0)
Time number of nodes with past-cost less than that of optimal solution
Space number of nodes with past-cost less than that of optimal solution
Optimal Yes
c : Michael Kohlhase 380
If step cost is negative, the same situation as in breadth-rst search can occur: later solutions may
be cheaper than the current one.
If step cost is 0, one can run into innite branches. UC search then degenerates into depth-rst
search, the next kind of search algorithm. Even if we have innite branches, where the sum of
step costs converges, we can get into trouble
23
EdNote:23
Worst case is often worse than BF search, because large trees with small steps tend to be
searched rst. If step costs are uniform, it degenerates to BF search.
Depth-rst search
Idea: Expand deepest unexpanded node
Implementation: fringe is a LIFO queue (a stack), i.e. successors go in at front of queue
Note: Depth-rst search can perform innite cyclic excursions
Need a nite, non-cyclic search space (or repeated-state checking)
c : Michael Kohlhase 381
23
EdNote: say how
225
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 382
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 383
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 384
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 385
226
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 386
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 387
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 388
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 389
227
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 390
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 391
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 392
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 393
228
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 394
Depth-First Search
A
B C
D E F G
H I J K L M N O
c : Michael Kohlhase 395
Depth-First Search: Romania
Arad
c : Michael Kohlhase 396
Depth-First Search: Romania
Arad
Sibiu Timisoara Zerind
c : Michael Kohlhase 397
Depth-First Search: Romania
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea
c : Michael Kohlhase 398
229
Depth-First Search: Romania
Arad
Sibiu Timisoara Zerind
Arad Fagaras Oradea R. Vilcea
Sibiu Timisoara Zerind
c : Michael Kohlhase 399
Depth-rst search: Properties
Complete Yes: if state space nite
No: if state contains innite paths or loops
Time O(b
m
)
(we need to explore until max depth m in any case!)
Space O(b m) (i.e. linear space)
(need at most store m levels and at each level at most b nodes)
Optimal No (there can be many better solutions in the
unexplored part of the search tree)
Disadvantage: Time terrible if m much larger than d.
Advantage: Time may be much less than breadth-rst search if solutions are dense.
c : Michael Kohlhase 400
Iterative deepening search
Depth-limited search: Depth-rst search with depth limit
Iterative deepening search: Depth-limit search with ever increasing limits
procedure Tree_Search (problem)
<initialize the search tree using the initial state of problem>
for depth = 0 to
result := Depth_Limited_search(problem,depth)
if depth ,= cutoff return result end if
end for
end procedure
c : Michael Kohlhase 401
Iterative Deepening Search at Limit Depth 0
A A
c : Michael Kohlhase 402
230
Iterative Deepening Search at Limit Depth 1
A
B C
A
B C
A
B C
A
B C
c : Michael Kohlhase 403
Iterative Deepening Search at Limit Depth 2
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
c : Michael Kohlhase 404
Iterative Deepening Search at Limit Depth 3
c : Michael Kohlhase 405
231
Iterative deepening search: Properties
Complete Yes
Time (d + 1)(b
0
) +d(b
1
) + (d 1)(b
2
) +. . . + (b
d
) O(b
d+1
)
Space O(bd)
Optimal Yes (if step cost = 1)
(Depth-First) Iterative-Deepening Search often used in practice for search spaces of large,
innite, or unknown depth.
Comparison:
Criterion
Breadth-
rst
Uniform-
cost
Depth-
rst
Iterative
deepening
Complete? Yes
Yes
No Yes
Time b
d+1
b
d
b
m
b
d
Space b
d+1
b
d
bm bd
Optimal? Yes
Yes No Yes
c : Michael Kohlhase 406
Note: To nd a solution (at depth d) we have to search the whole tree up to d. Of course since we
do not save the search state, we have to re-compute the upper part of the tree for the next level.
This seems like a great waste of resources at rst, however, iterative deepening search tries to be
complete without the space penalties.
However, the space complexity is as good as depth-rst search, since we are using depth-rst
search along the way. Like in breadth-rst search, the whole tree on level d (of optimal solution)
is explored, so optimality is inherited from there. Like breadth-rst search, one can modify this
to incorporate uniform cost search.
As a consequence, variants of iterative deepening search are the method of choice if we do not
have additional information.
Comparison
Breadth-rst search Iterative deepening search
c : Michael Kohlhase 407
14.4 Informed Search Strategies
232
Summary: Uninformed Search/Informed Search
Problem formulation usually requires abstracting away real-world details to dene a state
space that can feasibly be explored
Variety of uninformed search strategies
Iterative deepening search uses only linear space and not much more time than other
uninformed algorithms
Next Step: Introduce additional knowledge about the problem (informed search)
Best-rst-, A
search
c : Michael Kohlhase 409
This is like UCS, but with evaluation function related to problem at hand replacing the path cost
function.
If the heuristics is arbitrary, we expect incompleteness!
Depends on how we measure desirability.
Concrete examples follow.
Romania with step costs in km
c : Michael Kohlhase 410
233
Greedy search
Denition 487 A heuristic is an evaluation function h on nodes that estimates of cost
from n to the nearest goal state.
Idea: Greedy search expands the node that appears to be closest to goal
Example 488 h
SLD
(n) = straight-line distance from n to Bucharest
Note: Unlike uniform-cost search the node evaluation function has nothing to do with
the nodes explored so far
internal search control external search control
partial solution cost goal cost estimation
c : Michael Kohlhase 411
In greedy search we replace the objective cost to construct the current solution with a heuristic or
subjective measure from which we think it gives a good idea how far we are from a solution. Two
things have shifted:
we went from internal (determined only by features inherent in the search space) to an
external/heuristic cost
instead of measuring the cost to build the current partial solution, we estimate how far we
are from the desired goal
Greedy Search: Romania
Arad
366
c : Michael Kohlhase 412
Greedy Search: Romania
Arad
366
Sibiu
253
Timisoara
329
Zerind
374
c : Michael Kohlhase 413
Greedy Search: Romania
Arad
366
Sibiu
253
Timisoara
329
Zerind
374
Arad
366
Fagaras
176
Oradea
380
R. Vilcea
193
c : Michael Kohlhase 414
234
Greedy Search: Romania
Arad
366
Sibiu
253
Timisoara
329
Zerind
374
Arad
366
Fagaras
176
Oradea
380
R. Vilcea
193
Sibiu
253
Bucharest
0
c : Michael Kohlhase 415
Greedy search: Properties
Complete No: Can get stuck in loops
Complete in nite space with repeated-state checking
Time O(b
m
)
Space O(b
m
)
Optimal No
Example 489 Greedy search can get stuck going from Iasi to Oradea:
Iasi Neamt Iasi Neamt
Worst-case time same as depth-rst search,
Worst-case space same as breadth-rst
But a good heuristic can give dramatic improvement
c : Michael Kohlhase 416
Greedy Search is similar to UCS. Unlike the latter, the node evaluation function has nothing to
do with the nodes explored so far. This can prevent nodes from being enumerated systematically
as they are in UCS and BFS.
For completeness, we need repeated state checking as the example shows. This enforces complete
enumeration of state space (provided that it is nite), and thus gives us completeness.
Note that nothing prevents from all nodes nodes being searched in worst case; e.g. if the
heuristic function gives us the same (low) estimate on all nodes except where the heuristic mis-
estimates the distance to be high. So in the worst case, greedy search is even worse than BFS,
where d (depth of rst solution) replaces m.
The search procedure cannot be optional, since actual cost of solution is not considered.
For both, completeness and optimality, therefore, it is necessary to take the actual cost of
partial solutions, i.e. the path cost, into account. This way, paths that are known to be expensive
are avoided.
235
A
search
Idea: Avoid expanding paths that are already expensive (make use of actual cost)
The simplest way to combine heuristic and path cost is to simply add them.
Denition 490 The evaluation function for A
search: Admissibility
Denition 492 (Admissibility of heuristic) h(n) is called admissible if (0
h(n) h
Search: Admissibility
Theorem 494 A
P.1 Suppose a suboptimal goal G has been generated then we are in the following situ-
ation:
start
n
O G
P.2 Let n be an unexpanded node on a path to an optimal goal O, then
f(G) = g(G) since h(G) = 0
g(G) > g(O) since G suboptimal
g(O) = g(n) +h
Search Example
Arad
366=0+366
c : Michael Kohlhase 420
A
Search Example
Arad
Sibiu
393=140+253
Timisoara
447=118+329
Zerind
449=75+374
c : Michael Kohlhase 421
A
Search Example
Arad
Sibiu Timisoara
447=118+329
Zerind
449=75+374
Arad
646=280+366
Fagaras
415=239+176
Oradea
671=291+380
R. Vilcea
413=220+193
c : Michael Kohlhase 422
A
Search Example
Arad
Sibiu Timisoara
447=118+329
Zerind
449=75+374
Arad
646=280+366
Fagaras
415=239+176
Oradea
671=291+380
R. Vilcea
Craiova
526=366+160
Pitesti
417=317+100
Sibiu
553=300+253
c : Michael Kohlhase 423
237
A
Search Example
Arad
Sibiu Timisoara
447=118+329
Zerind
449=75+374
Arad
646=280+366
Fagaras Oradea
671=291+380
R. Vilcea
Craiova
526=366+160
Pitesti
417=317+100
Sibiu
553=300+253
Sibiu
591=338+253
Bucharest
450=450+0
c : Michael Kohlhase 424
A
Search Example
Arad
Sibiu Timisoara
447=118+329
Zerind
449=75+374
Arad
646=280+366
Fagaras Oradea
671=291+380
R. Vilcea
Craiova
526=366+160
Pitesti Sibiu
553=300+253
Sibiu
591=338+253
Bucharest
450=450+0
Bucharest
418=418+0
Craiova
615=455+160
Sibiu
607=414+193
c : Michael Kohlhase 425
A
search: f-contours
A
search: Properties
Complete Yes (unless there are innitely many nodes n with f(n) f(0))
Time Exponential in [relative error in h length of solution]
Space Same as time (variant of BFS)
Optimal Yes
A
(n)
The run-time depends on how good we approximated the real cost h
with h.
c : Michael Kohlhase 427
Since the availability of admissible heuristics is so important for informed search (particularly for
A
), let us see how such heuristics can be obtained in practice. We will look at an example, and
then derive a general procedure from that.
Admissible heuristics: Example 8-puzzle
Example 495 Let h
1
(n) be the number of misplaced tiles in node n (h
1
(S) = 6)
Example 496 Let h
2
(n) be the total manhattan distance from desired location of each
tile. (h
2
(S) = 2 + 0 + 3 + 1 + 0 + 1 + 3 + 4 = 14)
Observation 497 (Typical search costs) (IDS = iterative deepening search)
nodes explored IDS A
(h
1
) A
(h
2
)
d = 14 3,473,941 539 113
d = 24 too many 39,135 1,641
c : Michael Kohlhase 428
Dominance
Denition 498 Let h
1
and h
2
be two admissible heuristics we say that h
2
dominates
h
1
if h
2
(n) h
1
(n) or all n.
Theorem 499 If h
2
dominates h
1
, then h
2
is better for search than h
1
.
c : Michael Kohlhase 429
239
Relaxed problems
Finding good admissible heuristics is an art!
Idea: Admissible heuristics can be derived from the exact solution cost of a relaxed
version of the problem.
Example 500 If the rules of the 8-puzzle are relaxed so that a tile can move anywhere,
then we get heuristic h
1
.
Example 501 If the rules are relaxed so that a tile can move to any adjacent square,
then we get heuristic h
2
.
Key point: The optimal solution cost of a relaxed problem is not greater than the optimal
solution cost of the real problem
c : Michael Kohlhase 430
Relaxation means to remove some of the constraints or requirements of the original problem, so
that a solution becomes easy to nd. Then the cost of this easy solution can be used as an
optimistic approximation of the problem.
14.5 Local Search
Local Search Problems
Idea: Sometimes the path to the solution is irrelevant
Example 502 (8 Queens Problem) Place 8 queens on a chess board, so that no
two queens threaten each other.
This problem has various solutions, e.g. the one on the right
Denition 503 A local search algorithm is a search algorithm that operates on a single
state, the current state (rather than multiple paths). (advantage: constant space)
Typically local search algorithms only move to successors of the current state, and do
not retain search paths.
Applications include: integrated circuit design, factory-oor layout, job-shop scheduling,
portfolio management, eet deployment,. . .
c : Michael Kohlhase 431
240
Local Search: Iterative improvement algorithms
Denition 504 (Traveling Salesman Problem) Find shortest trip through set of
cities such that each city is visited exactly once.
Idea: Start with any complete tour, perform pairwise exchanges
Denition 505 (n-queens problem) Put n queens on nn board such that no two
queens in the same row, columns, or diagonal.
Idea: Move a queen to reduce number of conicts
c : Michael Kohlhase 432
Hill-climbing (gradient ascent/descent)
Idea: Start anywhere and go in the direction of the steepest ascent.
Depth-rst search with heuristic and w/o memory
procedure Hill-Climbing (problem) (* a state that is a local minimum *)
local current, neighbor (* nodes *)
current := Make-Node(Initial-State[problem])
loop
neighbor := <a highestvalued successor of current>
if Value[neighbor] < Value[current]
return [current]
current := neighbor
end if
end loop
end procedure
Like starting anywhere in search tree and making a heuristically guided DFS.
Works, if solutions are dense and local maxima can be escaped.
c : Michael Kohlhase 433
In order to understand the procedure on a more intuitive level, let us consider the following
scenario: We are in a dark landscape (or we are blind), and we want to nd the highest hill. The
search procedure above tells us to start our search anywhere, and for every step rst feel around,
and then take a step into the direction with the steepest ascent. If we reach a place, where the
next step would take us down, we are nished.
Of course, this will only get us into local maxima, and has no guarantee of getting us into
global ones (remember, we are blind). The solution to this problem is to re-start the search at
random (we do not have any information) places, and hope that one of the random jumps will get
us to a slope that leads to a global maximum.
241
Example Hill-Climbing with 8 Queens
Idea: Heuristic function h is number of queens that threaten each other.
Example 506 An 8-queens state with heuristic cost estimate h = 17 showing h-values
for moving a queen within its column
Problem: The state space has local minima. e.g. the board on the right has h = 1 but
every successor has h > 1.
c : Michael Kohlhase 434
Hill-climbing
Problem: Depending on initial state, can get stuck on local maxima/minima and plateaux
Hill-climbing search is like climbing Everest in thick fog with amnesia
Idea: Escape local maxima by allowing some bad or random moves.
Example 507 local search, simulated annealing. . .
Properties: All are incomplete, non-optimal.
Sometimes performs well in practice (if (optimal) solutions are dense)
c : Michael Kohlhase 435
Recent work on hill-climbing algorithms tries to combine complete search with randomization to
escape certain odd phenomena occurring in statistical distribution of solutions.
242
Simulated annealing (Idea)
Denition 508 Ridges are ascending successions of local maxima
Problem: They are extremely dicult to navigate for local search algorithms
Idea: Escape local maxima by allowing some bad moves, but gradually decrease their
size and frequency
Annealing is the process of heating steel and let it cool gradually to give it time to grow
an optimal cristal structure.
Simulated Annealing is like shaking a ping-pong ball occasionally on a bumpy surface to
free it. (so it does not get stuck)
Devised by Metropolis et al., 1953, for physical process modelling
Widely used in VLSI layout, airline scheduling, etc.
c : Michael Kohlhase 436
Simulated annealing (Implementation)
procedure Simulated-Annealing (problem,schedule) (* a solution state *)
local node, next (* nodes*)
local T (*a temperature controlling prob.~of downward steps *)
current := Make-Node(Initial-State[problem])
for t :=1 to
T := schedule[t]
if T = 0 return current end if
next := <a randomly selected successor of current>
(E) := Value[next]-Value[current]
if (E) > 0 current := next
else
current := next <only with probability> e
(E)/T
end if
end for
end procedure
a problem schedule is a mapping from time to temperature
c : Michael Kohlhase 437
243
Properties of simulated annealing
At xed temperature T, state occupation probability reaches Boltzman distribution
p(x) = e
E(x)
kT
T decreased slowly enough =always reach best state x
because
e
E(x
)
kT
e
E(x)
kT =e
E(x
)E(x)
kT
1
for small T.
Is this necessarily an interesting guarantee?
c : Michael Kohlhase 438
Local beam search
Idea: Keep k states instead of 1; choose top k of all their successors
Not the same as k searches run in parallel!
(Searches that nd good states recruit other searches to join them)
Problem: quite often, all k states end up on same local hill
Idea: Choose k successors randomly, biased towards good ones.
(Observe the close analogy to natural selection!)
c : Michael Kohlhase 439
Genetic algorithms (very briey)
Idea: Use local beam search (keep a population of k) randomly modify population
(mutation) generate successors from pairs of states (sexual reproduction) optimize a
tness function (survival of the ttest)
, hierPart, [
query], [
fragment]
hier part :==
//
nums1
[
[
:
[ CombiningChar [ Extender
Name :== (Letter [
[
:
) (NameChar)
(S)
Name (S)
attribute (S)
>
ETag :==
< /
(S)
Name (S)
>
EmptyElementTag :==
<
(S)
Name (S)
attribute (S)
/ >
use these to parse well-formed XML document into a tree data structure
use these to serialize a tree data structure into a well-formed XML document
Idea: Integrate XML parsers/serializers into all programming languages to communicate
trees instead of strings. (more structure = better CS)
c : Michael Kohlhase 504
285
The Dual Role of Grammar in XML (II)
Idea: We can dene our own XML language by dening our own elements and attributes.
Validation: Specify your language with a tree grammar (works like a charm)
Denition 600 Document Type Denitions (DTDs) are grammars that are built into
the XML framework.
Put <!DOCTYPE foo PUBLIC "foo.dtd"> into the second line of the document to val-
idate.
Denition 601 RelaxNG is a modern XML grammar/schema framework on top of the
XML framework.
c : Michael Kohlhase 505
RelaxNG, A tree Grammar for XML
Denition 602 Relax NG (RelaxNG: Regular Language for XML Next Generation) is
a tree grammar framework for XML documents.
A RelaxNG schema is itself an XML document; however, RelaxNG also oers a popular,
non-XML compact syntax.
Example 603 The RelaxNG grammars validate the left document
document RelaxNG in XML RelaxNG compact
<lecture>
<slide id="foo">
first slide
</slide>
<slide id="bar">
second one
</slide>
</lecture>
<grammar>
<start>
<element name="lecture">
<oneOrMore>
<ref name="slide"/>
</oneOrMore>
</element>
</start>
<define name="slide">
<element name="slide">
<text/>
</element>
<attribute name="id">
<text/>
</attribute>
</define>
</grammar>
start = element lecture
{slide+}
slide = element slide
{attribute id {text}
text}
c : Michael Kohlhase 506
16.7 More Web Resources
Wikis
Denition 604 (Wikis) A Wiki is a website on which authoring and editing can be
done by anyone at anytime using a simple browser.
Example 605 Wikipedia, Wikimedia, Wikibooks, Citizendium, etc.(accuracy concerns)
Allow individuals to edit content to facilitate
c : Michael Kohlhase 507
286
Internet Telephony (VoIP)
Denition 606 VoIP uses the Internet to make phone calls, videoconferences
Example 607 Providers include Vonage, Verizon, Skype, etc.
Long-distance calls are either very inexpensive or free
(Quality, security, and reliability concerns)
c : Michael Kohlhase 508
Social Networks
Denition 608 A social network service is an Internet service that focuses on building
and reecting of social networks or social relations among people, e.g., who share interests
and/or activities.
A social network service essentially consists of a representation of each user (often a
prole), his/her social links, and a variety of additional services. Most social network
services provide means for users to interact over the internet, such as e-mail and instant
messaging.
Example 609 MySpace, Facebook, Friendster, Orkut, etc.
c : Michael Kohlhase 509
Really Simple Syndication (RSS)
FireAnt, i-Fetch, RSS Captor, etc.
Built-in Web browser RSS features
c : Michael Kohlhase 510
Instant messaging (IM) and real-time chat (RTC)
Multi-protocol IM clients (AIM)
Web-based IM systems (Forum, chat room)
Podcasting, Blogs
Blogger, Xanga, LiveJournal, etc.
Types: Microblog, vlog, photoblog, sketchblog, linklog, etc.
Blog search engines
Blogs and advertising, implications of ad blocking software
Do bloggers have the same rights as journalists?
c : Michael Kohlhase 511
16.8 The Semantic Web
287
The Current Web
Resources: identied by URIs, untyped
Links: href, src, . . . limited, non-descriptive
User: Exciting world - semantics of the resource, however, gleaned from content
Machine: Very little information available - signicance of the links only evident from
the context around the anchor.
c : Michael Kohlhase 512
The Semantic Web
Resources: Globally Identied by URIs or Locally scoped (Blank), Extensible, Relational
Links: Identied by URIs, Extensible, Relational
User: Even more exciting world, richer user experience
Machine: More processable information is available (Data Web)
Computers and people: Work, learn and exchange knowledge eectively
c : Michael Kohlhase 513
288
What is the Information a User sees?
WWW2002
The eleventh international world wide web conference
Sheraton waikiki hotel
Honolulu, hawaii, USA
7-11 may 2002
1 location 5 days learn interact
Registered participants coming from
australia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway,
singapore, switzerland, the united kingdom, the united states, vietnam, zaire
On the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event ?
Speakers conrmed
Tim Berners-Lee: Tim is the well known inventor of the Web, ?
Ian Foster: Ian is the pioneer of the Grid, the next generation internet ?
c : Michael Kohlhase 514
What the machine sees
JJJ//
T|||_|..|./_/,_,|_||/||||
o|./_||/.|
1//_|o/
//
1|.||,.|.|//
.|,||,|||||/|/,
|,.[.|_|,.||.|,/_
/|_.|,.|.|,|,/.|.|,..|_|.|
O.|./1//_/_,|.|||,//.|||_|.
.|./_/,_,|_||/||||T|./|_|.
o||||/|,
T||||T.|_||/__|.//.|J|
JT/.|J.|/||/.|(,.||.||./.||.
c : Michael Kohlhase 515
289
Solution: XML markup with meaningful Tags
<title>JJJ//T|||_|..|./_/,_,|_||/||||</title>
<place>o|./J||/.|1//_|o/</place>
<date>//</date>
<participants>1|.||,.|.|//
.|,||,|||||/|/,
|,.[.|_|,.||.|,/_
/|_.|,.|.|,|,/.|.|,..|_|.
|</participants>
</introduction>O.|./1//_/_,|.|||,//.|||_|.
.|./_/,_,|_||/||||</introduction>
<program>o||||/|,
<speaker>T||||T.|_||/__|.//.|J|</speaker>
<speaker>JT/.|J.|/||/.|(,.||.||./.|
|.<speaker></program>
c : Michael Kohlhase 516
What the machine sees of the XML
<..|>JJJ//T|||_|..|./_/,_,|_||/||||</..|>
<||>o|./J||/.|1//_|o/</||>
<,.|>//</,.|>
<.|.>1|.||,.|.|//
.|,||,|||||/|/,
|,.[.|_|,.||.|,/_
/|_.|,.|.|,|,/.|.|,..|_|.
|</.|.>
</./,|./>O.|./1//_/_,|.|||,//.|||_|..|
./_/,_,|_||/||||</./,|./>
</>o||||/|,
<|||>T||||T.|_||/__|.//.|J|</|||>
<|||>JT/.|J.|/||/.|(,.||.||./.|
|.<|||><//>
c : Michael Kohlhase 517
290
Need to add Semantics
External agreement on meaning of annotations E.g., Dublin Core
Agree on the meaning of a set of annotation tags
Problems with this approach: Inexible, Limited number of things can be expressed
Use Ontologies to specify meaning of annotations
Ontologies provide a vocabulary of terms
New terms can be formed by combining existing ones
Meaning (semantics) of such terms is formally specied
Can also specify relationships between terms in multiple ontologies
c : Michael Kohlhase 518
291
References
[AES01] Announcing the ADVANCED ENCRYPTION STANDARD (AES), 2001.
[BCHL09] Bert Bos, Tantek Celik, Ian Hickson, and Hakon Wium Lie. Cascading style sheets
level 2 revision 1 (CSS 2.1) specication. W3C Candidate Recommendation, World
Wide Web Consortium (W3C), 2009.
[BLFM05] Tim Berners-Lee, Roy T. Fielding, and Larry Masinter. Uniform resource identier
(URI): Generic syntax. RFC 3986, Internet Engineering Task Force (IETF), 2005.
[Den00] Peter Denning. Computer science: The discipline. In A. Ralston and D. Hem-
mendinger, editors, Encyclopedia of Computer Science, pages 405419. Nature Pub-
lishing Group, 2000.
[DH98] S. Deering and R. Hinden. Internet protocol, version 6 (IPv6) specication. RFC 2460,
Internet Engineering Task Force (IETF), 1998.
[ECM09] ECMAScript language specication, December 2009. 5
th
Edition.
[FGM
+
99] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-
Lee. Hypertext transfer protocol HTTP/1.1. RFC 2616, Internet Engineering Task
Force (IETF), 1999.
[Hal74] Paul R. Halmos. Naive Set Theory. Springer Verlag, 1974.
[HL11] Martin Hilbert and Priscila Lopez. The worlds technological capacity to store, com-
municate, and compute information. Science, 331, feb 2011.
[Hut07] Graham Hutton. Programming in Haskell. Cambridge University Press, 2007.
[Koh08] Michael Kohlhase. Using L
A
T
E
X as a semantic markup format. Mathematics in Com-
puter Science, 2(2):279304, 2008.
[Koh10] Michael Kohlhase. sTeX: Semantic markup in T
E
X/L
A
T
E
X. Technical report, Compre-
hensive T
E
X Archive Network (CTAN), 2010.
[KP95] Paul Keller and Wolfgang Paul. Hardware Design. Teubner Leibzig, 1995.
[LP98] Harry R. Lewis and Christos H. Papadimitriou. Elements of the Theory of Computa-
tion. Prentice Hall, 1998.
[OSG08] Bryan OSullivan, Don Stewart, and John Goerzen. Real World Haskell. OReilly,
2008.
[Pal] Neil/Freds gigantic list of palindromes. web page at http://www.derf.net/
palindromes/.
[RFC80] DOD standard internet protocol, 1980.
[RHJ98] Dave Raggett, Arnaud Le Hors, and Ian Jacobs. HTML 4.0 Specication. W3C
Recommendation REC-html40, World Wide Web Consortium (W3C), April 1998.
[RN95] Stuart J. Russell and Peter Norvig. Articial Intelligence A Modern Approach.
Prentice Hall, Upper Saddle River, NJ, 1995.
[Ros90] Kenneth H. Rosen. Discrete Mathematics and Its Applications. McGraw-Hill, 1990.
[SML10] The Standard ML basis library, 2010.
[XML] Extensible Markup Language (XML) 1.0 (Fourth Edition). Web site at http://www.
w3.org/TR/REC-xml/.
292
Index
-notation, 38
abstract
call-by-value
interpreter, 66
computation, 60
data
type, 57
interpreter, 60
procedure, 65
Program, 65
access
random (), 179
accumulator, 179
acyclic, 143
directed (), 143
adder, 157
carry chain, 160
conditional sum, 161
full, 158
half, 157
twos-complement, 169
addition
carry chain, 156
operation, 29
rules, 156
address
decoder, 177
IPv4, 261
IPv6, 261
MAC, 260
admissible, 112, 236
agent
user, 268
algebra
Boolean, 88
algorithm, 19
generate-and-test, 101
search, 210
algorithm, 20
alphabet, 74
ALU, 173, 179181
anchor
cell, 199
anonymous
variables, 44
answer
substitution, 248
antisymmetric, 36
append
function, 54
application
layer, 262
argument, 60
arithmetic, 110
logic
unit, 173
American Standard Code for Information Inter-
change, 79
assembler, 184
language, 179, 181
assignment
variable, 87
astarSearch search, 236
asymptotically
bounded, 93
asynchronous, 176
atom, 107, 126
atomic, 126
attribute, 279
node, 279
axiom, 26
Axioms
Peano, 58
balanced
binary
tree, 149
bracketing
structure, 279
fully (), 149
base, 154
case, 27
condition, 25, 27
knowledge, 247
sort, 57
basic
multilingual
plane, 81
operator, 43
type, 43
constructor, 43
bijection, 155
bijective, 39
binary, 76
balanced (), 149
natural
number, 165
tree, 149
bit
carry, 157
most signicant, 165
293
sign, 165
sum, 157
Blaise Pascal, 110
Boolean
algebra, 88
expressions, 147
function, 89
polynomial, 90
sum, 86
bootstrapping
process, 155
borrow
input (), 169
bounded
asymptotically, 93
bracketing
balanced (), 279
branch
closed, 128
open, 128
Browser
web, 267
by description
denition, 34
byte
code, 184, 194
C, 196
calculus
complete, 113
correct, 113
resolution, 133
call-by-value
abstract (), 66
canonical
product, 91
sum, 91
card
punch, 80
cardinality, 39
carry
bit, 157
input, 158
intermediate, 170
carry chain
adder, 160
addition, 156
Cartesian
n-dim (nCartSpace), 33
n-fold (nCartProd), 33
procedure, 49
product, 33
case
base, 27
spfstep, 27
cell
anchor, 199
internal, 199
chain
innite, 68
character, 74
code, 76, 155
encoding, 82
name, 280
properties, 280
structural, 83
child, 144
circuit
combinational, 145
combinatorial, 145, 146
clause, 90, 133
empty, 133
fact, 253
Horn, 253
rule, 253
set, 90
clock, 176
closed, 61
branch, 128
closing
tag, 279
CNF, 91
code
byte, 184, 194
character, 76, 155
Morse, 77
on strings, 77
point, 81, 280
prex, 77
string, 77, 155
codeword, 76
codomain, 37
combinational
circuit, 145
combinatorial
circuit, 145, 146
command interpreter, 185
compact
syntax, 282
compiler, 184, 185
complete
calculus, 113
complex, 126
complexity
theory, 22
composition, 36
computation, 60
abstract, 60
294
Computer
Science, 8
computes, 148
concatenation, 74, 75
condition
base, 25, 27
spfstep, 27
step, 25, 28
conditional sum
adder, 161
conjecture, 25
conjunctive
normal
form, 91, 133
constant
name, 83
constructive
proof, 94
constructor, 45
declaration, 57
ground (), 58
term, 62
control, 162
structure, 43
converse, 36
cookie, 272
cookies
third party, 272
corollary, 25
correct
calculus, 113
cost, 148
countably
innite, 39
counter
program, 179, 180
counterexamples, 108
covers, 92
CPU, 179
crawler
web, 274
Cascading Style Sheets, 271
current
instruction, 180
state, 240
Currying, 48
cycle, 143
cyclic, 143
DAG, 143
data
abstract (), 57
store, 179
declaration
constructor, 57
function, 43
namespace, 279
parameter, 64
type, 43
value, 43
decoder
address, 177
dened
inductively, 45
deniendum, 33
deniens, 33
denition
implicit, 34
simple, 33
denition
by description, 34
depth, 85
derivation, 26, 112
derived
inference
rule, 130, 135
rule, 130
derives, 128
dierence
set, 33
digit, 154
digraph, 139
directed
acyclic
graph, 143
edge, 139
graph, 139, 140
disjunctive
normal
form, 91, 133
diverges, 68
DNF, 91, 133, 152
DNS, 262
document
root, 279
XML (), 279
Document Type Denition, 282
domain, 37
Domain Name System, 262
dominates, 98, 239
DTD, 282
edge, 140
directed, 139
undirected, 139
edges, 17
element
empty, 279
295
node, 279
empty
clause, 133
element, 279
set, 33
string, 74
encoding
character, 82
scheme, 280
end, 143
environment, 193
equality
set, 32
equivalence
relation, 36
equivalence class, 20
equivalent, 88, 89
graph, 141
essential, 102
evaluation
function, 87, 236
exception
handler, 73
exclusive
or, 153
expression, 83
label, 147
cost, 92
expressions
Boolean, 147
extension, 76
fact, 247
clause, 253
falsiable, 108
Fibonacci
number, 68
sequence, 68
nite, 39
rmware, 181
folding
left (), 49
right (), 50
formal
language, 75
formula
labeled, 126
frame
pointer, 199
fringe, 218
full
n-bit (nbitfulladder), 159
adder, 158
fully
balanced
tree, 149
function
append, 54
Boolean, 89
declaration, 43
evaluation, 87, 236
interpretation, 87
inverse, 39
name, 83
partial, 37
result, 68
selector, 44
sort, 57
space, 37
total, 37
composition, 40
cost, 92
functional
programming
language, 41, 181
gate, 146
generate-and-test, 20
algorithm, 101
goal, 248
state, 212
Gottfried Wilhelm Leibniz, 110
graph, 16
directed, 139, 140
equivalent, 141
isomorphism, 141
labeled, 142
undirected, 139, 140
depth, 144
greedy
search, 234
ground, 61
constructor
term, 58
half
adder, 157
handled, 73
handler
exception, 73
head, 58
heuristic, 234
hexadecimal
numbers, 181
Horn
clause, 253
HyperText Markup Language, 270
HyperText Markup Language, 270
296
HTTP, 268
Hypertext Transfer Protocol, 268
http
request, 268
hypothetical
reasoning, 121
IANA, 261
Internet Assigned Numbers Authority, 261
ICANN, 262
idempotent, 268
idiom
math, 30
image, 40
immediate
subterm, 59
imperative
programming
language, 181
implicant, 98
prime, 98
implicit
denition, 34
implies, 98
in-degree, 139
index
search, 275
indexing, 275
induced, 89
induced by, 65
inductive, 45
inductively
dened, 45
inference, 25, 26
derived (), 130, 135
rule, 133
innite
chain, 68
countably, 39
innite precision
integers, 70
initial, 141
node, 141
state, 212
injective, 39
input
borrow
bit, 169
carry, 158
vertex, 146
instantiates, 62
instruction
current, 180
program, 181
integers
innite precision, 70
interface
network, 260
intermediate
carry, 170
internal
cell, 199
Internet, 255
Internet Protocol, 261
interpretation
function, 87
interpreter
abstract, 60
intersection, 33
intersection over a collection, 33
invariants
loop, 184
inverse
function, 39
IP, 261
IPv4, 261
address, 261
Internet Protocol Version 4, 261
IPv6, 261
address, 261
Internet Protocol Version 6, 261
isomorphism
graph, 141
iterator, 49
Java, 196
jump
table, 189
Karnaugh-Veitch
map, 105
key, 193, 275
knowledge
base, 247
KV-map, 105
label, 142
expression, 147
labeled
formula, 126
graph, 142
-notation, 38
Landau
set, 93
language
assembler, 179, 181
formal, 75
layer
297
application, 262
transport, 262
leaf, 144
leak
memory, 195
left
folding
operator, 49
lemma, 25
length, 143
lexical
order, 76
LIFO, 185
linear
order, 36
list
variable, 89
literal, 90, 126, 129
local
search, 240
logic
arithmetic (), 173
sequential (), 174
logical
program, 253
reasoning, 25
system, 25
loop
invariants, 184
MAC
address, 260
media access control address, 260
machine, 184
register, 179, 180, 184
virtual, 184, 185
management
memory (), 179
map
Karnaugh-Veitch, 105
matcher, 62
matching
pattern, 44
math
idiom, 30
mathematical
vernacular, 30
maxterm, 92
maze, 17
memory
leak, 195
management
unit, 179
random access, 179
minimal
polynomial, 152
minterm, 92
MMU, 179181
model, 87
monomial, 90, 133
Morse
code, 77
most signicant
bit, 165
multilingual
basic (), 81
multiplexer, 162
n-tuple, 33
name
character, 280
constant, 83
function, 83
variable, 83
namespace
declaration, 279
natural
binary (), 165
unary (), 24
n-bit
full
adder, 159
n-fold
Cartesian
product, 33
n-dim
Cartesian
space, 33
negative, 128
network
interface, 260
packet-switched, 258
packets, 258
NIC, 260
network interface controller, 260
node, 139, 140
attribute, 279
element, 279
initial, 141
root, 19
terminal, 141
text, 279
nodes, 17
nonempty
string, 75
normal, 91
conjunctive (), 91, 133
disjunctive (), 91, 133
298
number
Fibonacci, 68
positional (), 154
numbers
hexadecimal, 181
o
worked, 133
oine
problem
solving, 211
on
relation, 36
on strings
code, 77
one, 27
open
branch, 128
opening
tag, 279
operation
addition, 29
operator, 212
basic, 43
or
exclusive, 153
order
lexical, 76
linear, 36
partial, 36
ordered
pair, 139, 140
out-degree, 139
output
vertex, 146
overow, 172
packet-switched
network, 258
packets
network, 258
page
web, 267
pair, 33, 139
ordered, 139, 140
parameter
declaration, 64
parent, 144
parse-tree, 145
partial
function, 37
order, 36
Pascal, 196
path, 143
XML (), 281
pattern
matching, 44
Peano
Axioms, 58
point
code, 81, 280
pointer
frame, 199
stack, 188
polarity, 165
polynomial
Boolean, 90
minimal, 152
port, 264
positional
number
system, 154
positive
unit-resulting
hyperresolution, 253
postulate, 25
power
set, 33
pre-image, 40
prex, 75
code, 77
proper, 75
abstract
procedure, 60
argument
sort, 60
result
sort, 60
rule, 60
prime
implicant, 98
problem
oine (), 211
procedure
abstract, 60, 65
Cartesian, 49
static, 196
process
bootstrapping, 155
product, 86
canonical, 91
Cartesian, 33
sort, 57
term, 90
product of
sums, 90
products
sum of, 90
299
Program
abstract, 65
program, 179
counter, 179, 180
instruction, 181
logical, 253
store, 179
program store, 185
programming
functional (), 41, 181
imperative (), 181
proof, 25, 26, 112
constructive, 94
tableau, 128
proper
prex, 75
proper subset, 32
proper superset, 32
properties
character, 280
proposition, 107
pull-back, 155, 157
punch
card, 80
query, 248, 253
Quine-McCluskey, 152
radix, 154
raised, 73
RAM, 179, 180
random access
memory, 179
random
access
memory, 179
random access
memory, 179
reasoning
hypothetical, 121
logical, 25
recipe, 20
recursion
relation, 67
step, 67
recursive, 46, 67
reexive, 36
refutation
resolution, 133
tableau, 128
register, 179, 180
machine, 179, 180, 184
relation, 36, 140
equivalence, 36
on, 36
recursion, 67
relative
URI, 267
Relax NG, 282
RelaxNG, 282
schema, 282
representation, 16
request
http, 268
resolution
calculus, 133
refutation, 133
sproof, 134
resolvent, 100
resource
uniform (), 266, 267
web, 266
restriction, 40
result, 60
function, 68
Ridge, 243
right
folding
operator, 50
root, 144
document, 279
node, 19
rosary, 20
router, 259
RS-ipop, 174
RS-latch, 174
rule, 247
clause, 253
derived, 130
inference, 133
rule, 65
rules
addition, 156
safe, 268
satisable, 108
saturated, 128
schema
RelaxNG, 282
scheme
encoding, 280
Science
Computer, 8
search
algorithm, 210
greedy, 234
index, 275
local, 240
300
strategy, 218
web (), 273
selector
function, 44
semantics, 84
sequence
Fibonacci, 68
sequential
logic
circuit, 174
server
web, 268
set, 139
clause, 90
dierence, 33
empty, 33
equality, 32
Landau, 93
power, 33
sign
bit, 165
signature, 64
simple
denition, 33
sink, 141
site
web, 267
size, 34
solution, 212
sort
argument, 60
base, 57
function, 57
product, 57
result, 60
sorts, 57
source, 141
space
function, 37
spanning
tree, 19
spfstep
case, 27
condition, 27
sproof
resolution, 134
stack, 185
pointer, 188
Standard
Unicode, 81
standard
unicode, 280
start, 143
value, 49
state, 212
current, 240
goal, 212
initial, 212
static
procedure, 196
step
condition, 25, 28
recursion, 67
store
data, 179
program, 179
strategy
search, 218
String, 46
string, 74, 75
code, 77, 155
empty, 74
nonempty, 75
structural
character, 83
structure
control, 43
sub-monomial, 98
subset, 32
substitution, 62
answer, 248
substring, 75
subterm, 59
immediate, 59
subtracter, 169
successor, 25
sum
bit, 157
Boolean, 86
canonical, 91
term, 90
sum of
products, 90
sums
product of, 90
superset, 32
support, 62
surjective, 39
symmetric, 36
synchronous, 176
syntax, 84
compact, 282
system
logical, 25
table
jump, 189
tableau
301
proof, 128
refutation, 128
tag
closing, 279
opening, 279
TCP, 262
Transmission Control Protocol, 262
Internet Protocol Suite, 259
TCP/IP, 259
term, 64
constructor, 62
product, 90
sum, 90
terminal, 141
node, 141
terminates, 60, 65
terminates, 68
test calculi, 128
text
node, 279
theorem, 25, 26, 112
theory
complexity, 22
third party
cookies, 272
total, 36
function, 37
transitive, 36
transport
layer, 262
tree, 18, 144
binary, 149
spanning, 19
Turing
universal (), 210
twelve, 27
two, 27
twos-complement
adder, 169
twos complement, 167
type, 43
basic, 43
basic (), 43
declaration, 43
universal, 44
Universal Character Set, 81
UDP, 262
User Datagram Protocol, 262
unary
natural
numbers, 24
underow, 172
undirected
edge, 139
graph, 139, 140
Unicode
Standard, 81
unicode
standard, 280
uniform
resource
identier, 266
locator, 267
union, 33
union over a collection, 33
Union-Find, 20
unit-resulting
positive (), 253
universal
Turing
machine, 210
type, 44
universe, 87
unsatisable, 108, 133
URI
relative, 267
user
agent, 268
UTF-8, 280
valid, 108
value
declaration, 43
start, 49
variable, 61
assignment, 87
list, 89
name, 83
variables
anonymous, 44
vector, 33
vernacular
mathematical, 30
vertex, 139
input, 146
output, 146
depth, 144
virtual
machine, 184
virtual program counter, 185
visualization, 18
virtual
machine, 185
VPC, 185
web
Browser, 267
302
crawler, 274
page, 267
resource, 266
search
engine, 273
server, 268
site, 267
spider, 274
Wilhelm Schickard, 110
word, 179, 184
worked
o, 133
World Wide Web, 255
XML
document
tree, 279
path
language, 281
zero, 27
303