10 Pointers
10 Pointers
10 Pointers
Pointers
15-122: Principles of Imperative Computation
Frank Pfenning, Rob Simmons
Lecture 9
February 14, 2013
1 Introduction
In this lecture we complete our discussion of types in C0 by discussing
pointers and structs, two great tastes that go great together. We will discuss
using contracts to ensure that pointer accesses are safe, as well as the use
of linked lists to ensure implement the stack and queue interfaces that were
introduced last time. The linked list implementation of stacks and queues
allows us to handle lists of any length.
Relating this to our learning goals, we have
Algorithms and Data Structures: Linked lists are a fundamental data struc-
ture.
Programming: We will see structs and pointers, and the use of recursion in
the definition of structs.
char c ‘\n’ 0 1 2 3 4
string[] A
struct img_header {
pixel[] data;
int width;
int height;
};
Here data, width, and height are not variables, but fields of the struct.
The declaration expresses that every image has an array of data as well as a
width and a height. This description is incomplete, as there are some miss-
ing consistency checks – we would expect the length of data to be equal to
the width times the height, for instance, but we can capture such properties
in a separate data structure invariant.
Structs do not necessarily fit into a machine word because they can have
arbitrarily many components, so they must be allocated on the heap (in
memory, just like arrays). This is true if they happen to be small enough to
fit into a word in order to maintain a uniform and simple language.
% coin structdemo.c0
C0 interpreter (coin) 0.3.2 ’Nickel’
Type ‘#help’ for help or ‘#quit’ to exit.
--> struct img_header IMG;
<stdio>:1.1-1.22:error:type struct img_header not small
[Hint: cannot pass or store structs in variables directly; use
pointers]
We can access the fields of a structs, for reading or writing, through the
notation p->f where p is a pointer to a struct, and f is the name of a field
in that struct. Continuing above, let’s see what the default values are in the
allocated memory.
--> IMG->data;
(default empty int[] with 0 elements)
--> IMG->width;
0 (int)
--> IMG->height;
0 (int)
We can write to the fields of a struct by using the arrow notation on the
left-hand side of an assignment.
char c ‘\n’ 0 1 2 3 4
0
1
2
0xFF00FF00
0xFFFF0000
3 Pointers
As we have seen in the previous section, a pointer is needed to refer to a
struct that has been allocated on the heap. In can also be used more gener-
ally to refer to an element of arbitrary type that has been allocated on the
heap. For example:
In this case we refer to the value using the notation *p, either to read (when
we use it inside an expression) or to write (if we use it on the left-hand side
of an assignment).
So we would be tempted to say that a pointer value is simply an ad-
dress. But this story, which was correct for arrays, is not quite correct for
pointers. There is also a special value NULL. Its main feature is that NULL is
not a valid address, so we cannot dereference it to obtain stored data. For
example:
char c ‘\n’ 0 1 2 3 4
4 Linked Lists
Linked lists are a common alternative to arrays in the implementation of
data structures. Each item in a linked list contains a data element of some
type and a pointer to the next item in the list. It is easy to insert and delete
elements in a linked list, which are not natural operations on arrays, since
arrays have a fixed size. On the other hand access to an element in the
middle of the list is usually O(n), where n is the length of the list.
An item in a linked list consists of a struct containing the data element
and a pointer to another linked list. In C0 we have to commit to the type
of element that is stored in the linked list. We will refer to this data as
having type elem, with the expectation that there will be a type definition
elsewhere telling C0 what elem is supposed to be. Keeping this in mind
ensures that none of the code actually depends on what type is chosen.
These considerations give rise to the following definition:
struct list_node {
elem data;
struct list_node* next;
};
typedef struct list_node list;
struct infinite {
int x;
struct infinite next;
}
front back
struct queue_header {
list* front;
list* back;
};
typedef struct queue_header* queue;
We call this a header because it doesn’t hold any elements of the queue, just
pointers to the linked list that really holds them. The type definition allows
us to use queue as a type that represents a pointer to a queue header. We
define it this way so we can hide the true implementation of queues from
the client and just call it an element of type queue.
When does a struct of this type represent a valid queue? In fact, when-
ever we define a new data type representation we should first think about
the data structure invariants. Making these explicit is important as we
think about and write the pre- and postconditions for functions that im-
plement the interface.
What we need here is if we follow front and then move down the
linked list we eventually arrive at back. We call this a list segment. We
also want both front and back not to be NULL so it conforms to the picture,
with one element already allocated even if the queue is empty.
bool is_queue(queue Q) {
if (Q == NULL) return false;
if (Q->front == NULL) return false;
if (Q->back == NULL) return false;
if (!is_segment(Q->front, Q->back)) return false;
return true;
}
Next, the code for checking whether two pointers delineate a list segment.
When both start and end are NULL, we consider it a valid list segment, even
though this will never come up for queues. It is a common code pattern for
working with linked lists and similar data representation to have a pointer
variable, here called p, that is updated to the next item in the list on each
iteration until we hit the end of the list.
return false. The other situation is if we find end , in which case we return
true since we have a valid list segment. This function may not terminate
if the list contains a cycle. We will address this issue in the next lecture; for
now we assume all lists are acyclic.
To check if the queue is empty we just compare its front and back. If
they are equal, the queue is empty; otherwise it is not. We require that we
are being passed a valid queue. Generally, when working with a data struc-
ture, we should always require and ensure that its invariants are satisfied
in the pre- and post-conditions of the functions that manipulate it. Inside
the function, we will generally temporarily violate the invariants.
bool queue_empty(queue Q)
//@requires is_queue(Q);
{
return Q->front == Q->back;
}
To obtain a new empty queue, we just allocate a list struct and point both
front and back of the new queue to this struct. We do not initialize the list
element because its contents are irrelevant, according to our representation.
It is good practice to always initialize memory if we care about its contents,
even if it happens to be the same as the default value placed there.
queue queue_new()
//@ensures is_queue(\result);
//@ensures queue_empty(\result);
{
queue Q = alloc(struct queue_header);
list* p = alloc(struct list_node);
Q->front = p;
Q->back = p;
return Q;
}
Let’s take one of these lines apart. Why does
queue Q = alloc(struct queue_header);
make sense? According to the definition of alloc, we might expect
struct queue_header* Q = alloc(struct queue_header);
since allocation returns the address of what we allocated. Fortunately, we
defined queue to be a short-hand for struct queue_header* so all is well.
To enqueue something, that is, add a new item to the back of the queue,
we just write the data (here: a string) into the extra element at the back,
create a new back element, and make sure the pointers updated correctly.
You should draw yourself a diagram before you write this kind of code.
Here is a before-and-after diagram for inserting "3" into a list. The new or
updated items are dashed in the second diagram.
data
next
1
2
Q
front
back
data
next
1
2
3
Q
front
back
Another important point to keep in mind as you are writing code that ma-
nipulates pointers is to make sure you perform the operations in the right
order, if necessary saving information in temporary variables.
void enq(queue Q, string s)
//@requires is_queue(Q);
//@ensures is_queue(Q);
{
list* p = alloc(struct list);
Q->back->data = s;
Q->back->next = p;
Q->back = p;
}
data
next
1
2
3
front back
data
next
1
2
3
front back
And in code:
string deq(queue Q)
//@requires is_queue(Q);
//@requires !queue_empty(Q);
//@ensures is_queue(Q);
{
string s = Q->front->data;
Q->front = Q->front->next;
return s;
}
Let’s verify that the our pointer dereferencing operations are safe. We have
Q->front->data
bool is_queue(queue Q) {
if (Q == NULL) return false;
if (Q->front == NULL) return false;
if (Q->back == NULL) return false;
if (!is_segment(Q->front, Q->back)) return false;
return true;
}
We see that Q->front is okay, because by the first test we know that Q != NULL
is the precondition holds. By the second test we see that both Q->front and
Q->back are not null, and we can therefore dereference them.
We also make the assignment Q->front = Q->front->next. Why does
this preserve the invariant? Because we know that the queue is not empty
(second precondition of deq) and therefore Q->front != Q->back. Because
Q->front to Q->back is a valid non-empty segment, Q->front->next can-
not be null.
An interesting point about the dequeue operation is that we do not ex-
plicitly deallocate the first element. If the interface is respected there cannot
be another pointer to the item at the front of the queue, so it becomes un-
reachable: no operation of the remainder of the running programming could
ever refer to it. This means that the garbage collector of the C0 runtime sys-
tem will recycle this list item when it runs short of space.
the data at the bottom of the stack is meaningless and will not be used in
our implementation. A typical stack then has the following form:
data
next
3
2
1
top bo.om
struct list_node {
elem data;
struct list_node* next;
};
typedef struct list_node list;
struct stack_header {
list* top;
list* bottom;
};
typedef struct stack_header* stack;
bool is_stack(stack S) {
if (S == NULL) return false;
if (Q->front == NULL) return false;
if (Q->back == NULL) return false;
if (!is_segment(Q->front, Q->back)) return false;
return true;
}
Popping from a stack requires taking an item from the front of the
linked list, which is much like dequeuing.
elem pop(stack S)
//@requires is_stack(S);
//@requires !stack_empty(S);
//@ensures is_stack(S);
{
elem e = S->top->data;
S->top = S->top->next;
return e;
}
To push an element onto the stack, we create a new list item, set its data
field and then its next field to the current top of the stack – the opposite end
of the linked list from the queue. Finally, we need to update the top field of
the stack to point to the new list item. While this is simple, it is still a good
idea to draw a diagram. We go from
data
next
3
2
1
top bo.om
to
top bo/om
In code:
Exercises
Exercise 1 Consider what would happen if we pop an element from the empty
stack when contracts are not checked in the linked list implementation? When
does an error arise?
Exercise 2 Stacks are usually implemented with just one pointer in the header, to
the top of the stack. Rewrite the implementation in this style, dispensing with the
bottom pointer, terminating the list with NULL instead.