Computer Organization and Design Final After Corrections
Computer Organization and Design Final After Corrections
Computer Organization and Design Final After Corrections
SYLLABUS
UNIT-I
Digital Fundamentals: Digital Systems, Binary Numbers, Octal, Hexadecimal Conversions,
Signed Binary Numbers, Complements, Logic Gates, Boolean Algebra, K-maps, Standard
Forms, NAND-NOR Implementation
UNIT-II
Combinational and Sequential Circuits: Combinational Circuits, Adder, Subtractor, ALU
Design, Decoder, Encoder, Multiplexers, Sequential Circuits: Latches, Flip-flops, Registers,
Memories, Up-down Counters
UNIT-III
Processor Fundamentals: Von Neumann Architecture, Processor: Definition, Structure,
Category, Technology, ALU Concept, Stored Programs, Fetch Execute Cycle, Instruction
Formats, Clock Rate Instruction Rate, Pipeline, Current Processors, Multi Core Processors
UNIT-IV
Memory: Physical Memory, Addressing, Virtual Memory, Address Translation, Paging,
Cache, L1, L2, L3 Cache Memories, Cache Mapping, LRU Replacement
UNIT-V
I/O Data Transfer: Data Transfer, Serial and Parallel Data Transfer, Full Duplex–Half Duplex
Interaction, Bus Interface, Programmed I/O, Polling, Interrupt Driven I/O, Hardware
Interrupt Mechanism, Interrupt Vectors, Multi Level of Interrupts, DMA, Buffer Chaining,
Operation Chaining
ANNA UNIVERSITY
2
ANNA UNIVERSITY
Page No.
3
ANNA UNIVERSITY
4
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
1.1 Concept of Digital System
1.1.1 Non-positional Number System
1.1.2 Positional Number System
1.2 Different Types of Number Systems
1.2.1 Decimal Number System
1.2.2 Binary Number System
1.2.3 Octal Number System
1.2.4 Hexadecimal Number System
1.3 Conversions
1.3.1 Decimal to Other Number Systems
1.3.2 Binary System to Other Number Systems
1.3.3 Octal to Other Number Systems
1.3.4 Hexadecimal to Other Number Systems
1.4 Complement
1.4.1. 1’s Complement of Binary Number
1.4.2. 2’s Complement of Binary Number
1.5 Signed Binary Numbers
1.5.1 1’s Complement of Signed Binary Number
1.5.2 2’s Complement of Signed Binary Number
Summary
Keywords
Self-Assessment Questions
Further Readings
5
ANNA UNIVERSITY
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Understood the concept of digital system
• Analyze and identify different number systems
• Describe the Number system conversion concept
• Understood the concept signed binary number and complements
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• The basics of digital system in computer organization
• To explain positional and non-positional number system
• To describe different types of conversion techniques in number system
• The basics of signed binary number
• How to solve 1’s complement and 2’s complement
OVERVIEW
We all are familiar with the impact of digital watches, present day correspondence systems,
mini-computers, and PCs in regular day to day existence. Digital systems assume such an
indispensable part in regular daily existence that we call the present mechanical period as the
Digital Age. Digital systems are utilized in clinical treatment, correspondence, climate
observing, space direction, traffic light, web, and numerous other logical, business and modern
ventures.
In this chapter, you will consider the idea of digital systems. You will find out about various
number systems and their transformation procedures. Ultimately, you will contemplate the idea
of marked paired numbers and supplements.
1.1 CONCEPT OF DIGITAL SYSTEM
The term digital system refers to elements such as hardware, software and networks and their
use. There may be many different components that make up one system; for example, a
computer has a central processing unit, a hard disk, keyboard, mouse, screen etc. A peripheral
device is a digital component that can be connected to a digital system, for example, a digital
camera or printer.
A digital system is the interconnection of digital hardware modules that achieve a particular data
preparing task. The advancement of digital systems occurred with the appearance of integrated
circuits (ICs) which is an offspring of semiconductor technology. The modest manufacture of
ICs has made the subject Digital Electronics simple to consider. One small IC can perform the
task of thousands of transistors diodes and resistors. Numerous ICs are utilized to develop digital
circuits. This is an intriguing and quickly developing field which utilizes a few standards for the
working of computers, communication systems, digital apparatuses and so on.
6
ANNA UNIVERSITY
The fundamental thought is to allow the amateurs to comprehend the activity of the Digital
system and numerous different systems dependent on the standards of Digital Procedures.
Devices working under Digital Methods are considered as Digital systems and the electronic
organizations used to make them functional are called digital circuits.
Electronic circuits utilize two sorts of signs. They are simple signs (constant stockpile of
voltages and flows) and digital signs (discrete voltages and current).
Circuits (electronic organization) utilizing simple signs are known as direct or simple circuits.
Likewise, the electronic organization of an electronic mini-computer or digital watch that utilizes
digital signs is called digital circuits.
An analogue device, is one that has a signal, which varies continuously in time with the input,
whereas, a digital device operates with a digital signal that varies discontinuously.
Figure 1.1 shows the behaviour of analogue and digital signals and the possibility of
conversion from analogue to digital.
Example 1.1: Figure 1.2 represents a symbol of some devices using analogue and digital
world. This example is set to explain how a real-life problem like movement of a pivot on a
water tank that indicates the level of water can be translated to analogue/digital form.
Figure 1.2: Digital System used to Interpret Float Level in Water Tank
7
ANNA UNIVERSITY
Digital system works under logic and hence they are called Logic Circuits, whose building
blocks are known as Gates. These circuits employ two different representations of digital signal
known as Positive Logic Systems and Negative Logic Systems as shown in Figure 1.3. The two
discrete signal levels HIGH and LOW are generally represented by Binary Digits 1 and 0
respectively is referred to as bit and binary number with 8 bits is known as a byte. Since a digital
signal can have only one of the two possible level 1 and 0, the Binary Number System can be
used for the analysis and design of digital system, which was introduced by George Boolean in
1854 and the corresponding algebra is known as Boolean Algebra. These logic concepts have
been adopted for the design of digital circuit.
The number system that we use in day-to-day life is called Decimal Number System. In this
system, one works with 10 different Digits, (0 through 9) and is known as base-ten system. But
digital electronic devices used a ‘strange’ number system called binary. Digital computers and
microprocessor-based systems use other strange number systems called Hexadecimal and Octal.
One who works in electronics must know how to convert numbers from the everyday decimal
system to binary, to Hexadecimal, and to Octal system.
The hexadecimal number system uses the 16 symbol: 0 through 9,A, B, C, D, E, and F and is
referred to as base-sixteen system. The letter ‘A’ stands for decimal 10, ‘B’ for decimal 11, and
so on. The Octal number system uses the 8 symbols: 0 through 7 and are referred to as base-eight
system.
Any Decimal number of any magnitude can be expressed by using the system of positional
weighting in which the right most digit of the number called the ‘least significant digit’ is
multiplied by 100 to the number, the digit left to the least significant digit is multiplied by 101 .
Similarly, as we move in the number towards left side the power increases in steps of 1.
The number system can be classified into two types:
• Positional number system
• Non-Positional number system
1.1.1 Non-positional Number System
In ancient times people used stones and pebbles to count things. This method of counting
8
ANNA UNIVERSITY
9
ANNA UNIVERSITY
10
ANNA UNIVERSITY
1.3 CONVERSIONS
Representations of a number in a different radix are said to be equivalent if they have the same
decimal representation. For example, (0011)8 and (1001)2 are equivalent—both have decimal
value 9. The conversion of a number in base r to decimal is done by expanding the number in a
power series and adding all the terms as shown previously. We now present a general procedure
for the reverse operation of converting a decimal number to a number in base r. If the number
includes a decimal point, it is necessary to separate the number into an integer part and a fraction
part, since each part must be converted differently. The conversion of a decimal integer to a
number in base r is done by dividing the number and all successive quotients by r and
accumulating the remainders. This procedure is best illustrated by example. Now, we will
discuss the set of rules to convert them. These are discussed below:
1.3.1 Decimal to Other Number Systems
You can convert the decimal number systems to other number systems by using methods
given below:
1.3.1.1 Decimal to Binary Number Systems
A decimal number has base 10 and a binary number has base 2. In decimal to binary conversion,
the base of the number also changes, i.e. from base 10 to base 2. All the decimal numbers have
its equivalent binary numbers. These binary numbers are majorly used in computer applications,
where it is used for programming or coding purpose. This is because computers understand the
language of binary digits, 0 and 1.
Hence, once we give the input to the computer system in the decimal form, it converts them into
11
ANNA UNIVERSITY
binary digits, performs the required operations and provides the output with into decimal form
again. Now, learn here how the decimal number can be represented here in binary form. But
before learning the steps for conversion, first, let us see the table to know the equivalent binary
number for a decimal number.
Conversion steps:
1. Divide the number by 2.
2. Get the integer quotient for the next iteration.
3. Get the remainder for the binary digit.
4. Repeat the steps until the quotient is equal to 0.
Example 1.7: Convert 1310 to binary
So 1310 = 11012
Example 1.8: Convert 17410 to binary
So 17410 = 101011102
13.1.2 Decimal to Octal Number Systems
To convert decimal to octal, we have to learn about both the number system first. A number with
base 8 is the octal number and a number with base 10 is the decimal number. Here we will
convert a decimal number to an equivalent octal number.
12
ANNA UNIVERSITY
Conversion steps:
1. Divide the number by 8.
2. Get the integer quotient for the next iteration.
3. Get the remainder for the octal digit.
4. Repeat the steps until the quotient is equal to 0.
Example 1.9: Convert 756210 to octal
7562/8 945 2 2 0
945/8 118 1 1 1
118/8 14 6 6 2
14/8 1 6 6 3
1/8 0 1 1 4
So 756210 = 166128
Example 1.10: Convert 3563110 to octal
35631/8 4453 7 7 0
4453/8 556 5 5 1
556/8 69 4 4 2
69/8 8 5 5 3
8/8 1 0 0 4
1/8 0 1 1 5
So 3563110 = 1054578
13
ANNA UNIVERSITY
7562/16 472 10 A 0
472/16 29 8 8 1
29/16 1 13 D 2
1/16 0 1 1 3
So 756210 = 1D8A16
Example 1.12: Convert 3563110 to hex
35631/16 2226 15 F 0
2226/16 139 2 2 1
139/16 8 11 B 2
8/16 0 8 8 3
So 3563110 = 8B2F16
14
ANNA UNIVERSITY
binary number: 1 1 1 0 0 1
power of 2: 25 24 23 22 21 20
15
ANNA UNIVERSITY
16
ANNA UNIVERSITY
• Now multiply each digit of the number with 8n-1, when the digit is in the nth position from
the right end of the number. If the number has a decimal part, multiply each digit in the
decimal part by `8-m` when the digit is in the mth position from the decimal point.
• Add all the terms after multiplication.
• The obtained value is the equivalent decimal number.
Example 1.17: Convert 418 to a decimal number
418 = (4 * 81) + (1 * 80)
=4*8+1*1
= 32+1
= (33)10
1.3.3.2 Octal to Binary
In order to convert octal to binary number, we have to follow a few steps. Octal numbers have
base 8 and binary numbers have base 2. Separate each digit by its equivalent 3bit binary
numbers.
17
ANNA UNIVERSITY
18
ANNA UNIVERSITY
19
ANNA UNIVERSITY
Example 1.27: To find out the 2’s complement of a binary number, first you have to find out the
1’s complement of that number then add 1. Now read the example given below to understand the
method in a better way.
20
ANNA UNIVERSITY
21
ANNA UNIVERSITY
Sign-magnitude notation is the simplest and one of the most common methods of representing
positive and negative numbers either side of zero, (0). Thus negative numbers are obtained
simply by changing the sign of the corresponding positive number as each positive or unsigned
number will have a signed opposite, for example, +2 and -2, +10 and -10, etc.
But how do we represent signed binary numbers if all we have is a bunch of one’s and zero’s.
We know that binary digits, or bits only have two values, either a “1” or a “0” and conveniently
for us, a sign also has only two values, being a “+” or a “–“.
Then we can use a single bit to identify the sign of a signed binary number as being positive or
negative in value. So to represent a positive binary number (+n) and a negative (-n) binary
number, we can use them with the addition of bit.
For signed binary numbers the most significant bit (MSB) is used as the sign bit. If the sign bit is
“0”, this means the number is positive in value. If the sign bit is “1”, then the number is negative
in value. The remaining bits in the number are used to represent the magnitude of the binary
number in the usual unsigned binary number format way.
Then we can see that the Sign-and-Magnitude (SM) notation stores positive and negative values
by dividing the “n” total bits into two parts: 1 bit for the sign and n–1 bits for the value which is
a pure binary number. For example, the decimal number 53 can be expressed as an 8-bit signed
binary number as follows.
Positive Signed Binary Numbers
The disadvantage here is that whereas before we had a full range n-bit unsigned binary number,
we now have an n-1 bit signed binary number giving a reduced range of digits from:
-2(n-1) – 1 to +2(n-1) – 1
So for example: if we have 4 bits to represent a signed binary number, (1-bit for the Sign bit and
22
ANNA UNIVERSITY
3-bits for the Magnitude bits), then the actual range of numbers we can represent in sign-
magnitude notation would be:
-2(4-1) – 1 to +2(4-1) – 1
-2(3) – 1 to +2(3) – 1
-7 to +7
Whereas before, the range of an unsigned 4-bit binary number would have been from 0 to 15,
or 0 to F in hexadecimal, we now have a reduced range of -7 to +7. Thus an unsigned binary
number does not have a single sign-bit, and therefore can have a larger binary range as the most
significant bit (MSB) is just an extra bit or digit rather than a used sign bit.
Another disadvantage here of the sign-magnitude form is that we can have a positive result for
zero, +0 or 00002, and a negative result for zero, -0 or 10002. Both are valid but which one is
correct.
Signed Binary Numbers Example No1
Convert the following decimal values into signed binary numbers using the sign-magnitude
format:
-12710 as a 8-bit
⇒ 111111112
number
23
ANNA UNIVERSITY
So we can see that using this method there can be two representations for zero, a positive zero
( 00002 ) and also a negative zero ( 10002 ) which can cause big complications for computers and
digital systems.
1.5.1 1’s Complement in Signed Binary Number Representation
1’s complement binary numbers are very useful in Signed number representation. Positive
numbers are simply represented as Binary number. There is nothing to do for positive binary
number. But in case of negative binary number representation, we represent in 1’s complement.
If the number is negative then it is represented using 1’s complement. First represent the number
with positive sign and then take 1’s complement of that number.
Example 1.29: Let’s assume 5 bits register. The representation of -5 and +5 will be as follows:
24
ANNA UNIVERSITY
• If the result of above addition has carry bit 1, then add it to the least significant bit (LSB)
of given result.
• If there is no carry bit 1, then take 1’s complement of the result which will be negative.
Note that subtrahend is number that to be subtracted from the another number, i.e., minuend.
Example (Case-1: When Carry bit 1): Evaluate 10101 - 00101
According to above algorithm, take 1’s complement of subtrahend 00101, which will be 11010,
then add both of these. So, 10101 + 11010 =101111 . Since, there is carry bit 1, so add this to the
LSB of given result, i.e., 101111+1=110000 which is the answer.
Example (Case-2: When no Carry bit): Evaluate 11110 from 1110
According to above algorithm, take 1’s complement of subtrahend 11110, which will be 00001.
Then add both of these, So, 1110 + 00001 =01111 . Since there is no carry bit 1, so take 1’s
complement of above result, which will be 10000, and this is negative number, i.e, 10000, which
is the answer.
Similarly, you can subtract two mixed (with fractional part) binary numbers. Note that you
always add Carry bit the the least significant bit (LSB) of the result, whenever you get carry bit
1. LSB of fractional binary number is last (rightmost) bit of mixed or fractional binary numbers.
Additions by 1’s Complement
There are difference scenario for addition of two binary numbers using 1’s complement. These
are explained as following below.
Case-1: Addition of positive and negative number when positive number has greater
magnitude:
When positive number has greater magnitude, then take simply 1’s complement of negative
number and the end-around carry of the sum is added to the least significant bit (LSB).
Example1.30: Add 1110 and -1101.
So, take 1’s complement of 1101, which will be 0010, then add with given number. So,
1110+0010=1 0000 , then add this carry bit to the LSB, 0000+1=0001 , which is the answer.
Note that if the register size is big then fill the same value of MSB to preserve sign magnitude
for inputs and output.
Case-2: Addition of positive and negative number when negative number has greater
magnitude:
When the negative number has greater magnitude, then take 1’s complement of negative number
and add with given positive number. Since there will not be any end-around carry bit, so take 1’s
complement of the result and this result will be negative.
Example1.31: Add 1010 and -1100 in five-bit registers.
Note that there are five-bit registers, so these new numbers will be 01010 and -01100.
Now take 1’s complement of 01100 which will be 10011 and add 01010+10011=11101 . Then
take 1’s complement of this result, which will be 00010 and this will be negative number, i.e., -
25
ANNA UNIVERSITY
26
ANNA UNIVERSITY
easier addition of the two numbers so therefore the sum is: 115 + ( 2’s complement of 27 ) which
is:
01110011 + 11100101 = 1010110002
As previously, the 9th overflow bit is disregarded as we are only interested in the first 8-bits, so
the result is: 010110002 or (64 + 16 + 8) = 8810 in decimal the same as before.
Learning Activity
27
ANNA UNIVERSITY
learning theories in their teaching, and the teachers were keen to explore the use of concrete
materials, especially the base ten game in relation to student learning.
The five project schools represented the diverse range of communities within the independent
sector, including an isolated rural school with high numbers of Indigenous students, small and
large urban schools, and two schools with high numbers of students from a low socio-
economic background. The research took place between March and October of 2001, and
involved approximately 280 primary students.
The research methodology enabled the teacher researchers to focus on improved student
learning through changes in teacher practice and allowed concentration on the practical, day-
to-day realities of the classroom. Under the broad question, ’What are the most effective
teaching methods and management structures that will maximize the learning of the base ten
number system in a whole class setting?’ each teacher developed their own specific research
question. Within the parameters of each individual research question were common elements
of investigation that included:
• Range of teaching and learning strategies that were employed to enhance base ten
learning
• Monitoring and evaluation of these strategies
• Teacher knowledge that was required to effectively help students learn
• Use of concrete materials to assist learning
28
ANNA UNIVERSITY
SUMMARY
• A digital system is the interconnection of digital hardware modules that achieve a particular
data preparing task.
• Digital systems work under rationale and subsequently they are called Logic Circuits, whose
building blocks are known as Gates.
• Ascertaining utilizing just 1s and 0s is known as the double system.
• A number system with base or radix of 10 is known as decimal number system.
• A number system with base or radix of 8 is known as octal number system. A number system
with base of 16 is known as hexadecimal number system.
• A n-digit marked twofold number comprises of two sections, that is, sign piece and size.
• Complements are utilized to work on the deduction activity.
KEYWORDS
Digital Systems: Digital systems are designed to store, process, and communicate information
in digital form.
Binary Number System: Binary number is a number expressed in the base-2 numeral system or
binary numeral system, a method of mathematical expression which uses only two symbols:
typically "0" and "1".
Decimal Number System: A number system with base or radix of 10 is known as decimal
number system.
Octal Number System: A number system with base or radix of 8 is known as octal number
system.
Hexadecimal Number System: A number system with base of 16 is known as hexadecimal
number system.
Complements: Complements are used in digital computers to simplify the subtraction
operation.
SELF-ASSESSMENT QUESTIONS
29
ANNA UNIVERSITY
30
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
2.1 Types of Logic Gates
2.1.1 The NOT (Inverter) Gate
2.1.2 The AND Gate
2.1.3 The OR Gate
2.1.4 NAND Gate
2.1.5 The NOR Gate
2.1.6 X-OR Gate(exclusive-OR)
2.1.7 X-NOR Gate
2.1.8 Non-inverter or Buffer
2.1.9 Open Collector and Open Drain
2.2 Tri-state Gates
2.2.2 Tri-state Buffer Switch Equivalent
2.2.3 Active “HIGH” Tri-state Gates
2.2.4 Active “LOW” Tri-state Gate
2.2.5 Multi-bit tri-state Buffers
2.2.6 Tri-state Gate Control
Summary
Self-Assessment Questions
Keywords
Further Reading
31
ANNA UNIVERSITY
LEARNING OBJECTIVES
After studying this chapter, you should be able to:
• Gain knowledge about concept of logic gates
• Understand about the different types of logic gates
• Analyze the logic symbol and truth table for each gate
• Gain knowledge about tri-state gate
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand:
• The implementation of logic gates
• Analyze the different types of logic gates and their symbol
• Identify the logic symbols and truth table for each gates
• The concept of tri-state gates
OVERVIEW
In the previous chapter, you have contemplated the idea of digital systems, distinctive number
systems and their transformations. You have likewise perceived the idea of marked double
numbers and supplements.
In this chapter, you will contemplate the idea of rationale doors. Rationale doors are the essential
structure squares of digital hardware. As the name suggests, they work by "opening" or
"shutting" to concede or dismiss the progression of digital data. It is a romanticized or actual
gadget executing a Boolean capacity, that is, it plays out an intelligent procedure on at least one
legitimate sources of info, and produces a solitary sensible yield. Entryways execute
electronically straightforward consistent procedure on Boolean factors, that factors that can have
just one of two states (0/1, low/high, bogus/valid). This exercise clarifies various sorts of
rationale doors. You will likewise examine tri-state rationale entryway that is an uncommon
yield door.
You can build digital systems by utilizing rationale doors and the diverse rationale entryways
that you will learn in this exercise are AND, OR, NOT, NAND, NOR, X-OR and X-NOR.
2.1 TYPES OF LOGIC GATES
Logic gates are the basic building blocks of any digital system. It is an electronic circuit having
one or more than one input and only one output. The relationship between the input and the
output is based on certain logic. Based on this, logic gates are named as AND gate, OR gate,
NOT gate etc. The logic state of a terminal can, and usually does, change often, as the circuit
processes data. In most logic gates, the low state is approximately zero volts (0 V), while the
high state is approximately five volts positive (+5 V).
There are mainly seven types of logic gates that are used in expressions. By combining them in
different ways, you will be able to implement all types of digital components. Now, each basic
logic gate and their operation are discussed.
32
ANNA UNIVERSITY
33
ANNA UNIVERSITY
Similar to AND gate, an OR gate may also have two or more inputs but produce only one output.
The OR gate produces an output of logic 1 state even if any of its inputs is in logic 1 state and
also produces an output of logic 0 state if any of its inputs is in logic 0 state. The symbol for OR
operation is ‘+’. If the inputs are of X and Y, then the output can be represented as Z=X+Y. An
OR gate may also be defined as a device whose output is 1, even if one of its input is 1. OR gate
is also called as any or all gate. It is also called as an inclusive OR gate because it consists of the
condition of ‘both the inputs can be present’. The logic symbols and truth table for two-input
OR gates are given below.
34
ANNA UNIVERSITY
35
ANNA UNIVERSITY
36
ANNA UNIVERSITY
Solutions:
(a) AC + ABC = AC (1 + B) = AC, since 1 + B = 1
(b) A + AB = A (1 + B) = A
A tri-state buffer (a special gate output) is a useful device that allows us to control when
current passes through the device, and when it does not.
A tri-state buffer has two inputs: a data input x and a control input c. The control input acts
like a valve. When the control input is active, the output is the input. That is, it behaves
just like a normal buffer. The "valve" is open. When the control input is not active, the
output is "Z". The "valve" is open, and no electrical current flows through. Thus, even if x is
0 or 1, that value does not flow through.
37
ANNA UNIVERSITY
When activated into its third state it disables or turns “OFF” its output producing an open circuit
condition that is neither at a logic “HIGH” or “LOW”, but instead gives an output state of very
high impedance, High-Z, or more commonly Hi-Z. Then this type of device has two logic state
inputs, “0” or a “1” but can produce three different output states, “0”, “1” or ” Hi-Z ” which is
why it is called a “Tri” or “3-state” device.
Note that this third state is NOT equal to a logic level “0” or “1”, but is an high impedance state
in which the buffers output is electrically disconnected from the rest of the circuit. As a result, no
current is drawn from the supply.
There are four different types of Tri-state Buffer, one set whose output is enabled or disabled by
an “Active-HIGH” control signal producing an inverted or non-inverted output, and another set
whose buffer output is controlled by an “Active-LOW” control signal producing an inverted or
non-inverted output as shown below.
2.2.2 Active “HIGH” Tri-state Gates
In this case, when the output is Z, that means it is high impedance, neither 0, nor 1, i.e., no
current.
When c = 1 the valve is open, and z = x. When c = 0 the valve is closed, and z= Z (e.g., high
impedance/no current).
2.2.3 Active “LOW” Tri-state Gate
Some tri-state buffers are active low. In an active-low tri-state buffer, c = 0 turns open the
valve, while c = 1 turns it off. Here's the condensed truth table for an active-low tri-state
buffer.
38
ANNA UNIVERSITY
As you can see, when c = 0 the valve is open, and z = x. When c = 1 the valve is closed,
and z = Z (e.g., high impedance/no current).
2.2.4 Multi-bit Tri-state Buffers
So far, we've talked about a tri-state buffer controlling the output of a single wire.
However, it's more common to deal with many wires.
Example:
In this case, we have 32 wires coming into the tri-state buffer. We have 32 wires coming
out of the tri-state buffer. There's still only 1 control bit.
This can easily be implemented using 32 tri-state buffers taking one bit as input and one bit as
output.
2.2.5 Tri-state Gate Control
The third state (Hi-Z) removes the device's influence from the rest of the circuit. If more
than one device is electrically connected, putting an output into the Hi-Z state is often used to
prevent short circuits, or one device driving high (logical 1) against another device driving
low (logical 0).Three-state buffers can also be used to implement efficient multiplexers,
especially those with large numbers of inputs. Three-state logic can reduce the number of
wires needed to drive a set of LED.
The Tri-state Buffer is used in many electronic and microprocessor circuits as they allow
multiple logic devices to be connected to the same wire or bus without damage or loss of data.
For example, suppose we have a data line or data bus with some memory, peripherals, I/O or a
CPU connected to it. Each of these devices is capable of sending or receiving data to each other
39
ANNA UNIVERSITY
onto this single data bus at the same time creating what is called a contention.
Contention occurs when multiple devices are connected together because some want to drive
their output high and some low. If these devices start to send or receive data at the same time a
short circuit may occur when one device outputs to the bus a logic “1”, the supply voltage, while
another is set at logic level “0” or ground, resulting in a short circuit condition and possibly
damage to the devices as well as loss of data. Digital information is sent over these data buses or
data highways either serially, one bit at a time, or it may be up to eight (or more) wires together
in a parallel form such as in a microprocessor data bus allowing multiple tri-state buffers to be
connected to the same data highway without damage or loss of data as shown.
Learning Activity
40
ANNA UNIVERSITY
buffer is probably related to watching videos on YouTube or live streams from Netflix or
Hulu. In a video stream, a buffer represents the amount of data required to be downloaded
before the video can be played back to the viewer in real time. The buffer stores transmitted
data temporarily as it is going between devices or between a device and an app.
A buffer in a computer environment means that a set amount of data is going to be stored in
order to preload the required data right before it gets used by the CPU. Computers have many
different devices that all operate at varying speeds, and a buffer is needed to act as a sort of
temporary placeholder for everything that is interacting. This is done to keep everything
running efficiently and without issues between all the devices, programs, and processes
running at that time.
Questions:
1. Discuss about need for buffer in computer network.
2. List out types of buffer.
1. NAND has the ability to perform three basic logic functions such as
AND, OR and NOT.
2. The buffer is a single-input device that has a gain of 1, mirroring the
input at the output. It has value for impedance matching and for isolation of
the input and output.
SUMMARY
• Logic gates perform essential logical capacities and are the major structure squares of
advanced coordinated circuits.
• The NOT gate is likewise called as an inverter as it changes the contribution to its inverse.
• The AND Gate takes in two or more sources of info and produce just one yield.
• The OR gate delivers a yield of logic 1 state regardless of whether any of its information
sources is in logic 1 state and likewise creates a yield of logic 0 state if any of its data sources
is in logic 0 state.
• NAND can perform three fundamental logic capacities, for example, AND, OR and NOT.
• NOR implies NOT OR. It is a mix of an OR gate and a NOT gate.
• A X-OR gate is a two information, one yield logic circuit. X-OR gate accepts logic 1 state
when any of its two sources of info expects logic 1 state.
• An X-NOR gate is a combination of an X-OR gate and a NOT gate. The X- NOR gate is also
a two input, one output concept.
41
ANNA UNIVERSITY
• The buffer is a single-input device, which has a gain of 1, mirroring the input at the output.
• Open-collector or open-drain outputs do not source current, they can only sink current.
• A tri-state buffer is a useful device that allows us to control when current passes through the
device, and when it doesn't.
KEYWORDS
Logic Gates: They are the basic building blocks of digital circuitry.
NOT Gate: It is also called as an inverter as it changes the input to its opposite.
AND Gate: It takes in two or more inputs and produces only one output
OR Gate: It produces an output of logic 1 state even if any of its inputs is in logic 1 state
and also produces an output of logic 0 state if any of its inputs is in logic 0 state.
NAND Gate: It is a universal gate that has the ability to perform three basic logic functions
such as AND, OR and NOT.
NOR Gate: NOR gate is a combination of an OR gate and a NOT gate.
X-OR Gate: It is a two input, one output logic circuit that assumes logic 1 state when any of
its two inputs assumes logic 1 state.
X-NOR Gate: It is a combination of an X-OR gate and a NOT gate.
Buffer: It is a single-input device that has a gain of 1, mirroring the input at theoutput.
Tri-state Buffer: It is a useful device that allows us to control when current passes through
the device, and when it doesn't.
SELF-ASSESSMENT QUESTIONS
Short Answer Questions
1. What are logic gates? Discuss.
2. Differentiate between basic and universal gates.
3. What is a buffer?
4. Discuss about X-OR Gate.
5. Define X-NOR Gate.
6. Discuss open drain concept.
7. What is the use of tri-state gate?
8. Discuss the gate fan out example.
42
ANNA UNIVERSITY
43
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
3.1 Concept of Boolean Algebra
3.1.1 Postulates of Boolean Algebra
3.1.2 Laws of Boolean Algebra and their Truth Table
3.1.3 Boolean Arithmetic
3.1.4 Boolean Algebraic Identities
3.1.5 Translating Truth Tables into Boolean Expression
3.1.6 Boolean Rules for Simplification
3.1.7 Boolean Theorems
3.2 DeMorgan’s Theorems
3.2.1 DeMorgan's FirstTheorems
3.2.2 DeMorhan’s Second Theorem
3.3 K-maps
3.3.1 Karnaugh Map Format
3.3.2 Looping
3.4 Standard Forms
3.5 NAND NOR Implementation
3.5.1 NAND Implementation
3.5.2 NOR Implementation
Summary
Keywords
Self-Assessment Questions
Further Reading
44
ANNA UNIVERSITY
LEARNING OBJECTIVES
After Studying this chapter, you should be able to:
• Explain the concept of Boolean Algebra
• Define DeMorgan’s Theorems
• Describe the concept of K-maps
• Explain standard forms
• Illustrate NAND and NOR implementation
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand:
• Gains knowledge about Boolean Arithmetic
• Laws of Boolean Algebra and their truth table
• How to apply DeMorgan’s theorems
• Gains knowledge about functions of K-Maps
• Concept of minterms and maxterms
• Applications of K-map digital circuits and standard forms
OVERVIEW
In the past chapter, you have examined the idea of logic gates and comprehended the
various kinds of logic gates, for example, OR, AND, NOR, NAND, X-OR, X-NOR and
tri-state support, which is an exceptional gate yield.
In this chapter, you will ponder the idea of Boolean variable based math. You can
characterize Boolean polynomial math with a bunch of components, operators or various
hypothesizes. In certain regards, Boolean variable based math looks like ordinary variable
based math. You will contemplate portrayal of Boolean polynomial math and DeMorgan's
theorem. We will likewise cover the idea of K-map and the utilizations of K-Map
advanced circuits.
The investigation of Boolean variable based math will help you in understanding circuits
and their working. The information on Boolean variable based math will cause you to
understand the ideas of combinational and successive circuits, which you will concentrate
in the next chapter.
3.1 CONCEPT OF BOOLEAN ALGEBRA
In 1854 George Boole introduced a systematic treatment of logic and developed for this purpose
an algebraic system known as symbolic logic, or Boolean algebra. Boolean algebra is a branch of
mathematics and it can be used to describe the manipulation and processing of binary
information. The two-valued Boolean algebra has important application in the design of modern
computing systems. This chapter contains a brief introduction the basics of logic design. It
provides minimal coverage of Boolean algebra and this algebra’s relationship to logic gates and
basic digital circuit.
45
ANNA UNIVERSITY
Boolean algebra is algebra for the manipulation of objects that can take on only two values,
typically true and false. It is common to interpret the digital value 0 as false and the digital value
1 as true.
Example 3.1: A+A=A, not 2A
Also 1+1=1, not 2 as it is logical expression.
One can visualize this as,
TRUE+TRUE=TRUE
FALSE+FALSE=FALSE
A Boolean algebraic expression is composed of variables, constant and operators. The
variables are commonly denoted by the letters of the Alphabet (say A), which can have two
possible values 1 or 0. The interpretation of 1 may be that the variable is present, input signal
is ON, is TRUE, and is a positive voltage. If A is 0, then it means that the variable is absent,
input signal if OFF, is FALSE, and is a negative voltage. Similarly, the Boolean constant can
have any two values, either 1 or 0.
Boolean operators are used in Boolean algebra where a mathematical function called Boolean
function is constructed. These operators are the symbols:
• PLUS (+) meaning an OR operation
• DOT (.) meaning an AND operation
• BAR read as COMPLEMENT meaning a NOT operation.
3.1.1 Postulates of Boolean algebra
In this section, let us discuss about the Boolean postulates and basic laws that are used in
Boolean algebra. These are useful in minimizing Boolean functions.
Boolean Postulates
x.1 = x
x.0 = 0
x.x = x
x.x’ = 0
Consider the binary numbers 0 and 1, Boolean variable x and its complement x̄. Either the
Boolean variable or complement of it is known as literal. The four possible logical
OR operations among these literals and binary numbers are shown below.
x+0=x
x+1=1
x+x=x
x + x’ = 1
Similarly, the four possible logical AND operations among those literals and binary numbers are
shown. These are the simple Boolean postulates. We can verify these postulates easily, by
substituting the Boolean variable with ‘0’ or ‘1’.
46
ANNA UNIVERSITY
47
ANNA UNIVERSITY
The first three sums make perfect sense to anyone familiar with elementary addition. The last
sum, though, is quite possibly responsible for more confusion than any other single statement in
digital electronics, because it seems to run contrary to the basic principles of mathematics. Well,
it does contradict the principles of addition for real numbers, but not for Boolean numbers.
Remember that in the world of Boolean algebra, there are only two possible values for any
quantity and for any arithmetic operation: 1 or 0. There is no such thing as “2” within the scope
of Boolean values. Since the sum “1 + 1” certainly isn’t 0, it must be 1 by process of elimination.
It does not matter how many or few terms we add together, either. Consider the following sums:
Binary Addition
Binary addition is much like your normal everyday addition (decimal addition), except that
rather than carrying a value of 0 and 1 instead of 1 to 9.
Example3.2: In decimal addition, if you add 8 + 2 you get ten, which you write as 10; in the sum
this gives a digit 0 and a carry of 1. Something similar happens in binary addition when you add
1 and 1; the result is two (as always), but since two is written as 10 in binary, we get, after
summing 1 + 1 in binary, a digit 0 and a carry of 1.
Therefore in binary:
0+0=0
0+1=1
1+0=1
1 + 1 = 10 (which is 0 carry 1)
48
ANNA UNIVERSITY
1 1 1 1 1 1
1 0 1 1 0 1 1 0 0 0 1
+ 1 1 0 0 1 + 1 1 1 0 1
1 0 0 0 1 1 0 1 0 1 1 1 0
Binary Subtraction
Subtraction and borrow are the words used very frequently for the binary subtraction. There are
four rules of binary subtraction.
A B Difference Borrow
0 0 0 0
0 1 1 1
1 0 1 0
1 1 0 0
Work the columns right to left subtracting in each column. If you must subtract a one from a
zero, you need to “borrow” from the left, just as in decimal subtraction.
Example 3.4: 1010110-101010=101100 1011011-10010=1001001
*0 *0
1 0 1 0 1 1 0 1 0 1 1 0 1 1
− 1 0 1 0 1 0 − 1 0 0 1 0
1 0 1 1 0 0 1 0 0 1 0 0 1
Binary Multiplication
Binary multiplication is the process of multiplying binary numbers. The process of multiplying
binary numbers is the same as that of arithmetic multiplication with decimal numbers. The only
difference is that binary multiplication involves numbers that are consist of 0s and 1s, whereas,
decimal multiplication involves numbers that comprise digits from 0 to 9.
49
ANNA UNIVERSITY
Example 3.5: Using the binary multiplication rules, multiply (110)2 and (11)2.
The rules for binary multiplication are:
0×0=0
0×1=0
1×0=0
1×1=1
Let us use the above rules to multiply the binary numbers.
GIven, multiplicand = 1102, multiplier = 112. We multiply the two numbers as shown below.
Binary Division
The binary division operation is similar to the base 10 decimal system, except the base 2. The
division is probably one of the most challenging operations of the basic arithmetic operations.
There are different ways to solve division problems using binary operations.
1÷1 = 1
1÷0 = Meaningless
0÷1 = 0
0÷0 = Meaningless
Example 3.6: Solve 01111100 ÷ 0010
01111100 ÷ 0010
Here the dividend is 01111100, and the divisor is 0010
Remove the zero’s in the Most Significant Bit in both the dividend and divisor, that doesn’t
change the value of the number.
So the dividend becomes 1111100, and the divisor becomes 10.
Example 3.7 : Solve using the long division method: 101101 ÷ 101
Complement
Boolean algebra also supports complement of a variable, which are the opposite of its value. It
uses capital alphabetical letters to denote variables.
50
ANNA UNIVERSITY
51
ANNA UNIVERSITY
Product of Sum
The product of Sum form is a form in which products of different sum terms of inputs are taken.
These are not arithmetic product and sum but they are logical Boolean AND and OR
respectively.
Example 3.9: 1010 → A̅ B C̅ D
Substitute, and see the sum of term is 0:
A+B+C+D=1̅+0+1̅+0=0+0+0+0=0
3.1.6 Boolean Rules for Simplification
Boolean algebra is also used in simplifying logic circuits. By applying specific algebraic rules to
a Boolean equation resulting from a truth table, you will get a much simpler equation.
Boolean Theorems
The Boolean theorems are grouped into two categories below. Table 1 lists the single-variable
theorems. Table lists the theorems relating expressions with multiple variables. Table 3 able
lists two special multivariable theorems which express how to change an AND expression to an
OR expression and vice versa. DeMorgan’s theorems in Table.
Note that AB and A B are equivalent representations for the AND function. The dot notation
52
ANNA UNIVERSITY
is used below to avoid confusion when combining numbers and variables, ie, to differentiate the
expression x 0 from the variable x0.
1. x 0 = 0 x 5. x + 0 = x x
0
x
0
0
2. x 1 = x x 6. x + 1 = 1 x
x 1
1 1
3. x x = x x 7. x + x = x x
x x
4. x x = 0 x 8. x + x = 1 x
0 1
9. x + y = y + x 13a. x( y + z ) = xy + xz
11. x + ( y + z ) = ( x + y ) + z = x + y + z 14. x + xy = x
15b. x + xy = x + y
Table 2: Multivariable Theorems
16. ( x + y ) = x y 17. ( x y ) = x + y
Table 3: DeMorgan’s Theorems
These theorems can be used to simplify expressions as shown in the examples below. In all of
the following examples, we refer to the theorems above by number for simplicity.
Example 3.10:
Simplify the expression z = ABD + ABD to a sum of products (SOP) form.
z = ABD + ABD
z = AB ( D + D) by theorem 13
z = AB (1) by theorem 8
z = AB by theorem 2 z = AB
53
ANNA UNIVERSITY
Example 3.11:
Example 3.13:
Determine the output expression of the circuit in Figure 1 and simplify to a sum of products.
A
B
C z = A B C by analysis
z
z = A+ B +C by theorem 17
z = A+ B +C
Figure 1: Logic Circuit
z = A+ B +C
Note that in Example we have changed an AND expression (since a NAND gate is an AND gate
with an inverter after it) to an OR expression. This conversion leads directly to our ability to
implement any logic expression using any universal gate, eg, NAND or NOR.
3.2 DEMORGAN’S THEOREMS
DeMorgan´s Theorem and Laws can be used to to find the equivalency of the NAND and NOR
gates.
As we have seen previously, Boolean Algebra uses a set of laws and rules to define the operation
of a digital logic circuit with “0’s” and “1’s” being used to represent a digital input or output
condition. Boolean Algebra uses these zeros and ones to create truth tables and mathematical
54
ANNA UNIVERSITY
expressions to define the digital operation of a logic AND, OR and NOT (or inversion)
operations as well as ways of expressing other logical operations such as the XOR (Exclusive-
OR) function.
While George Boole’s set of laws and rules allows us to analyise and simplify a digital circuit,
there are two laws within his set that are attributed to Augustus DeMorgan (a nineteenth century
English mathematician) which views the logical NAND and NOR operations as separate NOT
AND and NOT OR functions respectively.
But before we look at DeMorgan’s Theory in more detail, let’s remind ourselves of the basic
logical operations where A and B are logic (or Boolean) input binary variables, and whose values
can only be either “0” or “1” producing four possible input combinations, 00, 01, 10, and 11.
Input Output Conditions
Variable
55
ANNA UNIVERSITY
OR´ed. That is replace all the OR operators with AND operators, or all the AND operators with
an OR operators.
3.2.1 DeMorgan’s First Theorem
DeMorgan’s First theorem proves that when two (or more) input variables are AND’ed and
negated, they are equivalent to the OR of the complements of the individual variables. Thus the
equivalent of the NAND function will be a negative-OR function, proving that A̅ ̅ B
̅ ̅ = A̅ + B̅. We
can show this operation using the following table.
Verifying DeMorgan’s First Theorem using Truth Table
56
ANNA UNIVERSITY
The top logic gate arrangement of: A+B can be implemented using a standard NOR gate function
using inputs A and B. The lower logic gate arrangement first inverts the two inputs, thus
producing A̅ and B̅. Thus then become the inputs to the AND gate. Therefore the output from
the AND gate becomes: A̅ . B̅
Then we can see that a standard AND gate function with inverters (NOT gates) on each of its
inputs produces an equivalent output condition to a standard NOR gate function, and an
individual NOR gate can be represented in this way as the equivalency of a NOR gate is a
negative-AND.
3.2.3 DeMorgan’s Equivalent Gates
We have seen here that by using DeMorgan’s Theorems we can replace all of the AND (.)
operators with an OR (+) and vice versa, and then complements each of the terms or variables in
the expression by inverting it, that is 0’s to 1’s and 1’s to 0’s before inverting the entire function.
Thus to obtain the DeMorgan equivalent for an AND, NAND, OR or NOR gate, we simply add
inverters (NOT-gates) to all inputs and outputs and change an AND symbol to an OR symbol or
change an OR symbol to an AND symbol as shown in the following table.
57
ANNA UNIVERSITY
Then we have seen in this tutorial about DeMorgan’s Thereom that the complement of two (or
more) AND’ed input variables is equivalent to the OR of the complements of these variables,
and that the complement of two (or more) OR’ed variables is equivalent to the AND of the
complements of the variables as defined by DeMorgan.
3.3 K-MAPS
A Karnaugh map (K-map) method is a graphical way of minimizing a Boolean expression based
on the rule of complementation. It works well if to use for expressions with not more than 4
variables. Hardware components of a computer are built of logic circuits to perform various logic
functions. A logic function can be represented by a truth table, a circuit diagram, or a Boolean
expression. The truth table of a function is unique. A function, however, can be represented by
many equivalent Boolean expressions. A Boolean algebraic manipulation or a K-map may
simplify a Boolean expression. The algebraic manipulation method is often difficult because it
lacks specific rules for predicting each succeeding step in the manipulative process. The K- map
method provides a simple, straightforward procedure for minimizing Boolean expressions.
The idea behind a K-map (Karnaugh, 1953) is to draw an expression's truth table as a matrix in
such a way that each row and each column of the matrix puts minterms that differ in the value of
a single variable adjacent to each other. Then, by grouping adjacent cells of the matrix, you can
identify product terms that eliminate all complemented literals, resulting in a minimized version
58
ANNA UNIVERSITY
of the expression.
3.3.1 K-maps Format
cd
ab 00 01 11 10
00
01
11
10
F(a,b,c,d) =
This will save you the trouble of creating a table format on your own. You can simply copy and
paste the template into your own document and fill it out. It’s easy to change the terms from
a,b,c,d to w,x,y,z or whatever you need. Below are some grouping examples.
cd
ab 00 01 11 10
00 1 1
01
11 1 1 1 1
10 1 1
F(a,b,c,d) = ab + b’d’
Where the groups don’t overlap, it’s simplest to just use shading. Too bad it’s not always that
simple.
cd
ab 00 01 11 10
00 1 1
01
11 1 1 1 1
10 1 1 1 1
F(a,b,c,d) = ??
When the groups overlap, you’ll need to use a painting program. Paste the map into it and use
the program’s shape tool to draw a box around the groups. Paint is quick and dirty and gets the
job done, but it won’t let you paste the map directly; you’ll have to take a screen shot and paste
that.
59
ANNA UNIVERSITY
In an unused part of the image, you can draw a rectangle and break it up into corners, like the red
one above. Then you can paste those corners to capture groups going off the edges. Use
Transparent Selection to get just the rectangle parts and paste them without obscuring the map
your pasting them over:
F(a,b,c,d) = a + b’d’
As you can see, then all you have to do is select the part of the image you want to put in the
document, Ctrl-C to copy it, and Ctrl-V to paste it into your document.
3.3.2 Looping
Looping out logical adjacencies is a graphical alternative to algebraic calculations.
Unit distance code (Gray code.)
For two bits, the Gray code is:
00 01 11 10
Only one bit changes as you go from left to right. This code preserves logical adjacencies.
The K-map method is to loop out groups of 2n logically adjacent minterms. Each looped out
group corresponds to a product term in a minimal SOP expression.
1. Loop out single 1s (n=0) which have no logical adjacencies.
2. Loop out all pairs of 1s (n=1) which cannot be included in a larger group.
3. Loop out all quads of 1s (n=2) which cannot be included in a larger group.
4. Etc.
Example 3.14:
60
ANNA UNIVERSITY
Moving left to right or up to down in the K-map changes only one digit in the minterm code.
Note the wrap-around at the ends: because of logical adjacency, the top and bottom are joined,
and the left and right are joined.
A̅B̅C̅+A̅BC̅+A̅BC+A̅BC̅+ A̅BC̅+ ABC̅
A̅C(B+B̅)+A̅B(C+C̅)+BC̅(A+A̅)
A̅C+A̅B+BC̅
Therefore the minimal SOP representation is f = A̅C+A̅B+BC̅
Example 3.16: Minimize the following boolean function-
F(A, B, C, D) = Σm(0, 1, 2, 5, 7, 8, 9, 10, 13, 15)
Solution:
• Since the given boolean expression has 4 variables, so we draw a 4 x 4 K Map.
• We fill the cells of K Map in accordance with the given boolean function.
• Then, we form the groups in accordance with the above rules.
Then, we have-
Now,
F(A,B,C,D)= (A̅B+AB)(C̅D+CD)+(A̅B̅+A̅B+AB+AB̅)C̅D+(A̅B̅+AB̅)(C̅D̅+CD̅)
= BD+C̅D+B̅D̅
Thus, minimized boolean expression is-
F(A, B, C, D) = BD + C̅D + B̅D̅
61
ANNA UNIVERSITY
Now,
F(A, B, C, D)= (A̅B̅ + A̅B + AB + AB̅)(C̅D + CD) + (A̅B̅ + AB̅)(C̅D̅ + C̅D)
= D + B̅C̅
Thus, minimized boolean expression is-
F(A, B, C, D) = B̅C̅ + D
Example 3.18:Minimize the following boolean function-
F(A, B, C, D) = Σm(1, 3, 4, 6, 8, 9, 11, 13, 15) + Σd(0, 2, 14)
Solution-
• Since the given boolean expression has 4 variables, so we draw a 4 x 4 K Map.
• We fill the cells of K Map in accordance with the given boolean function.
• Then, we form the groups in accordance with the above rules.
Then, we have
Now,
F(A, B, C, D)= (AB + AB̅)(C̅D + CD) + (A̅B̅ + AB̅)(C̅D + CD) + (A̅B̅ + AB̅)(C̅D̅ + C̅D)
+(A̅B̅ + A̅B)(C̅D̅ + CD̅)
62
ANNA UNIVERSITY
A product-of-sum is a Boolean expression containing OR terms called sum terms. Each term
may contain any number of literals.
3.5 NAND AND NOR IMPLEMENTATION
Any logic function can be implemented using NAND gates. To achieve this, first the logic
function has to be written in Sum of Product (SOP) form. Once logic function is converted to
SOP, then is very easy to implement using NAND gate. In other words any logic circuit with
AND gates in first level and OR gates in second level can be converted into a NAND-NAND
gate circuit.
Consider the following SOP expression
F = W.X.Y + X.Y.Z + Y.Z.W
The above expression can be implemented with three AND gates in first stage and one OR gate
in second stage as shown in figure.
If bubbles are introduced at AND gates output and OR gates inputs (the same for NOR gates),
the above circuit becomes as shown in figure.
63
ANNA UNIVERSITY
Now replace OR gate with input bubble with the NAND gate. Now we have circuit which is
fully implemented with just NAND gates.
64
ANNA UNIVERSITY
Any logic function can be implemented using NOR gates. To achieve this, first the logic function
has to be written in Product of Sum (POS) form. Once it is converted to POS, then it's very easy
to implement using NOR gate. In other words any logic circuit with OR gates in first level and
AND gates in second level can be converted into a NOR-NOR gate circuit.
Consider the following POS expression
F = (X+Y) . (Y+Z)
The above expression can be implemented with three OR gates in first stage and one AND gate
in second stage as shown in figure.
If bubble are introduced at the output of the OR gates and the inputs of AND gate, the above
circuit becomes as shown in figure.
Now replace AND gate with input bubble with the NOR gate. Now we have circuit which is
fully implemented with just NOR gates.
65
ANNA UNIVERSITY
Learning Activity
66
ANNA UNIVERSITY
individual members of the universal set, but all possible subsets of the universal set.
Properties of sets
A set is a collection of objects, called members or elements. The members of a set can be
physical objects, such as people, stars, or roses, or they can be abstractions such as numbers
or even other sets. A set is referred to as the universal set (usually called I) if it contains all
the elements under consideration. A set S not equal to I is called a proper subset of I if every
element of S is contained in I. This is written and read “S is contained in I.”
If S equals I, then S is called an improper subset of I, that is, I is an improper subset of itself
(note that two sets are equal if and only if they both contain exactly the same elements). The
special symbol is given to the set with no elements, called the empty set or null set. The null
set is a subset of every set.
When dealing with sets there are three important operations. Two of these operations are
binary (that is, they involve combining sets two at a time), and the third involves only one set
at a time. The two binary operations are union and intersection. The third operation is
complementation. The union of two sets S and T is the collection of those members that
belong to either S or T or both.
The intersection of the sets S and T is the collection of those members that belong to both S
and T.
The complement of a subset, S, is that part of I not contained in S, and is written S’.
Properties of Boolean algebra
The properties of Boolean algebra can be summarized in four basic rules.
(1) Both binary operations have the property of commutativity, that is, order doesn’t matter.
S∩T= T∩S, and S∪T = T∪S.
(2) Each binary operation has an identity element associated with it. The universal set is the
identity element for the operation of intersection, and the null set is the identity element for
the operation of union.
S∩I = S, and S ∪Ø = S.
(3) each operation is distributive over the other.
S ∪(T∩ V) = (S ∪ T) ∩ (S ∪ V), and S ∩ (T∪V) = (S ∩ T) U (S ∩ V).
This differs from the algebra of real numbers, for which multiplication is distributive over
addition, a(b + c) = ab + ac, but addition is not distributive over multiplication, a + (bc) not
equal (a + b)(a + c).
(4) each element has associated with it a second element, such that the union or intersection of
the two results in the identity element of the other operation.
A ∪′ = I, and A ∩ ′ = Ø.
This also differs from the algebra of real numbers. Each real number has two others
associated with it, such that its sum with one of them is the identity element for addition, and
its product with the other is the identity element for multiplication. That is, a + (-a) = 0, and
a(1/a) = 1.
67
ANNA UNIVERSITY
Questions:
1. Discuss the history of Boolean Algebra.
1. De Morgan's laws are a pair of transformation rules that are both valid
rules of inference.
2. A logic gate is an idealized model of computation or physical electronic
device implementing a Boolean function, a logical operation performed on
one or more binary inputs that produces a single binary output.
SUMMARY
•Boolean algebra is a branch of mathematics that deals with operations on logical values
with binary variables.
• A Boolean expression is an expression used in programming languages that produces a
Boolean value when evaluated. A Boolean value is either true or false.
• A mathematician named DeMorgan developed a pair of important rules regarding group
complementation in Boolean algebra.
• K-map is table like representation but it gives more information than truth table. We fill
grid of K-map with 0’s and 1’s then solve it by making groups.
• We will get four Boolean product terms by combining two variables x and y with logical
AND operation. These Boolean product terms are called as min terms or standard product
terms. The min terms are x’y’, x’y, xy’ and xy.
• Any logic circuit with AND gates in first level and OR gates in second level can be
converted into a NAND-NAND gate circuit.
KEYWORDS
Boolean Algebra- Boolean algebra is a division of mathematics that deals with operations on
logical values and incorporates binary variables.
Identity-A statement true for all possible values of its variable or variables.
Truth Table- A truth table is a mathematical table used in logic specifically in connection with
Boolean algebra, boolean functions, and propositional calculus.
SOP - A way of representing boolean expressions as sum of product terms.
POS- A way of representing boolean expressions as product of sum terms.
Minterm- A minterm is a Boolean expression resulting in 1 for the output of a single cell,
and 0s for all other cells in a Karnaugh map, or truth table. If a minterm has a single 1 and the
remaining cells as 0s, it would appear to cover a minimum area of 1s.
Maxterm- A maxterm is a Boolean expression resulting in a 0 for the output of a single cell
expression, and 1s for all other cells in the Karnaugh map, or truth table. The illustration above
left shows the maxterm (A+B+C), a single sum term, as a single 0 in a map that is otherwise 1s.
K-Map- The K-map is a systematic way of simplifying Boolean expressions. With the help of
the K-map method, we can find the simplest POS and SOP expression, which is known as the
minimum expression. The K-map provides a cookbook for simplification.
68
ANNA UNIVERSITY
FURTHER READING
1. Mano, M. Morris, and M. Morris Mano. 2007. “Digital design”. Englewood Cliffs, N.J.:
Prentice-Hall.
2. Mohammad A. Karim, Xinghao Chen. 2007. “Digital Design Basic Concepts and Principles”.
3. Harris David. 2012. “Digital Design and Computer Architecture”, Elsevier.
4. Mano Morris. 2012. “Logic and Computer Design Fundamentals”,Cram101.
69
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
4.1 Concept of Combinational Circuits
4.2 Adder
4.2.1 Half Adder
4.2.2 Full Adder
4.2.3 n-Bit Array Adder
4.3 Subtractor
4.3.1 Half Subtractor
4.3.2 Full Subtractor
4.3.3 n-Bit Array Subtractor
4.4 ALU Design
4.4.1 Designing a Bit-Slice ALU
4.4.2 Full 8-bit ALU Circuit
4.5 Decoder
4.5.1 ENABLE Inputs
4.6 Encoder
4.6.1 Binary Encoder
4.6.2 Priority Encoder
4.7 Multiplexer
4.7.1 2-to-1 MUX
4.7.2 4-to-1 MUX
4.7.3 4-to-2 MUX
4.8 Demultiplexer
4.8.1 1-to-4 DEMUX
Summary
Keywords
Self-Assessment Questions
Further Reading
70
ANNA UNIVERSITY
LEARNING OBJECTIVE
After studying this chapter, you should be able to:
• Illustrate the concept of combinational circuit
• Gain knowledge about the Adder and Subtractor circuits
• Design and implementation of various types of decoder and encoder circuits
• Explain ALU Design
• Understand the operation of multiplexers
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Gain knowledge about the combinational circuits
• Analyze the block diagram of the adder and subtractor
• The operation of decoder and encoder logic and use them in different circuit application
• The concept of encoder and multiplexer
• How priority encoder works.
OVERVIEW
In the previous chapter, we have examined the idea of Boolean variable based math. We have
clarified different proposes, laws, and hypotheses of Boolean variable based math. We have
likewise found out about the idea of K-map which is utilized in limiting Boolean capacities.
Standard structures and NAND-NOR execution were additionally examined.
In this chapter, you will find out about combinational (combinatorial) circuits which
acknowledge Boolean capacities and manage digitized signals, generally signified by 0s and 1s.
The conduct of a combinational circuit is memory less. This implies that given an improvement
to the contribution of a combinational circuit, a reaction arises at the yield after some
proliferation delay, yet the reaction isn't put away or taken care of back. At the end of the day,
the yield relies solely upon its latest information and is autonomous of the circuit's set of
experiences. You will examine different combinational circuits like adder, decoder, encoder, and
so forth, and consider the idea of ALU design.
In this chapter, you will utilize the information acquired from the past chapter and study different
combinational circuits, which are significant for the comprehension of digital computers and
systems.
4.1 CONCEPT OF COMBINATIONAL CIRCUITS
Combinational Logic Circuits are memoryless digital rationale circuits whose yield at any
moment in time relies just upon the blend of its data sources. A combinational circuit comprises
of rationale doors, whose yields not set in stone straightforwardly from the current mix of
contributions concerning past inputs.
In combinational circuit, we combine the different gates in the circuit such as encoder, decoder,
multiplexer, etc. Combination Logic Circuits are made of basic gates (AND, OR, NOT) or universal
71
ANNA UNIVERSITY
gates (NAND, NOR) gates, which are connected or combined together to make more complicated
switching circuits. These logic gates are the elementary units of combinational logic circuits.
Dissimilar to Sequential Logic Circuits whose yields are reliant upon both their current data
sources and their past yield state giving them some type of Memory. The yields of
Combinational Logic Circuits are just dictated by the consistent capacity of their present
information state, rationale "0" or rationale "1", at some random moment on schedule. The
outcome is that combinational rationale circuits have no feedback, and any progressions to the
signs being applied to their data sources will quickly have an impact at the yield. At the end of
the day, in a Combinational Logic Circuit, the yield is dependant consistently on the blend of its
information sources. Consequently a combinational circuit is memoryless. So on the off chance
that one of its information sources condition changes state, from 0-1 or 1-0, so too will the
subsequent yield as of course combinational rationale circuits have "no memory", "timing" or
"feedback loops" inside their plan.
Example 4.1: An illustration of a combinational circuit is a decoder, which changes over
the double code information present at its contribution to various diverse yield lines, each in
turn delivering an identical decimal code at its yield. In these circuits the yields at any
moment of time relies upon the data sources present right then and there as it were.
Some of the characteristics of combinational circuits are following.
• They can have m number of yields and n number of information sources.
• The yield of combinational circuits whenever relies just upon the levels present at
input terminals.
• These circuits don't utilize any memory. The previous condition of information
doesn't have any impact on the current situation with the circuit.
• For the plan of combinational digital circuits, all inclusive entryways or essential
doors are utilized.
Example 4.2: Various examples for combinational digital circuits are Half adder, Full adder,
Half subtractor, Full subtractor, Code converter, Decoder,Multiplexer, Encoder, ROM, etc.
You can determine combinational circuits through Boolean rationale articulations, underlying
portrayals, or truth tables. Combinational circuits utilized with fixed rationale slanted to be more
costly, with regards to hardware cost and plan exertion. Notwithstanding, they are frequently
denser, quicker and utilize less force. This is the reason they are proper for high-volume
72
ANNA UNIVERSITY
production or high speed circuits. Executions that utilization programmable logic circuits or
memory gadgets are prudent for quick prototyping and low-volume production. Be that as it
may, they may not bring about the force utilization, thickness or best execution.
The way toward choosing the most un-expensive execution among the practical and comparable
executions with the designated innovation is called improvement. For little combinational
circuits, it very well may be reasonable to do manual improvement dependent on revising or
controlling rationale articulations in one of a few identical structures. In most pragmatic cases, in
any case, programmed hardware combination apparatuses are utilized that have inherent
improvement abilities. These modified improvements are performed utilizing a blend of heuristic
and algorithmic changes. Confirmation implies the way toward determining the executed circuit
acts as initially visualized or indicated.
4.2 ADDER
An adder is a digital logic circuit in electronics that is extensively used for the addition of
numbers. In many computers and other types of processors, adders are even used to calculate
addresses and related activities and calculate table indices in the ALU and even utilized in other
parts of the processors. These can be built for many numerical representations like excess-3 or
binary coded decimal. Adders are basically classified into two types: Half Adder and Full Adder.
4.2.1 Half Adder
The half adder circuit has two inputs: A and B, which add two input digits and generates a carry
and a sum. A half adder is a logical circuit that performs an addition operation on two binary
digits. The half adder produces a sum and a carry value which are both binary digits.
Half Adder Truth Table with Carry-Out
Symbol Truth Table
B A Sum Carry
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
Figure 4.2: Symbol and Truth Table
From the truth table of the half adder we can see that the SUM (S) output is the result of
the Exclusive-OR gate and the Carry-out (Cout) is the result of the AND gate. Then the Boolean
expression for a half adder is as follows.
73
ANNA UNIVERSITY
74
ANNA UNIVERSITY
a Carry-in is a possible carry from a less significant digit, while a Carry-out represents a carry to
a more significant digit.
In many ways, the full adder can be thought of as two half adders connected together, with the
first half adder passing its carry to the second half adder as shown.
As the full adder circuit above is basically two half adders connected together, the truth table for
the full adder includes an additional column to take into account the Carry-in, CIN input as well
as the summed output, S and the Carry-out, COUT bit.
Symbol Truth Table
C- B A Sum C-
in out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Figure 4.5 :Full Adder Symbol and Truth Table with Carry
75
ANNA UNIVERSITY
A “ripple carry adder” is simply “n“, 1-bit full adders cascaded together with each full adder
representing a single weighted column in a long binary addition. It is called a ripple carry adder
because the carry signals produce a “ripple” effect through the binary adder from right to left,
(LSB to MSB).
For example, suppose we want to “add” together two 4-bit numbers, the two outputs of the first
full adder will provide the first place digit sum (S) of the addition plus a carry-out bit that acts as
the carry-in digit of the next binary adder.
The second binary adder in the chain also produces a summed output (the 2nd bit) plus another
carry-out bit and we can keep adding more full adders to the combination to add larger numbers,
linking the carry bit output from the first full binary adder to the next full adder, and so forth.
Example 4.3: An example of a 4-bit adder is given below.
A 4-bit Ripple Carry Adder
One main disadvantage of “cascading” together 1-bit binary adders to add large binary numbers
is that if inputs A and B change, the sum at its output will not be valid until any carry-input has
“rippled” through every full adder in the chain because the MSB (most significant bit) of the sum
has to wait for any changes from the carry input of the LSB (less significant bit). Consequently,
there will be a finite delay before the output of the adder responds to any change in its inputs
resulting in a accumulated delay.
When the size of the bits being added is not too large for example, 4 or 8 bits, or the summing
speed of the adder is not important, this delay may not be important. However, when the size of
the bits is larger for example 32 or 64 bits used in multi-bit adders, or summation is required at a
very high clock speed, this delay may become prohibitively large with the addition processes not
being completed correctly within one clock cycle.
This unwanted delay time is called Propagation delay. Also another problem called “overflow”
occurs when an n-bit adder adds two parallel numbers together whose sum is greater than or
equal to 2n.
One solution is to generate the carry-input signals directly from the A and B inputs rather than
76
ANNA UNIVERSITY
using the ripple arrangement above. This then produces another type of binary adder circuit
called a Carry Look Ahead Binary Adder where the speed of the parallel adder can be greatly
improved using carry-look ahead logic.
The advantage of carry look ahead adders is that the length of time a carry look ahead adder
needs in order to produce the correct SUM is independent of the number of data bits used in the
operation, unlike the cycle time a parallel ripple adder needs to complete the SUM which is a
function of the total number of bits in the addend.
Example 4.4: 4-bit full adder circuits with carry look ahead features are available as standard IC
packages in the form of the TTL 4-bit binary adder 74LS83 or the 74LS283 and the CMOS 4008
which can add together two 4-bit binary numbers and generate a SUM and a CARRY output as
shown.
4.3 SUBTRACTOR
The Subtractor is another type of combinational arithmetic circuit that produces an output which
is the subtraction of two binary numbers.
As their name implies, a Binary Subtractor is a decision making circuit that subtracts two binary
numbers from each other, for example, X – Y to find the resulting difference between the two
numbers.
Unlike the Binary Adder which produces a SUM and a CARRY bit when two binary numbers
are added together, the binary subtractor produces a DIFFERENCE, D by using a BORROW
bit, B from the previous column. Then obviously, the operation of subtraction is the opposite to
that of addition.
We learnt from our maths lessons at school that the minus sign, “–” is used for a subtraction
calculation, and when one number is subtracted from another, a borrow is required if the
subtrahend is greater than the minuend. Consider the simple subtraction of the two denary (base
10) numbers below.
We can not directly subtract 8 from 3 in the first column as 8 is greater than 3, so we have to
77
ANNA UNIVERSITY
borrow a 10, the base number, from the next column and add it to the minuend to produce 13
minus 8. This “borrowed” 10 is then return back to the subtrahend of the next column once the
difference is found. Simple school math’s, borrow a 10 if needed, find the difference and return
the borrow.
The subtraction of one binary number from another is exactly the same idea as that for
subtracting two decimal numbers but as the binary number system is a Base-2 numbering system
which uses “0” and “1” as its two independent digits, large binary numbers which are to be
subtracted from each other are therefore represented in terms of “0’s” and “1’s”.
4.3.1 Half Subtraction
Half Subtraction can take many forms but the rules for subtraction are the same whichever
process you use. As binary notation only has two digits, subtracting a “0” from a “0” or a “1”
leaves the result unchanged as 0-0 = 0 and 1-0 = 1. Subtracting a “1” from a “1” results in a “0”,
but subtracting a “1” from a “0” requires a borrow. In other words 0 – 1 requires a borrow.
For the simple 1-bit subtraction problem above, if the borrow bit is ignored the result of their
binary subtraction resembles that of an Exclusive-OR Gate. To prevent any confusion in this
tutorial between a binary subtractor input labelled, B and the resulting borrow bit output from the
binary subtractor also being labelled, B, we will label the two input bits as X for the minuend
and Y for the subtrahend. Then the resulting truth table is the difference between the two input
bits of a single binary subtractor is given as:
2-input Exclusive-OR Gate
Symbol Truth Table
Y X Q
0 0 0
0 1 1
1 0 1
2-input Ex-OR Gate 1 1 0
Figure 4.9:Exclusive-OR Gate
As with the Binary Adder, the difference between the two digits is only a “1” when these two
inputs are not equal as given by the Ex-OR expression. However, we need an additional output to
produce the borrow bit when input X = 0 and Y = 1. Unfortunately there are no standard logic
gates that will produce an output for this particular combination of X and Y inputs.
But we know that an AND Gate produces an output “1” when both of its inputs X and Y are “1”
(HIGH) so if we use an inverter or NOT Gate to complement the input X before it is fed to
78
ANNA UNIVERSITY
the AND gate, we can produce the required borrow output when X = 0 and Y = 1 as shown
below.
Then by combining the Exclusive-OR gate with the NOT-AND combination results in a simple
digital binary subtractor circuit known commonly as the Half Subtractor as shown.
A Half Subtractor Circuit
A half subtractor is a logical circuit that performs a subtraction operation on two binary digits.
The half subtractor produces a sum and a borrow bit for the next stage.
From the truth table of the half subtractor we can see that the DIFFERENCE (D) output is the
result of the Exclusive-OR gate and the Borrow-out (Bout) is the result of the NOT-
AND combination. Then the Boolean expression for a half subtractor is as follows.
For the DIFFERENCE bit:
D = X XOR Y = X ⊕ Y
For the BORROW bit
B = X̅ AND Y = X̅.Y
If we compare the Boolean expressions of the half subtractor with a half adder, we can see that
the two expressions for the SUM (adder) and DIFFERENCE (subtractor) are exactly the same
and so they should be because of the Exclusive-OR gate function. The two Boolean expressions
for the binary subtractor BORROW is also very similar to that for the adders CARRY. Then all
that is needed to convert a half adder to a half subtractor is the inversion of the minuend input X.
One major disadvantage of the Half Subtractor circuit when used as a binary subtractor, is that
there is no provision for a “Borrow-in” from the previous circuit when subtracting multiple data
79
ANNA UNIVERSITY
bits from each other. Then we need to produce what is called a “full binary subtractor” circuit to
take into account this borrow-in input from a previous circuit.
4.3.2 Full Subtractor
The main difference between the Full Subtractor and the previous Half Subtractor circuit is that a
full subtractor has three inputs. The two single bit data inputs X (minuend) and Y (subtrahend)
the same as before plus an additional Borrow-in (B-in) input to receive the borrow generated by
the subtraction process from a previous stage as shown below.
Then the combinational circuit of a “full subtractor” performs the operation of subtraction on
three binary bits producing outputs for the difference D and borrow B-out. Just like the binary
adder circuit, the full subtractor can also be thought of as two half subtractors connected
together, with the first half subtractor passing its borrow to the second half subtractor as follows.
Full Subtractor Logic Diagram
As the full subtractor circuit above represents two half subtractors cascaded together, the truth
table for the full subtractor will have eight different input combinations as there are three input
variables, the data bits and the Borrow-in, BIN input. Also includes the difference output, D and
the Borrow-out, BOUT bit.
Symbol Truth Table
B-in Y X Diff. B-out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 1
0 1 1 0 0
1 0 0 1 1
1 0 1 0 0
1 1 0 0 1
1 1 1 1 1
Figure 4.15:Full Subtractor Truth Table
80
ANNA UNIVERSITY
81
ANNA UNIVERSITY
However, not all these operations are required to be applied in the ALU. We would like to
implement the ALU with a set of “minimal” operations – those that we cannot do without,
which are extremely common and can be performed very fast, say, in a small number of gate
delays. All other operations will be performed by combining these basic operations.
Therefore, some operations, like multiplication and division, are too complicated to be done
here, and are usually relegated to a separate part of the processor. Others like X-OR, are not
common enough to justify implementing in the most important combinational circuit in the
computer. What we would like is a minimal set that can be combined in several ways to
implement all the operations we require.
Now, if you want to build an ALU, which is as simple as possible, so you will choose only
the following four operations on two binary numbers A and B:
• (A or B) (A and B)
• (A + (not B)) – This is the sum of A and the bit-wise inverse of B(A + B)
• This is just the sum of A and B
As a result, all the other operations you would like to perform on two numbers can be
implemented by combinations of these four fundamental operations. You will return to this
point after you design the circuit to do these four operations.
4.4.1 Designing a Bit-Slice ALU
Normally, binary information in modern computer systems is denoted in 16, 32, or even
64-bit quantities. However, to make the presentation simpler, we will present most of the
concepts using 8-bit quantities. To build an 8-bit ALU, you are required to design a
combinational circuit that can take two 8-bit numbers, A and B, and return another 8-bit
number C, which is the result of one of the four basic operations.
In addition, it needs two control wires to choose which of the four operations are to be
performed. In the case of addition, you will input the Cin value and output the Cout value
from the Ripple-Carry Adder.
82
ANNA UNIVERSITY
You will again use the divide and conquer method for developing the design. First,
consider the problem of designing, for example, ALU for 1-bit quantities, and then combine
these solutions into an 8-bit ALU (or 32-bit or 64-bit). The ALU design is called a “bit-
slice,” because you can assume taking an 8-bit ALU and slicing it into eight equal parts,
one for each bit position.
The 1-bit ALU design, built from circuits is presented in Figure 4.16.
From the given below F0 and F1 lines, choose the suitable function of the inputs:
F0 F1 | Out
0 0 | (A and B)
0 1 | (A or B)
1 0 | (A + B)
1 1 | (A + (not B))
This is a standard use of a multiplexer circuit. A number of input lines with various different
types of information come into the multiplexer, and the address lines choose which input to
route to the output.
4.4.2 Full 8-bit ALU Circuit
Like with this Ripple-Cary Adder circuit, to make a wider ALU, we simply join the suitable
carry lines to propagate the carry between the stages of the adder circuits. The first stage
will still have a Carry-In, and the last stage will have a Carry-Out.
You can see the eight-bit ALU in figure 4.17 and standard ALU icon in figure 4.18.
83
ANNA UNIVERSITY
4.5 DECODER
A decoder is a logic circuit that accepts a set of inputs that represents a binary number and
activates only the output that corresponds to that input number. The diagram for a general
decoder is shown in Figure. 4.19 with N inputs and M outputs. Since each of the N inputs can be
0 or 1, there are 2N possible input combinations or codes. For each of these input combinations
only one of the M outputs will be active (HIGH); all the other outputs are LOW. Many decoders
are designed to produce active-LOW outputs, where only the selected outputs is LOW while all
others are HIGH. This would be indicated by the presence of small circles on the output lines in
the decoder diagram.
Some decoders do not utilize all of the 2N possible input codes but only certain ones. For
example, a BCD-to-decimal decoder has a 4-bit inputcode and ten output lines that correspond to
the ten BCD code groups 0000 through 1001. If any of the unused codes are applied to the input
none of the outputs will be activated.
Figure. 4.20 shows the circuitry for a decoder with three inputs and 23=8 outputs. It uses all
AND gates, and so the outputs are active-HIGH. Note that for a given input code, the only output
which is active (HIGH) is the one corresponding to the decimal equivalent of the binary input
code (e.g.; output O6 goes HIGH only when CBA=1102=610).
84
ANNA UNIVERSITY
This decoder can be referred to in several ways. It can be called a 3-line to 8-line decoder,
because it has three input lines and eight output lines. It could also be called a binary-to-octal
decoder or converter because it takes a 3-bit binary input code and activates the one of the eight
(octal) outputs corresponding to that code. It is also referred to as a 1-of-8 decoder, because only
1 of the 8 outputs is activated at one time.
4.5.1 ENABLE Inputs
Some decoders have one or more ENABLE inputs that are used to control the operation of the
decoder. For example, refer to the decoder in Figure 4.22 and visualize having a common
ENABLE line connected to a fourth input of each gate.
85
ANNA UNIVERSITY
that its outputs are active-LOW. Another indication is the labeling of the outputs as
and so on; overbar indicates active-LOW outputs.
The input code is applied at A2, A1, and A0, where A2 is the MSB. With three inputs and eight
outputs, this is a 3-to-8 decoder or, equivalently, a 1-of-8 decoder.
Inputs , and E3 are separate enable inputs that are combined in the AND gate. In order to
enable the output NAND gates to respond to the input code at A2,A1,A0, this AND gate output
has to be HIGH. This will occur only when = = 0 and E3=1. E1 and E2 are active LOW, E3
is active HIGH. If one or more of the enable inputs is in its inactive state, the AND output will be
LOW, which will force all NAND outputs to their inactive HIGH state regardless of the input
code. This operation is summarized in the truth table Figure 4.24. Recall that x represents the
“don’t care” condition.
4.6 ENCODER
Encoders take all of their data inputs one at a time and converts them into an equivalent binary
code at its output.
Unlike a multiplexer that selects one individual data input line and then sends that data to a
single output line or switch, Digital Encoder more commonly called a Binary Encoder takes ALL
its data inputs one at a time and then converts them into a single encoded output. So we can say
that a binary encoder, is a multi-input combinational logic circuit that converts the logic level “1”
data at its inputs into an equivalent binary code at its output.
Generally, digital encoders produce outputs of 2-bit, 3-bit or 4-bit codes depending upon the
number of data input lines. An “n-bit” binary encoder has 2n input lines and n-bit output lines
with common types that include 4-to-2, 8-to-3 and 16-to-4 line configurations.
86
ANNA UNIVERSITY
The output lines of a digital encoder generate the binary equivalent of the input line whose value
is equal to “1” and are available to encode either a decimal or hexadecimal input pattern to
typically a binary or “B.C.D” (binary coded decimal) output code.
4.6.1 4-to-2 Bit Binary Encoder
One of the main disadvantages of standard digital encoders is that they can generate the wrong
output code when there is more than one input present at logic level “1”. For example, if we
make inputs D1 and D2 HIGH at logic “1” both at the same time, the resulting output is neither at
“01” or at “10” but will be at “11” which is an output binary number that is different to the actual
input present. Also, an output code of all logic “0”s can be generated when all of its inputs are at
“0” OR when input D0 is equal to one.
One simple way to overcome this problem is to “Prioritise” the level of each input pin. So if
there is more than one input at logic level “1” at the same time, the actual output code would
only correspond to the input with the highest designated priority. Then this type of digital
encoder is known commonly as a Priority Encoder or P-encoder for short.
4.6.2 Priority Encoder
The Priority Encoder solves the problems mentioned above by allocating a priority level to each
input. The priority encoders output corresponds to the currently active input which has the
highest priority. So when an input with a higher priority is present, all other inputs with a lower
priority will be ignored.
The priority encoder comes in many different forms with an example of an 8-input priority
encoder along with its truth table shown below.
87
ANNA UNIVERSITY
Priority encoders are available in standard IC form and the TTL 74LS148 is an 8-to-3 bit priority
encoder which has eight active LOW (logic “0”) inputs and provides a 3-bit code of the highest
ranked input at its output.
Priority encoders output the highest order input first for example, if input lines “D2“, “D3” and
“D5” are applied simultaneously the output code would be for input “D5” (“101”) as this has the
highest order out of the 3 inputs. Once input “D5” had been removed the next highest output code
would be for input “D3” (“011”), and so on.
The truth table for a 8-to-3 bit priority encoder is given as:
Digital Inputs Binary Output
D7 D6 D5 D4 D3 D2 D1 D0 Q2 Q1 Q0
0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 X 0 0 1
0 0 0 0 0 1 X X 0 1 0
0 0 0 0 1 X X X 0 1 1
0 0 0 1 X X X X 1 0 0
0 0 1 X X X X X 1 0 1
0 1 X X X X X X 1 1 0
1 X X X X X X X 1 1 1
Figure 4.27: Truth Table
Where X equals “dont care”, that is it can be at a logic “0” level or at a logic “1” level.
From this truth table, the Boolean expression for the encoder above with data inputs D0 to D7 and
outputs Q0, Q1, Q2 is given as:
Output Q0
Output Q1
Output Q2
88
ANNA UNIVERSITY
Then the final Boolean expression for the priority encoder including the zero inputs is defined as:
Priority Encoder Output Expression
In practice these zero inputs would be ignored allowing the implementation of the final Boolean
expression for the outputs of the 8-to-3 priority encoder. We can constructed a simple encoder
from the expression above using individual OR gates as follows.
Digital Encoder using Logic Gates
89
ANNA UNIVERSITY
The rotary switch, also called a wafer switch as each layer of the switch is known as a wafer, is a
mechanical device whose input is selected by rotating a shaft. In other words, the rotary switch is
a manual switch that you can use to select individual data or signal lines simply by turning its
inputs “ON” or “OFF”. So how can we select each data input automatically using a digital
device.
In digital electronics, multiplexers are also known as data selectors because they can “select”
each input line, are constructed from individual Analogue Switches encased in a single IC
package as opposed to the “mechanical” type selectors such as normal conventional switches and
relays.
They are used as one method of reducing the number of logic gates required in a circuit design or
when a single data line or data bus is required to carry two or more different digital signals. For
example, a single 8-channel multiplexer.
Generally, the selection of each input line in a multiplexer is controlled by an additional set of
inputs called control lines and according to the binary condition of these control inputs, either
“HIGH” or “LOW” the appropriate data input is connected directly to the output. Normally, a
multiplexer has an even number of 2n data input lines and a number of “control” inputs that
correspond with the number of data inputs.
Note that multiplexers are different in operation to Encoders. Encoders are able to switch an n-bit
input pattern to multiple output lines that represent the binary coded (BCD) output equivalent of
the active input.
We can build a simple 2-line to 1-line (2-to-1) multiplexer from basic logic NAND gates as
shown.
4.7.1 2-to-1 Channel Multiplexer
The input A of this simple 2-1 line multiplexer circuit constructed from standard NAND gates
acts to control which input ( I0 or I1 ) gets passed to the output at Q.
From the truth table above, we can see that when the data select input, A is LOW at logic 0,
90
ANNA UNIVERSITY
input I1 passes its data through the NAND gate multiplexer circuit to the output, while input I0 is
blocked. When the data select A is HIGH at logic 1, the reverse happens and now input I0 passes
data to the output Q while input I1 is blocked.
So by the application of either a logic “0” or a logic “1” at A we can select the appropriate input,
I0 or I1 with the circuit acting a bit like a single pole double throw (SPDT) switch.
As we only have one control line, (A) then we can only switch 21 inputs and in this simple
example, the 2-input multiplexer connects one of two 1-bit sources to a common output,
producing a 2-to-1-line multiplexer. We can confirm this in the following Boolean expression.
and for our 2-input multiplexer circuit above, this can be simplified too:
We can increase the number of data inputs to be selected further simply by following the same
procedure and larger multiplexer circuits can be implemented using smaller 2-to-1 multiplexers
as their basic building blocks. So for a 4-input multiplexer we would therefore require two data
select lines as 4-inputs represents 22 data control lines give a circuit with four inputs,
I0, I1, I2, I3 and two data select lines A and B as shown.
4.7.2 4-to-1 Channel Multiplexer
The Boolean expression for this 4-to-1 Multiplexer above with inputs A to D and data select
lines a, b is given as:
In this example at any one instant in time only ONE of the four analogue switches is closed,
connecting only one of the input lines A to D to the single output at Q. As to which switch is
closed depends upon the addressing input code on lines “a” and “b“.
So for this example to select input B to the output at Q, the binary input address would need to
be “a” = logic “1” and “b” = logic “0”. Thus we can show the selection of the data through the
multiplexer as a function of the data select bits as shown.
91
ANNA UNIVERSITY
Multiplexers are not limited to just switching a number of different input lines or channels to one
common single output. There are also types that can switch their inputs to multiple outputs and
have arrangements or 4-to-2, 8-to-3 or even 16-to-4 etc configurations and an example of a
simple Dual channel 4 input multiplexer (4-to-2) is given below:
4.7.3 4-to-2 Channel Multiplexer
Here in this example the 4 input channels are switched to 2 individual output lines but larger
92
ANNA UNIVERSITY
arrangements are also possible. This simple 4-to-2 configuration could be used for example, to
switch audio signals for stereo pre-amplifiers or mixers.
4.8 DEMULTIPLEXER
The demultiplexer is a combinational logic circuit designed to switch one common input line to
one of several seperate output line.
The data distributor, known more commonly as a Demultiplexer or “Demux” for short, is the
exact opposite of the Multiplexer we saw in the previous tutorial.
The demultiplexer takes one single input data line and then switches it to any one of a number of
individual output lines one at a time. The demultiplexer converts a serial data signal at the input
to a parallel data at its output lines as shown below.
4.8.1 1-to-4 Channel De-multiplexer
The function of the Demultiplexer is to switch one common data input line to any one of the 4
output data lines A to D in our example above. As with the multiplexer the individual solid state
switches are selected by the binary input address code on the output select pins “a” and “b” as
shown.
Demultiplexer Output Line Selection
93
ANNA UNIVERSITY
As with the previous multiplexer circuit, adding more address line inputs it is possible to switch
more outputs giving a 1-to-2n data line outputs.
Some standard demultiplexer IC´s also have an additional “enable output” pin which disables or
prevents the input from being passed to the selected output. Also some have latches built into
their outputs to maintain the output logic level after the address inputs have been changed.
However, in standard decoder type circuits the address input will determine which single data
output will have the same value as the data input with all other data outputs having the value of
logic “0”.
The implementation of the Boolean expression above using individual logic gates would require
the use of six individual gates consisting of AND and NOT gates as shown.
4 Channel Demultiplexer using Logic Gates
Again, as with the previous multiplexer example, we can also use the demultiplexer to digitally
control the gain of an operational amplifier as shown.
Learning Activity
94
ANNA UNIVERSITY
95
ANNA UNIVERSITY
WDM
Time Division Multiplexing (TDM)
The TDM is one kind of method for transmitting a signal over a channel of particular
communication by separating the time edge into slots. Like single slot is used for each
message signal.
1. Half adder includes two logic gates like AND gate and EX-OR gate. Full
adder includes two EX-OR gates, two OR gates, and two AND gates. The
input bits in the half adder are two like A, B.
2. A combinational circuit comprises of logic gates whose outputs at any
time are determined directly from the present combination of inputs without
any regard to previous inputs.
96
ANNA UNIVERSITY
SUMMARY
• Combinational circuit will be circuit in which we consolidate the various gates in the
circuit.
• A half adder is a logical circuit that plays out an expansion procedure on two paired
digits.
• Subtractor accepts two double numbers as info and deducts one parallel number
contribution from the other twofold number information.
• Encoder is a gadget used to change a sign, (for example, a bitstream) or information into
a code.
• Dynamic high yield type decoders are developed with AND gates and dynamic low yield
kind of decoders are built with NAND gates.
• Dynamic low yield sorts of decoders will give the yield low for given info blend and any
remaining yields are high.
• A multiplexer or MUX is a gadget that chooses one of a few simple or advanced info
signals and advances the chose contribution to a solitary line.
KEYWORDS
Adder: It is a digital circuit that performs addition of numbers.
Half Adder: It is a logical circuit that performs an addition operation on two binary digits.
Full Adder: It adds three bit binary numbers (X,Y,Z) & outputs two nos. of one bit binary
numbers, Sum & Carry.
Subtractor: It takes two binary numbers as input and subtracts one binary number input
from the other binary number input.
Half Subtractor: It is a combinational circuit which is used to perform subtraction of two
bits.
Full Subtractor: It subtracts one bit from the other by taking pervious borrow into account
and generates difference and borrow.
Encoder: It is a device used to change a signal (such as a bitstream) or data into a code.
Priority Encoder: It has a priority given to each input.
Decoder: It is a logic circuit that converts an n bit binary input code into M (2n) output
lines such that each output line will be activated for only one of the possible combinations of
inputs.
MUX: It is a device that selects one of several analog or digital input signals and forwards
the selected input into a single line.
SELF ASSESMENT QUESTIONS
Short Answer Questions
1. What are the characteristics of combinational circuits?
97
ANNA UNIVERSITY
98
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
5.1 Concept of Sequential Circuits
5.1.1 Synchronous Sequential Circuits
5.1.2 Asynchronous Sequential Circuits
5.2 Latches
5.2.1 SR Latch
5.2.2 D Latch
5.3 Flip-flops
5.3.1 SR Flip-Flop
5.3.2 JK Flip-Flop
5.3.3 D Flip-Flop
5.3.4 T Flip-Flop
5.3.5 Master-Slave Flip-Flop
5.4 Registers
5.4.1 Shift Registers
5.5 Memories
5.5.1 Semiconductor Memories
5.6 Counters
5.6.1 Asynchronous Counter
5.6.2 Synchronous Counters
Summary
Keywords
Self-Assessment Questions
Further Reading
99
ANNA UNIVERSITY
LEARNING OBJECTIVE
After studying this chapter, you can able to:
• Gain knowledge about the concept of sequential circuit
• Describe about the concept of latch circuit
• Illustrate about type of flip flops
• Identify the different types registers
• Clarify the concept of memory unit.
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Design the sequential circuit
• The concept of latches and flip flops
• Analyze the different type of flip flops
• Gain knowledge about registers and memories
• Design of counters.
OVERVIEW
In the past chapter, you have contemplated different combinational circuits. You have found out
about the ideas of adder, subtractor, decoder, encoder and multiplexers. We have additionally
covered the way toward planning ALU circuit.
In this exercise, you will contemplate the idea of sequential circuits. In digital circuit hypothesis,
sequential logic is a sort of logic circuit whose yield depends not just on the current worth of its
information flags yet on the previous history of its data sources. Sequential logic is considered as
a combinational logic with memory. In this exercise, you will find out about latch – a logic
circuit having two sources of info and one yield, flip-flops that is a circuit having two stable
states and registers, which play out the capacity of putting away and moving the information.
You will likewise comprehend memory unit which is carried out on a semiconductor-based
incorporated circuit and counters that cycle through a decent arrangement of states.
Virtually all circuits in down to earth digital gadgets are a combination of combinational and
sequential logic. The ideas canvassed in this exercise will help you in utilizing sequential logic to
develop limited state machine, an essential structure block in all digital circuitry just as memory
circuits and different gadgets.
5.1 CONCEPT OF SEQUENTIAL CIRCUITS
A sequential circuit comprises of combinational circuit to which memory components are
associated with structure a criticism way. A sequential circuit acquires twofold data from outer
sources of info.
Digital sequential logic circuits are partitioned into synchronous and asynchronous sorts. A
synchronous sequential circuit is characterized as a framework whose conduct can be
characterized from the information on its signs at discrete moments of time.
100
ANNA UNIVERSITY
In asynchronous circuits, the condition of the gadget can change whenever because of evolving
inputs. The conduct of asynchronous sequential circuit relies on the request wherein its info
signals change and can be influenced at any moment of time.
5.1.1 Synchronous Sequential Logic
Essentially all sequential logic today is timed or synchronous logic. In a synchronous circuit, an
electronic oscillator called a clock produces a succession of tedious heartbeats called the clock
signal, which is circulated to all the memory components in the circuit. The essential memory
component in sequential logic is the flip-flop.
The yield of each flip-flop possibly changes when set off by the clock beat, so changes to the
logic signals all through the circuit all start simultaneously, at customary spans, synchronized by
the clock. The yield of all the capacity components (flip-flops) in the circuit at some random
time, the paired information they contain, is known as the condition of the circuit. The condition
of a synchronous circuit just changes on clock beats. At each cycle, the following state is dictated
by the present status and the worth of the info signals when the clock beat happens.
The fundamental benefit of synchronous logic is its straightforwardness. The logic doors which
play out the procedure on the information require a limited measure of time to react to changes to
their inputs. This is called engendering delay. The span between clock beats should be long
enough with the goal that all the logic entryways have the opportunity to react to the
progressions and their yields "settle" to stable logic esteems, before the following clock beat
happens. However long this condition is met (overlooking certain different subtleties) the circuit
is destined to be steady and dependable. This decides the most extreme working velocity of a
synchronous circuit.
Let us discuss some of the disadvantages of synchronous logic:
• The maximum possible clock rate is determined by the slowest logic path in the
circuit, otherwise known as the critical path.
• Logic paths that complete their calculations quickly are idle much of the time,
waiting for the next clock pulse. Therefore synchronous logic can be slower than
asynchronous logic. One way to speed up synchronous circuits is to split complex
operations into several simple operations which can be performed in successive clock
cycles. This technique is extensively used in microprocessor design, and helps to
improve the performance of modern processors.
• The clock signal must be distributed to every flip-flop in the circuit. As the clock is
usually a high-frequency signal, this distribution consumes a relatively large
amount of power and dissipates much heat. Even the flip- flops that are doing
nothing consume a small amount of power, thereby generating waste heat in the
chip. In portable devices which have limited battery power, the clock signal goes on
even when the device is not being used, consuming power.
101
ANNA UNIVERSITY
102
ANNA UNIVERSITY
• Enable signal is provided with the latch. When enable signal is active output changes occur
as the input changes. But when enable signal is not activated input changes do not affect the
output.
• Flip-Flop is used for a sequential device that normally samples its inputs and changes its
outputs only at times determined by clocking signal.
5.2.1 SR Latch
The simplest type of latch is the Set-Reset (SR) latch. It can be constructed from either two NOR gates or
two NAND gates
i) SR latch using NOR gates
103
ANNA UNIVERSITY
104
ANNA UNIVERSITY
The SR latch can also be implemented using NAND gates. The inputs of this Latch are S and R.
To understand how this circuit functions, recall that a low on any input to a NAND gate forces
its output high.
The operation of SR using NAND latch is summarized as follows: When S= 0 and R= 0, the
output of both gates will produce 0. i.e., Qn+1= Qn+1’= 1.
• When S= 0 and R= 1, the latch is reset to 0.
• When S= 1 and R= 0, the latch is set to 1.
• When S= 1 and R= 1, the output, Qn+1 remains in its present state, Qn.
The truth table of NAND based SR latch is shown below.
S R Qn Qn+1 State
0 0 0 X Indeterminate
No change
0 0 1 X *
0 1 0 1
Set
Reset
0 1 1 1
1 0 0 0
Set
1 0 1 0 Reset
1 1 0 0 NoIndeterminate
Change (NC)
Figure.5.10: Truth table
105
ANNA UNIVERSITY
When S is HIGH and R is LOW, a HIGH on the EN input sets the latch. When S is LOW and R
is HIGH, a HIGH on the EN input resets the latch.
106
ANNA UNIVERSITY
In the above logic circuit if S = 0 and also R = 0, the condition of Q is totally unpredictable.
• First suppose Q is previously 0.
• Now both inputs of G2 are 0 as R = 0 and Q = 0. So output of G2, i.e. Q’ is at logic 1.
• Now the inputs of G1 are 0 and 1 as S=0 and Q’= 1. So output of G1, i.e. Q is at logic1.
That means Q is changed.
• Now Q is 1. So inputs of G2 are 0 and 1 as R = 0 and Q = 1. So output of G2 i.e. Q’ is at
logic 1. That means Q’ is unchanged.
• Now the inputs of G1 are 0 and 1 as S=0 and Q’= 1. So output of G1 i.e. Q is or 1. That
means Q is unchanged.
5.2.2 D Latch
There is one drawback of SR Latch. That is the next state value can’t be predicted when both the
inputs S & R are one. So, we can overcome this difficulty by D Latch. It is also called as Data
Latch. The circuit diagram of D Latch is shown in the following figure.
The D latch as shown below has an enable input. When the E input is 1, the Q output follows the
D input. In this situation, the latch is said to be "open" and the path from the input D to the
output Q is "transparent". Thus the circuit is also known as a transparent latch. When E is 0, the
latch is disabled or "closed", and the Q output retains its last value independent of the D input.
107
ANNA UNIVERSITY
5.3 FLIP-FLOPS
In electronics, a flip-flop or latch is a circuit that has two stable states and can be used to store
state information – a bistable multivibrator. The circuit can be made to change state
by signals applied to one or more control inputs and will have one or two outputs. It is the basic
storage element in sequential logic. Flip-flops and latches are fundamental building blocks
of digital electronics systems used in computers, communications, and many other types of
systems.
5.3.1 SR Flip-Flop
The SR flip-flop, also known as a SR Latch, can be considered as one of the most basic
sequential logic circuit possible. This simple flip-flop is basically a one-bit memory bistable
device that has two inputs, one which will “SET” the device (meaning the output = “1”), and is
labelled S and one which will “RESET” the device (meaning the output = “0”), labelled R.
Then the SR description stands for “Set-Reset”. The reset input resets the flip-flop back to its
original state with an output Q that will be either at a logic level “1” or logic “0” depending upon
this set/reset condition.
A basic NAND gate SR flip-flop circuit provides feedback from both of its outputs back to its
opposing inputs and is commonly used in memory circuits to store a single data bit. Then the SR
flip-flop actually has three inputs, Set, Reset and its current output Q relating to it’s current state
or history. The term “Flip-flop” relates to the actual operation of the device, as it can be
“flipped” into one logic Set state or “flopped” back into the opposing logic Reset state.
The NAND Gate SR Flip-Flop
The simplest way to make any basic single bit set-reset SR flip-flop is to connect together a pair
of cross-coupled 2-input NAND gates as shown, to form a Set-Reset Bistable also known as an
active LOW SR NAND Gate Latch, so that there is feedback from each output to one of the
other NAND gate inputs. This device consists of two inputs, one called the Set, S and the other
called the Reset, R with two corresponding outputs Q and its inverse or complement Q (not-Q)
as shown below.
The Basic SR Flip-flop
108
ANNA UNIVERSITY
It can be seen that when both inputs S = “1” and R = “1” the outputs Q and Q can be at either
logic level “1” or “0”, depending upon the state of the inputs S or R BEFORE this input
condition existed. Therefore the condition of S = R = “1” does not change the state of the
outputs Q and Q.
However, the input state of S = “0” and R = “0” is an undesirable or invalid condition and must
be avoided. The condition of S = R = “0” causes both outputs Q and Q to be HIGH together at
109
ANNA UNIVERSITY
logic level “1” when we would normally want Q to be the inverse of Q. The result is that the flip-
flop looses control of Q and Q, and if the two inputs are now switched “HIGH” again after this
condition to logic “1”, the flip-flop becomes unstable and switches to an unknown data state
based upon the unbalance as shown in the following switching diagram.
S-R Flip-flop Switching Diagram
This unbalance can cause one of the outputs to switch faster than the other resulting in the flip-
flop switching to one state or the other which may not be the required state and data corruption
will exist. This unstable condition is generally known as its Meta-stable state.
Then, a simple NAND gate SR flip-flop or NAND gate SR latch can be set by applying a logic
“0”, (LOW) condition to its Set input and reset again by then applying a logic “0” to
its Reset input. The SR flip-flop is said to be in an “invalid” condition (Meta-stable) if both the
set and reset inputs are activated simultaneously.
As we have seen above, the basic NAND gate SR flip-flop requires logic “0” inputs to flip or
change state from Q to Q and vice versa. We can however, change this basic flip-flop circuit to
one that changes state by the application of positive going input signals with the addition of two
extra NAND gates connected as inverters to the S and R inputs as shown.
Positive NAND Gate SR Flip-flop
As well as using NAND gates, it is also possible to construct simple one-bit SR Flip-flops using
110
ANNA UNIVERSITY
two cross-coupled NOR gates connected in the same configuration. The circuit will work in a
similar way to the NAND gate circuit above, except that the inputs are active HIGH and the
invalid condition exists when both its inputs are at logic level “1”, and this is shown below.
The NOR Gate SR Flip-flop
5.3.2 JK Flip-Flop
The JK Flip-flop is similar to the SR Flip-flop but there is no change in state when the J and K
inputs are both LOW.
The basic S-R NAND flip-flop circuit has many advantages and uses in sequential logic circuits
but it suffers from two basic switching problems.
1. the Set = 0 and Reset = 0 condition (S = R = 0) must always be avoided
2. if Set or Reset change state while the enable (EN) input is high the correct latching action may
not occur
Then to overcome these two fundamental design problems with the SR flip-flop design, the JK
flip Flop was developed.
This simple JK flip Flop is the most widely used of all the flip-flop designs and is considered to
be a universal flip-flop circuit. The two inputs labelled “J” and “K” are not shortened abbreviated
letters of other words, such as “S” for Set and “R” for Reset, but are themselves autonomous
letters chosen by its inventor Jack Kilby to distinguish the flip-flop design from other types.
The sequential operation of the JK flip flop is exactly the same as for the previous SR flip-flop
with the same “Set” and “Reset” inputs. The difference this time is that the “JK flip flop” has no
invalid or forbidden input states of the SR Latch even when S and R are both at logic “1”.
The JK flip flop is basically a gated SR flip-flop with the addition of a clock input circuitry that
prevents the illegal or invalid output condition that can occur when both inputs S and R are equal
to logic level “1”. Due to this additional clocked input, a JK flip-flop has four possible input
combinations, “logic 1”, “logic 0”, “no change” and “toggle”. The symbol for a JK flip flop is
similar to that of an SR Bistable Latch as seen in the previous tutorial except for the addition of a
clock input.
111
ANNA UNIVERSITY
If the circuit is now “SET” the J input is inhibited by the “0” status of Q through the
lower NAND gate. If the circuit is “RESET” the K input is inhibited by the “0” status
of Q through the upper NAND gate. As Q and Q are always different we can use them to control
the input. When both inputs J and K are equal to logic “1”, the JK flip flop toggles as shown in
the following truth table.
The Truth Table for the JK Function
J K Qn Qn+1 Description
0 0 0 0
No Change
0 0 1 1
0 1 0 0
Reset
0 1 1 0
1 0 0 1
Set
1 0 1 1
1 1 0 1
Toggle
1 1 1 0
Figure 5.23: Truth Table
Then the JK flip-flop is basically an SR flip flop with feedback which enables only one of its two
input terminals, either SET or RESET to be active at any one time under normal switching
thereby eliminating the invalid condition seen previously in the SR flip flop circuit.
However, if both the J and K inputs are HIGH at logic “1” (J = K = 1), when the clock input goes
HIGH, the circuit will “toggle” as its outputs switch and change state complementing each other.
This results in the JK flip-flop acting more like a T-type toggle flip-flop when both terminals are
112
ANNA UNIVERSITY
“HIGH”. However, as the outputs are fed back to the inputs, this can cause the output at Q to
oscillate between SET and RESET continuously after being complemented once.
While this JK flip-flop circuit is an improvement on the clocked SR flip-flop it also suffers from
timing problems called “race” if the output Q changes state before the timing pulse of the clock
input has time to go “OFF”. To avoid this the timing pulse period ( T ) must be kept as short as
possible (high frequency). As this is sometimes not possible with basic JK flip-flops built using
basic NAND or NOR gates, far more advanced master-slave (edge-triggered) flip-flops were
developed which are more stable.
5.3.3 D Flip-Flops
The D-type flip-flop is a modified Set-Reset flip-flop with the addition of an inverter to prevent
the S and R inputs from being at the same logic level.
One of the main disadvantages of the basic SR NAND Gate Bistable circuit is that the
indeterminate input condition of SET = “0” and RESET = “0” is forbidden.
This state will force both outputs to be at logic “1”, over-riding the feedback latching action and
whichever input goes to logic level “1” first will lose control, while the other input still at logic
“0” controls the resulting state of the latch.
But in order to prevent this from happening an inverter can be connected between the “SET” and
the “RESET” inputs to produce another type of flip flop circuit known as a Data Latch, Delay
flip flop, D-type Bistable, D-type Flip Flop or just simply a D Flip Flop as it is more generally
called.
The D Flip Flop is by far the most important of all the clocked flip-flops. By adding an inverter
(NOT gate) between the Set and Reset inputs, the S and R inputs become complements of each
other ensuring that the two inputs S and R are never equal (0 or 1) to each other at the same time
allowing us to control the toggle action of the flip-flop using one single D (Data) input.
Then this Data input, labelled “D” and is used in place of the “Set” signal, and the inverter is
used to generate the complementary “Reset” input thereby making a level-sensitive D-type flip-
flop from a level-sensitive SR-latch as now S = D and R = not D as shown.
D-type Flip-Flop Circuit
We remember that a simple SR flip-flop requires two inputs, one to “SET” the output and one to
“RESET” the output. By connecting an inverter (NOT gate) to the SR flip-flop we can “SET”
113
ANNA UNIVERSITY
and “RESET” the flip-flop using just one input as now the two input signals are complements of
each other. This complement avoids the ambiguity inherent in the SR latch when both inputs are
LOW, since that state is no longer possible.
Thus this single input is called the “DATA” input. If this data input is held HIGH the flip flop
would be “SET” and when it is LOW the flip flop would change and become “RESET”.
However, this would be rather pointless since the output of the flip flop would always change on
every pulse applied to this data input.
To avoid this an additional input called the “CLOCK” or “ENABLE” input is used to isolate the
data input from the flip flop’s latching circuitry after the desired data has been stored. The effect
is that D input condition is only copied to the output Q when the clock input is active. This then
forms the basis of another sequential device called a D Flip Flop.
The “D flip flop” will store and output whatever logic level is applied to its data terminal so long
as the clock input is HIGH. Once the clock input goes LOW the “set” and “reset” inputs of the
flip-flop are both held at logic level “1” so it will not change state and store whatever data was
present on its output before the clock transition occurred. In other words the output is “latched”
at either logic “0” or logic “1”.
Truth Table for the D-type Flip Flop
Clk D Qn Qn+1 Description
0 X Qn Qn Memory
no change
1 0 0 0 Reset Q » 0
1 1 1 1 Set Q » 1
Figure 5.25: Truth Table
Note that: ↓ and ↑ indicates direction of clock pulse as it is assumed D-type flip flops are edge
triggered
The Master-Slave D Flip Flop
The basic D-type flip flop can be improved further by adding a second SR flip-flop to its output
that is activated on the complementary clock signal to produce a “Master-Slave D-type flip flop”.
On the leading edge of the clock signal (LOW-to-HIGH) the first stage, the “master” latches the
input condition at D, while the output stage is deactivated.
On the trailing edge of the clock signal (HIGH-to-LOW) the second “slave” stage is now
activated, latching on to the output from the first master circuit. Then the output stage appears to
be triggered on the negative edge of the clock pulse. “Master-Slave D-type flip flops” can be
constructed by the cascading together of two latches with opposite clock phases as shown.
5.3.4 T Flip-Flop
Toggle Flip-flops are sequential logic circuits frequently used as single bit bistable storage
elements in counters, memory divices or as frequency dividers in response to a clock pulse.
114
ANNA UNIVERSITY
The Toggle Flip-flop is another type of bistable sequential logic circuit based around the
previous clocked JK flip-flop circuit. The toggle flip-flop can be used as a basic digital element
for storing one bit of information, as a divide-by-two divider or as a counter. Toggle flip-flops
have a single input and one or two complementary outputs of Q and Q which change state on the
positive edge (rising edge) or negative edge (falling edge) of an input clock signal or pulse.
Toggle flip-flops, TFF’s or simply “T-type flip-flops” are not available commercially as a
dedicated TTL or CMOS logic chip, they can be easily constructed by connecting together
the J and K inputs of a basic JK flip-flop where the J input behaves like a Set (S) command, and
the K input behaves like a Reset (R) command. We remember from our previous tutorial that the
JK flip-flop is an asynchronous flip-flop where its input condition (HIGH or LOW), and its
present steady state, both determine its next switching state.
Suppose that initially CLK and input T are both LOW (CLK = T = 0), and that output Q is HIGH
(Q = 1). At the rising edge or falling edge of a CLK pulse, the logic “0” condition present
at T prevents the output at Q from changing state. Thus the output remains unchanged when T =
0.
Now let’s suppose that input T is HIGH (T = 1) and CLK is LOW (CLK = 0). At the rising edge
(assuming positive transistion) of a CLK pulse at time t1, the output at Q changes state and
becomes LOW, making Q HIGH. The negative transistion of the clock pulse from HIGH to
LOW at time t2 has no effect on the output at Q as the flip-flop is reset into one stable state.
At the next rising edge of the clock signal at time t3, the logic “1” at T passes to Q, changing its
state making output Q HIGH again. The negative transistion of the CLK pulse at time t4 from
HIGH to LOW once again has no effect on the output. Thus the Q output of the flip-flop
“toggles” at each positive going edge (for this example) of the CLK pulse.
115
ANNA UNIVERSITY
Then we can define the switching action of the toggle flip-flop in Boolean form as being:
Q+1 = T.Q + T.Q
Where: Q represents the present steady state of the flip-flop and Q+1 is the next switching state.
You may have noticed that the characteristic equation given in Boolean form for the toggle flip-
flop above will produce an output HIGH for the next state (Q+1) if the two inputs of T and Q are
different, and a LOW output if these inputs are the same.
This idea of Q+1 is HIGH only when either of the inputs is HIGH but not when both inputs are
HIGH, that is either input but not both represents the same Boolean Algebra expression of
an Exclusive-OR Function which is given as:
Q+1 = TQ + TQ = T XOR Q = T ⊕ Q
Then we can represent the switching action of a toggle flip-flop using a 2-input Exclusive-OR
(Ex-OR) gate.
5.3.5 Master-Slave Flip-flop
The master-slave flip-flop eliminates all the timing problems by using two SR flip-flops
connected together in a series configuration. One flip-flop acts as the “Master” circuit, which
triggers on the leading edge of the clock pulse while the other acts as the “Slave” circuit, which
triggers on the falling edge of the clock pulse. This results in the two sections, the master section
and the slave section being enabled during opposite half-cycles of the clock signal.
Master-Slave JK Flip-flop
The Master-Slave Flip-Flop is basically two gated SR flip-flops connected together in a series
configuration with the slave having an inverted clock pulse. The outputs from Q and Q from the
“Slave” flip-flop are fed back to the inputs of the “Master” with the outputs of the “Master” flip
flop being connected to the two inputs of the “Slave” flip flop. This feedback configuration from
the slave’s output to the master’s input gives the characteristic toggle of the JK flip flop as
shown below.
The input signals J and K are connected to the gated “master” SR flip flop which “locks” the input
condition while the clock (Clk) input is “HIGH” at logic level “1”. As the clock input of the “slave” flip
flop is the inverse (complement) of the “master” clock input, the “slave” SR flip flop does not toggle. The
outputs from the “master” flip flop are only “seen” by the gated “slave” flip flop when the clock input
goes “LOW” to logic level “0”.
When the clock is “LOW”, the outputs from the “master” flip flop are latched and any additional changes
116
ANNA UNIVERSITY
to its inputs are ignored. The gated “slave” flip flop now responds to the state of its inputs passed over by
the “master” section.
Then on the “Low-to-High” transition of the clock pulse the inputs of the “master” flip flop are
fed through to the gated inputs of the “slave” flip flop and on the “High-to-Low” transition the
same inputs are reflected on the output of the “slave” making this type of flip flop edge or pulse-
triggered.
Then, the circuit accepts input data when the clock signal is “HIGH”, and passes the data to the
output on the falling-edge of the clock signal. In other words, the Master-Slave JK Flip flop is a
“Synchronous” device as it only passes data with the timing of the clock signal.
The Master-Slave D Flip Flop Circuit
We can see from above that on the leading edge of the clock pulse the master flip-flop will be
loading data from the data D input, therefore the master is “ON”. With the trailing edge of the
clock pulse the slave flip-flop is loading data, i.e. the slave is “ON”. Then there will always be
one flip-flop “ON” and the other “OFF” but never both the master and slave “ON” at the same
time. Therefore, the output Q acquires the value of D, only when one complete pulse, ie, 0-1-0 is
applied to the clock input.
5.4 Registers
A Register is a collection of flip flops. A flip flop is used to store single bit digital data. For
storing a large number of bits, the storage capacity is increased by grouping more than one flip
117
ANNA UNIVERSITY
flops. If we want to store an n-bit word, we have to use an n-bit register containing n number of
flip flops.
The register is used to perform different types of operations. For performing the operations, the
CPU use these registers. The faded inputs to the system will store into the registers. The result
returned by the system will store in the registers.
5.4.1 Shift Register
The Shift Register is another type of sequential logic circuit that can be used for the storage or
the transfer of binary data.
This sequential device loads the data present on its inputs and then moves or “shifts” it to its
output once every clock cycle, hence the name Shift Register.
A shift register basically consists of several single bit “D-Type Data Latches”, one for each data
bit, either a logic “0” or a “1”, connected together in a serial type daisy-chain arrangement so
that the output from one data latch becomes the input of the next latch and so on.
Data bits may be fed in or out of a shift register serially, that is one after the other from either the
left or the right direction, or all together at the same time in a parallel configuration.
The number of individual data latches required to make up a single Shift Register device is
usually determined by the number of bits to be stored with the most common being 8-bits (one
byte) wide constructed from eight individual data latches.
Shift Registers are used for data storage or for the movement of data and are therefore commonly
used inside calculators or computers to store data such as two binary numbers before they are
added together, or to convert the data from either a serial to parallel or parallel to serial format.
The individual data latches that make up a single shift register are all driven by a common clock
( Clk ) signal making them synchronous devices.
Shift register IC’s are generally provided with a clear or reset connection so that they can be
“SET” or “RESET” as required. Generally, shift registers operate in one of four different modes
with the basic movement of data through a shift register being:
Serial-in to Parallel-out (SIPO) - the register is loaded with serial data, one bit at a time, with
the stored data being available at the output in parallel form.
Serial-in to Serial-out (SISO) - the data is shifted serially “IN” and “OUT” of the register, one
bit at a time in either a left or right direction under clock control.
Parallel-in to Serial-out (PISO) - the parallel data is loaded into the register simultaneously and
is shifted out of the register serially one bit at a time under clock control.
Parallel-in to Parallel-out (PIPO) - the parallel data is loaded simultaneously into the register,
and transferred together to their respective outputs by the same clock pulse.
The effect of data movement from left to right through a shift register can be presented
118
ANNA UNIVERSITY
graphically as:
Also, the directional movement of the data through a shift register can be either to the left, (left
shifting) to the right, (right shifting) left-in but right-out, (rotation) or both left and right shifting
within the same register thereby making it bidirectional. In this tutorial it is assumed that all the
data shifts to the right, (right shifting).
Serial-in to Parallel-out (SIPO) Shift Register
Let’s assume that all the flip-flops ( FFA to FFD ) have just been RESET ( CLEAR input ) and
that all the outputs QA to QD are at logic level “0” ie, no parallel data output.
If a logic “1” is connected to the DATA input pin of FFA then on the first clock pulse the output
of FFA and therefore the resulting QA will be set HIGH to logic “1” with all the other outputs
still remaining LOW at logic “0”. Assume now that the DATA input pin of FFA has returned
LOW again to logic “0” giving us one data pulse or 0-1-0.
The second clock pulse will change the output of FFA to logic “0” and the output
of FFB and QB HIGH to logic “1” as its input D has the logic “1” level on it from QA. The logic
“1” has now moved or been “shifted” one place along the register to the right as it is now at QA.
When the third clock pulse arrives this logic “1” value moves to the output of FFC ( QC ) and so
on until the arrival of the fifth clock pulse which sets all the outputs QA to QD back again to logic
level “0” because the input to FFA has remained constant at logic level “0”.
The effect of each clock pulse is to shift the data contents of each stage one place to the right,
and this is shown in the following table until the complete data value of 0-0-0-1 is stored in the
register. This data value can now be read directly from the outputs of QA to QD.
Then the data has been converted from a serial data input signal to a parallel data output. The
truth table and following waveforms show the propagation of the logic “1” through the register
119
ANNA UNIVERSITY
Note that after the fourth clock pulse has ended the 4-bits of data ( 0-0-0-1 ) are stored in the
register and will remain there provided clocking of the register has stopped. In practice the input
data to the register may consist of various combinations of logic “1” and “0”. Commonly
available SIPO IC’s include the standard 8-bit 74LS164 or the 74LS594.
Serial-in to Serial-out (SISO) Shift Register
This shift register is very similar to the SIPO above, except were before the data was read
directly in a parallel form from the outputs QA to QD, this time the data is allowed to flow
straight through the register and out of the other end. Since there is only one output,
the DATA leaves the shift register one bit at a time in a serial pattern, hence the name Serial-in
to Serial-Out Shift Register or SISO.
The SISO shift register is one of the simplest of the four configurations as it has only three
connections, the serial input (SI) which determines what enters the left hand flip-flop, the serial
output (SO) which is taken from the output of the right hand flip-flop and the sequencing clock
signal (Clk). The logic circuit diagram below shows a generalized serial-in serial-out shift
register.
120
ANNA UNIVERSITY
You may think what’s the point of a SISO shift register if the output data is exactly the same as
the input data. Well this type of Shift Register also acts as a temporary storage device or it can
act as a time delay device for the data, with the amount of time delay being controlled by the
number of stages in the register, 4, 8, 16 etc or by varying the application of the clock pulses.
Commonly available IC’s include the 74HC595 8-bit Serial-in to Serial-out Shift Register all
with 3-state outputs.
Parallel-in to Serial-out (PISO) Shift Register
The Parallel-in to Serial-out shift register acts in the opposite way to the serial-in to parallel-out
one above. The data is loaded into the register in a parallel format in which all the data bits enter
their inputs simultaneously, to the parallel input pins PA to PD of the register. The data is then
read out sequentially in the normal shift-right mode from the register at Q representing the data
present at PA to PD.
This data is outputted one bit at a time on each clock cycle in a serial format. It is important to
note that with this type of data register a clock pulse is not required to parallel load the register as
it is already present, but four clock pulses are required to unload the data.
As this type of shift register converts parallel data, such as an 8-bit data word into serial format,
it can be used to multiplex many different input lines into a single serial DATA stream which
can be sent directly to a computer or transmitted over a communications line. Commonly
available IC’s include the 74HC166 8-bit Parallel-in/Serial-out Shift Registers.
Parallel-in to Parallel-out (PIPO) Shift Register
The final mode of operation is the Parallel-in to Parallel-out Shift Register. This type of shift
register also acts as a temporary storage device or as a time delay device similar to the SISO
configuration above. The data is presented in a parallel format to the parallel input
pins PA to PD and then transferred together directly to their respective output pins QA to QD by
121
ANNA UNIVERSITY
the same clock pulse. Then one clock pulse loads and unloads the register. This arrangement for
parallel loading and unloading is shown below.
The PIPO shift register is the simplest of the four configurations as it has only three connections,
the parallel input (PI) which determines what enters the flip-flop, the parallel output (PO) and the
sequencing clock signal (Clk).
Similar to the Serial-in to Serial-out shift register, this type of register also acts as a temporary
storage device or as a time delay device, with the amount of time delay being varied by the
frequency of the clock pulses. Also, in this type of register there are no interconnections between
the individual flip-flops since no serial shifting of the data is required.
5.5 MEMORIES
A memory unit is a collection of storage cells together with associated circuits needed to
transform information in and out of the device.
Sequential logic differs from combinational logic in that the output of the logic device is
dependent not only on the present inputs to the device, but also on past inputs; i.e., the output of
a sequential logic device depends on its present internal state and the present inputs. This
implies that a sequential logic device has some kind of memory of at least part of its “history”'
(i.e. its previous inputs).
A simple memory device can be constructed from combinational devices with, which we are
already familiar. By a memory device, we mean a device that can remember if a signal of logic
level 0 or 1 has been connected to one of its inputs, and can make this fact available at an
output. A very simple, but still useful, memory device can be constructed from a simple OR
gate, as shown in Figure 5.38.
In this memory device, if A and Q are initially at logic 0, then Q remains at logic 0.
However if the single input A ever becomes a logic 1, then the output Q will be logic 1 ever
after, regardless of any further changes in the input at A. In this simple memory, the output is
122
ANNA UNIVERSITY
a function of the state of the memory element only; after the memory is “written”' then it
cannot be changed back. However, it can be “read”. Such a device could be used as a simple
read only memory, which could be “programmed” only once. Often a state table or timing
diagram is used to describe the behaviour of a sequential device.
Figure 5.39 shows both a state table and a timing diagram for this simple memory. The state
table shows the state which the device enters after an input (the ``next state''), for all possible
states and inputs. For this device, the output is the value stored in the memory.
In particular, it is possible to inadvertently construct devices for which the output is not
determined by the inputs, and for which it is not possible to predict the output.
Semiconductor Memory
Semiconductor memory is used in any electronics assembly that uses computer processing
technology. Semiconductor memory is the essential electronics component needed for any
computer based PCB assembly.
In addition to this, memory cards have become commonplace items for temporarily storing data -
everything from the portable flash memory cards used for transferring files, to semiconductor
memory cards used in cameras, mobile phones and the like.
The use of semiconductor memory has grown, and the size of these memory cards has increased
as the need for larger and larger amounts of storage is needed.
To meet the growing needs for semiconductor memory, there are many types and technologies
that are used. As the demand grows new memory technologies are being introduced and the
existing types and technologies are being further developed.
A variety of different memory technologies are available - each one suited to different
applications. Names such as ROM, RAM, EPROM, EEPROM, Flash memory, DRAM, SRAM,
SDRAM, as well as F-RAM and MRAM are available, and new types are being developed to
enable improved performance.
Terms like DDR3, DDR4, DDR5 and many more are seen and these refer to different types of
SDRAM semiconductor memory.
In addition to this the semiconductor devices are available in many forms - ICs for printed board
assembly, USB memory cards, Compact Flash cards, SD memory cards and even solid state hard
123
ANNA UNIVERSITY
drives. Semiconductor memory is even incorporated into many microprocessor chips as on-board
memory.
Types of Semiconductor Memory
Electronic semiconductor memory technology can be split into two main types or categories,
according to the way in which the memory operates:
Random Access Memory (RAM): It is a type of memory in which all addresses are accessible
in an equal amount of time and can be selected in any order for a read or write operation. It is
meaning of RAM that it can access all necessary data and file programs randomly from cache
memory, and it is also known as “Primary Memory“, “Main Memory”, “Internal Memory”.
RAM is hardware part of computer which is embedded on the motherboard. All RAMs have both
read and write capability. Because RAMs lose stored data when the power is turned off, they are
volatile memories.
There are two important memory devices in the RAM family: SRAM and DRAM. The main
difference between them is the duration of the data stored. Static RAM (SRAM) retains its
contents as long as electrical power is applied to the chip. However, if the power is turned of for
lost temporarily, its contents will be lost forever. Dynamic RAM (DRAM), on the other hand,
has an extremely short data lifetime usually less than a quarter of a second. This is true even
when power is applied continuously.
SRAM: Full form of SRAM is “Static Random Access Memory”
Static RAM is able to retain all information into static form until power supply is turn off, so due
to this nature this memory is known as volatile memory. Main objective of using the static RAM
is to make Cache Memory. It is more expensive as well as its power consumption is more when
compared to dynamic RAM. Static RAM are faster than dynamic RAM. In SRAM, all data has
been stored in flip-flop. Flip-flop contains the every bit of this Ram. Flip-flop uses 4-6 transistors
for making a memory cell and its circuit do not need to refreshment continuously.
As long as dc power is applied to a static memory cell, it can retain a 1 or 0 state indefinitely. If
power is removed, the stored data bit is lost. The cell is selected by an active level on the Select
line and a data bit (l or 0) is written into the cell by placing it on the Data in line. A data bit is
read by taking it off the Data out line.
a).Types of SRAM: There are different types of SRAM, such as –
Non-Volatile SRAM: It is capable to store all information when power supply gets turn off.
Pseudo SRAM: It uses the self -refresh circuit, but it is slow speed to static RAM
b). Basic SRAM Organization: Basic Static Memory Cell Array
The memory cells in a SRAM are organized in rows and columns. All the cells in a row share
the same Row Select line. Each set of Data in and Data out lines go to each cell in a given
column and are connected to a single data line that serves as both an input and output (Data I/O)
124
ANNA UNIVERSITY
c).Operation:SRAM cell is designed with two inverters, which are cross-linked like as latch
form. This latch is made connection to two bit line along with two transistors T1 and T2. Now
both transistors are capable to alter their modes (open or close) under control of word line, and
this entire process is controlled by address decoder. When word line goes to ground level then
both transistors get turned off, and latch starts to retain own state.
i).Read Operation: Both switches T1 and T2 are closed while activating the word line. When,
cell comes to state 1 then signal flows in high amount on b line and other side signal flows in
low amount on b’ line. Opposite is true when cell goes to state 0. Finally both b and b’ get
complement of each other’s. Sense/write, which are connected in the rear side of two bit line,
they monitor their states and finally convert into output respectively.
ii).Write Operation: In the write operation, Sense/Write circuit allows to drive bit lines b and it
complement b’, and then it provides accurate values on bit line b and b’ as well as go to activate
word line.
DRAM: Full form of DRAM is “Dynamic Random Access Memory”
DRAM is another type of semiconductor memory, and it is designed specially to store data or
program files which are needed by computer processors for performing their functions.
In DRAM, several capacitors are used for storing every bit of data. This is very simple path to
125
ANNA UNIVERSITY
save data in its memory because it needs small area to store same data to SRAM as well as it is
capable to store massive data than to SRAM but it requires the frequently refreshing of its circuit
for its charging, so it consumes more power compare to SRAM.
a).Types of DRAM:
SDRAM: It stands for “Synchronous Dynamic Access Memory”, and it can access any element
of data within 25 to 10 nano second. SDRAM are used in the DIMM (dual in-line memory
module) along with 168 contacts. In which, all data are stored with the help of capacitors using
IC’s “Integrated Circuits”, and it is inserted into its specific slot, which is embedded on the
motherboard.
ADRAM: Asynchronous DRAM is basic form of the DRAM, and Asynchronous DRAM is
enabling to different connections like as power, address inputs, and bidirectional data lines. It
controls the timing of all memory devices with asynchronously nature, and memory controller
circuit arises the useful control signals to control timing.
b).Operation: DRAM was invented by Robert Dennard in 1966, at IBM. DRAM uses two
elements as a storage cell like as transistor and capacitor. To keep charge or discharge of
capacitors to be used the transistor. If logic high or “1” it means capacitor is fully charged
otherwise it is discharged then its logic low or “0”. All operations of charging or discharging are
performed by work line and bit line as shown in figure.
i).Write Operation: In this operation, Voltage is supplied on bit line as well as signal is supplied
on the address line for closing the transistor.
ii).Reading Operation: While storing the information on the cell, then transistor is turned on
and voltage is supplied for bit line. Due to this process, some charge is stored in the capacitors.
After some time transistor is turned off mode, and it goes to discharge. Hence, entire information
is stored in the cell which can be read easily.
Read Only Memory (ROM): A ROM contains permanently or semi-permanently stored data,
which can be read from the memory but either cannot be changed at all or cannot be changed
without specialization equipment. A ROM stores data that are used repeatedly in system
applications, such as tables, conversions, or programmed instructions for system initialization
and operation. ROMs retain stored data when the power is OFF and are therefore nonvolatile
memories.
126
ANNA UNIVERSITY
Figure.5.43.ROM Cells
127
ANNA UNIVERSITY
if an existing program in the memory array is erased first. An EPROM uses an NMOSFET array
with an isolated-gate structure. The isolated transistor gate has no electrical connections and can
store an electrical charge for indefinite periods of time. The data bits in this type of array are
represented by the presence or absence of a stored gate charge. Erasure of a data bit is a process
that removes the gate charge.
Two basic types of erasable PROMs are the ultraviolet erasable PROM (UV EPROM) and the
electrically erasable PROM (EEPROM).
iv. EEPROM (Electrically Erasable Programmable ROM)
EEPROM is programmed and erased electrically. It can be erased and reprogrammed about ten
thousand times. Both erasing and programming take about 4 to 10 ms (millisecond). In
EEPROM, any location can be selectively erased and programmed. EEPROMs can be erased
one byte at a time, rather than erasing the entire chip. Hence, the process of reprogramming is
flexible but slow.
5.6 COUNTERS
A counter is a sequential circuit that cycles through a prescribed sequence of states upon the
application of input pulses. The state of the counter is stored in Flip-Flops. An n-bit counter
has n Flip-Flops and can cycle through at most 2nstates.
5.6.1 Asynchronous Counter
Asynchronous Counters use flip-flops which are serially connected together so that the input
clock pulse appears to ripple through the counter.
An Asynchronous counter can have 2n-1 possible counting states e.g. MOD-16 for a 4-bit
counter, (0-15) making it ideal for use in Frequency Division applications. But it is also possible
to use the basic asynchronous counter configuration to construct special counters with counting
states less than their maximum output number. For example, modulo or MOD counters.
This is achieved by forcing the counter to reset itself to zero at a pre-determined value producing
a type of asynchronous counter that has truncated sequences. Then an n-bit counter that counts
up to its maximum modulus ( 2n ) is called a full sequence counter and a n-bit counter whose
modulus is less than the maximum possible is called a truncated counter.
But why would we want to create an asynchronous truncated counter that is not a MOD-4,
MOD-8, or some other modulus that is equal to the power of two. The answer is that we can by
using combinational logic to take advantage of the asynchronous inputs on the flip-flop.
If we take the modulo-16 asynchronous counter and modified it with additional logic gates it can
be made to give a decade (divide-by-10) counter output for use in standard decimal counting and
arithmetic circuits.
Such counters are generally referred to as Decade Counters. A decade counter requires resetting
to zero when the output count reaches the decimal value of 10, ie. when DCBA = 1010 and to do
128
ANNA UNIVERSITY
this we need to feed this condition back to the reset input. A counter with a count sequence from
binary “0000” (BCD = “0”) through to “1001” (BCD = “9”) is generally referred to as a BCD
binary-coded-decimal counter because its ten state sequence is that of a BCD code but binary
decade counters are more common.
Asynchronous Decade Counter
This type of asynchronous counter counts upwards on each trailing edge of the input clock signal
starting from 0000 until it reaches an output 1001 (decimal 9). Both outputs QA and QD are now
equal to logic “1”. On the application of the next clock pulse, the output from the 74LS10
NAND gate changes state from logic “1” to a logic “0” level.
As the output of the NAND gate is connected to the CLEAR ( CLR ) inputs of all the 74LS73 J-
K Flip-flops, this signal causes all of the Q outputs to be reset back to binary 0000 on the count
of 10. As outputs QA and QD are now both equal to logic “0” as the flip-flop’s have just been
reset, the output of the NAND gate returns back to a logic level “1” and the counter restarts again
from 0000. We now have a decade or Modulo-10 up-counter.
Decade Counter Truth Table
Clock Output bit Pattern Decimal
Count QD QC QB QA Value
1 0 0 0 0 0
2 0 0 0 1 1
3 0 0 1 0 2
4 0 0 1 1 3
5 0 1 0 0 4
6 0 1 0 1 5
7 0 1 1 0 6
8 0 1 1 1 7
9 1 0 0 0 8
10 1 0 0 1 9
11 Counter Resets its Outputs back to Zero
Figure 5.45: Truth Table
129
ANNA UNIVERSITY
By using the same idea of truncating counter output sequences, the above circuit could easily be
adapted to other counting cycles be simply changing the connections to the inputs of
the NAND gate or by using other logic gate combinations
5.6.2 Synchronous Counter
Synchronous Counters are so called because the clock input of all the individual flip-flops within
the counter are all clocked together at the same time by the same clock signal.
In the previous Asynchronous binary counter tutorial, we saw that the output of one counter
stage is connected directly to the clock input of the next counter stage and so on along the chain.
The result of this is that the Asynchronous counter suffers from what is known as “Propagation
Delay” in which the timing signal is delayed a fraction through each flip-flop.
However, with the Synchronous Counter, the external clock signal is connected to the clock
input of EVERY individual flip-flop within the counter so that all of the flip-flops are clocked
together simultaneously (in parallel) at the same time giving a fixed time relationship. In other
words, changes in the output occur in “synchronisation” with the clock signal.
The result of this synchronisation is that all the individual output bits changing state at exactly
the same time in response to the common clock signal with no ripple effect and therefore, no
propagation delay.
It can be seen above, that the external clock pulses (pulses to be counted) are fed directly to each
130
ANNA UNIVERSITY
of the JK Flip-flops in the counter chain and that both the J and K inputs are all tied together in
toggle mode, but only in the first flip-flop, flip-flop FFA (LSB) are they connected HIGH, logic
“1” allowing the flip-flop to toggle on every clock pulse. Then the synchronous counter follows a
predetermined sequence of states in response to the common clock signal, advancing one state
for each pulse.
The J and K inputs of flip-flop FFB are connected directly to the output QA of flip-flop FFA, but
the J and K inputs of flip-flops FFC and FFD are driven from separate AND gates which are also
supplied with signals from the input and output of the previous stage. These
additional AND gates generate the required logic for the JK inputs of the next stage.
If we enable each JK flip-flop to toggle based on whether or not all preceding flip-flop outputs
(Q) are “HIGH” we can obtain the same counting sequence as with the asynchronous circuit but
without the ripple effect, since each flip-flop in this circuit will be clocked at exactly the same
time.
Then as there is no inherent propagation delay in synchronous counters, because all the counter
stages are triggered in parallel at the same time, the maximum operating frequency of this type of
frequency counter is much higher than that for a similar asynchronous counter circuit.
Because this 4-bit synchronous counter counts sequentially on every clock pulse the resulting
outputs count upwards from 0 ( 0000 ) to 15 ( 1111 ). Therefore, this type of counter is also
known as a 4-bit Synchronous Up Counter.
However, we can easily construct a 4-bit Synchronous Down Counter by connecting
the AND gates to the Q output of the flip-flops as shown to produce a waveform timing diagram
the reverse of the above. Here the counter starts with all of its outputs HIGH ( 1111 ) and it
counts down on the application of each clock pulse to zero, ( 0000 ) before repeating again.
As synchronous counters are formed by connecting flip-flops together and any number of flip-
flops can be connected or “cascaded” together to form a “divide-by-n” binary counter, the
modulo’s or “MOD” number still applies as it does for asynchronous counters so a Decade
counter or BCD counter with counts from 0 to 2n-1 can be built along with truncated sequences.
131
ANNA UNIVERSITY
All we need to increase the MOD count of an up or down synchronous counter is an additional
flip-flop and AND gate across it.
The additional AND gates detect when the counting sequence reaches “1001”, (Binary 10) and
causes flip-flop FF3 to toggle on the next clock pulse. Flip-flop FF0 toggles on every clock
pulse. Thus, the count is reset and starts over again at “0000” producing a synchronous decade
counter.
We could quite easily re-arrange the additional AND gates in the above counter circuit to
produce other count numbers such as a Mod-12 counter which counts 12 states from”0000″ to
“1011” (0 to 11) and then repeats making them suitable for clocks, etc.
Triggering A Synchronous Counter
Synchronous Counters use edge-triggered flip-flops that change states on either the “positive-
edge” (rising edge) or the “negative-edge” (falling edge) of the clock pulse on the control input
resulting in one single count when the clock input changes state.
Generally, synchronous counters count on the rising-edge which is the low to high transition of
132
ANNA UNIVERSITY
the clock signal and asynchronous ripple counters count on the falling-edge which is the high to
low transition of the clock signal.
It may seem unusual that ripple counters use the falling-edge of the clock cycle to change state,
but this makes it easier to link counters together because the most significant bit (MSB) of one
counter can drive the clock input of the next.
This works because the next bit must change state when the previous bit changes from high to
low – the point at which a carry must occur to the next bit. Synchronous counters usually have a
carry-out and a carry-in pin for linking counters together without introducing any propagation
delays.
Learning Activity
133
ANNA UNIVERSITY
Recently, some authors reserve the term flip-flop exclusively for discussing clocked circuits;
the simple ones are commonly called transparent latches. Using this terminology, a level-
sensitive flip-flop is called a transparent latch, whereas an edge-triggered flip-flop is simply
called a flip-flop. Using either terminology, the term "flip-flop" refers to a device that stores a
single bit of data, but the term "latch" may also refer to a device that stores any number of bits
of data using a single trigger. The terms "edge-triggered", and "level-triggered" may be used
to avoid ambiguity.
When a level-triggered latch is enabled it becomes transparent, but an edge-triggered flip-
flop's output only changes on a single type (positive going or negative going) of clock edge.
134
ANNA UNIVERSITY
• Shift Registers is a series of flip-flops, sharing the same clock, in which the output of
each flip-flop is connected to the data input of the next flip-flop in the chain.
• A memory device is a device which can remember if a signal of logic level 0 or 1 has
been connected to one of its inputs, and can make this fact available at an output.
• Counter is a sequential circuit (Finite state machine) that cycles through a fixed
sequence of states.
• Asynchronous Counters can easily be made from Toggle or D-type flip-flops.
• Synchronous counters are easier to design than asynchronous counters.
• Synchronous counters are sometimes called parallel counters as the clock is fed in
parallel to all flip-flops.
KEYWORDS
Sequential Logic: A type of logic circuit whose output depends not only on the present
value of its input signals but on the past history of its inputs.
Synchronous Circuit: A circuit that has an electronic oscillator, which is called a clock. It
generates a sequence of repetitive pulses called the clock signal that is distributed to all the
memory elements in the circuit.
Asynchronous Sequential Logic: A logic synchronised by a clock signal; the outputs of
the circuit change directly in response to changes in inputs.
Latch: An electronic logic circuit that has two inputs and one output.
Flip-flop: A circuit that has two stable states and can be used to store state information.
Setup Time: The minimum amount of time the data signal should be held steady before
the clock event.
Hold Time: The minimum amount of time the data signal should be held steady after the
clock event.
Recovery Time: The time between active clock edge and asynchronous signal going
inactive.
Counter: A sequential circuit that cycles through a fixed sequence of states.
Register: Stores and moves or shifts data.
SELF ASSESMENT QUESTIONS
Short Answer Questions
1. What is sequential circuit?
2. What is the difference between latch and flip flop?
3. Does sequential circuit contain memory element?
4. What is the difference between asynchronous counter and synchronous counter ?
5. How many types of shift register counters are there and write their names also?
6. Write the characteristic equation of a JK flip-flop.
Long Answer Questions
1. Explain the different types of shift register.
2. Explain the operation of D-FF.
135
ANNA UNIVERSITY
136
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcome
Overview
6.1 Von Neumann Architecture
6.2 Processor
6.2.1 Definition
6.2.2 Structure
6.2.3 Category
6.2.4 Technology
6.3 ALU Concept
6.3.1 1-Bit ALU
6.4 Stored Program
6.5 Fetch Execution Cycle
6.6 Instruction Formats
6.6.1 The Opcode Size
6.6.2 Addressing Modes
6.7 Clock Rate and Instruction Rate
6.7.1 Clock Rate
6.7.2 Instruction Rate
6.7.3 Pipeline
6.8 Current Processors
Summary
Keywords
Self-Assessment Questions
Further Reading
137
ANNA UNIVERSITY
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Understand the structure of Von Neumann Architecture
• Gain knowledge about concept of processor
• Analyze Arithmetic Logic Unit (ALU)
• Understand the fetch execute cycle
• Know the concept of clock rate and pipelining
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• To Analyze the structure of Von Neumann Architecture
• Recall the definition, structure and category processor
• How to design ALU and explain floating-point representation
• The concept of fetch execution cycle and instruction formats
OVERVIEW
In the past chapter, we have examined the idea of different successive circuits, for example,
hooks, goes back and forth, registers, recollections and counters. We have examined every one of
these circuits with their various kinds.
In this chapter, you will find out about the nuts and bolts of processor. Initially, you will
comprehend Von Neumann Architecture. Von Neumann Architecture is an early, persuasive
kind of registering structure. It essentially comprises of memory chips that can both hold and
interaction information. Each chip can perform various undertakings, contingent upon how it is
influenced by the activity executed before it. This exercise will likewise cover the definition and
structure of the processor alongside the ALU idea, stored program, fetch execute cycle, and so
forth.
Fundamentally, the processor is the piece of a computer system that executes the guidelines of a
computer program. In this manner the processor permits you to play out the essential logical,
arithmetical and input and output tasks of the system.
6.1 VON NEUMANN ARCHITECTURE
In 1945, John von Neumann, who was a mathematician at the time, had delved into the study
that, a computer could have a fixed simple structure and still be able to execute any kind of
computation without hardware modification. This is providing that the computer is properly
programmed with proper instructions, in which it is able to execute them.
Von Neumann’s primary advancement was referred to as “conditional control transfer”, which
had allowed a program sequence to be interrupted and then reinitiated at any point, furthermore
this advancement had allowed data to be stored with instructions in the same memory unit.
This was beneficial because if instructions were desired, they can be arithmetically modified in
the same way as the data.
The von Neumann architecture describes a design model for a stored program digital computer
138
ANNA UNIVERSITY
that incorporates only one single processing unit and a one single separate storage structure,
which will hold both instructions and data.
The von Neumann architecture refers to one that keeps the data as well as the programmed
instructions in read-write RAM (Random Access Memory).
Characteristics of von Neumann Architecture
As mentioned above, the von Neumann Architecture is based on the fact that the program data
and the instruction data are stored in the same memory unit. This can also be referred to as the
“stored program concept”.
This design is still used in the computer produced nowadays:
139
ANNA UNIVERSITY
carried out.
Memory Unit:
Consists of RAM, which is partitioned out and consists of an address and its contents, which are
in binary form. RAM (Random Access Memory) is a fast type of memory unlike hard drives, it
is also directly accessible by the CPU. The existence of RAM in a CPU, allows it to function a
lot quicker and hence more efficiently.
Registers:
Small block in the CPU that consists of a high-speed storage memory cells that store data before
it is processed, all logical, arithmetic, and shift operations occur here.
The register consist of 5 components
• Program Counter (PC): Holds the address of the next instruction to be executed
• Accumulator (AC): Storage area where logic and arithmetic results are stored
• Memory Address Register (MAR): Holds the address of a location of the data that is to
be read from or written to
• Memory Data Register (MDR): Temporary storage location that stores data that has
been read, or data that still needs to be written
• Current Instruction Register (CIR): Area where the current instruction is being carried
out. The operation is divided into operand and opcode.
Operand: Contains data or the address of the data (where the operation will be performed)
Opcode: Specifies type of instruction to be executed
Buses:
These are a set of parallel wires, which connect components (two or more) inside the CPU.
Within the CPU, there are three types of buses, and these are all referred to a system bus. The
types of buses are: Data bus, Control bus, and Address bus.
Data bus: This bus is referred to as a bi-directional bus, which means “bits” can be carried in
both ways. This bus is used to transport data and instructions between the processor, memory
unit, and the input/output.
Address bus: Transmits the memory address specifying where the relevant data needs to be sent
or retrieved from. (The address bus does not carry the data, only the address)
Control bus: This is also a bi-directional bus used to transmit “control signals”/commands from
the CPU (and status signals from other components) in order to control and coordinate all the
activities within the computer.
Input/Outputs:
Information passed from the user/information received by the user.
Advantages
• The control unit retrieves instruction and data in the same way from one memory
140
ANNA UNIVERSITY
unit. This simplifies the development and design of the control unit.
• The above advantage would also mean that data from memory and from devices are
accessed the same way. Therefore increasing efficiency
• An advantageous characteristics is that programmers have control of memory
organization.
6.2 PROCESSOR
In this section, we will discuss the definition, structure, categories, and technologies related
to processor.
6.2.1 Definition
In computing, a processor or processing unit is a digital circuit which performs operations on
some external data source, usually memory or some other data stream. It typically takes the form
of a microprocessor, which can be implemented on a single metal–oxide–
semiconductor integrated circuit chip. The processor is a chip or a logical circuit that responds
and processes the basic instructions to drive a particular computer. The main functions of the
processor are fetching, decoding, executing, and write back the operations of an instruction. The
processor is also called the brain of any system which incorporates computers, laptops,
smartphones, embedded systems, etc. The ALU (Arithmetic Logic Unit) and CU (Control Unit)
are the two parts of the processors. The Arithmetic Logic Unit performs all mathematical
operations such as additions, multiplications, subtractions, divisions, etc and the control unit
works like traffic police, it manages the command or the operation of the instructions. The
processor communicates with the other components also they are input/output devices and
memory/storage devices.
6.2.2 Structure
The processor alone is incapable of successfully performing any tasks. It requires memory (for
program and data storage), support logic, and at least one I/O device (“input/output device”) used
to transfer data between the computer and the outside world. The basic computer system is
shown in Figure 6.2.
141
ANNA UNIVERSITY
142
ANNA UNIVERSITY
grouped into three main categories. There could be other applicable methods to classify
computer processors.
However, this grouping will help you to ease the complication when selecting a processor both
for desktop and mobile computers. Computer processors are grouped into the following three
divisions. This grouping doesn’t include processors for Server and Workstation PCs.
1. High-End Processors
Processors in this group are designed for intensive applications, since the programs require high
processing power. What makes high-end is the advanced microprocessor technology
incorporated with these types of processors.
Other than the normal usage of computer applications, if you are into Statistical analysis,
intensive graphics, creating and editing professional videos, extreme 3D gaming, multitasking
and multi-threading application then you should choose a computer installed with a processor
categorized in high-end group.
Both Intel and AMD introduced processors categorized as high-end. The latest Intel Core I
Series processors: i3, i5, i7, i9 and AMD Phenom family processors are among high-end CPU’s.
The prices of high-end processors are a bit higher comparing with the rest CPU’s.
2. Mid-End computer processor types
Mi-range processors are meant for middle intensive tasks. Beyond the standard work you do with
mid-range processors, you can do tasks such as basic 3D Gaming, casual photo editing, home
video creating, and multimedia applications.
Some of the common mid-range Intel processors include Intel Core 2 Quad, Intel Core 2
Extreme, Intel Core 2 Duo, Intel Pentium Dual Core and Intel Core Duo/Solo. AMD Phenom 1
X3, AMD Turion family and AMD Athlon family processors are categorized into mid-range
CPUs.
3. Basic or economy computer processor types
As the name implies, processors in this group are low performing CPUs with cheap price. If you
are into non-intensive tasks such as simple gaming, office applications, internet browsing, email
and common graphics, then your choice of processor grouped as budget CPUs.
AMD Sempron, AMD Athlon Neo and Intel Atom, Intel Centrino, Centrino Duo and Celeron are
grouped into economy processors.
6.2.4 Technology
In this section, we will discuss technologies related to the processor.
RISC Architecture
Reduced Instruction Set Computer (RISC), is a type of computer architecture which operates on
small, highly optimised set of instructions, instead of a more specialised set of instructions,
143
ANNA UNIVERSITY
which can be found in other types of architectures. This architecture means that the computer
microprocessor will have fewer cycles per instruction.
The word “Reduced Instruction Set” may be incorrectly interpreted to refer to “reduced number
of instructions”. Though this is not the case, the term actually means that the amount of work
done by each instruction is decreased in terms of number of cycles.
Due to the design of Alan Turing 1946 Automatic Computing Engine, it had many
characteristics that resembled RISC architecture, furthermore many traits of RISC architectures
were seen in the 1960s due to them embodying the load/store approach.
That being said the term RISC had first been used by David Patterson of “Berkeley RISC
project”, who is considered to be a pioneer in his RISC processor designs. Patterson is currently
the Vice Chair of Board of Directors of the RISC-V Foundation.
A RICS chip doesn’t require many transistors, which makes them less costly to design and to
produce. One of RISCs main characteristics is that the instruction set contains relatively simple
and basic instruction from which more complex instructions can be produced.
RISC processors/architectures are used across a wide range of platforms nowadays, ranging from
tablet computers to smartphones, as well as supercomputers.
CISC Architecture
CISC, which stands for “Complex Instruction Set Computer”, is computer architecture where
single instructions can execute several low level operations, for instance, “load from memory an
arithmetic operation, and a memory store). CISC processors are also capable of executing multi-
step operations or addressing modes with single instructions.
CISC, as with RISC, is a type of microprocessor that contains specialised simple/complex
instructions.
Until recent times, all major manufacturers of microprocessors had used CISC based designs to
develop their products. The reason for that was because, CISC was introduced around the early
1970’s, where it was used for simple electronic platforms, such as stereos, calculators, video
games, not personal computers, therefore allowing the CISC technology to be used for these
types of applications, as it was more suitable.
However, eventually, CISC microprocessors found their way into personal computers, this was
to meet the increasing need of PC users. CISC manufactures started to focus their efforts from
general-purpose designs to a high performance computing orientation.
6.3 ALU CONCEPT
Inside a computer, there is an Arithmetic Logic Unit (ALU), which is capable of performing
logical operations (e.g. AND, OR, Ex-OR, Invert etc.) in addition to the arithmetic operations
(e.g. Addition, Subtraction etc.). The control unit supplies the data required by the ALU from
memory, or from input devices, and directs the ALU to perform a specific operation based on the
144
ANNA UNIVERSITY
instruction fetched from the memory. ALU is the “calculator” portion of the computer.
An arithmetic logic unit(ALU) is a major component of the central processing unit of the a
computer system. It does all processes related to arithmetic and logic operations that need to be
done on instruction words. In some microprocessor architectures, the ALU is divided into the
arithmetic unit (AU) and the logic unit (LU).
An ALU can be designed by engineers to calculate many different operations. When the
operations become more and more complex, then the ALU will also become more and more
expensive and also takes up more space in the CPU and dissipates more heat. That is why
engineers make the ALU powerful enough to ensure that the CPU is also powerful and fast, but
not so complex as to become prohibitive in terms of cost and other disadvantages.
ALU is also known as an Integer Unit (IU). The arithmetic logic unit is that part of the CPU that
handles all the calculations the CPU may need. Most of these operations are logical in nature.
Depending on how the ALU is designed, it can make the CPU more powerful, but it also
consumes more energy and creates more heat. Therefore, there must be a balance between how
powerful and complex the ALU is and how expensive the whole unit becomes. This is why faster
CPUs are more expensive, consume more power and dissipate more heat.
Different operation as carried out by ALU can be categorized as follows –
logical operations − These include operations like AND, OR, NOT, XOR, NOR, NAND, etc.
Bit-Shifting Operations − This pertains to shifting the positions of the bits by a certain number
of places either towards the right or left, which is considered a multiplication or division
operations.
Arithmetic operations − This refers to bit addition and subtraction. Although multiplication and
division are sometimes used, these operations are more expensive to make. Multiplication and
subtraction can also be done by repetitive additions and subtractions respectively.
6.3.1 1-Bit ALU
The logical operations are easiest, because they map directly onto the hardware components. The
1-bit logical unit for AND and OR looks like Figure 6.5. T e multiplexor on the right then selects
a AND b or a OR b, depending on whether the value of Operation is 0 or 1. T e line that controls
the multiplexor is shown in color to distinguish it from the lines containing data. Notice that we
145
ANNA UNIVERSITY
have renamed the control and output lines of the multiplexor to give them names that reflect the
function of the ALU. The next function to include is addition. An adder must have two inputs for
the operands and a single-bit output for the sum. There must be a second output to pass on the
carry, called CarryOut. Since the CarryOut from the neighbor adder must be included as an
input, we need a third input. This input is called CarryIn. Figure 6.6 shows the inputs and the
outputs of a 1-bit adder. Since we know what addition is supposed to do, we can specify the
outputs of this “black box” based on its inputs, as Figure 6.8 demonstrates. We can express the
output functions CarryOut and Sum as logical equations, and these equations can in turn be
implemented with logic gates. Let’s do CarryOut. Figure 6.9 shows the values of the inputs when
CarryOut is a 1.
We can turn this truth table into a logical equation:
Carryout = (b.CarryIn) + (a.CarryIn) + (a.b) + (a.b.CarryIn)
146
ANNA UNIVERSITY
If a . b . CarryIn is true, then all of the other three terms must also be true, so we can leave out
this last term corresponding to the fourth line of the table. We can thus simplify the equation to
CarryOut= (b.CarryIn) +(a.CarryIn)+(a.b)
Figure 6.9 shows that the hardware within the adder black box for CarryOut consists of three
AND gates and one OR gate. The three AND gates correspond exactly to the three parenthesized
terms of the formula above for CarryOut, and the OR gate sums the three terms.
The Sum bit is set when exactly one input is 1 or when all three inputs are 1. The Sum results in
a complex Boolean equation (recall that a means NOT a):
The drawing of the logic for the Sum bit in the adder black box is left as an exercise for the
reader.
Figure 6.10 shows a 1-bit ALU derived by combining the adder with the earlier components.
Sometimes designers also want the ALU to perform a few more simple operations, such as
generating 0. The easiest way to add an operation is to expand the multiplexor controlled by the
Operation line and, for this example, to connect 0 directly to the new input of that expanded
multiplexor.
Figure 6.10: A 1-bit ALU that performs AND, OR, and addition
147
ANNA UNIVERSITY
Let us discuss some vital aspects of stored program computer architecture. First significant
aspect of stored program computer architecture is given below:
• It allows programs to load and store into the processor from the main memory with ease.
• Same set of control signal(s) and debugged, the numbers representing its instructions can
be written down onto a storage device, allowing the program to load back into the (main)
memory in the future.
Second ascpects of the stored-program abstraction is given below:
• Programs that treat themselves as data can also function as the self-modifying programs.
• This aspects is perhaps even more important than first.
• It allows programs to treat themselves or other programs as data.
• It facilitates compliers, debugging, programming tools designing.
6.5 FETCH EXECUTION CYCLE
The basic operation of a computer is called the ‘fetch-execute’ cycle. The CPU is designed to
understand a set of instructions - the instruction set. It fetches the instructions from the
main memory and executes them. This is done repeatedly from when the computer is booted up
to when it is shut down.
148
ANNA UNIVERSITY
The CPU fetches the instructions one at a time from the main memory into the registers. One
register is the program counter (pc). The pc holds the memory address of the next instruction to
be fetched from main memory.
• The CPU decodes the instruction.
• The CPU executes the instruction.
• Repeat until there are no more instructions.
A single piece of program code might require several instructions. Look at this Python (3.x)
code:
area = length * width
First, the computer needs to load in the value of the variable length into the immediate access
store (registers). Next it needs to load in the value of the variable width. Then it needs to
multiply the two numbers together, and finally it needs to store the result in the variable area.
6.6 INSTRUCTION FORMATS
One of the first lessons a beginner learns about computers is that in order to use a computer, a
program must be written, translated into machine language, and loaded into memory; that such a
program consists of machine instructions, and that a machine instruction is a number. In this
chapter we look at the details of machine instructions, and in particular the fact that a machine
instruction is rarely a single number. It normally consists of several numbers that are assembled
(placed together) by the assembler, to become the fields of the complete instruction. As a result,
any machine instruction has a certain format. The instruction is divided into fields, each a
number. A machine instruction must contain at least one field, namely the operation code
(opcode), that tells the control unit what the instruction is supposed to do. Most instructions
contain other fields specifying registers, memory addresses, immediate quantities, addressing
modes, and other parameters.
Figure 6.12 illustrates typical instruction formats. They range from the simplest to the very
complex and they demonstrate the following properties of machine instructions:
• Instructions can have different sizes. The size depends on the number and nature of the
individual fields.
149
ANNA UNIVERSITY
• The opcode can have different sizes. The opcode can also be broken up into a number of
separate fields.
• A field containing an address is much larger than a register field. This is because a
register number is small, typically 3–4 bits, while an address is large (typically 25–35
bits).
• Fields containing immediate operands (numbers used as data) can have different sizes.
Experience shows that most data items used by programs are small. Thus, a well-
designed instruction set should allow for both small and large data items, resulting in
short instructions whenever possible.
The last point, about short instructions, is important and leads us to the next topic of discussion,
namely the properties of a good instruction set. When a new computer is designed, one of the
first features that has to be determined and fully specified is the instruction set. First, the
instruction set has to be designed as a whole, then the designers decide what instructions should
be included in the set. The general form of the instruction set depends on features such as the
intended applications of the computer, the word size, address size, register set, memory
organization, and bus organization. Instruction sets vary greatly, but a good design for an
instruction set is based on a small number of principles that are independent of the particular
computer in question. These principles demand that an instruction set should satisfy the
following requirements:
1. Short instructions.
2. An instruction size that is both compatible with the word size, and variable.
1. Short instructions: Why should machine instructions be short? Not because short instructions
occupy less memory. Memory isn’t expensive and modern computers support large memories.
Also, not because short instructions execute faster. The execution time of an instruction has
nothing to do with its size. A “register divide” instruction, to divide the contents of two floating-
point registers, is short, since it only has to specify the registers involved. It takes, however, a
long time to execute.
Exercise 6.1: Find an example of a long instruction with fast execution. The reason why short
instructions are preferable is that it takes less time to fetch them from memory. An instruction
that is longer than one memory word has to be stored in at least two words. It therefore takes two
memory cycles to fetch it from memory. Cutting the size of an instruction such that it fits in one
word, cuts down the fetch time. Even though the fetch time is short (it is measured in
nanoseconds), many instructions are located inside loops and have to be executed (and therefore
fetched) many times. Also, since memory is slower than the processor, speeding up the
instruction fetch can speed up the entire computer.
2. Instruction size: The instruction size should also be compatible with the computer’s word
size. The best design results in instruction sizes of N, 2N and 3N where N is the word size.
150
ANNA UNIVERSITY
Instruction sizes such as 1.2N or 0.7N do not make any sense, since each memory read brings in
exactly one word. In a computer with long words, several instructions can be packed in one word
and, as a result, instruction sizes of N/2, N/3, and N/4 also make sense. The two requirements
above are satisfied by the use of variable-size opcodes and addressing modes. These two
important concepts are discussed next.
6.6.1 The Opcode Size
If the instruction is to be short, individual fields in the instruction should be short. In this section
we look at the opcode field. Obviously, the opcode cannot be too short or there would not be
enough codes for all the instructions. An opcode size of 6–8 bits, allowing for 64–256
instructions, is common. Most modern computers, however, use variable size opcodes, for two
good reasons. One reason has to do with the instruction size in relation to the word size, and the
other has to do with future extensions of the instruction set.
The first reason is easy to understand. We want our instructions to be short, but some instructions
have to contain more information than others and are naturally longer. If the opcode size varies,
longer instructions can be assigned short opcodes. Other instructions, with short operands, can be
assigned longer opcodes. This way the instruction size can be fine-tuned to fit in precisely N or
2N bits.
The second advantage of variable-size opcodes has to do with extensions to an original
instruction set. When a computer becomes successful and sells well, the manufacturer may
decide to come up with a new, extended, and upward compatible version of the original
computer. The 68000, 80x86, and Pentium families are familiar examples.
Upward compatibility means that any program running on the original computer should also run
on the new one. The instruction set of the new computer must therefore be an extension of the
original instruction set. With variable-size opcodes, such an extension is easy.
When an instruction is fetched, the hardware does not know what its opcode is. It has to decode
the instruction first. In a computer with variable-size opcodes, when an instruction is fetched, the
control unit does not even know how long the opcode is. It has to start by identifying the opcode
size, following which it can decode the instruction. Thus, with variable-size opcodes, the control
unit has to work harder. Three methods to implement variable-size opcodes are described here.
Prefix codes. We start by simply choosing several opcodes with the sizes that we need. For
example, if we need a 3-bit opcode in order to adjust the size of instruction A to one word, a 6-
bit opcode to adjust the size of instruction B to one word, and other opcodes with sizes 2, 4, and
7 bits, we can select the set of opcodes 000, 101010, 11, 0001, and 1110001. A little thinking,
however, shows that our opcodes cannot be decoded uniquely. Imagine an instruction with
opcode 0001 being fetched in Step 1 of the fetch-execute cycle and decoded in Step 3. The
control unit examines the first few bits of the instruction and finds them to be 0001... . It does not
know whether the opcode is the four bits 0001 or the three bits 000 (with the 1 being the start of
the next field). The problem exists because we selected the bit pattern 000 as both a code and the
151
ANNA UNIVERSITY
start of another code. The solution is to avoid such a situation. We have to design our opcodes
following the simple rule:
Once a bit pattern has been assigned as an opcode, no other opcode should start with that pattern
The five opcodes above can now be redesigned as, for example, 000, 101010, 11, 100, and
0100100. We say that these opcodes satisfy the prefix property, or that they are prefix codes.
These codes are discussed in more detail in Section 3.11. It is easy to see, by checking each of
the five codes, that they can be uniquely decoded. When the control unit finds, for example, the
pattern 000 it knows that this must be the first code, since no other codes start with 000. When it
finds 10... , it knows that this must be either code 2 or code 4. It checks the next bit to find out
which code it is.
Fixed prefix. The first 2–4 bits of the instruction tell what the opcode size is (and even what the
instruction format is). Figure 6.13 shows four instruction formats, denoted by 0–3 and identified
by the first two opcode bits. When the control unit fetches an instruction that starts with 00, it
knows that the next three bits are the opcode, so there can be eight instructions with a 5-bit
opcode (including the identification bits). Similarly, when the control unit fetches an instruction
that starts with 01, it knows that the opcode consists of the next two bits plus three more bits
elsewhere in the instruction, so there can be 32 instructions with a 7-bit opcode (including the
two identification bits).
Exercise 6.2: How many instructions can there be with the other two opcode sizes?
With fixed-size opcodes, the opcode size for a set of 120 instructions would be seven bits. The
advantage of this method is that several opcode sizes are available and the opcode size can easily
be determined. The disadvantages are: (1) the instruction set cannot be extended; there is no
room for new opcodes and (2) because of the prefix, the average opcode is longer than in the
case of fixed-size opcodes.
6.6.2 Addressing Modes
This is an important feature of computers. We start with the known fact that many instructions
have to include addresses; the instructions should be short, but addresses tend to be long.
Addressing modes are a solution to this problem. Using addressing modes, a short instruction
may specify a long address.
152
ANNA UNIVERSITY
The idea is that the instruction no longer includes the full address (also called the effective
address, or EA), but instead contains a number, normally called a displacement, that’s closely
related to the address. Another field, an addressing mode, is also added to the instruction. The
addressing mode is a code that tells the control unit how to obtain the effective address from the
displacement. It is more accurate to say that the addressing mode is a rule of calculation, or a
function, that uses the displacement as its main argument, and other hardware components—such
as the PC, the registers, and memory locations—as secondary arguments, and produces the EA
as a result.
The notation EA = fM(Disp,PC, Regs, Mem), where the subscript M indicates the mode is
convenient. For different modes there are different functions, but the idea is the same.
Before any individual modes are discussed, it may be useful to look at some of the numbers
involved when computers, memories, and instructions are designed.
Up until the mid 1970s, memories were expensive, and computers supported small memories. A
typical memory size in a second generation computer was 16K–32K words (24–48 bits/word),
and in an early third generation computer, 32K–64K words (48–60 bits/word). Today, with much
lower hardware prices, modern computers can access much larger memories. Most of the early
microprocessors could address 64K bytes, and most of today’s microprocessors can address
between 32M words and a few tera words (usually 8-bit words, bytes). As a result, those
computers must handle long addresses. Since 1M (1 mega) is defined as 1024 K = 1024 × 210 =
220, an address in a 1M memory is 20 bits long. In a 32M memory, an address is 25 bits long.
The Motorola 68000 microprocessor [Kane 81] and the Intel 80386 [Intel 85] generate 32-bit
addresses, and can thus physically address 4G (giga) bytes. (their virtual address space, though,
is 64 tera bytes, = 246.) Computers on the drawing boards right now could require much longer
addresses.
6.7 CLOCK RATE AND INSTRUCTION RATE
Processor circuits are controlled by a timing signal called clock. The clock designer the regular
time intervals called clock cycles. To execute a machine instruction the processor divides the
action to be performed into a sequence of basic steps that each step can be completed in one
clock cycle. The length P of one clock cycle is an important parameter that affects the processor
performance. Processor used in today’s personal computer and work station have a clock rates
that range from a few hundred million to over a billion cycles per second.
We now focus our attention on the processor time component of the total elapsed time. Let ‘T’
be the processor time required to execute a program that has been prepared in some high-level
language. The compiler generates a machine language object program that corresponds to the
source program. Assume that complete execution of the program requires the execution of N
machine cycle language instructions. The number N is the actual number of instruction execution
and is not necessarily equal to the number of machine cycle instructions in the object program.
Some instruction may be executed more than once, which in the case for instructions inside a
program loop others may not be executed all, depending on the input data used.
153
ANNA UNIVERSITY
Suppose that the average number of basic steps needed to execute one machine cycle instruction
is S, where each basic step is completed in one clock cycle. If clock rate is ‘R’ cycles per second,
the program execution time is given by
T=N*S/R
this is often referred to as the basic performance equation.
We must emphasize that N, S & R are not independent parameters changing one may affect
another. Introducing a new feature in the design of a processor will lead to improved
performance only if the overall result is to reduce the value of T.
Individual instructions still require several clock cycles to complete. But for the purpose of
computing T, effective value of S is 1.
A higher degree of concurrency can be achieved if multiple instructions pipelines are
implemented in the processor. This means that multiple functional units are used creating parallel
paths through which different instructions can be executed in parallel with such an arrangement,
it becomes possible to start the execution of several instructions in every clock cycle. This mode
of operation is called superscalar execution.
If it can be sustained for a long time during program execution the effective value of S can be
reduced to less than one. But the parallel execution must preserve logical correctness of
programs that is the results produced must be same as those produced by the serial execution of
program instructions. Now days many processors are designed in this manner.
6.7.1 Clock Rate
These are two possibilities for increasing the clock rate ‘R’.
1. Improving the IC technology makes logical circuit faster, which reduces the time of execution
of basic steps. This allows the clock period P, to be reduced and the clock rate R to be increased.
2. Reducing the amount of processing done in one basic step also makes it possible to reduce the
clock period P. however if the actions that have to be performed by an instructions remain the
same, the number of basic steps needed may increase.
Increase in the value ‘R’ that are entirely caused by improvements in IC technology affects all
aspects of the processor’s operation equally with the exception of the time it takes to access the
main memory. In the presence of cache the percentage of accesses to the main memory is small.
Hence much of the performance gain excepted from the use of faster technology can be realized.
6.7.2 Instruction Rate
Simple instructions require a small number of basic steps to execute. Complex instructions
involve a large number of steps. For a processor that has only simple instruction a large number
of instructions may be needed to perform a given programming task. This could lead to a large
value of ‘N’ and a small value of ‘S’ on the other hand if individual instructions perform more
complex operations, a fewer instructions will be needed, leading to a lower value of N and a
154
ANNA UNIVERSITY
larger value of S. It is not obvious if one choice is better than the other.
But complex instructions combined with pipelining (effective value of S ¿ 1) would achieve one
best performance. However, it is much easier to implement efficient pipelining in processors
with simple instruction sets.
RISC and CISC are computing systems developed for computers. Instruction set or instruction
set architecture is the structure of the computer that provides commands to the computer to guide
the computer for processing data manipulation. Instruction set consists of instructions,
addressing modes, native data types, registers, interrupt, exception handling and memory
architecture. Instruction set can be emulated in software by using an interpreter or built into
hardware of the processor. Instruction Set Architecture can be considered as a boundary between
the software and hardware. Classification of microcontrollers and microprocessors can be done
based on the RISC and CISC instruction set architecture.
6.7.3 Pipeline
We assume that instructions are executed one after the other. Hence the value of S is the total
number of basic steps, or clock cycles, required to execute one instruction. A substantial
improvement in performance can be achieved by overlapping the execution of successive
instructions using a technique called pipelining.
Consider Add R1 R2 R3
This adds the contents of R1 & R2 and places the sum into R3.
The contents of R1 & R2 are first transferred to the inputs of ALU. After the addition operation is
performed, the sum is transferred to R3. The processor can read the next instruction from the
memory, while the addition operation is being performed. Then of that instruction also uses, the
ALU, its operand can be transferred to the ALU inputs at the same time that the add instructions
is being transferred to R3. In the ideal case if all instructions are overlapped to the maximum
degree possible the execution proceeds at the rate of one instruction completed in each clock
cycle.
6.8 CURRENT PROCESSORS
Different kinds of processors have flooded the market. As you know, the two major
manufacturers of computer microprocessors are Intel and Advanced Micro Devices (AMD).
When it comes to speed, quality and efficiency, their products lead the market. Both companies
make single-core and multi-core processors. Intel’s desktop CPUs consist of Core, Pentium and
Celeron whereas AMD’s desktop processors consist of Athlon, phenom ans Sempron. Intel also
makes Pentium M, Celeron M and Core mobile processors for notebooks where AMD makes
versions of its Athlon, Turion and Sempron processors.
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor
company based in Santa Clara, California, that develops computer processors and related
technologies for business and consumer markets. While it initially manufactured its own
155
ANNA UNIVERSITY
processors, the company later outsourced its manufacturing, a practice known as going fabless,
after Global Foundries was spun off in 2009. AMD's main products
include microprocessors, motherboard chipsets, embedded processors and graphics
processors for servers, workstations, personal computers and embedded system applications.
Recently both processor lunched latest processor in the market. Some of them discussed below:
Sandy Bridge-E:
Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation
of the Intel Core processors (Core i7, i5, i3). The Sandy Bridge microarchitecture is the
successor to Nehalem and Westmere microarchitecture. Intel demonstrated a Sandy Bridge
processor in 2009, and released first products based on the architecture in January 2011 under
the Core brand.[2][3]
Sandy Bridge is manufactured in the 32 nm process and has a soldered contact with the die and
IHS (Integrated Heat Spreader), while Intel's subsequent generation Ivy Bridge uses a 22 nm die
shrink and a TIM (Thermal Interface Material) between the die and the IHS.
The main specification of the processor i7-3970X are:
Haswell:
Haswell processors will have 4 CPU cores with Hyper-Threading support, 8 MB L3 cache on
the i7-4900 series, and 6 MB cache on the i7-4800 series. The i7 processors will be clocked
from 2.7 GHz to 3 GHz, reaching as high as 3.7 - 3.9 GHz when Turbo Boost feature is active.
The i7-4930MX is supposed to be unlocked, and could be over clocked to achieve higher
performance. All processors will support vPro technology, VT-d (I/O virtualization) and
Trusted Execution. The CPUs will integrate HD 4600 graphics with 400 MHz base frequency.
The graphics unit will have Turbo technology enabled, that will increase GPU frequency up to
1.35 GHz on the unlocked i7-4930MX, and up to 1.3 GHz on the i7-4800MQ and i7-4900MQ
parts.
Feature Value
Number of Cores 4
Number of Threads 8
Clock Speed 3.5 GHz
Code Size 22nm
TDP 84 W
Memory Bandwidth 25.6 G/s
156
ANNA UNIVERSITY
Piledriver:
The CPU indeed runs at 4.0GHz frequency and turbo core speed of 4.2GHz. The CPU consists
of 16MB Cache (8MB L3/8MB L2), Supports DDR3-1866MHz+ and is compatible with the
AM3+ Socket. TDP of this part would be set at 125W.
With the architecture improvements in Piledriver, we could witness an 10-15% Multitasking
performance improvement with the FX-8350 over FX-8150.
Feature Value
Number of Cores 8
Number of Threads 8
Clock Speed 4.0 GHz
Code Size 32nm
TDP 125 W
Memory Bandwidth 19 G/s
Richland:
Richland core is manufactured on 32nm technology, and it combines 1 or 2 Piledriver modules
(2 or 4 CPU cores), and Radeon HD 8000 series graphics. Compared to its predecessor, Richland
core doesn't have any new CPU features, however the integrated GPU is upgraded to Radeon HD
8000 series graphics. Both the CPU and GPU portions chips offer better performance mainly due
to higher clock speeds. AMD also added support for DDR3-2133 memory to A10-Series APUs.
Mobile Desktop Richland APUs are available in a socket FM2-compatible package. Mainstream
mobile processors work in existing socket FS1 (FS1r2) motherboards, whereas mobile ULV
processors are produced in a BGA package.
Feature Value
Number of Cores 4
Number of Threads 4
Clock Speed 4.1 GHz
Code Size 32nm
TDP 100W
Memory
Bandwidth
Steamroller:
AMD Steamroller Family 15h is a microarchitecture developed by AMD for AMD APUs, which
succeeded Piledriver in the beginning of 2014 as the third-generation Bulldozer-based
microarchitecture. Steamroller APUs continue to use two-core modules as their predecessors,
while aiming at achieving greater levels of parallelism.
Learning Activity
157
ANNA UNIVERSITY
AMD Processor
Advanced Micro Devices, Inc. (AMD), global company that specializes in manufacturing
semiconductor devices used in computer processing. The company also produces flash
memories, graphics processors, motherboard chip sets, and a variety of components used in
consumer electronics goods. The company is a major supplier of microprocessors (computer
chips). AMD is based in Sunnyvale, California.
AMD was founded in 1969 by Walter Jeremiah (Jerry) Sanders, a former executive at
Fairchild Semiconductor Corporation, and seven others. The company released its first
product in 1970 and went public two years later. In the mid-1970s the company began
producing computer chips. Starting out as a second-source manufacturer of computer chips,
the company placed a great emphasis on quality and grew steadily. In 1982 the company
began supplying second-source chips for the Intel Corporation, which made the
microprocessor used in IBM personal computers (PCs). The agreement with Intel ended in
1986. In 1991 AMD released the Am386 microprocessor family, a reverse-engineered chip
that was compatible with Intel’s next-generation 32-bit 386 microprocessor. There ensued a
long legal battle that was finally decided in a 1994 U.S. Supreme Court ruling in AMD’s
favour. That same year, Compaq Computer Corporation contracted with AMD to produce
Intel-compatible chips for their computers. In 1996 AMD acquired a microprocessor
company known as NexGen and began branching out from the Intel-compatible chip market.
In 2000 AMD introduced the Athlon processor, which was designed to run the Microsoft
Corporation’s Windows operating system. With the release of the Athlon processor, AMD
became the first company to produce a 1-GHz (gigahertz) microprocessor, which marked
AMD as a serious competitor in the chip market. In 2003 the company released the Opteron
chip, another product that showcased the company’s ability to produce high-end chips. In
2006 AMD absorbed ATI Technologies, a manufacturer of video graphics cards for use in
PCs. In 2008 AMD announced plans to split the company in two—with one part designing
microprocessors and the other manufacturing them. This announcement followed news that
the Advanced Technology Investment Company and the Mubadala Development Company,
both based in Abu Dhabi, would acquire a controlling interest in AMD, pending approval by
shareholders and the U.S. and German governments. In 2009, following a series of complaints
lodged by AMD, the European Commission fined rival Intel a record €1.06 billion (£948
million; $1.45 billion) for engaging in anticompetitive practices that violated the European
Union’s antitrust laws. These practices allegedly involved financially compensating and
providing rebates to manufacturers and retailers who favoured its computer chips over those
of AMD, as well as paying manufacturers to cancel or postpone the launching of products
utilizing AMD’s chips. In 2014 the company was restructured into two parts: computing and
graphics, which made processors for personal computers, and enterprise, embedded, and
semi-custom, which made more-specialized.
158
ANNA UNIVERSITY
On a scale of 1-10, AMD processors come at 5-10. It is cheaper than Intel Processors at a
similar range. These processors are efficient compared to the current generation Core series.
AMD APUs are also a good option for their good iGPU performance and comparable CPU
performance to Core i series. Laptops powered with Ryzen processors often clock lower and
less aggressively compared to Intel-powered laptops, they often run cooler and longer on
battery, thus for laptops, when higher iGPU performance and longer battery life is preferred,
Ryzen powered laptops can be used. Although, when building a new Desktop PC, older FX
series CPUs A-series APUs and should be avoided for their higher power consumption and
heat output.
If we talk about the desktop, mobile, and you only want to do normal gaming and for
everyday use, then Ryzen APU is the way to go. For heavier tasks like video editing, 3D
modelling, etc, Ryzen 7 or 9 CPUs or Threadripper should be preferred.
For Ryzen Desktop CPUs and APUs in the AM4 platform, the motherboard chipset should be
checked for support otherwise PC may not boot, although it can be easily solved with
motherboards with USB BIOS flashing for newer processors.
Example – AMD Ryzen, AMD Threadripper, AMD FX-Series, AMD EPYC, AMD Opteron,
AMD Athlon 64
Question:
1. Describe brief about AMD Processor.
SUMMARY
• The von Neumann architecture—also known as the von Neumann model or Princeton
architecture—is a computer architecture based on a 1945 description by John von
Neumann and others in the First Draft of a Report on the EDVAC.
• A central processing unit, also called a central processor, main processor or just
processor, is the electronic circuitry that executes instructions comprising a computer
program.
• An arithmetic-logic unit (ALU) is the part of a computer processor (CPU) that carries out
arithmetic and logic operations on the operands in computer instruction words.
• Stored-program computer, a computer that stores instructions in its memory to enable it
to perform a variety of tasks in sequence or intermittently.
159
ANNA UNIVERSITY
160
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
7.1 Microprocessors
7.1.1 Moore’s Law
7.2 The Need for Multicore
7.2.1 Multicore Basics
7.3 Multicore Implementations
7.3.1 Intel and AMD Dual-Core Processors
7.3.2 Tilera TILE64
7.4 Multicore Challenges
7.4.1 Power and Temperature
7.4.2 Cache Coherence
7.5 Open Issues
7.5.1 Improved Memory System
7.5.2 System Bus and Interconnection Networks
7.5.3 Parallel Programming
7.5.4 Homogeneous vs Heterogeneous Cores
7.5.5 False Sharing
Summary
Keywords
Self-Assessment Questions
Further Reading
161
ANNA UNIVERSITY
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Understand the concept of microprocessors
• Identify the need for multicore
• Analyze the multicore implementations
• Describe some challenges faced in multicore processing
• Identify the issues related to multicore processors.
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Recall the Moore’s law
• Illustrate the concept of multicore
• Design of Intel and AMD dual-core processors
• The open issues related to design system bus and interconnection networks
OVERVIEW
In the past chapter, we have considered the idea of Von-Neumann design. We have found out
about the fundamentals of processor, for example, design of the processor, ALU idea, and so
forth put away projects and get execute cycle.
Also, we have examined guidance designs, clock rate and guidance rate alongside the idea of
pipeline.
In this chapter, you will consider the idea of multicore processors. Multi-core is typically the
term used to clarify at least two CPUs working at something similar time on a similar chip.
Otherwise called multicore innovation, it is a sort of engineering wherein a single actual
processor comprises of the core rationale of at least two processors. A double core set-up is
somewhat practically identical to having various, various processors introduced in a similar
computer, anyway as the two processors are truly connected to a similar socket, the association
between them is speedier. This exercise will cover the idea of chip, multicore executions and
difficulties alongside the issues identified with multicore processors. Multicore handling is a
growing industry pattern as single-core processors rapidly achieve the actual furthest reaches of
conceivable intricacy and speed. Greater part of the new systems are multicore. Systems with an
extraordinary number of processor core – tens or hundreds – are now and again portrayed as
many-core or greatly multicore systems.
7.1 MICROPROCESSORS
Microprocessor is a controlling unit of a micro-computer, fabricated on a small chip capable of
performing ALU (Arithmetic Logical Unit) operations and communicating with the other
devices connected to it.
162
ANNA UNIVERSITY
Microprocessor consists of an ALU, register array, and a control unit. ALU performs
arithmetical and logical operations on the data received from the memory or an input device.
Register array consists of registers identified by letters like B, C, D, E, H, L and accumulator.
The control unit controls the flow of data and instructions within the computer.
163
ANNA UNIVERSITY
every two years, though the cost of computers is halved. Moore's Law states that we can expect
the speed and capability of our computers to increase every couple of years, and we will pay less
for them. Another tenet of Moore's Law asserts that this growth is exponential.
Moore’s law, prediction made by American engineer Gordon Moore in 1965 that the number
of transistors per silicon chip doubles every year.
For a special issue of the journal Electronics, Moore was asked to predict developments over the
next decade. Observing that the total number of components in these circuits had roughly
doubled each year, he blithely extrapolated this annual doubling to the next decade, estimating
that microcircuits of 1975 would contain an astounding 65,000 components per chip. In 1975, as
the rate of growth began to slow, Moore revised his time frame to two years. His revised law was
a bit pessimistic; over roughly 50 years from 1961, the number of transistors doubled
approximately every 18 months. Subsequently, magazines regularly referred to Moore’s law as
though it were inexorable—a technological law with the assurance of Newton’s laws of motion.
164
ANNA UNIVERSITY
multiple execution cores in one chip. Though manufacturer designs vary from one another,
multicore architectures require to stick to definite aspects. The fundamental configuration of
a microprocessor is observed in Figure 7.3.
Figure 7.5: (a) Shared Memory Model, (b) Distributed Memory Model
165
ANNA UNIVERSITY
Multicore processing is typically common place because it offers advantages in the following
seven areas:
1. Energy Efficiency. By using multicore processors, architects can decrease the number of
embedded computers. They overcome increased heat generation due to Moore's Law (i.e.,
smaller circuits increase electrical resistance, which creates more heat), which in turn
decreases the need for cooling. The use of multicore processing reduces power
consumption (less energy wasted as heat), which increases battery life.
2. True Concurrency. By allocating applications to different cores, multicore processing
increases the intrinsic support for actual (as opposed to virtual) parallel processing within
individual software applications across multiple applications.
3. Performance. Multicore processing can increase performance by running multiple
applications concurrently. The decreased distance between cores on an integrated chip
enables shorter resource access latency and higher cache speeds when compared to using
separate processors or computers. However, the size of the performance increase depends
on the number of cores, the level of real concurrency in the actual software, and the use
of shared resources.
4. Isolation. Multicore processors may improve (but do not guarantee) spatial and temporal
isolation (segregation) compared to single-core architectures. Software running on one
core is less likely to affect software on another core than if both are executing on the
same single core. This decoupling is due to both spatial isolation (of data in core-specific
cashes) and temporal isolation, because threads on one core are not delayed by threads on
another core. Multicore processing may also improve robustness by localizing the impact
of defects to single core. This increased isolation is particularly important in the
independent execution of mixed-criticality applications (mission-critical, safety critical,
and security-critical).
5. Reliability and Robustness. Allocating software to multiple cores increases reliability
and robustness (i.e., fault and failure tolerance) by limiting fault and/or failure
propagation from software on one core to software on another. The allocation of software
to multiple cores also supports failure tolerance by supporting failover from one core to
another (and subsequent recovery).
6. Obsolescence Avoidance. The use of multicore processors enables architects to avoid
technological obsolescence and improve maintainability. Chip manufacturers are
applying the latest technical advances to their multicore chips. As the number of cores
continues to increase, it becomes increasingly hard to obtain single-core chips.
7. Hardware Costs. By using multicore processors, architects can produce systems with
fewer computers and processors.
7.3 MULTICORE IMPLEMENTATIONS
As it happens with any technology, multicore architectures from various manufacturers
differ tremendously. Along with variations in communication and memory configuration
another difference comes in the shape of how many cores the microprocessor has. In few
166
ANNA UNIVERSITY
multicore architectures, distinct cores have distinct functions, therefore they are
heterogeneous.
7.3.1 Intel and AMD Dual-Core Processors
Intel as well as AMD are the mainstream manufacturers of microprocessors. Intel generates
many varied flavours of multicore processors. shows block diagrams for the Core 2 Duo and
Athlon 64 X2, respectively. The figure is self explanatory; both architectures are homogenous
dual-core processors.
Intel's Core 2 Duo contains a shared memory model with private L1 caches and a shared L2
cache. When a L1 cache miss occur, both the L2 cache and the second core's L1 cache are
traversed in parallel before sending a request to main memory. In contrast, the Athlon follows a
distributed memory model with discrete L2 caches. Instead of a bus, the L2 cache shares a
system request interface.
Example: The Pentium D is used in desktops, Core 2 Duo is used in both laptop and
desktop environments, and the Xeon processor is used in servers. AMD also generates
several different flavours.
In Figure 7.6, you can see the block diagrams for the Core 2 Duo and Athlon 64 X2,
respectively. Both architectures are homogenous dual-core processors. The Core 2 Duo
sticks to a shared memory model with private L1 caches as well as a shared L2 cache which
“provides a peak transfer rate of 96 GB/sec.”
If a L1 cache miss happens both the L2 cache and the second core’s L1 cache are crossed in
parallel prior to sending a request to main memory. In contrast, the Athlon pursues a
distributed memory model with discrete L2 caches. Such L2 caches share a system request
interface, removing the need for a bus.
The system request interface also links the cores with an on-chip memory controller as well
as an interconnect termed as HyperTransport. HyperTransport efficiently decreases the
number of buses needed in a system, decreasing bottlenecks and enhancing bandwidth. The
Core 2 Duo rather utilises a bus interface. The Core 2 Duo also has obvious thermal and
167
ANNA UNIVERSITY
power control units on-chip. There is no definitive performance benefit of a bus vs. an
interconnect, and the Core 2 Duo and Athlon 64 X2 acquire definite performance measures,
each utilising a varied communication protocol.
7.3.2 Tilera TILE64
You need to know that Tilera has created a multicore chip with 64 homogeneous cores
established in a grid, depicted in Figure 7.8. An application that is written to take benefit of
these extra cores will run far quicker than if it were operate on a single core. Imagine having
a project to end, but rather than having to work on it alone you have 64 people to perform
for you. Every processor has its personal L1 and L2 cache for a whole of 5MB on-chip and a
switch that links the core into the mesh network instead of a bus or interconnect.
168
ANNA UNIVERSITY
clarify for this each plan above works the different centers at a lower recurrence to diminish
power utilization.
To battle unnecessary force utilization a few plans additionally incorporate a force control unit
that has the ability to close down inactive centers or limit the measure of force. By controlling
off unused centers and using clock gating the amount of spillage in the chip is diminished. To
diminish the warmth delivered by various centers on a solitary chip, the chip is architected so the
quantity of problem areas doesn't foster too huge and the warmth is extended across the chip.
As observed in Figure 7.8, the most of the heat in the CELL processor is dispersed in the
Power Processing Element and the rest is expanded across the Synergistic Processing
Elements. The CELL processor pursues a common trend to develop temperature regulating
into the system, with its one linear sensor as well as ten internal digital sensors.
7.4.2 Cache Coherence
Cache coherence is the uniformity of shared resource data that ends up stored in multiple local
caches. When clients in a system maintain caches of a common memory resource, problems may
arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.
In the illustration on the right, consider both the clients have a cached copy of a particular
memory block from a previous read. Suppose the client on the bottom updates/changes that
memory block, the client on the top could be left with an invalid cache of memory without any
notification of the change. Cache coherence is intended to manage such conflicts by maintaining
a coherent view of the data values in multiple caches.
Example: Envision a dual-core processor where each core brought a square of memory into its
private cache. One core composes a worth to a particular area; when the subsequent core
endeavors to peruse that worth from its cache it will not have the refreshed duplicate except if its
cache section is negated and a cache miss happens. This cache miss powers the subsequent core's
cache passage to be refreshed. In the event that this cognizance strategy wasn't set up trash
information would be perused and invalid outcomes would be delivered, perhaps smashing the
program or the whole computer.
Generally, there are two schemes for cache coherence, a snooping protocol as well as a
directory-based protocol. The snooping protocol only performs with a bus-based system, and
utilizes numerous states to decide whether or not it requires updating cache entries and
169
ANNA UNIVERSITY
whether it has control over writing to the block. The directory-based protocol can be utilized
on an arbitrary network and is, hence, scalable to several processors or cores, as opposed to
snooping which isn’t scalable.
In such a scheme, a directory is utilized that holds information related to which memory
locations are being shared in multiple caches and which are utilized especially by one
core’s cache. The directory is aware when a block requires to be updated or invalidated.
7.5 OPEN ISSUES
In this section, we will discuss some issues related to multicore processors.
7.5.1 Improved Memory System
With a few cores on a sole chip there is a tremendous requirement for upgraded memory. 32-bit
processors, similar to the Pentium 4, can handle till 4GB of principle memory. With cores now
using 64-bit handles the measure of addressable memory is almost limitless. A further developed
memory system is an imperative; more primary memory and more noteworthy caches are needed
for multithreaded multiprocessors.
7.5.2 System Bus and Interconnection Networks
Extra memory will be useless if the measure of time required for memory demands doesn't
improve also. Reshaping the interconnection network between cores is a primary focal point of
chip producers. A speedier organization portrays a lower dormancy in between core
communication just as memory exchanges.
Intel is making their Quick way interconnect, which is a 20-bit wide transport working
somewhere in the range of 4.8 and 6.4 GHz; AMD's new Hyper Transport 3.0 is a 32-bit wide
transport and works at 5.2 GHz. A different kind of interconnect is seen in the TILE64's iMesh,
which includes five organizations used to fulfill I/O and off-chip memory communication.
7.5.3 Parallel Programming
To use multicore, you really need to utilize various strings. On the off chance that you know how
to do it, it's not awful. Nonetheless, the first occasion when you do it there are a lot of techniques
to mess yourself up. The bugs you dispatch with multithreading are very harder to discover.
Since various cores in a processor are set to turn out to be double at regular intervals, it just
includes that the software working on these cores thinks about this. At long last, software
engineers need to figure out how to compose equal projects that can be separated and worked
simultaneously on different cores instead of endeavoring to take advantage of single-core
hardware to upgrade parallelism of consecutive projects
7.5.4 Homogeneous vs Heterogeneous Cores
Architects have argued whether the cores in a multicore environment should be
heterogeneous or homogeneous, and there is no absolute reply yet. Homogenous cores are
170
ANNA UNIVERSITY
all precisely the same: equal frequencies, cache sizes, functions, etc. But, every core in a
heterogeneous system may have a varied memory model, function, frequency, etc.
There is an obvious trade-off between processor complexity and customization. Every design
has utilized homogeneous cores excluding the CELL processor, which has one Power
Processing Element and eight Synergistic Processing Elements. Homogeneous cores are
simpler to generate since the same instruction set is utilized across all cores and every core
contains the identical hardware. However are they the most effective utilization of multicore
technology.
Each core in a heterogeneous environment could have a specific capacity and work its own
particular guidance set. Creating on the CELL example, a heterogeneous model might have
a major unified core produced for nonexclusive preparing and working an OS, a core for
illustrations, a cryptographic core, a communications core, a further developed arithmetic
core, a sound core, and the rundown goes on. This model is more confounded, anyway may
have power, proficiency, and warm benefits that eclipse its intricacy.
With principle makers on the two sides of such an issue, this conversation will proceed for
quite a long time to come; it will be fascinating to see which side arises on top.
7.5.5 False Sharing
The smallest unit of memory that two processors interchange is a cache line or cache sector. Two
separate caches can share a cache line when they both need to read it, but if the line is written in
one cache, and read in another, it must be shipped between caches, even if the locations of
interest are disjoint.
Like two people writing in different parts of a log book, the writes are independent, but unless
the book can be ripped apart, the writers must pass the book back and forth. In the same way,
two hardware threads writing to different locations contend for a cache sector to the point where
it becomes a ping-pong game.
171
ANNA UNIVERSITY
Learning Activity
172
ANNA UNIVERSITY
Kung Fu Panda 2
In 1965, Intel pioneer Gordon Moore predicted the steady increases in processing
performance that we continue to see today. That performance has been a boon to
DreamWorks Animation, which has applied the increasing compute power to tell more
visually rich stories. This is most apparent from sequel to sequel. In fact, DreamWorks
Animation, in deference to Moore’s Law, coined “Shrek’s Law”: Every sequel will need
double the render power of the film before it. The original Shrek movie
required five million rendering hours; Shrek 2* required 10 million; and Shrek the Third
required 20 million. In 2010, the final Shrek feature film, Shrek Forever After*, consumed
over 45 million rendering hours.
Continuing that trend, DreamWorks Animation’s latest feature, Kung Fu Panda 2, used more
than 55 million rendering hours—more than double the original Kung Fu Panda in 2008.
DreamWorks Animation produces all its CG animated feature films in stereoscopic 3D. Each
shot in the film is conceived in 3D from the beginning of the filmmaking process, allowing
the use of depth to further the storytelling and immerse audiences into the characters’
environment.
Add in rising audience expectations and higher-definition devices, and it’s no wonder each
new film places increasingly heavier demands on server, desktop, and storage infrastructure
throughout its development.
“We use technology to enable the creative ambition of our artists and filmmakers,” says Ed
Leonard, DreamWorks Animation chief technology officer.
Intel Xeon Processors in the Data Centre and on the Desktop
To meets its rigorous infrastructure requirements, DreamWorks Animation provisions its data
centres with servers and storage systems powered by the latest Intel Xeon technologies. The
company was quick to adopt the Intel Xeon processor 5500 series and its energy-efficient
performance, as well as the Intel Xeon processor 5600 series, which uses Intel’s industry-
leading 32nm process technology to provide additional improvements in performance,
memory bandwidth, and energy consumption. Along with server processors, DreamWorks
Animation uses the latest Intel Xeon processors in the desktop workstations its artists use to
create their visions and bring them to virtual life. The company uses Intel® compilers and
tools to performance-tune its rendering and simulation applications, and Intel Xeon processor-
based storage solutions to keep pace with swelling data volumes.
“With Kung Fu Panda 2 we hit computational load peaks nearly 50 per cent higher than any
previous production, and our upcoming films will be higher still,” says Leonard. “Intel’s
contributions to performance and throughput are critical for us in both the software and
hardware arenas.”
Teamwork Drives Advances
In addition to relying on Intel technologies, DreamWorks Animation has forged a strategic
alliance with Intel to achieve DreamWorks Animation’s goals and accelerate the evolution of
high-quality 3D entertainment. Intel engineers work closely with the studio to support its 3D
173
ANNA UNIVERSITY
leadership and advance the company toward its vision of highly interactive toolsets.
The collaboration, along with the end-to-end Intel technologies, pays off for DreamWorks
Animation as it continues its drive to lead the digital entertainment industry and delight
audiences with riveting, high-quality entertainment experiences. “The work we’re doing on
Intel’s scalable multicore platforms allows us to deliver the highest-end computer graphics
software to our artists,” says Lincoln Wallen, head of research and development at
DreamWorks Animation. “Our partnership with Intel provides a unique and incredibly
valuable collaboration between their technology experts and ours.”
Questions:
1. What is the purpose of DreamWorks Animation?
2. Discuss the role of DreamWorks Animation in delivering 3D animated film in 2011: Kung
Fu Panda 2*.
SUMMARY
• A multicore processor is an integrated circuit (IC) to which at least two processors have
been connected for upgraded execution, decreased power utilization, and more effective
synchronous handling of numerous errands.
• Recently, the innovation of multicore has become standard with Intel and Advanced
Micro Devices (AMD) presenting numerous economically accessible multicore chips.
• According to Moore's Law, the quantity of semiconductors on a chip will generally
twofold every year.
• Intel and AMD are the standard makers of microprocessors.
• Multicore processors incorporate different difficulties like Power and Temperature,
Cache Coherence, and so forth
• Power and temperature the board are two worries that can increment dramatically with
the expansion of various cores.
• Cache intelligence is a worry in a multicore environment on account of dispersed L1 and
L2 cache.
• There are a few issues identified with multicore processors, for example, Improved
Memory System, Parallel Programming, Homogeneous versus Heterogeneous Cores.
KEYWORD
Multicore Processor: A multi-core processor is a computer processor on a single integrated
circuit with two or more separate processing units, called cores, each of which reads and
executes program instructions.
174
ANNA UNIVERSITY
Moore’s Law: The principle that the speed and capability of computers can be expected to
double every two years, as a result of increases in the number of transistors a microchip can
contain.
Microprocessor: A microprocessor is a computer processor where the data processing logic and
control is included on a single integrated circuit, or a small number of integrated circuits.
Level 1 Cache: It is a memory which stores the data frequently used by theprocessor.
Level 2 Cache: It is a memory which is used to store recently accessed information.
Level 3 Cache: It is a specialized memory that works hand-in-hand with L1 and L2 cache
to improve computer performance.
SELF-ASSESSMENT QUESTIONS
Short Answer Questions
1. What is Moore's Law?
2. Discuss the significance of multicore processors.
3. Compare and contrast homogeneous and heterogeneous cores.
4. List the plans utilized for cache cognizance.
5. How do multicore processors eliminate the lacks of single core processors? Talk about
momentarily.
Long Answer Questions
1. Explain the essential design of a microprocessor exhaustively.
2. Describe the idea of homogenous dual-core processors.
3. Describe different difficulties looked by multicore processors.
4. Explain why cache coherence is considered as a primary worry in a multicore
environment.
5. Explain the issues identified with multicore processors.
FURTHER READING
1. Hamacher Carl. 2002. “Computer Organization”, 5th Edition, Tata McGraw Hill.
2. Hennessy L. John. 2012. “Computer Architecture: A QuantitativeApproach”, Elsevier.
3. Patterson A. David. 2011. “Computer Organization and Design”, Elsevier.
4. Tanenbaum S.Andrew. 2012.“Structured Computer Organization”, Prentice Hall
PTR.
175
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
8.1 Physical Memory
8.1.1 Types of RAM
8.2 Addressing
8.2.1 Types of Addressing
8.3 Virtual Memory
8.3.1 Need for Virtual Memory
8.3.2 Virtual Memory Working
8.4 Paging
8.4.1 Page Tables
8.5 Address Translation
8.5.1 Huge Page Tables
8.5.2 Inverted Page Tables
Summary
Keywords
Self-Assessment Questions
Further Reading
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Gain knowledge about physical memory
• Illustrate about the various types of addressing modes
• Understand about the concept of virtual memory
• Analyze about the concept of paging
• The translation of address from virtual to physical
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• The basic knowledge of physical memory
• Recall about addressing and types
• How the virtual memory works
• Gains basic knowledge about paging and address translation
176
ANNA UNIVERSITY
OVERVIEW
In the past chapter, we have examined the idea of multicore processors. We have examined the
requirement for multicore, multicore executions, and the difficulties and issues identified with
multicore processors.
In this chapter, you will examine the idea of memory system. Memory system is at the core of a
computer system. Memory is characterized as the capacity region wherein programs are kept
when they are running and that contains the information required by the running projects. This
exercise will cover the idea of actual memory, tending to modes, and virtual memory. You will
likewise comprehend the idea of paging and address interpretation.
The fundamental component of memory is that it allows exceptionally fast admittance to
guidelines just as information, regardless of where the things are inside it. Accordingly, memory
is the place where the computer stocks directions and information.
8.1 PHYSICAL MEMORY
Memory latency is traditionally quoted using two measures—access time and cycle time. Access
time is the time between when a read is requested and when the desired word arrives, cycle time
is the minimum time between requests to memory. One reason that cycle time is greater than
access time is that the memory needs the address lines to be stable between accesses.
Physical memory refers to the actual RAM of the system, which usually takes the form of cards
(DIMMs) attached onto the motherboard. Also called primary memory, it is the only storage type
directly accessibly to the CPU and holds the instructions of programs to execute. Physical
memory is linearly addressable; memory addresses increase in a linear fashion and each byte is
directly addressable.
8.1.1 Types of RAM
The two widely used forms of modern RAM are static RAM (SRAM) and dynamic
RAM (DRAM). In SRAM, a bit of data is stored using the state of a six-transistor memory cell,
typically using six MOSFETs (metal-oxide-semiconductor field-effect transistors). This form of
RAM is more expensive to produce, but is generally faster and requires less dynamic power than
DRAM. In modern computers, SRAM is often used as cache memory for the CPU. DRAM
stores a bit of data using a transistor and capacitor pair (typically a MOSFET and MOS
capacitor, respectively), which together comprise a DRAM cell. The capacitor holds a high or
low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control circuitry
on the chip read the capacitor's state of charge or change it. As this form of memory is less
expensive to produce than static RAM, it is the predominant form of computer memory used in
modern computers.
SRAM
Static RAM is able to retain all information into static form until power supply is turn off, so due
to this nature this memory is known as volatile memory. Main objective of using the static RAM
177
ANNA UNIVERSITY
is to make Cache Memory. It is more expensive as well as its power consumption is more when
compared to dynamic RAM. Static RAM are faster than dynamic RAM. In SRAM, all data has
been stored in flip-flop. Flip-flop contains the every bit of this Ram. Flip-flop uses 4-6 transistors
for making a memory cell and its circuit do not need to refreshment continuously.
As long as dc power is applied to a static memory cell, it can retain a 1 or 0 state indefinitely. If
power is removed, the stored data bit is lost. The cell is selected by an active level on the Select
line and a data bit (l or 0) is written into the cell by placing it on the Data in line.
DRAM
DRAM is another type of semiconductor memory, and it is designed specially to store data or
program files which are needed by computer processors for performing their functions.
In DRAM, several capacitors are used for storing every bit of data. This is very simple path to
save data in its memory because it needs small area to store same data to SRAM as well as it is
capable to store massive data than to SRAM but it requires the frequently refreshing of its circuit
for its charging, so it consumes more power compare to SRAM.
8.2 ADDRESSING
To perform any operation, the corresponding instruction is to be given to the microprocessor. In
each instruction, programmer has to specify 3 things:
• Operation to be performed.
• Address of source of data.
• Address of destination of result.
In computing, a memory address is a reference to a specific memory location used at various
levels by software and hardware. Memory addresses are fixed-length sequences
of digits conventionally displayed and manipulated as unsigned integers. Such numerical
semantic bases itself upon features of CPU (such as the instruction pointer and
incremental address registers), as well upon use of the memory like an array endorsed by
various programming languages.
Types of Memory Addressing
• Physical Addressing
• Logical Addressing
Physical Addressing
A digital computer's main memory consists of many memory locations. Each memory location
has a physical address which is a code. The CPU (or other device) can use the code to access the
corresponding memory location. Generally only system software, i.e. the BIOS, operating
systems, and some specialized utility programs (e.g., memory testers), address physical memory
using machine code operands or processor registers, instructing the CPU to direct a hardware
device, called the memory controller, to use the memory bus or system bus, or
separate control, address and data busses, to execute the program's commands. The memory
controllers' bus consists of a number of parallel lines, each represented by a binary digit (bit).
The width of the bus, and thus the number of addressable storage units, and the number of bits in
each unit, varies among computers.
178
ANNA UNIVERSITY
Logical Addressing
A computer program uses memory addresses to execute machine code, and to store and
retrieve data. In early computers logical and physical addresses corresponded, but since the
introduction of virtual memory most application programs do not have knowledge of physical
addresses. Rather, they address logical addresses, or virtual addresses, using the
computer's memory management unit and operating system memory mapping.
8.2.1 Types of Addressing modes
Implied addressing mode
In this mode the operands are specified implicitly in the definition of the instruction. For
example the ‘complement accumulator’ instruction is an implied mode instruction because the
operand in the accumulator register is implied in the definition of the instruction itself. All
register reference instructions that use an accumulator are implied mode instructions. Zero
address instructions in a stack organized computer are implied mode instructions since the
operands are implied to be on the top of the stack.
Example : CMA
In this mode the operand is specified in the instruction itself. In other words, an immediate mode
instruction has a operand field rather than an address field. The operand field contains the actual
operand to be used in conjunction with the operation specified in the instruction. Immediate
mode instructions are useful for initializing registers to a constant value.
Example: ADD 5
Direct addressing mode
In this mode the effective address is equal to the address part of the instruction. The operand
resides in memory and its address is given directly by the address field of instruction. In a branch
type instruction the address field specifies the actual branch address.
179
ANNA UNIVERSITY
180
ANNA UNIVERSITY
181
ANNA UNIVERSITY
Figure shows a typical organization that implements virtual memory. A special hardware unit,
called the Memory Management Unit (MMU), translates virtual addresses into physical
addresses. When the desired data (or instructions) are in the main memory, these data are fetched
as described in our presentation of the ache mechanism. If the data are not in the main memory,
the MMU causes the operating system to bring the data into the memory from the disk. The
DMA scheme is used to perform the data Transfer between the disk and the main memory.
The process of translating a virtual address into physical address is known as address translation.
It can be done with the help of MMU. A simple method for translating virtual addresses into
physical addresses is to assume that all programs and data are composed of fixed length units
called pages, each of which consists of a block of words that occupy contiguous locations in the
main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic
unit of information that is moved between the main memory and the disk whenever the
translation mechanism determines that a move is required.
Pages should not be too small, because the access time of a magnetic disk is much longer
(several milliseconds) than the access time of the main memory. The reason for this is that it
takes a considerable amount of time to locate the data on the disk, but once located, the data can
be transferred at a rate of several megabytes per second. On the other hand, if pages are too large
it is possible that a substantial portion of a page may not be used, yet this unnecessary data will
occupy valuable space in the main memory.
The cache bridges the speed gap between the processor and the main memory and is
implemented in hardware. The virtual-memory mechanism bridges the size and speed gaps
between the main memory and secondary storage and is usually implemented in part by software
techniques. Conceptually, cache techniques and virtual- memory techniques are very similar.
They differ mainly in the details of their implementation.
A virtual-memory address translation method based on the concept of fixed-length pages. Each
virtual address generated by the processor, whether it is for an instruction fetch or an operand
fetch/store operation, is interpreted as a virtual page number (high-order bits) followed by an
offset (low-order bits) that specifies the location of a particular byte (or word) within a page.
Information about the main memory location of each page is kept in a page table. This
information includes the main memory address where the page is stored and the current status of
the page.
An area in the main memory that can hold one page is called a page frame. The starting address
of the page table is kept in a page table base register. By adding the virtual page number to the
contents of this register, the address of the corresponding entry in the page table is obtained. The
contents of this location give the starting address of the page if that page currently resides in the
main memory. Each entry in the page table also includes some control bits that describe the
status of the page while it is in the main memory. One bit indicates the validity of the page, that
is, whether the page is actually loaded in the main memory.
182
ANNA UNIVERSITY
This bit allows the operating system to invalidate the page without actually removing it. Another
bit indicates whether the page has been modified during its residency in the memory. As in cache
memories, this information is needed to determine whether the page should be written back to
the disk before it is removed from the main memory to make room for another page. Other
control bits indicate various restrictions that may be imposed on accessing the page. For
example, a program may be given full read and write permission, or it may be restricted to read
accesses only.
8.4 PAGING
In a multiprogramming system memory is divided into a number of fixed-size or variable-sized
partitions or regions that are allocated to running processes. For example: a process needs m
words of memory may run in a partition of n words where n ≥ m. The variable size partition
scheme may result in a situation where available memory is not contiguous, but fragmented into
many scattered blocks. We distinguish between internal fragmentation and external
fragmentation. The difference (n – m) is called internal fragmentation, memory that is internal to
a partition but is not being used. If a partition is unused and available, but too small to be used by
any waiting process, then it is accounted for external fragmentation. These memory fragments
cannot be used. In order to solve this problem, we can either compact the memory making large
free memory blocks, or implement paging scheme which allows a program’s memory to be non-
contiguous, thus permitting a program to be allocated to physical memory. Physical memory is
divided into fixed size blocks called frames. Logical memory is also divided into blocks of the
same, fixed size called pages. When a program is to be executed, its pages are loaded into any
available memory frames from the disk. The disk is also divided into fixed sized blocks that are
the same size as the memory frames. A very important aspect of paging is the clear separation
between the user’s view of memory and the actual physical memory.
Normally, a user believes that memory is one contiguous space containing only his/her program.
In fact, the logical memory is scattered through the physical memory that also contains other
programs. Thus, the user can work correctly with his/her own view of memory because of the
address translation or address mapping. The address mapping, which is controlled by the
operating system and transparent to users, translates logical memory addresses into physical
addresses. Because the operating system is managing the memory, it must be sure about the
nature of physical memory, for example: which frames are available, which are allocated; how
many total frames there are, and so on. All these parameters are kept in a data structure called
frame table that has one entry for each physical frame of memory indicating whether it is free or
allocated, and if allocated, to which page of which process. As there is much less physical
memory than virtual memory the operating system must be careful that it does not use the
physical memory inefficiently. One way to save physical memory is to load only virtual pages
that are currently being used by the executing program. For example, a database program may be
run to query a database. In this case not the entire database needs to be loaded into memory, just
those data records that are being examined.
183
ANNA UNIVERSITY
Also, if the database query is a search query then it is not necessary to load the code from the
database that deals with adding new records. This technique of only loading virtual pages into
memory as they are accessed is known as demand paging. When a process attempts to access a
virtual address that is not currently in memory the CPU cannot find a page table entry for the
virtual page referenced. For example, in Figure 1, there is no entry in Process X’s page table for
virtual PFN 2 and so if Process X attempts to read from an address within virtual PFN 2 the CPU
cannot translate the address into a physical one. At this point the CPU cannot cope and needs the
operating system to fix things up. It notifies the operating system that a page fault has occurred
and the operating system makes the process wait whilst it fixes things up. The CPU must bring
the appropriate page into memory from the image on disk. Disk access takes a long time, and so
the process must wait quite a while until the page has been fetched. If there are other processes
that could run then the operating system will select one of them to run. The fetched page is
written into a free physical page frame and an entry for the virtual PFN is added to the processes
page table. The process is then restarted at the point where the memory fault occurred. This time
the virtual memory access is made, the CPU can make the address translation and so the process
continues to run. This is known as demand paging and occurs when the system is busy but also
when an image is first loaded into memory. This mechanism means that a process can execute an
image that only partially resides in physical memory at any one time.
The valid/invalid bit of the page table entry for a page, which is swapped in, is set as valid.
Otherwise it is set as invalid, which will have no effect as long as the program never attempts to
access this page. If all and only those pages actually needed are swapped in, the process will
execute exactly as if all pages were brought in.
If the process tries to access a page, which was not swapped in, i.e., the valid/invalid bit of this
page table, entry is set to invalid, then a page fault trap will occur. Instead of showing the
“invalid address error” as usually, it indicates the operating system’s failure to bring a valid part
of the program into memory at the right time in order to minimize swapping overhead.
In order to continue the execution of process, the operating system schedules a disk read
operation to bring the desired page into a newly allocated frame. After that, the corresponding
page table entry will be modified to indicate that the page is now in memory. Because the state
(program counter, registers etc.) of the interrupted process was saved when the page fault trap
occurred, the interrupted process can be restarted at the same place and state. As shown, it is
possible to execute programs even though parts of it are not (yet) in memory.
In the extreme case, a process without pages in memory could be executed. Page fault trap would
occur with the first instruction. After this page was brought into memory, the process would
continue to execute. In this way, page fault trap would occur further until every page that is
needed was in memory. This kind of paging is called pure demand paging. Pure demand paging
says that “never bring a page into memory until it is required”.
184
ANNA UNIVERSITY
185
ANNA UNIVERSITY
186
ANNA UNIVERSITY
Learning Activity
187
ANNA UNIVERSITY
idle, transparent memory sharing enables VM to pack guest machine images efficiently into a
single hardware platform, especially for machines running the same OS, the same OS version,
and the same applications. However, when guest machines are active, the benefits of
transparent memory sharing are evidently greatly reduced, as will soon be apparent.
VMware uses a background thread that scans guest machine pages continuously, looking for
duplicates. This process is illustrated in Figure 5. Candidates for memory sharing are found
by calculating a hash value from the contents of the page and looking for a collision in a Hash
Table that is built from the known hash values from other current pages. If a collision is
found, then the candidate for sharing is compared to the base page byte by byte. If the
contents of the candidate and the base pages match, then VMware points the PTE of the copy
to the same page of machine memory backing the base page.
Memory sharing is provisional. VMware uses a Copy on Write mechanism whenever a
shared page is modified and can no longer be shared. This is accomplished by flagging the
shared page PTE as Read Only. Then, when an instruction executes that attempts to store data
in the page, the hardware generates an addressing exception. VMware handles the exception
by creating a duplicate, and re-executing the Store instruction that failed against the duplicate.
Transparent memory sharing has great potential benefits, but there is some overhead
necessary to support the feature. One source of overhead is the processing by the background
thread. There are tuning parameters to control the rate at which these background memory
scans run, but, unfortunately, there are no associated performance counters reported that
would help the system administrator to adjust these parameters. The other source of overhead
results from the Copy on Write mechanism, which entails the handling of additional hardware
interrupts associated with the soft page faults. There is no metric that provides the rate these
additional soft page faults occur either.
Figure. Transparent memory sharing uses a background thread to scan memory pages,
compute a hash code value from their contents, and compare to other Hash codes that
have already been computed. In the case of a collision, the contents of the page that is a
candidate for sharing are compared byte by byte to the collided page. If the pages
contain identical content, VMware points both pages to same physical memory location.
188
ANNA UNIVERSITY
In the case study, transparent memory sharing is initially extremely effective – when the guest
machines are largely idle. Figure 6 renders the Memory Shared performance counter from
each of the guest machines as a stacked area chart. At 9 AM, when the guest machines are
still idle, almost all the 8 GB granted to three of the machines (ESXAS12C, ESXAS12D, and
ESXAS12E) is being shared by pointing those pages to the machine memory pages that are
assigned to the 4th guest machine (ESXAS12B). Together, these three guest machines have
about 22 GB of shared memory, which allows VMware to pack 4 x 8-GB OS images into a
machine footprint of about 10-12 GB.
However, once the benchmark programs start to execute, the amount of shared memory
dwindles to near zero. This is an interesting result. With this workload of identically
configured virtual machines, even when the benchmark programs are active, there should still
be significant opportunities to share identical code pages. But VMware is apparently unable
to capitalize much on this opportunity once the guest machines become active. A likely
explanation for the diminished returns from memory sharing is simply that the virtual
memory management performed by each of the active guest Windows machines leads to the
contents of too many virtual memory pages changing too frequently, something which simply
overwhelms the copy detection sharing mechanism.
Questions:
1. Discuss the concept of benchmark program.
2. Discuss the transparent memory sharing and compare the performance with other codes.
SUMMARY
• Physical memory is the fundamental memory that has direct/aberrant access to CPU.
• Physical memory basically comprises of RAM (Random Access Memory) which is the
most widely recognized type of Main Memory.
• Static Random Access Memory (SRAM) is a kind of RAM utilized in different electronic
applications including toys, vehicles, advanced gadgets and PCs.
• Dynamic random access memory (DRAM) is a kind of random access memory that
189
ANNA UNIVERSITY
190
ANNA UNIVERSITY
191
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
9.1 Cache
9.1.1 L1, L2 and L3 Cache Memories
9.2 Cache Mapping
9.2.1 Direct Mapping
9.2.2 Fully Associative Mapping
9.2.3 Set-associative Mapping
9.3 LRU Replacement
9.3.1 LRU Caching
9.3.2 Block/Line Replacement in Associative Caches
9.3.3 Page Replacement in Virtual Memory
Summary
Keywords
Self-Assessment Questions
Further Reading
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Characterize the term cache
• Perceive the significance of cache memory
• Portray LI, L2, L3 cache memories
• Delineate the idea of cache mapping
• Portray the idea of LRU replacement
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Identify L1, L2 and L3 cache memories
192
ANNA UNIVERSITY
Levels of memory:
Level 1 or Register
It is a type of memory in which data is stored and accepted that are immediately stored in CPU.
193
ANNA UNIVERSITY
Most commonly used register is accumulator, Program counter, address register etc.
Level 2 or Cache memory
It is the fastest memory which has faster access time where data is temporarily stored for faster
access.
Level 3 or Main Memory
It is memory on which computer works currently. It is small in size and once power is off data no
longer stays in this memory.
Level 4 or Secondary Memory
It is external memory which is not as fast as main memory but data stays permanently in this
memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.
• If the processor finds that the memory location is in the cache, a cache hit has occurred and
data is read from cache
• If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from main memory,
then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher associativity, reduce
miss rate, reduce miss penalty, and reduce the time to hit in the cache.
9.1.1 L1, L2 and L3 Cache Memories
L1 Cache:
Level 1 Caching is otherwise depicted as L1 cache, internal cache, primary cache or system
cache. A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers.
Example: Internal caches like Intel 80486 microprocessor (which contain an 8K memory cache)
and the Pentium (which has a16K cache) are often known as Level 1(L1) caches.
L2 Cache:
Secondary cache is placed between the primary cache and the rest of the memory. It is referred to
as the
level 2 (L2) cache. Often, the Level 2 cache is also housed on the processor chip.
L2 cache was at first dispatched with the Intel Pentium and Pentium Pro computers and has been
engaged with ever processor as, except for the previous of Celeron processor. L2 Cache is short
for level 2 reserving. L2 is additionally generally portrayed as auxiliary cache or outer cache.
Most contemporary PCs additionally accompany outer cache memory, known as Level 2 (L2)
caches. Distinctive layer 1 cache, L2 cache was arranged on the motherboard on previous
194
ANNA UNIVERSITY
computers through with more up to date processors it is found on the processor chip.
At the point when L2 cache is found on the processor, on the off chance that the cache is
additionally on the motherboard, it is all the more appropriately called L3 cache. The L2 cache is
on the same processor chip utilize the same die as the CPU, but it is still not portion of the core
of the CPU.
The cache is not as quick as the L1 cache, but is only somewhat slower since it is still placed on
the same processor chip, and is still quicker than the computer’s memory. The L2 cache is the
second thing the computer observes when performing instructions.
L3 Cache:
A memory bank built onto the motherboard or within the CPU module. The L3 cache feeds the
L2 cache, and its memory is typically slower than the L2 memory, but faster than main memory.
The L3 cache feeds the L2 cache, which feeds the L1 cache, which feeds the processor.
9.2 CACHE MAPPING
There are three different types of mapping used for the purpose of cache memory which are as
follows: Direct mapping, Associative mapping, and Set-Associative mapping. These are
explained below.
9.2.1 Direct Mapping
The simplest technique, known as direct mapping, maps each block of main memory into only
one possible cache line. In Direct mapping, assigned each memory block to a specific line in the
cache. If a line is previously taken up by a memory block when a new block needs to be loaded,
the old block is trashed. An address space is split into two parts index field and a tag field. The
cache is used to store the tag field whereas the rest is stored in the main memory. Direct
mapping`s performance is directly proportional to the Hit ratio.
i = j modulo m
where
i=cache line number
j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed as consisting of three
fields. The least significant w bits identify a unique word or byte within a block of main
memory. In most contemporary machines, the address is at the byte level. The remaining s bits
specify one of the 2s blocks of main memory. The cache logic interprets these s bits as a tag of s-
r bits (most significant portion) and a line field of r bits. This latter field identifies one of the
m=2r lines of the cache.
195
ANNA UNIVERSITY
196
ANNA UNIVERSITY
m=v*k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set
197
ANNA UNIVERSITY
Specification:
LRUs are designed to specifications to assure that they can be interchanged, especially if they are
from different manufacturers. Usually a class of LRUs will have coordinated environmental
specifications (i.e. temperature, condensation, etc.).
However, each particular LRU will also have detailed specifications describing its function, tray
size, tray connectors, attachment points, weight ranges, etc. It is common for LRU trays to have
standardized connections for rapid mounting, cooling air, power, and grounding.
The mounting hardware is often manually removable standard-screw-detent quick-release
fittings. Front-mounted electrical connectors are often jacks for ring-locked cannon plugs that
can be removed and replaced (R&R) without tools. Specifications also define the
supporting tools necessary to remove and replace the unit.
Many require no tools, or a standard-sized Frearson screwdriver. Frearson is specified for some
vehicles and many marine systems because Frearson screws keep their mating screwdriver from
camming out, and the same screwdriver can be used on many sizes of screws. Most LRUs also
have handles, and specific requirements for their bulk and weight. LRUs typically need to be
"transportable" and fit through a door or hatchway.
There are also requirements for flammability, unwanted radio emissions, resistance to damage
from fungus, static electricity, heat, pressure, humidity, condensation drips, vibration, radiation,
and other environmental measurements.
LRUs may be designed to ARINC 700-series standards. The form factor of LRUs comply to
ARINC Standards, ARINC 404 and ARINC 600.
LRUs are also defined by manufacturers like Airbus and Boeing and by various military
organizations. In the military, electronic LRUs are typically designed to interface according to
data bus standards such as MIL-STD-1553. On the International Space Station, LRUs are
referred to as Orbit Replaceable Units.
LRU Page Replacement Policy
In the Least Recently Used (LRU) page replacement policy, the page that is used least
recently will be replaced.
Implementation:
• Add a register to every page frame - contain the last time that the page in that frame was
accessed
• Use a "logical clock" that advance by 1 tick each time a memory reference is made.
• Each time a page is referenced, update its register
The following figure shows the behavior of the program in paging using the LRU page
replacement policy:
198
ANNA UNIVERSITY
Learning Activity
Building Blocks
In the previous chapters, we have broadly seen the kind of components required for
designing a computer. In this chapter, we address how to design central processing unit of a
simple computer that interacts with a small memory module. Before we proceed further let
us define a problem that our computer is supposed to solve. This is not the only problem it
can handle. Depending on how we program it, we will be able to solve different arithmetic
and logic problems and that is shown towards the end of this chapter through examples. The
purpose of defining a problem is to choose specific hardware components that will serve as
building block of our simple computer.
The Problem is Add 10 numbers stored in consecutive locations of memory. Subtract from
this total a number, stored in 11th location of memory. Multiply this result with 2 and store it
199
ANNA UNIVERSITY
in 12th location of memory. All the numbers brought from memory lie between 0 and 9.
Since, the problem says the numbers or data to be fetched from memory and we also know
that programs, i.e. binary coded instructions are also stored in memory, let us divide the
memory used in our computer in two parts. One part stores the program or series of
instructions the computer executes sequentially and this is known as program memory. The
other part houses data that program uses and is also used for storing result. This is called data
memory. From the given problem we find, we need 12 memory locations for data storage.
We expect our computer won’t need more than 20 instructions to complete the given task
hence, a memory with 32 locations (integer power of 2) can be selected for our computer.
Now we try to decide how many bits of information we store in each address location.
Usually, bits in memory locations are stored in multiple of 8 called byte. Let’s see if our job
can be done with 8bits. Each memory location stores data between 0 and 9 on which
program operates and thus require only 4bits. The final result at most can be 10 × 9 × 2 =
180 which requires 8bit for representation. So the data memory can be of 8-bits with which
we can represent decimal number up to 28-1=255.
Let’s now see the requirement of program memory. There, in each location, certain number
of bits is allocated that defines the instruction to be executed. This is called operation code or
in short, opcode. The rest of the bits can be used for referring the memory location from
which data is to be brought or stored, if required by the instruction. Since, 32 memory
locations require log232 = 5 bits for memory referencing we’ll have 8-5=3 bits for opcode
specification giving 23=8 different opcode (Figure).
We’ll see that 8 instructions are sufficient for the given kind of task in our limited ability
computer. Hence, one important hardware component of our computer gets decided. The
memory to be used is of size 32×8.
The above mode of addressing memory for data is called direct addressing. If the address
mentioned in the instruction contains address from which actual data is to be brought it is
called indirect addressing. If after opcode, in place of address actual data is made available, it
is called immediate addressing. If an instruction requires 2 bytes to be fetched from program
memory it is calle d 2-byte instruction. Obviously, in 2-byte instruction number of opcodes
or memory or memory addressing capability can be more than a single byte instruction.
Questions:
1. For a more complex computer design. 75 different instructions are required. What size of
memory required.
200
ANNA UNIVERSITY
1. Caches are divided into blocks, which may be of various sizes. The
number of blocks in a cache is usually a power of 2. For now we’ll say
that each block contains one byte.
2. Associative mapping scheme attempts to improve cache utilization, but
at the expense of speed.
SUMMARY
• Cache is a high speed access area that can be either a reserved section of main memory or
a storage device.
• Cache memory between main memory and CPU Cache acts as an intermediate storage
between CPU and main memory.
• Level 1 caching is alternatively referred to as L1 cache, primary cache, internal cache or
system cache.
• L2 Cache is short for Level 2 caching. L2 is also commonly referred to as secondary
cache or external cache.
• L3 Cache is Cache found on the motherboard instead of the processor on earlier
computers.
• Cache mapping is the method by which the contented of main memory are brought into
the cache and referenced by the CPU.
KEYWORD
Cache: A cache is a hardware or software component that stores data so that future requests for
that data can be served faster; the data stored in a cache might be the result of an earlier
computation or a copy of data stored elsewhere.
Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed up and
synchronizing with high-speed CPU.
LRU: The LRU caching scheme is to remove the least recently used frame when the cache is full
and a new page is referenced which is not there in cache.
Direct Mapped Cache: A cache where the cache location for a given address is determined from
the middle address bits.
Fully Associative Mapping: Fully Associative Mapping is a cache mapping technique that
allows to map a block of main memory to any freely available cache line.
Set Associative Cache: Set-associative cache is a specific type of cache memory that occurs in
RAM and processors.
SELF ASSESSMENT QUESTIONS
Short Answer Questions
1. Define cache.
2. List the types of caches.
3. Illustrate the levels of memory in cache.
4. Write few performance measure of cache.
201
ANNA UNIVERSITY
202
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
10.1 Concept of Data Transfer
10.1.1 Serial Data Transfer
10.1.2 Parallel Transfer
10.1.3 Serial-to-Parallel or Parallel-to-Serial Conversion
10.2 Full-Duplex–Half Duplex Interaction
10.2.1 Full-duplex
10.2.2 Half-duplex
10.3 Bus Interface
10.3.1 Types of Bus
10.3.2 Interrupt
Summary
Keywords
Self-Assessment Questions
Further Reading
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Gains knowledge of data transfer
• Illustrate the serial data transfer and parallel data transfer
• Describe in detail about synchronous and asynchronous transmission
• Illustrate the concept of full duplex-half duplex interaction
• Describe the concept of bus interface
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Gains idea about data transfer method in computer
203
ANNA UNIVERSITY
204
ANNA UNIVERSITY
• Parallel transfer
Addition of start and stop increase the number of data bits. Hence more bandwidth is consumed
in asynchronous transmission. There is idle time between the transmissions of different data
bytes. This idle time is also known as Gap. The gap or idle time can be of varying intervals. This
mechanism is called Asynchronous, because at byte level sender and receiver need not to be
synchronized. But within each byte, receiver must be synchronized with the incoming bit stream.
205
ANNA UNIVERSITY
206
ANNA UNIVERSITY
In the absence of start & stop bits, bit synchronization is established between sender & receiver
by ‘timing’ the transmission of each bit.
Since the various bytes are placed on the link without any gap, it is the responsibility of receiver
to separate the bit stream into bytes so as to reconstruct the original information. In order to
receive the data error free, the receiver and sender operates at the same clock frequency.
Application of Synchronous transmission
• Synchronous transmission is used for high speed communication between computers.
Advantage of Synchronous transmission
• This method is faster as compared to asynchronous as there are no extra bits (start bit &
stop bit) and also there is no gap between the individual data bytes.
Disadvantages of Synchronous transmission
• It is costly as compared to asynchronous method. It requires local buffer storage at the
two ends of line to assemble blocks and it also requires accurately synchronized clocks at
both ends. This lead to increase in the cost.
• The sender and receiver have to operate at the same clock frequency. This requires proper
synchronization which makes the system complicated.
10.2 Parallel transmission
Within a computing or communication device, the distances between different subunits are too
short. Thus, it is normal practice to transfer data between subunits using a separate wire to carry
each bit of data. There are multiple wires connecting each sub-unit and data is exchanged using
a parallel transfer mode. This mode of operation results in minimal delays in transferring each
word.
In parallel transmission, all the bits of data are transmitted simultaneously on separate
communication lines. In order to transmit n bits, n wires or lines are used. Thus each bit has its
own line.
All n bits of one group are transmitted with each clock pulse from one device to
another i.e. multiple bits are sent with each clock pulse. Parallel transmission is used for short
distance communication.
As shown in the fig, eight separate wires are used to transmit 8 bit data from sender to receiver.
207
ANNA UNIVERSITY
208
ANNA UNIVERSITY
2. This method is slower as compared to parallel transmission as bits are transmitted serially one
after the other.
10.2 TRANSMISSION MODES
Transmission mode means transferring of data between two devices. It is also known as
communication mode. Buses and networks are designed to allow communication to occur
between individual devices that are interconnected. There are three types of transmission
mode:-
1. Simplex Mode
In Simplex mode, the communication is unidirectional, as on a one-way street. Only one of the
two devices on a link can transmit, the other can only receive. The simplex mode can use the
entire capacity of the channel to send data in one direction.
Example: Keyboard and traditional monitors. The keyboard can only introduce input, the
monitor can only give the output.
Half-Duplex Mode
In half-duplex mode, each station can both transmit and receive, but not at the same time. When
one device is sending, the other can only receive, and vice versa. The half-duplex mode is used
in cases where there is no need for communication in both direction at the same time. The entire
capacity of the channel can be utilized for each direction.
Example: Walkie- talkie in which message is sent one at a time and messages are sent in both
the directions.
Channel capacity=Bandwidth * Propagation Delay
209
ANNA UNIVERSITY
Full-Duplex Mode
In full-duplex mode, both stations can transmit and receive simultaneously. In full duplex
mode, signals going in one direction share the capacity of the link with signals going in other
direction, this sharing can occur in two ways:
• Either the link must contain two physically separate transmission paths, one for sending and
other for receiving.
• Or the capacity is divided between signals travelling in both directions.
Full-duplex mode is used when communication in both direction is required all the time. The
capacity of the channel, however must be divided between the two directions.
Example: Telephone Network in which there is communication between two persons by a
telephone line, through which both can talk and listen at the same time.
Channel Capacity=2* Bandwidth*propagation Delay
210
ANNA UNIVERSITY
But before discussing them i.e., types of buses ,will first describe here one of the most important
aspect of buses which is described below. Any Bus consists, typically of form about 50 to 100 of
separate lines. And on any bus, the lines may generally be classified into three functional groups,
as depicted in figure below:
Data Lines:
Data Lines provide a path for moving data between system modules. It is bidirectional which
means data lines are used to transfer data in both directions. As an example, CPU can read data
on these lines drom memory as well as send data out of these lines to a memory location or to a
port. And in any bus the no. of lines in data lines are either 8, 16, 32 or more depending on size
of bus. These lines, collectively are called as data bus.
Address Lines:
Address Lines are collectively called as address bus. In any bus, the no. of lines in address are
usually 16, 20, 24, or more depending on type and architecture of bus. On these lines, CPU sends
out the address of memory location on I/O Port that is to be written on or read from. In short, it is
an internal channel from CPU to Memory across which the address of data(not data) are
transmitted. Here the communication is one way that is, the address is send from CPU to
Memory and I/O Port but not Memory and I/O port send address to CPU on that line and hence
these lines are unidirectional.
Control Lines:
Control Lines are collectively called as Control Bus. Control Lines are gateway used to transmit
and receives control signals between the microprocessor and various devices attached to it. In
other words, Control Lines are used by CPUs for Communicating with other devices within the
computer. As an example-CPU sends signals on the Control bus to enable the outputs of address
memory devices and port devices. Typical Control Lines signals are:
-Memory Read
-Memory Write
-I/O Read
-I/O Write
-Bus Request
-Bus Grant, etc.
Operation of Bus
The Operation of Bus is as follows:
If One module wishes to send data to another, it must do two things:
1.Obtain the use of module Bus
2.Transfer for data to the Bus
If one module wishes to request data from another module, it must:
Obtain the use of Bus Transfer a request to other module over the appropriate control and
211
ANNA UNIVERSITY
address lines. It must then wait for that second module to send the data.
10.3.1 Types of Bus
There are variety of Buses, but here i will describe only those that are widely used.
System Bus:
A Bus that connects major computer components (Processor, Memory, I/O) is called a System
Bus. It is a single computer bus among all Buses that connects all these components of a
computer system. And it is the only Bus, in which data lines, address, control lines all are
present. It is also known as "front side” Bus. It is faster than peripheral Bus (PCI, ISA, etc) but
slower than backside Bus.
Peripheral Bus (I/O Bus /External Bus):
Peripheral Bus also known as "I/O Bus". It is data pathway that connects peripheral devices to
the CPU. In other words , in computing, a peripheral bus is a computer bus designed to support
computer peripheral like printers, hard drives. The PCI and USB buses are commonly used
Peripheral Buses, and are today used in commonly many PCs. Now we will discuss both of them
in brief:
PCI(Peripheral Component Interconnect): PCI Bus connects the CPU and expansion
boards such as modem cards ,network cards and sound cards. These expansion boards are
normally plugged into expansion slots on the motherboard. That's why PCI bus is also known as
expansion bus or external Bus.
USB(Universal Serial Bus): Universal Serial Bus is used to attach USB devices like Pen Drive,
etc to CPU.
Local Bus:
Local Bus are the traditional I/O(Peripheral) buses such as ISA,MCA, or EISA Buses. Now we
will discuss about each in brief one by one.
ISA(Industry Standard Architecture Bus): The ISA Bus permit bus mastering i.e., it enabled
peripheral connected directly to the bus to communicate directly with other peripherals without
going through the processor. One of the consequences of bus mastering is Direct Memory
Access. Up to end of 1990s almost all PCs computers were equipped with ISA Bus, but it was
progressively replaced by the PCI Bus, which offer a better performance.
MCA(Micro Channel Architecture): It is an improved proprietary bus designed by IBM in
1987 to be used in their PS/2 lines of computers.
EISA(Extended Industry Standard Architecture): The EISA Bus use connectors that were
same size as the ISA connectors but with 4 rows of contacts instead of 2 for 32 bit addressing.
High Speed Bus:
High Speed Bus are specifically designed to support high capacity I/O devices. High Speed Bus
212
ANNA UNIVERSITY
brings high demand devices into closer integration with the processor. This Bus supports
connection to high speed LANs, such as Fast Ethernet at 100 Mbps, video and graphic
workstation, firewire etc.
10.3.2 Interrupts
Interrupt is a signal emitted by hardware or software when a process or an event needs
immediate attention. It alerts the processor to a high-priority process requiring interruption of the
current working process. In I/O devices one of the bus control lines is dedicated for this purpose
and is called the Interrupt Service Routine (ISR).
When a device raises an interrupt at let’s say process i, the processor first completes the
execution of instruction i. Then it loads the Program Counter (PC) with the address of the first
instruction of the ISR. Before loading the Program Counter with the address, the address of the
interrupted instruction is moved to a temporary location. Therefore, after handling the interrupt
the processor can continue with process i+1.
While the processor is handling the interrupts, it must inform the device that its request has been
recognized so that it stops sending the interrupt request signal. Also, saving the registers so that
the interrupted process can be restored in the future, increases the delay between the time an
interrupt is received and the start of the execution of the ISR. This is called Interrupt Latency.
Hardware Interrupts:
In a hardware interrupt, all the devices are connected to the Interrupt Request Line. A single
request line is used for all the n devices. To request an interrupt, a device closes its associated
switch. When a device requests an interrupt, the value of INTR is the logical OR of the requests
from individual devices.
The sequence of events involved in handling an IRQ:
• Devices raise an IRQ.
• The processor interrupts the program currently being executed.
• The device is informed that its request has been recognized and the device deactivates the
request signal.
• The requested action is performed.
• An interrupt is enabled and the interrupted program is resumed.
Handling Multiple Devices:
When more than one device raises an interrupt request signal, then additional information is
needed to decide which device to be considered first. The following methods are used to decide
which device to select: Polling, Vectored Interrupts, and Interrupt Nesting. These are explained
as following below.
Polling: In polling, the first device encountered with the IRQ bit set is the device that is to be
serviced first. Appropriate ISR is called to service the same. It is easy to implement but a lot of
213
ANNA UNIVERSITY
Learning Activity
SUMMARY
• A data move activity communicates signs or data bits between two machines.
• Serial Transfer implies that the data is moved along a solitary line the slightest bit at a
time.
• Parallel Transfer implies that each bit of data is continued on its own line and that all bits
move all the while as they did in the parallel register.
• Asynchronous transmission sends just each character in turn where a person is either a
letter of the letters in order or number or control character.
• Synchronous transmission doesn't utilize start and stop bits.
• Serial-to-parallel change or parallel-to-serial transformation depicts the way where data is
stored in a capacity gadget and the way in which that data is taken out from the capacity
gadget.
KEYWORD
Data transfer: Data transfer refers to the secure exchange of large files between systems or
organizations.
Serial Transfer: Serial Transfer means that the data is moved along a single line one bit at a
214
ANNA UNIVERSITY
time.
Parallel Transfer: Parallel Transfer means that each bit of data is moved on its own line and
that all bits transfer simultaneously as they did in the parallel register.
Asynchronous transmission: It sends only one character at a time where a character is either
a letter of the alphabet or number or control character.
Synchronous transmission: It is a method in which bit stream is combined into longer
frames that may contain multiple bytes.
Full-duplex: In full duplex communication, data can travel in both directions
simultaneously.
Half-duplex: A half-duplex channel can send and receive, but not at the same time.
Bus: A bus is a high-speed internal connection. Buses are used to send control signals and data
between the processor and other components.
Bit Synchronization: Bit-synchronous operation is a type of digital communication in which
the data circuit-terminating equipment (DCE), data terminal equipment (DTE), and transmitting
circuits are all operated in bit synchronism with a clock signal.
SELF ASSESSMENT QUESTIONS
Short Answer Questions
1. Write the concept of data transmission.
2. Compare the different types of transmission modes
3. Define bus interface
4. List out the types of bus interface
5. Recall interrupt.
6. What is meant by hardware interrupt?
Long Answer Questions
1. Describe in brief about data transmission.
2. Explain bus interface and its types.
3. Discuss in detail about the transmission mode with example.
4. Write concept of interrupt in brief.
FURTHER READING
1. Mano, M. Morris, and M. Morris Mano. 2007. “Digital design”. Englewood Cliffs, N.J.:
Prentice-Hall.
2. Godse A.D. 2008. “Computer Organisation and Architecture”, Technical Publications.
3. Harris David. 2012. “Digital Design and Computer Architecture”, Elsevier.
4. Ghoshal Subrata . 2011. “Computer Architecture and Organization”, Pearson
Education.
5. Gill s. Naseeb. 2011. “ Digital Design and Computer Organisation”, Firewall Media.
215
ANNA UNIVERSITY
CONTENTS
Learning Objectives
Learning Outcomes
Overview
11.1 Programmed Input/Output
11.1.1 Polling
11.2 Interrupt Driven I/O
11.3 Hardware Interrupt Mechanism
11.3.1 Hardware Interrupt Common Terms
11.4 Multi Level of Interrupts
11.5 Direct Memory Access (DMA)
11.5.1 DMA and Interrupt Breakpoints
11.5.2 DMA Controlled Operation
11.5.3 Buffer Chaining
11.5.4 Operation Chaining
Summary
Keywords
Self-Assessment Questions
Further Reading
LEARNING OBJECTIVES
After studying this chapter, you can able to:
• Concept of programmed Input/Output
• Identify interrupt driven I/O
• How the hardware interrupt mechanism works
• Concept of multi-level of interrupts and direct memory access
LEARNING OUTCOMES
Upon completion of the chapter, students are able to understand :
• Recall programmed input/output strategy
• The idea of Interrupt driven I/O
216
ANNA UNIVERSITY
217
ANNA UNIVERSITY
The processor issue the I/O commands to the I/O module can be of four types.
1. Control: This I/O command activates the I/O module addressed by the processor and
directs it to the task it has to perform. This command can be customized depending upon
the type of peripherals.
2. Test: This I/O command tests the status of the I/O module and its peripherals to ensure
that the addressed peripheral is powered on and available for the task. This command also
tests whether the most recent I/O operation has completed successfully or if any error has
occurred.
3. Read: This I/O command lets the I/O module extract the data from the corresponding
peripheral and store it in its internal buffer. Further, the I/O module can place this data
over the data bus on the processor’s demand.
4. Write: This I/O command lets the I/O module accept the data over the data bus and
transmit it to the corresponding peripheral.
11.1.1 I/O Instructions
• The I/O instruction encountered by the processor is issued to the processor by the main
memory. And to execute this I/O instruction the processor provides the I/O command to
the corresponding I/O device. Thereby the I/O instruction cab is simply mapped onto the
I/O command. Usually, there is a simple one-to-one relationship between I/O instruction
and I/O command.
• The I/O instruction can also be customized depending on the peripherals addressed. As
we have discussed above how the external device or peripheral recognizes that the
processor is addressing them and has issued them the I/O command.
• So, when the processor, main memory, and I/O module share the common bus then
addressing can be achieved in two ways memory-mapped I/O and isolated I/O.
• With the memory-mapped I/O, the processor access memory and I/O using a single
address space. Here the processor uses the same address, data, and control bus. So, the
same set of machine instructions addresses both memory and I/O.
• With isolated I/O the address space for memory is isolated from the address space of I/O.
Though the processor uses the same data and address line for memory and I/O devices it
uses a separate control line for memory and I/O devices.
• The memory-mapped I/O has a large set of I/O instructions as compared to the isolated
I/O.
• So, with the programmed I/O for every I/O transfer or I/O operation, a program is written
to implement the task. The other two methods of I/O transfer i.e. interrupted I/O and
DMA involve the use of interrupt.
218
ANNA UNIVERSITY
11.1.2 Polling
Polling is the process where the computer or controlling device waits for an external device to
check for its readiness or state, often with low-level hardware.
Example: When a printer is connected via a parallel port, the computer waits until the printer has
received the next character. These processes can be as minute as only reading one bit. This is
sometimes used synonymously with 'busy-wait' polling. In this situation, when an I/O operation
is required, the computer does nothing other than check the status of the I/O device until it is
ready, at which point the device is accessed. In other words, the computer waits until the device
is ready. Polling also refers to the situation where a device is repeatedly checked for readiness,
and if it is not, the computer returns to a different task. Although not as wasteful of CPU cycles
as busy waiting, this is generally not as efficient as the alternative to polling, interrupt-
driven I/O.
Algorithm
Polling can be described in the following steps:
Host actions:
• The host repeatedly reads the busy bit of the controller until it becomes clear (with a
value of 0).
• When clear, the host writes the command into the command register. If the host is
sending output, it sets the write bit and writes a byte into the data-out register. If the host
is receiving input, it reads the controller-written data from the data-in register, and sets
the read bit to 0 as the next command.
• The host sets the command-ready bit to 1.
Controller actions:
• When the controller notices that the command-ready bit is set, it sets the busy bit to 1.
• The controller reads the command register. If the write bit inside is set, it reads from the
data-out register and performs the necessary I/O operations on the device. If the read bit
is set, data from the device is loaded into the data-in register for the host to read.
• Once the operations are over, the controller clears the command-ready bit, clears the error
bit to show the operation was successful, and clears the busy bit.
11.2 INTERRUPT I/O DRIVEN
Interrupt driven I/O is an alternative scheme dealing with I/O. Interrupt I/O is a way of
controlling input/output activity whereby a peripheral or terminal that needs to make or receive a
data transfer sends a signal. This will cause a program interrupt to be set. At a time appropriate to
the priority level of the I/O interrupt. Relative to the total interrupt system, the processors enter
an interrupt service routine. The function of the routine will depend upon the system of interrupt
levels and priorities that is implemented in the processor. The interrupt technique requires more
complex hardware and software, but makes far more efficient use of the computer’s time and
capacities. Figure 11.1 shows the simple interrupt processing.
219
ANNA UNIVERSITY
For input, the device interrupts the CPU when new data has arrived and is ready to be retrieved
by the system processor. The actual actions to perform depend on whether the device uses I/O
ports or memory mapping.
For output, the device delivers an interrupt either when it is ready to accept new data or to
acknowledge a successful data transfer. Memory-mapped and DMA-capable devices usually
generate interrupts to tell the system they are done with the buffer.
220
ANNA UNIVERSITY
• How does the processor decide which module to process when multiple interrupts have
occurred?
There are 4 main ways to counter these problems, which are:
• Multiple Interrupt Lines
• Software Poll
• Daisy Chain (Hardware Poll, Vectored)
• Bus Arbitration (Vectored)
These are known as the 4 general categories of techniques that are commonly used in I/O
interrupt.
Multiple Interrupt Lines
As the name suggests, we provide multiple interrupt lines between the processor and the I/O
modules. This allows multiple modules to be handled at the same time. However, it is not
practical to assign many bus lines and processor pins to interrupt lines. One of the reasons is that
there might be more than one I/O module attached to a single line. This defeats the purpose of
this technique. The 3 techniques of the latter are usually used together with this technique to
rectify its’ problems.
Software Poll
Whenever an interrupt is detected by the processor, it branches to an interrupt service routine
which will poll each and every I/O module to determine the exact interrupting module. The
processor raises a poll which could be in the form of a command line. Consequently, the address
of the respective I/O module which is interacted by the poll will be placed on the address line.
The module will respond positively if it is responsible for setting the interrupt. On the other
hand, every I/O module has an addressable status register. This register can be read by the
processor to determine the interrupting module. After that, the processor is then branched to a
specific device-service routine. The downside to this technique is that it is time consuming.
Daisy Chain (Hardware Poll, Vectored)
This is actually a hardware poll. The interrupt acknowledge line is daisy chained to all the
modules. Whenever there is an interrupt, the processor send out an interrupt acknowledge which
will propagate throughout the series of I/O modules. This process will continue until it reaches a
requesting module. The module will respond by placing a word on the data lines. The word is
known as vector. This vector can either be the address of the module or a specific identifier. The
processor subsequently directs the module to its’ specific device-service routine based on its’
vector. This technique is also known as the vectored interrupt. It completely removes the need
for interrupt-service routine.
Bus Arbitration (Vectored)
The last method which also utilizes vectored interrupts is bus arbitration. This method involves
the I/O module gaining control over the bus before requesting for the interrupt. This is limited to
221
ANNA UNIVERSITY
only one module at a time. The processor sends an acknowledge signal whenever it detects an
interrupt. The requesting module then places its’ vector on the data lines.
However, when there are multiple interrupts at a single time, there will be a need to assign
priorities.
Examples of Interrupt Structures
Intel 82C59A
222
ANNA UNIVERSITY
223
ANNA UNIVERSITY
noticing!
Hardware interrupt Common terms
Terms you might hear associated with hardware interrupts are ISR, interrupt mask, non maskable
interrupt, an asynchronous event, interrupt vector and context switching.
ISR
An ISR is simply another program function. It is no different to any other function except that it
does context switching (saving/restoring processor registers) and at the end of the function it re-
enables interrupts.
After the ISR finishes program execution returns to the original program and it continues exactly
from where it was interrupted. The original program will have no idea that this has happened.
Hardware Interrupt Vector
This is a fixed address that contains the location of your ISR – for a PIC micro it is usually
address 4. In other micros there may be an interrupt vector for each vector – you have to make an
interrupt routine for each one that you want to use. For PIC micros you have just one interrupt
and you have to detect which interrupt triggered by examining the interrupt flag register(s).
You program the interrupt address with the address of your interrupt routine. Whenever the
interrupt is triggered (and if the interrupt is unmasked) program operation jumps to the location
of your interrupt routine.
NMI
The NMI is exactly the same as a normal interrupt except that you can not control whether or not
it is active - it's always active. It is more commonly found on older/larger processors as a
dedicated input pin. In this case it is more than likely fed with a brown-out-detector (BOD)
circuit i.e a PSU voltage dip detector circuit - for detecting a condition that should never be
ignored!
You don't need it in a microcontroller as you can achieve exactly the same functionality using a
224
ANNA UNIVERSITY
Interrupt Mask
The interrupt mask has a set of bits identical to those in the interrupt register. Setting any bit (or
unmasking) lets the corresponding signal source generate an interrupt - causing the processor to
225
ANNA UNIVERSITY
226
ANNA UNIVERSITY
embodiment of the present invention, no other interrupts are allowed while handling the
current interrupt using this process, at least until the process is passed to a user- level interrupt
handling procedure as will be described.
227
ANNA UNIVERSITY
Explanation :
The CPU initializes the DMA by sending the given information through the data bus.
• The starting address of the memory block where the data is available (to read) or where
data are to be stored (to write).
• It also sends word count which is the number of words in the memory block to be read or
write.
• Control to define the mode of transfer such as read or write.
• A control to begin the DMA transfer.
11.5.1 DMA and Interrupt Breakpoints
DMA interface transfers complete block of data one word at a time directly to or from memory
without going through processor. When transfer is complete, DMA interface transmits an
interrupt signal to processor. So in DMA processor involvement can be restricted at beginning
and end of transfer that can be displayed as in figure above. However question is when should
DMA take control of bus? For this we would recall phenomenon of execution of an instruction
by processor. Figure below displays five cycles for an instruction execution. Figure also displays
five points where a DMA request can be responded to and a point where interrupt request can be
responded to. Please note an interrupt request is acknowledged only at one point of an instruction
cycle and that is at interrupt cycle.
228
ANNA UNIVERSITY
229
ANNA UNIVERSITY
DMA controller then loads its own channel parameters from memory. Generally, the more
sophisticated the DMA controller, the less servicing the CPU has to perform. A DMA controller
has one or more status registers that are read by the CPU to determine the state of each DMA
channel. The status register typically indicates whether a DMA request is asserted on a channel
and whether a channel has reached TC. Reading the status register often clears the terminal count
information in the register, which leads to problems when multiple programs are trying to use
different DMA channels.
11.5.3 Buffer Chaining
With interrupts, the processor is occasionally interrupted from implementing the main
program to stock incoming data in a buffer for later retrieval as well as processing. With
DMA, a dedicated data transfer device reads incoming data from a device and stocks that data
in a system memory buffer for future retrieval by the processor. DMA controllers need
reprogramming when a DMA channel reaches TC. Hence, DMA controllers need some CPU
time, but far less than is needed for the CPU to service device I/O interrupts. When a DMA
channel attains TC, the processor may need to reprogram the controller for further DMA
transfers. Few DMA controllers interrupt the processor whenever a channel ends. DMA
controllers also possess mechanisms for automatically reprogramming a DMA channel at
times the DMA transfer sequence finishes.
These mechanisms involve auto-initialisation and buffer chaining. Buffer chaining is
valuable for transferring blocks of data into non-contiguous buffer regions or for tackling
double-buffered data acquisition. With buffer chaining, a channel interferes the CPU and is
programmed with the subsequent address and count parameters whereas DMA transfers are
being conducted on the recent buffer. Few DMA controllers minimise CPU intervention
additionally by possessing a chain address register that indicates a chain control table in
memory. The DMA controller thereafter loads its personal channel parameters from memory.
With buffer chaining mode, multiple buffers can be filled/emptied with minimum CPU
intervention as well as no delay between buffers. Ring buffer mode is an improvement to
auto-initialisation mode. A stop register can be programmed with an address that is
compared against the present address register. In case a match is identified, the DMA
controller stops trans till the stop register is reprogrammed. Ring buffer mode can be
utilised with auto- initialisation to manage a circular buffer and stop data from being
overwritten unintentionally.
11.5.4 Operation Chaining
DMA operations enable transference of data from a source to a destination without needing
a host processor to regulate the data transfer itself.
Example: Examples of source include a memory, input/output (I/O) device, etc.
Examples of destination include another memory, I/O device, etc. Transfers may be between
230
ANNA UNIVERSITY
231
ANNA UNIVERSITY
operations (“dynamic DMA chaining”). Connected lists of control blocks (also depicted
herein as “chains” or “control-block chains”) that permit such appending of extra control
blocks are regarded “dynamic”. On the contrary, chains that are described before the start of
a corresponding DMA operation and cannot be appended prior to such operation is completed
are regarded “static”.
Learning Activity
Interrupt in 8051
Interrupts are the events that temporarily suspend the main program, pass the control to the
external sources and execute their task. It then passes the control to the main program where
it had left off.
8051 has 5 interrupt signals, i.e. INT0, TFO, INT1, TF1, RI/TI. Each interrupt can be
enabled or disabled by setting bits of the IE register and the whole interrupt system can be
disabled by clearing the EA bit of the same register.
IE (Interrupt Enable) Register
This register is responsible for enabling and disabling the interrupt. EA register is set to one
for enabling interrupts and set to 0 for disabling the interrupts. Its bit sequence and their
meanings are shown in the following figure.
232
ANNA UNIVERSITY
Questions
• Discuss a simple mechanism of 8051.
• Discuss the signification of bits in the IP Register.
SUMMARY
• In Programmed Input/Output method, CPU controls entire IO transaction by
executing a sequence of IO instructions included in a program.
• Polling is a form of foreground data acquisition in which the processor is dedicated to
acquiring the incoming data, often by waiting in a loop.
• The term interrupt loosely is used for any exceptional event that causes temporary
transfer of control of CPU from one program to the other which is causing the
interrupt.
• Interrupts can be generated by various sources internal external to the CPU.
233
ANNA UNIVERSITY
•DMA is the transferring of data from one storage device to memory to another
device without using the CPU.
• The three primary data transfer mechanisms for computer-based data acquisition are
polling, interrupts (also known as programmed I/O), and direct memory access (DMA).
KEYWORD
PIO: Programmed input/output (PIO) is a method of transferring data between the CPU and
a peripheral, such as a network adapter or an ATA storage device.
Polling: The CPU, while waiting must repeatedly check the status of the I/O module, and
this process is known as Polling.
Interrupt: The term interrupt is a response by the processor to an event that needs attention
from the software.
Daisy Chaining: A daisy chain is a wiring scheme in which multiple devices are wired together
in sequence or in a ring, similar to a garland of daisy flowers.
Interrupt vector: An interrupt vector is the memory location of an interrupt handler, which
prioritizes interrupts and saves them in a queue if more than one interrupt is waiting to be
handled.
Direct Memory Access: Direct memory access is a feature of computer systems that allows
certain hardware subsystems to access main system memory independently of the central
processing unit.
SELF ASSESSMENT QUESTION
Short Answer Questions
1. List the different types of I/O Commands?
2. Define polling.
3. Write about hardware interrupt.
4. What do you mean by direct memory access?
5. Discuss the benefits of using interrupts
Long Answer Questions
1. Describe the modes of data transfers with examples.
2. Explain the key issues in Interrupt driven Input/output and also discuss thesolutions
to these problems.
3. Each hardware interrupt source has an associated interrupt flag. Elaborate.
4. Illustrate a method for handling an interrupt in accordance with an embodiment of
the present invention.
5. Discuss in detail about the operation of multilevel of interrupt.
6. Describe the data transfer modes of Direct Memory Access (DMA).
FURTHER READING
1. Hamacher Carl. 2002, “Computer Organization”, 5th Edition, Tata McGraw Hill.
234
ANNA UNIVERSITY
235
ANNA UNIVERSITY
236
ANNA UNIVERSITY
PART-C (1*15=15)
(Each answer should be written for minimum 6 pages with minimum 25 lines per page)
16. a. Discuss the role of DreamWorks Animation in delivering 3D animated film in 15
2011: Kung Fu Panda 2*.
(OR)
16. b. Represent each of the AND, OR, NOT, and XOR gates using only NOR 15
gates.
237
ANNA UNIVERSITY
REFERENCES:
1. Mano, M. Morris, and M. Morris Mano. 2007. “Digital design”. Englewood Cliffs, N.J.:
Prentice-Hall.
2. Mohammad A. Karim, Xinghao Chen. 2007. “Digital Design Basic Concepts and Principles”.
3. William Gothmann H .1977. “Digital Electronics : An Introduction To Theory And Practice”.
4. Mano M Morris.2007. “ Computer System Architecture” Edition 3.
5. P.K. Sinha.2004. “Computer Fundamentals”.
6. Comer E. Douglas. 2012, “Essentials of Computer A rchitecture”,Pearson, sixth edition.
7. Godse A.D. 2008. “Computer Organisation and Architecture”, Technical Publications.
8. Harris David. 2012. “Digital Design and Computer Architecture”, Elsevier.
9. Mano Morris. 2012. “Logic and Computer Design Fundamentals”,Cram101.
10. Hamacher Carl. 2002. “Computer Organization”, 5th Edition, Tata McGraw Hill.
11. Hennessy L. John. 2012. “Computer Architecture: A Quantitative Approach”, Elsevier.
12. Patterson A. David. 2011. “Computer Organization and Design”, Elsevier.
13. Tanenbaum S.Andrew. 2012.“Structured Computer Organization”, Prentice Hall PTR.
14. Wang Paul Shuangbao. 2012. “ Computer Architecture and Security”, John Wiley & Sons.
15. Ghoshal Subrata . 2011. “Computer Architecture andOrganization”, Pearson Education.
16. Gill s. Naseeb. 2011. “ Digital Design and Computer Organisation”,Firewall Media.
238