Bcs304module5slides 241029150507 B6e5bfba

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

SIR M VISVESVARAYA INSTITUTE OF TECHNOLOGY

Krishnadevarayanagar, Hunasamaranahalli, Yelahanka, International


Airport Road, Bengaluru - 562 157

Data Structures and Applications


BCS304
Syllabus
Module – 1
INTRODUCTION TO DATA STRUCTURES: Data Structures, Classifications (Primitive & Non-Primitive),
Data structure Operations, Review of pointers and dynamic Memory Allocation,
ARRAYS and STRUCTURES: Arrays, Dynamic Allocated Arrays, Structures and Unions, Polynomials,
Sparse Matrices, representation of Multidimensional Arrays, Strings
STACKS: Stacks, Stacks Using Dynamic Arrays, Evaluation and conversion of Expressions
Text Book: Chapter-1:1.2 Chapter-2: 2.1 to 2.7 Chapter-3: 3.1,3.2,3.6
Reference Book 1: 1.1 to 1.4 8 Hours

Module – 2
QUEUES: Queues, Circular Queues, Using Dynamic Arrays, Multiple Stacks and queues.
LINKED LISTS : Singly Linked, Lists and Chains, Representing Chains in C, Linked Stacks and
Queues, Polynomials
Text Book: Chapter-3: 3.3, 3.4, 3.7 Chapter-4: 4.1 to 4.4

Department of CSE 2
Syllabus
Module – 3
LINKED LISTS : Additional List Operations, Sparse Matrices, Doubly Linked List.
TREES: Introduction, Binary Trees, Binary Tree Traversals, Threaded Binary Trees.
Text Book: Chapter-4: 4.5,4.7,4.8 Chapter-5: 5.1 to 5.3, 5.5 8 Hours

Module – 4
TREES(Cont..): Binary Search trees, Selection Trees, Forests, Representation of Disjoint sets,
Counting Binary Trees,
GRAPHS: The Graph Abstract Data Types, Elementary Graph Operations
Text Book: Chapter-5: 5.7 to 5.11 Chapter-6: 6.1, 6.2

Module – 5
HASHING: Introduction, Static Hashing, Dynamic Hashing
PRIORITY QUEUES: Single and double ended Priority Queues, Leftist Trees
INTRODUCTION TO EFFICIENT BINARY SEARCH TREES: Optimal Binary Search Trees
Text Book: Chapter 8: 8.1 to 8.3 Chapter 9: 9.1, 9.2 Chapter 10: 10.1
Department of CSE 3
Textbook and References
Textbooks:
1. Ellis Horowitz and Sartaj Sahni, Fundamentals of Data Structures in C, 2nd Ed, Universities Press,
2014.
Reference Books:
1. Seymour Lipschutz, Data Structures Schaum's Outlines, Revised 1st Ed, McGraw Hill,2014.
2. Gilberg & Forouzan, Data Structures: A Pseudo-code approach with C, 2nd Cengage
Ed, Learning,2014.
3. Reema Thareja, Data Structures using C, 3rd Ed, Oxford press, 2012.
4. Jean-Paul Tremblay & Paul G. Sorenson, An Introduction to Data Structures with Applications, 2nd
Ed, McGraw Hill, 2013
5. A M Tenenbaum, Data Structures using C, PHI, 1989
6. Robert Kruse, Data Structures and Program Design in C, 2nd Ed, PHI, 1996.

Department of CSE 4
COURSE OBJECTIVE
After studying this course, students will be able
to: Course Learning Objective
1. To explain fundamentals of data structures and
their applications.
2. To illustrate representation of Different data structures such as Stack, Queues,
LinkedLists, Trees and Graphs.

3. To Design and Develop Solutions to problems using Linear Data Structures.

4. To discuss applications of Nonlinear Data Structures in problem solving.

5. To introduce advanced Data structure concepts such as Hashing and Optimal


BinarySearch Trees

Department of CSE 5
COURSE OUTCOMES
After studying this course, students will be able
to: Course Outcomes
1. Explain different data structures and their
applications.

2. Apply Arrays, Stacks and Queue data structures to


3. Use the
solve theconcept of linked list in problem solving.
given problems.
4. Develop solutions using trees and graphs to model the real-world problem.

5. Explain the advanced Data Structures concepts such as Hashing Techniques and
Optimal Binary Search Trees.

Department of CSE 6
Module – 5

HASHING: Introduction, Static Hashing, Dynamic Hashing


PRIORITY QUEUES: Single and double ended Priority Queues, Leftist Trees
INTRODUCTION TO EFFICIENT BINARY SEARCH TREES: Optimal Binary
Search Trees
Text Book: Chapter 8: 8.1 to 8.3 Chapter 9: 9.1, 9.2 Chapter 10: 10.1
8 Hours

Department of CSE 7
Additional Operations on Linked List
1. Inserting an element at a given position
2. Inserting an element after a given position
3. Inserting an element before a given position
4. Inserting an element before Key Element
5. Inserting an element after the key element
6. Ordered Inserting into a list
7. Deleting a key element
8. Deleting an element at a given position
9. Reverse a linked list
10. Concatenation of two linked list
11. Count the number of nodes in a list
Department of CSE 8
Hashing
The Symbol Table Abstract Data Type
Static Hashing
Hash Tables
Hashing Functions
Mid-square
Division
Folding
Digit Analysis
Overflow Handling
Linear Open Addressing, Quadratic probing,
Rehashing Chaining
Department of CSE 9
The Symbol Table ADT

• Many example of dictionaries are found in many applications, Ex.


spelling checker
• In computer science, we generally use the term symbol table rather
than dictionary, when referring to the ADT.
• We define the symbol table as a set of name-attribute pairs.

• Example: In a symbol table for a compiler


• the name is an identifier
• the attributes might include an initial value
• a list of lines that use the identifier.
Department of CSE 10
The Symbol Table ADT

• Operations on symbol table:


• Determine if a particular name is in the table
• Retrieve/modify the attributes of that name
• Insert/delete a name and its attributes
• Implementations
• Binary search tree: the complexity is O(n)
• Some other binary trees (chapter 10): O(log n).
• Hashing
• A technique for search, insert, and delete operations that
has very good expected performance.
Department of CSE 11
The Symbol Table ADT

Department of CSE 12
Search Techniques
• Hashing methods
• Relies on a formula called the hash function.
• Types of hashing
• Static hashing
• In• stDatyicnahmasichihnags,hwineg store the identifiers in a fixed size
table called a
hash table
• Arithmetic function, f
• To determine the address of an identifier, x, in the table
• f(x) gives the hash, or home address, of x in the table

• Hash table, ht
Hash Tables

Department of CSE 14
Hash Tables

• The identifier density of a hash table


is the ratio n/T
• n is the number of identifiers in the
table
• T is possible identifiers
• The loading density or loading
factor
of a hash table is α = n/(sb)
• s is the number of slots
• b is the number of buckets Department of CSE 15
Hash Tables

• Two identifiers, i 1 and


2
i are synonyms with respect to f if1 f(i ) =2
f(i• ) We enter distinct synonyms into the same bucket as long as the bucket has
slots available
• An overflow occurs when we hash a new identifier into a full bucket
• A collision occurs when we hash twonon-identical identifiers into the same
bucket.
• When the bucket size is 1, collisions and overflows occur simultaneously.
• Example 8.1: Hash table
• b = 26 buckets and s = 2 slots. Distinct identifiers n = 10
• The loading factor, is 10/52 = 0.19.
• Associate the letters, a-z, with the numbers , 0-25, respectively Department of CSE 16
Hash Tables

• The time required to enter, delete, or search for identifiers does not depend on
the number of identifiers n in use; Hash function requirements:
• Easy to compute and produces few collisions.
• Unfortunately, since the ration b/T is usually small, we cannot avoid collisions
altogether.
=> Overload handling mechanisms are needed
C library functions (f(x)):
acos(0), define(3), float(5), exp(4), char(2),
atan(0), ceil(2), floor(5), clock(2), ctime(2)

overflow: clock, ctime

Department of CSE 17
Hashing Functions

• A hash function, f, transforms an identifier, x, into a bucket address in the hash


table.
• We want a hash function that is easy to compute and that minimizes the
number of collisions.
• Hashing functions should be unbiased.
• That is, if we randomly choose an identifier, x, from the identifier space, the
probability that f(x) = i is 1/b for all buckets i.
• We call a hash function that satisfies unbiased property a
uniform hash function.
Mid-square, Division, Folding, Digit Analysis

Department of CSE 18
Hashing Functions - Mid-square
• Mid-square fm (x)=middle(x2):

• Frequently used in symbol table applications.


• We compute fm by squaring the identifier and then using an appropriate
number of bits from the middle of the square to obtain the bucket
address.
• The number of bits used to obtain the bucket address depends on the table
size. If we use r bits, the range of the value is 2r.
• Since the middle bits of the square usually depend upon all the characters in
an identifier, there is high probability that different identifiers will
produce different hash addresses.
Department of CSE 19
Hashing Functions - Division
• Division fD(x) = x % M :
• Using the modulus (%) operator.
• We divide the identifier x by some number M and use the remainder as
the hash address for x.
• This gives bucket addresses that range from 0 to M - 1, where M = that
table size.
• The choice of M is critical.
• If M is divisible by 2, then odd keys to odd buckets and even keys
to even buckets. (biased!!)

Department of CSE 20
Hashing Functions - Division (Cont..)
The choice of M is critical (cont’d)
• When many identifiers are permutations of each other, a biased use of the table
results.
• Example: X=x1x2 and Y=x2x1
Internal binary representation: x1 --> C(x1) and x2 --> C(x2)
Each character is represented by six bits
X: C(x1) * 26 + C(x 2), Y: C(x 2) * 26 + C(x 1)
(fD(X) - fD(Y)) % p (where p is a prime number)
= (C(x1) * 26 % p + C(x 2) % p - C(x 2) * 26 % p - C(x 1) % p ) % p
p = 3, 26=64
(64 % 3 * C(x1) % 3 + C(x2) % 3 - 64 % 3 * C(x2) % 3 - C(x1) % 3) % 3
= C(x1) % 3 + C(x2) % 3 - C(x2) % 3 - C(x1) % 3 = 0 % 3
The same behavior can be expected when p = 7
• A good choice for M would be : M a prime number such that M does not divide rk±a
for small k and a.
Department of CSE 21
Hashing Functions - Folding
Folding
• Partition identifier x into several parts
• All parts except for the last one have the same length
• Add the parts together to obtain the hash address
• Two possibilities (divide x into several parts)
• Shift folding:
Shift all parts except for the last one, so that the least significant bit
of each part lines up with corresponding bit of the last part.
x1=123, x2=203, x3=241, x4=112, x5=20, address=699
• Folding at the boundaries:
reverses every other partition before adding
x1=123, x2=302, x3=241, x4=211, x5=20, address=897
Department of CSE 22
Hashing Functions - Folding
Folding example:
123 20
P1 P2 P3 P4 P5
shift folding 203
123
203
241
241
1
1
112 2
2
0
folding at the 123 203 241 112 20
boundaries 699
MSD ---> LSD
LSD <--- MSD
Department of CSE 23
Hashing Functions - Digit Analysis
Digit Analysis
• Used with static files
• A static files is one in which all the identifiers are known in advance.
• Using this method,
• First, transform the identifiers into numbers using some radix, r.
• Second, examine the digits of each identifier, deleting those digits that have
the most skewed distribution.
• We continue deleting digits until the number of remaining digits is small
enough to give an address in the range of the hash table.

Department of CSE 24
Hashing Functions - Digit Analysis
Digital Analysis example:
All the identifiers are known in advance, M=1~999
X1:d11 …
d12 d1n
X2:d21 …
d22 d2n

Xm:dm1
dm2
Select
… 3 digits from n
dm
Criterion: Delete the digits having the most skewed distributions
n

• The one most suitable for general purpose applications is the division method
with a divisor, M, such that M has no prime factors less than 20.

Department of CSE 25
Overflow Handling - Linear probing
Linear open addressing (Linear probing)
• Compute f(x) for identifier x
• Examine the buckets:
ht[(f(x)+j)%TABLE_SIZE], 0 ≤ j ≤ TABLE_SIZE
• The bucket contains x.
• The bucket contains the empty string (insert to it)
• The bucket contains a nonempty string other than x
(examine the next bucket) (circular rotation)
• Return to the home bucket ht[f(x)],
if the table is full we report an error condition and
exit

Department of CSE 26
Overflow Handling - Linear probing

Department of CSE 27
Overflow Handling - Linear probing
Additive transformation and Division
Hash table with linear probing (13 buckets, 1 slot/bucket)

Department of CSE 28
Overflow Handling - Problem of Linear Probing
Problem of Linear Probing
Enter: cfaacd
• Identifiers tend to cluster together felltott
• Adjacent cluster tend to coalesce ehoxe
• Increase the search time oiomo
aipafs
• Example: suppose we enter the C built-in ollitsr
functions into a 26-bucket hash table in order. nree
The hash function uses the first character in

Enter seeqacuhenfucnec:tion name


acos, atoi, char, define, exp,
ceil, cos, float, atol, floor, ctime

# of key
comparisons=35/11=3.18
Hash table with linear probing (26 buckets, 1 slot/bucket)
Department of CSE 29
Overflow Handling
• Alternative techniques to improve open addressing
approach:
• Quadratic probing
• rehashing
• random probing
• Rehashing
• Try f1, f2, …, fm in sequence if collision occurs
• disadvantage
• comparison of identifiers with different hash values
• use chain to resolve collisions
Department of CSE 30
Overflow Handling - Quadratic Probing
Quadratic Probing Prime j Prime j
• Linear probing searches buckets (f(x)+i)%b 3 0 43 10
• Quadratic probing uses a quadratic function of i as the increment 7 1 59 14
11 2 127 31
• Examine buckets f(x),
19 4 251 62
(f(x)+i2)%b, (f(x)-i2)%b, for 1<=i<=(b-1)/2
23 5 503 125
• When b is a prime number of the form 4j+3, j is an integer, the 31 7 1019 254
Chaqiunaidnrgatic search examines every bucket in the table
• Linear probing and its variations perform poorly because inserting an identifier requires
the comparison of identifiers with different hash values.
• In this approach we maintained a list of synonyms for each bucket.
• To insert a new element
• Compute the hash address f (x)
• Examine the identifiers in the list for f(x).
• Since we would not know the sizes of the lists in advance, we should maintain them as
lined chains Department of CSE 31
Overflow Handling - Linear probing

Department of CSE 32
Overflow Handling - Hash Chaining
Results of Hash Chaining
acos, atoi, char, define, exp,
ceil, cos, float, atol, floor,
ctime

f (x)=first character of x

# of key comparisons=21/11=1.91
Department of CSE 33
Overflow Handling - Comparison
Comparison:
• In Figure 8.7, The values in each column give the average number of
bucket accesses made in searching eight different table with
33,575, 24,050, 4909, 3072, 2241, 930, 762, and 500 identifiers each.
• Chaining performs better than linear open addressing.
• We can see that division is generally superior

Department of CSE 34
Dynamic Hashing-using directories

• Dynamic hashing using directories


• Analysis of directory dynamic hashing
• simulation
• Directoryless dynamic hashing

Department of CSE 35
Dynamic Hashing Using Directories

Department of CSE 36
Dynamic Hashing Using Directories

Department of CSE 37
Dynamic Hashing Using Directories

Department of CSE 38
Directoryless Dynamic Hashing

Department of CSE 39
Directoryless Dynamic Hashing

Department of CSE 40
Thank You

Department of CSE 41

You might also like