Lecture 09
Lecture 09
Lecture 09
MINING (SE-409)
• Lecture-9
1
When toparallelize?
Useful for operations that access(sequential vs
parallel) significant amounts of data.
Word of caution
Parallelism can reduce system performance on over-
utilized systems or systems with small I/O bandwidth. 3
19
Scalability – Size is NOT everything
Index usage
• Hash based
Number of Concurrent Users
• B-Tree [I/O
BOTTLE
NECK]
• Multiple
• Bitmapped
• Simple table retrieval
• Moderate complexity Join
• Tendency analysis
• Clustering
Data
Warehousing
4
20
Scalability- Speed-Up &Scale-Up
Ideal
Transactions/S
Speed-Up
More resources means
proportionally less time Real
ec
for given amount of
data.
What it effect on Degree of
Parallelism
performance
Ideal
Secs/Transacti
If resources increased
in proportion to
on
increase in data size,
time is constant.
Data 5
Degree
Warehousing of Parallelism
QuantifyingSpeed-up
Ts Ts: Time on serial
Speedup = processor
Tm Tm: Time on multiple
processors
Task-
Task-
Task-
3
1
2
Task-1 Task-2 Task-3
18 time 6 time
units units
Speedup = 18 = 300%
6
6
Speed-Up & Amdahl’sLaw
Reveals maximum expected speedup from
parallel algorithms given the proportion of task
that must be computed sequentially. It gives the
speedup S as
Data 26
Warehousing
Symmetrical Multi Processing(SMP)
• A number of independent I/O and number of
processors all sharing access to a single large
memory space.
Main Memory
27
Distributed Memory Machines MMP
• Composed of a number of self-contained, self-controlled nodes
connected through a network interface.
• Each node contains its own CPU, processor, memory and I/O.
28
• Software Architecture
12
Shared disk RDBMSArchitecture
Here multiple tables of databases are sharing : logically all memories are one unit
Here every processor is capable of modifying a memory through this central node
It means two or more processors can modify a memory at a time : issue…???
How to resolve : Locking {Hardware + Software}
Adv
High level of fault tolerance but if data present at multiple place coherence issue can rise.
only redistribution]
Redistribution is expensive
31
16
Data Parallelism: Concept
• Parallel execution of a single data manipulation task
across multiple partitions of data.
17
Data Parallelism: Example all work in parallel
Select count (*)
62 from Emp
Partition-1
Partition 1 Query where age > 50
Server-1 AND
sal > 10,000’;
Partition-2
Query
440
. Server-2
Query
. . Coordinator
. .
Emp Table .
1,123 Query
Partition-k Server-k
[] [] [] [] [] []
[] [] [] []
20
Pipelining: Time Chart
Time = T/3 Time = T/3 Time = T/3
[] []
[] []
[] []
[]
(Ideal) time for pipelined execution of one task using an M stage pipeline = T
(Ideal) time for pipelined execution of N tasks using an M stage pipeline = T + ((N-1) (T/M))
Speed-up (S) =
22
Pipelining: Speed-Up Example
Example: Bottling soft drinks in a factory
23
Pipelining: Input vs Speed-Up
3
2.8
2.6
Speed-up (S)
2.4
2.2
2
1.8
1.6
1.4
1.2
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Input (N)
25
Quiz 1
Marks 10+10 Time : 30 min
26