Lecture4نظم
Lecture4نظم
Lecture4نظم
Performance Issues
1
Pipelining Example
Laundry Example: Three Stages
ﺛﻼث ﻣﺮاﺣﻞ:ﻣﺜﺎل اﻟﻐﺴﯿﻞ
cycle 1A total of k cycles are required to complete the execution of the first instruction, and
the remaining n-1 instructions require n-1 cycles. This equation is easily verified from the
Figure mentioned above. The ninth instruction completes at time 4:
، ﻹﻛﻤﺎل ﺗﻨﻔﯿﺬ اﻟﺘﻌﻠﯿﻤﺎت اﻷوﻟﻰk ﯾﻠﺰم إﺟﻤﺎﻟﻲ دورات1A دورة
ﯾﻤﻜﻦ اﻟﺘﺤﻘﻖ ﻣﻦ ھﺬه.n-1 اﻟﻤﺘﺒﻘﯿﺔ دوراتn-1 وﺗﺘﻄﻠﺐ ﺗﻌﻠﯿﻤﺎت
اﻟﺘﻌﻠﯿﻤﺎت اﻟﺘﺎﺳﻌﺔ.اﻟﻤﻌﺎدﻟﺔ ﺑﺴﮭﻮﻟﺔ ﻣﻦ اﻟﺸﻜﻞ اﻟﻤﺬﻛﻮر أﻋﻼه
:4 ﺗﻜﺘﻤﻞ ﻓﻲ اﻟﺴﺎﻋﺔ
Pipeline Performance
Now consider a processor with equivalent functions but no pipeline, and assume that the
instruction cycle time is kτ The speedup factor for the instruction pipeline compared to
execution without the pipeline is defined as:
واﻓﺘﺮض أن وﻗﺖ، اﻵن ﺿﻊ ﻓﻲ اﻋﺘﺒﺎرك ﻣﻌﺎﻟﺠًﺎ ﻟﮫ وظﺎﺋﻒ ﻣﻜﺎﻓﺌﺔ وﻟﻜﻦ ﻻ ﯾﻮﺟﺪ ﺧﻂ أﻧﺎﺑﯿﺐ
ﻋﺎﻣﻞ اﻟﺘﺴﺮﯾﻊ ﻟﺨﻂ أﻧﺎﺑﯿﺐ اﻟﺘﻌﻠﯿﻤﺎت ﻣﻘﺎرﻧﺔ ﺑﺎﻟﺘﻨﻔﯿﺬ ﺑﺪون ﺧﻂ اﻷﻧﺎﺑﯿﺐkτ دورة اﻟﺘﻌﻠﯿﻤﺎت ھﻮ
:ﻣﺤﺪد ﻋﻠﻰ اﻟﻨﺤﻮ اﻟﺘﺎﻟﻲ
Pipeline Performance Cont’d
Let ti = time delay in stage Si
Clock cycle t = max(ti) is the maximum stage delay
Clock frequency f = 1/t = 1/max(ti)
A pipeline can process n tasks in k + n – 1 cycles
k cycles are needed to complete the first task
n – 1 cycles are needed to complete the remaining n – 1 tasks
6 staged pipelining:
ﺿﻊ ﻓﻲ اﻋﺘﺒﺎرك آﻟﺔ ﻏﯿﺮ ﻣﺒﻄﻨﺔ ﺑﺎﻷﻧﺎﺑﯿﺐ ذات 6ﻣﺮاﺣﻞ ﺗﻨﻔﯿﺬ ﺑﺄطﻮال 50ﻧﺎﻧﻮﺛﺎﻧﯿﺔ ،و
50ﻧﺎﻧﻮ ﺛﺎﻧﯿﺔ ،و 60ﻧﺎﻧﻮ ﺛﺎﻧﯿﺔ ،و 60ﻧﺎﻧﻮ ﺛﺎﻧﯿﺔ ،و 50ﻧﺎﻧﻮ ﺛﺎﻧﯿﺔ ،و 50ﻧﺎﻧﻮ ﺛﺎﻧﯿﺔ.
Solution
Remember that in the pipelined implementation, the length of the pipe stages must all
be the same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it
comes to:
ﺗﺬﻛﺮ أﻧﮫ ﻓﻲ ﺗﻨﻔﯿﺬ اﻷﻧﺎﺑﯿﺐ
ﯾﺠﺐ أن ﯾﻜﻮن طﻮل،
، ًﻣﺮاﺣﻞ اﻷﻧﺎﺑﯿﺐﻣﺘﻤﺎﺛﻼ
أي ﺳﺮﻋﺔ أﺑﻄﺄ ﻣﺮﺣﻠﺔ
ﻣﻊ.ﺑﺎﻹﺿﺎﻓﺔ إﻟﻰ اﻟﺤﻤﻞ
، اﻟﻨﻔﻘﺎت اﻟﻌﺎﻣﺔ5ns
:ﯾﺘﻌﻠﻖ اﻷﻣﺮ ﺑﻤﺎ ﯾﻠﻲ
اﻟﺴﺮﻋﺔ ھﻲ ﻧﺴﺒﺔ ﻣﺘﻮﺳﻂ وﻗﺖ اﻟﺘﻌﻠﯿﻤﺎت ﺑﺪون ﺧﻄﻮط اﻷﻧﺎﺑﯿﺐ إﻟﻰ ﻣﺘﻮﺳﻂ وﻗﺖ اﻟﺘﻌﻠﯿﻤﺎت ﺑﺎﺳﺘﺨﺪام
.ﺧﻄﻮط اﻷﻧﺎﺑﯿﺐ
Solution:
Speedup is the ratio of the average instruction time without pipelining to the
average instruction time with pipelining.
k: stages in pipeline
t: time to do each stage (actually slowest stage
constrains performance)
n: number of instructions to be processed
ﺛﻮانٍ ﻟﺘﺸﻐﯿﻞ ﺑﺮﻧﺎﻣﺞ ﺑﺎﻟﺘﺘﺎﺑﻊ10 إذا اﺳﺘﻐﺮق اﻷﻣﺮ، ﻋﻠﻰ ﺳﺒﯿﻞ اﻟﻤﺜﺎل
ﻓﺈن اﻟﺘﺴﺮﯾﻊ ھﻮ، P ، وﺛﺎﻧﯿﺘﯿﻦ ﻟﺘﺸﻐﯿﻠﮫ ﺑﺎﻟﺘﻮازي ﻋﻠﻰ ﻋﺪد ﻣﻦ اﻟﻤﻌﺎﻟﺠﺎت
. ﻣﺮات5 = 10/2
Speedup Formula
Speedup
Efficiency
Processors
Parallel Processing and Amdahl’s Law
When considering system performance, computer system designers look for ways to improve
performance by improvement in technology or change in design. Examples include the use of parallel
processors, the use of a memory cache hierarchy, and speedup in memory access time and I/O
transfer rate due to technology improvements.
In all of these cases, it is important to note that a speedup in one aspect of the
technology or design does not result in a corresponding improvement in
performance. This limitation is succinctly expressed by Amdahl’s law.
.ﯾﻤﻜﻦ ﻟﻌﺪد ﺻﻐﯿﺮ ﻣﻦ اﻟﻌﻤﻠﯿﺎت اﻟﻤﺘﺴﻠﺴﻠﺔ أن ﺗﺤﺪ ﺑﺸﻜﻞ ﻓﻌﺎل ﻣﻦ ﺗﺴﺮﯾﻊ اﻟﺨﻮارزﻣﯿﺔ اﻟﻤﺘﻮازﯾﺔ
Amdahl’s Law: A small number of sequential operations can effectively limit
the speedup of a parallel algorithm. ﺧﻼل،اﻟﻤﺘﻮازﯾﺔﻣﻦ
ﯾﺒﺤﺚ ﻣﺼﻤﻤﻮ أﻧﻈﻤﺔ اﻟﻜﻤﺒﯿﻮﺗﺮ ﻋﻦ طﺮق ﻟﺘﺤﺴﯿﻦ اﻷداء، ﻋﻨﺪ اﻟﺘﻔﻜﯿﺮ ﻓﻲ أداء اﻟﻨﻈﺎم
ﺗﺘﻀﻤﻦ اﻷﻣﺜﻠﺔ اﺳﺘﺨﺪام اﻟﻤﻌﺎﻟﺠﺎت.ﺗﺤﺴﯿﻦ اﻟﺘﻜﻨﻮﻟﻮﺟﯿﺎ أو اﻟﺘﻐﯿﯿﺮ ﻓﻲ اﻟﺘﺼﻤﯿﻢ
واﻟﺘﺴﺮﯾﻊ ﻓﻲ وﻗﺖ اﻟﻮﺻﻮل إﻟﻰ اﻟﺬاﻛﺮة، واﺳﺘﺨﺪام اﻟﺘﺴﻠﺴﻞ اﻟﮭﺮﻣﻲ ﻟﺬاﻛﺮة اﻟﺘﺨﺰﯾﻦ اﻟﻤﺆﻗﺖ
اﻹﺧﺮاج/ واﻹدﺧﺎل
.ﻣﻌﺪل اﻟﻨﻘﻞ ﺑﺴﺒﺐ اﻟﺘﺤﺴﯿﻨﺎت اﻟﺘﻜﻨﻮﻟﻮﺟﯿﺔ
Let f be the fraction of operations in a computation that must be performed sequentially, where 0 <
f < 1. Then the maximum speedup S achievable by a parallel computer with p processors
performing the computation is S < 1 / [f + (1 - f) / p]. For example, if 10% of the computation must
be performed sequentially, then the maximum speedup achievable is 10, no matter how many
processors a parallel computer has.
ﺛﻢ أﻗﺼﻰ ﺳﺮﻋﺔ ﯾﻤﻜﻦ ﺗﺤﻘﯿﻘﮭﺎ ﺑﻮاﺳﻄﺔ.f <1> 0 ﺣﯿﺚ ﯾﻜﻮن، ﺟﺰء ﻣﻦ اﻟﻌﻤﻠﯿﺎت اﻟﺤﺴﺎﺑﯿﺔ اﻟﺘﻲ ﯾﺠﺐ إﺟﺮاؤھﺎ ﺑﺎﻟﺘﺘﺎﺑﻊf ﻟﻨﻔﺘﺮض أن
ﻣﻦ٪10 إذا ﻛﺎن ﯾﺠﺐ إﺟﺮاء، ﻋﻠﻰ ﺳﺒﯿﻞ اﻟﻤﺜﺎل.[ ع/ ( و- 1) + f] / S <1 ﺗﻘﻮم ﺑﺎﻟﺤﺴﺎب ھﻮp ﻛﻤﺒﯿﻮﺗﺮ ﻣﻮازٍ ﻣﻊ ﻣﻌﺎﻟﺠﺎت
ﺑﻐﺾ اﻟﻨﻈﺮ ﻋﻦ ﻋﺪد اﻟﻤﻌﺎﻟﺠﺎت اﻟﺘﻲ ﯾﻤﺘﻠﻜﮭﺎ اﻟﻜﻤﺒﯿﻮﺗﺮ، 10 ﻓﺈن أﻗﺼﻰ ﺳﺮﻋﺔ ﯾﻤﻜﻦ ﺗﺤﻘﯿﻘﮭﺎ ھﻲ، اﻟﻌﻤﻠﯿﺎت اﻟﺤﺴﺎﺑﯿﺔ ﺑﺎﻟﺘﺘﺎﺑﻊ
.اﻟﻤﻮازي
Maximum Speedup in Amdahl’s Law
Amdahl’s law states that the maximum speedup possible in
parallelizing an algorithm is limited by the sequential
ﯾﻨﺺ ﻗﺎﻧﻮن أﻣﺪال ﻋﻠﻰ أن أﻗﺼﻰ ﺳﺮﻋﺔ ﻣﻤﻜﻨﺔ ﻓﻲ ﻣﻮازاة
portion of the code. .اﻟﺨﻮارزﻣﯿﺔ ﻣﺤﺪودة ﺑﺎﻟﺠﺰء اﻟﻤﺘﺴﻠﺴﻞ ﻣﻦ اﻟﻜﻮد
2. As N approaches infinity, speedup is bound by 1/(1-f), so that there are diminishing returns
for using more processors.
Parallel Processing and Amdahl’s Law
Amdahl’s law illustrates the problems facing industry in the development of multicore
machines with an ever-growing number of cores: The software that runs on such machines
must be adapted to a highly parallel execution environment to exploit the power of parallel
processing. ﯾﻮﺿﺢ ﻗﺎﻧﻮن أﻣﺪال اﻟﻤﺸﻜﻼت اﻟﺘﻲ ﺗﻮاﺟﮫ اﻟﺼﻨﺎﻋﺔ ﻓﻲ ﺗﻄﻮﯾﺮ آﻻت ﻣﺘﻌﺪدة اﻟﻨﻮاة ﻣﻊ ﻋﺪد ﻣﺘﺰاﯾﺪ
ﯾﺠﺐ ﺗﻜﯿﯿﻒ اﻟﺒﺮﻧﺎﻣﺞ اﻟﺬي ﯾﻌﻤﻞ ﻋﻠﻰ ھﺬه اﻵﻻت ﻣﻊ ﺑﯿﺌﺔ ﺗﻨﻔﯿﺬ ﻣﺘﻮازﯾﺔ ﻟﻠﻐﺎﯾﺔ ﻻﺳﺘﻐﻼل:ﻣﻦ اﻟﻨﻮى
.ﻗﻮة اﻟﻤﻌﺎﻟﺠﺔ اﻟﻤﺘﻮازﯾﺔ
Amdahl’s law can be generalized to evaluate any design or technical improvement in a
computer system. Consider any enhancement to a feature of a system that results in a
speedup. The speedup can be expressed as
ﺿﻊ ﻓﻲ اﻋﺘﺒﺎرك أي ﺗﺤﺴﯿﻦ ﻋﻠﻰ.ﯾﻤﻜﻦ ﺗﻌﻤﯿﻢ ﻗﺎﻧﻮن أﻣﺪال ﻟﺘﻘﯿﯿﻢ أي ﺗﺼﻤﯿﻢ أو ﺗﺤﺴﯿﻦ ﺗﻘﻨﻲ ﻓﻲ ﻧﻈﺎم اﻟﻜﻤﺒﯿﻮﺗﺮ
ﯾﻤﻜﻦ اﻟﺘﻌﺒﯿﺮ ﻋﻦ اﻟﺘﺴﺮﯾﻊ ﻛـ.ﻣﯿﺰة ﻓﻲ ﻧﻈﺎم ﺗﺆدي إﻟﻰ زﯾﺎدة اﻟﺴﺮﻋﺔ
Suppose that a feature of the system is used during execution a fraction of the time f,
before enhancement, and that the speedup of that feature after enhancement is SUf . Then
the overall speedup of the system is:
، ﻗﺒﻞ اﻟﺘﺤﺴﯿﻦ، f اﻓﺘﺮض أن إﺣﺪى ﻣﯿﺰات اﻟﻨﻈﺎم ﺗُﺴﺘﺨﺪم أﺛﻨﺎء اﻟﺘﻨﻔﯿﺬ ﻟﺠﺰء ﺑﺴﯿﻂ ﻣﻦ اﻟﻮﻗﺖ
: ﺛﻢ اﻟﺴﺮﻋﺔ اﻹﺟﻤﺎﻟﯿﺔ ﻟﻠﻨﻈﺎم ھﻲ.SU f وأن ﺗﺴﺮﯾﻊ ھﺬه اﻟﻤﯿﺰة ﺑﻌﺪ اﻟﺘﺤﺴﯿﻦ ھﻮ
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
Parallel Processing and Amdahl’s Law
(Summary)
Amdahl’s Law
Examples
Questions
Parallel Processing and Amdahl’s Law
Example, suppose that a task makes extensive use of floating-point operations,
with 40% of the time is consumed by floating-point operations. With a new
hardware design, the floating-point module is speeded up by a factor of K. Then
the overall speedup is:
Solution
Parallel Processing and Amdahl’s Law
Problems and solutions
EFFECTIVENESS OF PARALLEL PROCESSING
There are certain measures to compare the effectiveness of various parallel algorithms or
architectures for solving desired problems. The following definitions and notations are
applicable [Lee80]:
The significance of each measure is self-evident from its name and defining equation given above. It is not difficult to
establish the following relationships between these parameters.
EFFECTIVENESS OF PARALLEL PROCESSING Cont’d
EFFECTIVENESS OF PARALLEL PROCESSING Cont’d
Example
Example: Finding the sum of 16 numbers can be represented by the binary-tree
computation graph of Figure mentioned bellow with T(1) = W(1) = 15. Assume
unit-time additions and ignore all else.
Essentially, the 8 processors perform all of the additions at the same tree level in
each time unit, beginning with the leaf nodes and ending at the root. The relatively
low efficiency is the result of limited parallelism near the root of the tree.
Thank You
46