Week 5-Workload Characterization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 52

Workload

Characterization
Overview
• Workload characterization
• Terminology
• Techniques
• Components
• Parameter Selection
• Workload Characterization Techniques
• Averaging, Single Parameter Histograms, Multi-parameter Histograms
• Principal Component Analysis, Markov Models, Clustering

2
Workload Characterization
• Performance of a system is determined by
• Its characteristics
• Composition of the load being processed
• Quantitative description of system performance is fundamental part of
performance evaluation studies.
• As a methodology

3
Terminology
• Assume system provides services
• User = Entity that makes the service request
• Workload components – entities that make service requests
• Applications: mail, editing, programming ..
• Sites: workload at different organizations
• User Sessions: complete user sessions from login to logout
• Workload parameters or Workload features:
• Measured quantities, service requests, or resource demands.
• For example: transaction types, instructions, packet sizes, source-destinations
of a packet, and page reference pattern.
Components
• The workload component should be at the SUT interface (System
Under Test)
• Each component should represent as homogeneous a group as
possible
• Combining very different users into a site workload may not be meaningful.
• Domain of the control affects the component:
• Example: mail system designer are more interested in determining a typical
mail session than a typical user session.
Parameter Selection
• Better to pick parameters that depend upon workload and not upon
system
• Example: response time of email not good
• Depends upon system
• Example: email size is good
• Depends upon workload
• Several characteristics that are of interest
• Arrival time, duration, quantity of resources demanded
• Example: network packet size
• Have significant impact (exclude if little impact)
• Example: type of Ethernet card
Parameter selection(2)

• Do not use parameters that depend upon the system,


• Example: the elapsed time, CPU time
• Characteristics of service requests
• Arrival Time
• Type of request or the resource demanded
• Duration of the request
• Quantity of the resource demanded
• Exclude those parameters that have little impact
Workload Characterization Techniques
• Averaging
• Specifying dispersion
• Single-parameter histograms
• Multi-parameter histograms
• Principal-component analysis
• Markov models
• Clustering
Averaging
• Characterize workload with average
• Example
• Average number of network hops
• Average number of clusters
• Arithmetic mean may be inappropriate
• Example: average hops may be a fraction
• Example: data may be skewed
• Specify with median, mode
Averaging (2)

• Mean

• Standard deviation

• Coefficient Of Variation

• Others
• Variance, Min-Max Range, 10- 90-percentiles
Case study

• Reasonable variation
Single-Parameter Histograms
• Shows relative frequency of parameter values

Frequency
• Divide into buckets
• Values of buckets can be used to generate workloads
• Given n buckets, m parameters, k components
• nmk values CPU Time
• Only use when variance is high
• Ignores correlation among parameters.

12
Multi-Parameter Histograms
• If correlation, should characterize in
multi-parameter histogram
• n-dimensional matrix, tough to
graph n > 2
• Often even more detailed than single
parameter histogram, so rarely used
• Difficult to plot joint histograms
for more than two parameters.

13
Principal-Component Analysis (PCA)
Principal-Component Analysis (PCA)
• Goal is to reduce number of factors
• Transforms a number of (possibly) correlated variables into a (smaller)
number of uncorrelated variables called principal components
Principal-Component Analysis (2)
• Key Idea
• Use a weighted sum of parameters to classify the components.
• Let xij denote the ith parameter for jth component.

• Principal component analysis assigns weights wi's such that yj's provide
the maximum discrimination among the components.
• The quantity yj is called the principal factor
• The factors are ordered
• First factor explains the highest percentage of the variance.
Principal-Component Analysis (3)
• Example: Sending and Receiving packets
• 1) Compute mean and standard deviation
• 2) Normalize all points (N~(0,1))
• Subtract mean, divide by standard dev
• 3) Compute correlation (essentially, a line through the data that
minimizes distance from line)
• Principal component 1, rotate to be x axis

17
Example
• 4) Compute values of principal
components
• 5) Compute sums of squares,
explains the percentage of
variation
• Factor 1  32.565 / (32.565 +
1.435) or 96%
• Factor 2 1.435 / (32.565 +
1.435) or 4%

18
Example…….

- Use y1
for low,
medium and
high load
- Not much
gained in y2

19
Finding Principal factors
• Find the correlation matrix
• Find the eigen values of the matrix and sort them in the order of
decreasing magnitude.
• Find corresponding eigen vectors
• These give the required loadings
Markov Models
Markov Models
• Sometimes, important not to just have number of
each type of request but also order of requests
• If next request depends upon previous request,
then can use Markov model
• Actually, more general. If next state depends upon
current state
• Exmaple: process between CPU, disk, terminal:

22
Markov Models (2)
• Can use for application transitions
• Ex: users run editors, compilers, linkers
 Markov model to characterize probability of type j
after type i
• Can use for page reference locality
• Ex: probability of referencing page (or procedure) i after
page (or procedure) j
• But not probability  really refers to order of
requests
• May be several Markov models that have same relative
frequency

23
Example
• Computer network showed packets large (20%) or small (80%)
• 1) ssssbssssbssssb 2) ssssssssbbssssssssbb

• 3) Or, generate random number between 0 and 1. If less than .8, small else large
• Next packet is not dependent upon current

• If performance is affected by order, then need to measure to build Markov model


24
Example (2)
• Markov
• the next request depends only on the last request
• Described by a transition matrix

• Transition matrices can be used also for application transitions.


• Used to specify page-reference locality.
• P(Reference module i | Referenced module j)
Transition Probability
• Given the same relative frequency of requests of different types, it is
possible to realize the frequency with several different transition
matrices.
• If order is important, measure the transition probabilities directly on the real
system.
• Example: Two packet sizes: Small (80%), Large (20%)
• An average of four small packets are followed by an average of one
big packet
• ssssbssssbssss
Transition Probability (2)
• Example: Eight small packets followed by two big packets
• ssssssssbbssssssssbbssssssss

• Generate a random number x


• x < 0.7 ⇒ generate a small packet;
• otherwise generate a large packet.
Clustering
Clustering
• May have large number of components
• Cluster such that components within are similar to each
other
• Then, can study one member to represent component
class
• Example: 30 jobs with CPU + I/O. Five clusters.

Disk I/O

CPU Time 29
Clustering (2)
1. Take sample
2. Select parameters
3. Transform, if necessary
4. Remove outliers
5. Scale observations
6. Select distance metric
7. Perform clustering
8. Interpret
9. Change and repeat 3-7
10. Select representative components
30
Clustering: Sampling
• Usually too many components to do clustering analysis
• That’s why we are doing clustering in the first place!
• Select small subset
• If careful, will show similar behavior to the rest
• May choose randomly
• However, if are interested in a specific aspect, may choose to
cluster only those
• Ex: if interested in a disk, only do clustering analysis on components with
high I/O

31
Clustering: Parameter Selection
• Many components have a large number of parameters
(resource demands)
• Some important, some not
• Remove the ones that do not matter
• Two key criteria: impact on perf & variance
• If have no impact, omit. Ex: Lines of output
• If have little variance, omit. Ex: Processes created
• Method: redo clustering with 1 less parameter. Count
fraction that change cluster membership. If not many
change, remove parameter.

32
Clustering: Transformation
• If distribution is skewed, may want to transform the measure of the
parameter
• Example: one study measured CPU time
• Two programs taking 1 and 2 seconds are as different as two programs taking
10 and 20 milliseconds
 Take ratio of CPU time and not difference

33
Clustering: Outliers
• Data points with extreme parameter values
• Can significantly affect max or min (or mean or variance)
• For normalization (scaling, next) their inclusion/exclusion may
significantly affect outcome
• Only exclude if do not consume significant portion of resources
• Example: extremely high RTT flows, exclude
• Example: extremely long (heavy tail) flow

34
Clustering: Data Scaling (1 of 3)
• Final results depend upon relative ranges
• Typically scale so relative ranges equal
• Different ways of doing this
• Normalize to Zero Mean and Unit Variance
• Mean xk, stddev sk of the kth parameter

35
Clustering: Data Scaling (2 of 3)
• Weights
• Assign based on relative importance
• Range Normalization
• Change from [xmin,k,xmax,k] to [0,1]

• Example: x {1, 6, 5, 11}


i1

• 10, 111, 6.5, 4.4


• But sensitive to outliers (say 11 above was 101)
36
Clustering: Data Scaling (3 of 3)
• Percentile Normalization
• Scale so 95% of values between 0 and 1

• Less sensitive to outliers

37
Clustering: Distance Metric (1 of 2)
• Map each component to n-dimensional space and see which are close to
each other
• Euclidean Distance between two components
• {xi1, xi2, … xin} and {xj1, xj2, …, xjn}

• Weighted Euclidean Distance


• Assign weights ak for n parameters
• Used if values not scaled or if significantly different in importance

38
Clustering: Distance Metric (2 of 2)
• Chi-Square Distance
• Used in distribution fitting
• Need to use normalized or the relative sizes influence
chi-square distance measure

• Overall, Euclidean Distance is most commonly


used

39
Clustering Techniques
• Partition into groups s.t. members are as similar as possible
and other groups as dissimilar as possible
• Minimize intra-group variance or
• Maximize inter-group variance
• Two classes:
• Non-Hierarchical – start with k clusters, move components
around until intra-group variance is minimized
• Hierarchical
• Start with 1 cluster, divide until k
• Start with n clusters, combine until k
• Example: minimum spanning tree

40
Clustering Techniques: Minimum Spanning
Tree

41
Minimum Spanning Tree Example
• Workload with 5 components (programs), 2 parameters
(CPU/IO).
• Measure CPU and I/O for each 5 programs

42
Minimum Spanning Tree Example (2)

Step 1) Consider 5 cluster with ith cluster having only ith program
Step 2) The centroids are {2,4}, {3,5}, {1,6}, {4,3} and {5,2}

c
b

5
a

Disk I/O
4
d

3
e
2
1

1 2 3 4 5

CPU Time 43
Minimum Spanning Tree Example (3)
Step 3) Euclidean distance:
c
b

5
a

Disk I/O
4
d

3
e

2
1
1 2 3 4 5
Step 4) Minimum
 merge CPU Time

44
Minimum Spanning Tree Example (4)
• The centroid of AB is {(2+3)/2, (4+5)/2}
= {2.5, 4.5}. DE = {4.5, 2.5}

c
b
5

ax
Disk I/O
4

d
3

x e
2
1

1 2 3 4 5
Minimum
CPU Time  merge 45
Minimum Spanning Tree Example(5)
• Centroid ABC {(2+3+1)/3, (4+5+6)/3}
• = {2,5}

Minimum
Merge
Stop
46
Representing Clustering
• Spanning tree called a dendrogram
• Each branch is cluster, height where merges

Can obtain clusters


4

for any allowable distance


3

Ex: at 3, get abc and de


2
1

a b c d e

47
Interpreting Clusters
• Clusters will small populations may be discarded
• If use few resources
• If cluster with 1 component uses 50% of resources,
cannot discard!
• Name clusters, often by resource demands
• Ex: “CPU bound” or “I/O bound”
• Select 1+ components from each cluster as a test
workload
• Can make number selected proportional to cluster size,
total resource demands or other

48
Problems with Clustering
• Goal: Minimize variance.
• The results of clustering are highly variable. No rules for:
• Selection of parameters
• Distance measure
• Scaling
• Labeling each cluster by functionality is difficult.
• Requires many repetitions of the analysis.
Cluster interpretation
• Assign all measured components to the clusters.
• Clusters with very small populations and small total resource
demands can be discarded.
• Interpret clusters in functional terms, e.g., a business application, Or
label clusters by their resource demands, for example, CPU-bound,
I/O-bound
• Select one or more representative components from each cluster for
use as test workload.
Summary
• Workload Characterization = Models of workloads
• Averaging, Single parameter histogram, multi-parameter histograms
• Principal component analysis (PCA) consists of finding parameter
combinations that explain the most variation
• Clustering: divide workloads in groups that can be represented by a
single benchmark
• Problems with clustering
………….END……….
Any Questions?

You might also like