Six Sigma Manual
Six Sigma Manual
Six Sigma Manual
CONTENTS :
• Basic Statistics
• Measure of Spread
1
BASIC STATISTICS
Statistics is the science of drawing information from numerical value. Such numerical values are
referred to as data. It deals with :
1) Collection of Data
2) Summarization of Data
3) Analysis of Data , and
4) Drawing valid inference from data
Data : It is the collection of any number of related observations. For eg. grades on student
exams, measures of athletic performance , the number of sales in a unit each month , the
ages of a people living in a town . The collection of data is called a data set , and single
observation is known as a data point.
2
Statistic : It is defined as the characteristic of the sample .
Sampling : A sample is selected, evaluated and studied in an effort to gain information about the
larger population from which the sample was drawn. Advantages of sampling are :
1) Cost : Samples can be studied at much lower cost .
2) Time : Samples can be evaluated more quickly than populations.
3) Accuracy : The larger the data set , the more opportunity is there for errors to occur. A
sample can provide a data set which is small enough to monitor carefully .
4) Feasibility : In some research situations, the population of interest is not available for study.
For example in case of destructive testing.
5) Scope of Information : When evaluating a smaller group , it is sometimes possible to gather
more extensive information on each unit evaluated.
3
Measure of Central Tendency
Most sets of numerical data show a distinct tendency to group or cluster about a certain central
Point. This peculiar kind of property exhibited by data is known as central tendency. Most common
measure of central tendency are :
1) Mean : Also known as the arithmetic mean , the mean is typically what is meant by the word
average. The mean of variable is given by ( the sum of all values )/(number of values ).
Despite its popularity, the mean may not be an appropriate measure of central tendency .
This is generally in case of populations, having outliers or extreme values .
2) Median : Median is the 50th percentile of a distribution. To find the median of a number of
values, first order them in ascending or descending order , then find the observation in the
middle. The median of 5,2,7,9 and 4 is 5. ( Note that if there is an even number of values ,
one takes the average of the middle two ; the average of 4,6,8 and 10 is 7 ). The median is
more appropriate than mean , in populations with larger outliers .
3) Mode : It is the most common value in the distribution . It is the value occurring maximum
number of times ( having highest frequency ).
4
Measure of Spread
Spread refers to the statistical fluctuation displayed by the variables in a given data. Statistical
1) Range :Range is defined as the maximum value minus the minimum value .
2) Variance : Variance is a measure of how spread out a distribution is. The formula for the
variance in population is :
3) Standard Deviation : Standard Deviation also indicates the spread in the population. It is the
5
Example : Calculate the standard deviation and variance for the given data ; 1,2,3,4,5
6
Definition of Six Sigma
The goal of Six Sigma is to increase profits by eliminating variability , defects and waste that
undermine customer loyalty . It is a disciplined data driven approach and methodology for
eliminating defects . Six Sigma can be understood / perceived at three levels :
1) Metric : The statistical representation of six sigma describes quantitatively how a process
is performing . To achieve Six Sigma a process must not produce more than 3.4
defects per million opportunity ( DPMO ) or 3.4 Parts Per Million ( PPM ) .
2) Methodology :
DMAIC : DMAIC refers to a data driven quality strategy for improving processes and is an
integral part of Company’s Six Sigma initiative . It refers to Define , Measure , Analyze ,
Improve , Control . It is a highly structured approach that includes both a set of tools and road
map or sequence of applying those tools .
DMADV : The development of a new product or service is essentially a problem solving
process . Here we need to see , what are the problems that we can face because of the
existing design and the effectiveness of the existing design from the customer’s point of view.
7
DMADV stands for Define , Measure , Analyze , Design and Validate .
DMADV approach is applied in R&D Six Sigma and TQ Six Sigma Projects .
3) Philosophy : Six Sigma tools and methods concentrate on reducing variability in the process .
Variation : A process can be defined as a series of operations performed to bring about the result .
Thus a process is further made up of small – small processes . The result can be the
delivery of a service or the manufacturing of a product . Variation is the sum total of all
minute changes that occur every time a process is performed . Variation is always
present at some level . If something appears to be constant, we usually have just not
looked at it with a fine enough resolution. In delivering services or manufacturing
products, variation is our enemy. Consistency and thus minimal variation leads to
improved quality, reduced costs, higher profits and higher customer satisfaction .
In the above example , three sigma and six sigma processes have been shown .
It is clear that acceptable area under Six Sigma process is more as compared to a three sigma
process . Hence it can be inferred that :
• More the Zvalue , lesser are the rejections .
• Zvalue of a process can be increased by lowering Sigma or variation , which is the ultimate
aim of Six Sigma methodology.
10
Comparison between a 3s level company and 6s level company
• Spends 15-25% of sales dollars on costs of failure • Spends 5% of sales dollar on cost of failure
• Produces 66,807 defects per million opportunities • Produces 3.4 defects per million opportunities
• Relies on inspection to find defect • Relies on capable process that don’t produce
defects
• Believe high quality is expensive • Knows that the high quality producer is a low cost
producer
• Does not have disciplined approach to gather and • Uses Measure, Analyze, Improve, Control
analyze data
• Benchmark themselves against their competition • Benchmark themselves against he best in the world
11
Different Approaches
DMAIC : The DMAIC methodology is used when a product or a process is in existence at the
company but is not meeting customers specification or is not performing adequately. In case
of Transactional Quality projects , this is applied for services.
DMADV : The DMADV methodology is used when a product or process is not existing and
needs to be developed . It is used for design and development .
13
DEFINE
CONTENTS :
• Introduction to define
Define Basics
• Brainstorming
• Pareto Chart
• Logic Tree
• 5-Why Analysis
• Process Mapping
Define Tools
• RTY
• Yield
• QFD
• FMEA
14
Introduction to Define
Define is the first stage of any Six Sigma project. It aims at identifying what is important for
problem solving and hence gives us the opportunity to find defect.
15
Brainstorming
Brainstorming is a name given to the situation when a group of people meet to generate new
ideas around a specific area of interest. Using rules which remove inhibitions, people are able
to think more freely and move into new areas of thoughts and so create numerous new ideas
and solutions.
Bottom line of brainstorming is that no new idea should be criticized . All ideas are noted down
and when the brainstorming session is over, these ideas are evaluated.
Rules of Brainstorming :
1) Postpone and withhold your judgment of ideas
2) Encourage wild and exaggerated ideas
3) Quantity counts at the initial stage
4) Build on the ideas put forward by others
5) Every person and every idea has equal worth
16
Different types of Brainstorming :
1) Free wheel : It refers to the spontaneous flow of ideas by all team member.
2) Round Robin : Team member take turns suggesting ideas .
3) Card Method : Team members write ideas on card with no discussion.
Brainstorming is always carried out in the presence of process experts. Process expert is a
person who is close to the process and is well versed with it. A cross functional approach is
always preferred in a brainstorming session, since inputs can be obtained from several
functions and then can be assimilated to form a new idea or a concept.
Brainstorming is a very useful exercise before theme selection as well as during problem
solving. It explores the problem in a very effective manner.
Use of brainstorming will be seen, while we discuss various tools of Define stage.
17
Pareto Chart
The theory behind the Pareto Chart was originated in 1897 by an Italian economist Vilfredo Pareto.
It says that 80% of the effects are caused by 20% of the causes. Infact many defect distribution
follow a simple pattern, with a relatively small number of issues accounting for an overwhelming
share of the defects. The Pareto chart shows the relative frequency of defects in rank order, and
thus provides a prioritization tool so that the process improvement activities can be organized.
Percent
400
Count
60
300
40
200
100 20
0 0
Defects g e ch m en m nt kn he
r
sin ag at ble ok ble De oe Ot
Mi
s a rp S cr r o Br ro br
w W rp ord rp ide
re to v e u
Sc ec rC Lo
u ir
g
onn owe A
C P
Count 180 120 100 85 60 28 23 10 26
Percent 28.5 19.0 15.8 13.4 9.5 4.4 3.6 1.6 4.1
Cum % 28.5 47.5 63.3 76.7 86.2 90.7 94.3 95.9 100.0
18
Pareto Chart
19
Pareto Chart
Put defects in
labels
Put frequency
against frequency
20
Pareto Chart
However this Pareto Chart is constructed from one dimension only-defect frequency. One must
consider the other constraints like cost, reliability also.
21
Logic Tree
It is used to examine the issue in detail and identify the root causes. It break down problem into
manageable groups based on MECE ( Mutually Exclusive and Collectively Exhaustive ). Causes
of the problem are brainstormed and are placed under 4M in case of Manufacturing Project and
5P in case of Transactional Quality project. People
Man
Mfg. Project Transactional Price
Quality Project
Machine
Product
Cause
Method Cause
Promotion
Material
Place
22
Logic Tree
Cause A1
A MECE
A2
Problem MECE
B1
B MECE
B2
23
Logic Tree
Mutually Exclusive : Two events that have no outcomes in common are known as mutually
exclusive events. These are the events that can not occur at the same time.
For eg. A pair of dice is rolled. The event of rolling a 9 and rolling a double
(3,3 or 4,4) have no outcome in common. These two events are mutually
exclusive event.
Collectively Exhaustive : This is the second aspect of MECE. It means that all the issue points
have been covered , no point has been left.
For eg. While flipping a coin, getting a head and getting a tail are
collectively exhaustive events.
24
Logic Tree
This tool is based on KEI (Knowledge , Experience and Intuition) approach. It is used to study
the symptoms and understand the true root cause of the system. It is said that only by asking
“why” 5 times, successfully can you delve into a problem deeply enough to understand the
ultimate root cause. By the time you get to the 4th and 5th why, you are likely to approach the
root cause of the problem. Here is a brief illustration of this concept :
26
Case Study on 5-Why Analysis
There is so much work in process inventory, yet we never seem to have the right parts.
Why?
The enameling process is unpredictable,and the press room does not respond quickly enough.
Why?
It takes them too long to make a changeover between parts, so the lot sizes are too big and often
the wrong parts.
Why?
Many of the stamping dies make several different parts, and must be reconfigured in the tool room
between runs,which takes as long as eight hours.
Why?
The original project management team had cost overruns on the building site work, so they
skimmed on the number of dies – they traded dedicated dies and small lot sizes for high work in
process (which was not measured by their project budget).
Why?
Root Cause : Company management did not understand lean manufacturing, and did not set
appropriate project targets when the plant was launched .
27
Process Mapping
The use of Process mapping is to clarify improvement opportunity with the understanding of
defined processes form start to finish. It is a visual representation of the work flow. A good
process mapping should:
1) Allow people unfamiliar with the process to understand the interaction of causes during the
work flow.
2) Contain additional information relating to the Six Sigma project , that is information per
critical step about input and output variables, time, cost, DPU, value etc.
• It provides the outline or Big Picture to help understand the operation flow and standardize
the design.
• It can be used as the fundamental source for analyzing the present situation
• It can be established as a common point for process improvement
• It helps to clarify the bottleneck
• Identify unnecessary excess process
• Identification of loss area due to incorrect process sequence
28
Process Mapping
29
Process Mapping
SIPOC :
SIPOC stands for Supplier, Input, Process, Output and Customer. You obtain input from your
suppliers, add value through your process, and provide an output that meets or exceeds your
customers requirement. A SIPOC diagram is a tool used by a team to identify all relevant
elements of a process improvement project before the work begins.
30
RTY
Six Sigma Methodology says that “ DO IT RIGHT FOR THE FIRST TIME “.
In one of the convocations, Bill Smith ( an Engineer in Motorola and founder of Six Sigma
methodology ) emphasized on the fact that, a product which has been reworked will have lesser
life as compared to a product which has been manufactured right for the first time. Same fact is
applicable well to service industries also.
This turned out to be a breakthrough idea and resulted in a major paradigm shift. All companies
accepted Six Sigma as an improved version of TQM( Total Quality Management ).
Following this principle Bob Galvin ( CEO of Motorola ), advocated the use of DPO ( Defects per
Opportunity ) rather than DPU ( Defect Per Unit ) for service related problems. These terms will be
discussed in detail in the coming sessions.
RTY is also based on the same concept. It says that reworks are the hidden losses( also referred
as hidden factory ). These reworks have a huge negative impact on the product life as well as
customer satisfaction. RTY is one of the ways through which we can calculate the extent of
hidden losses that are taking place. Higher value of RTY indicates better process. It is one of the
major criteria through which we can select the bottleneck process on the production line.
31
RTY
RTY stands for Rolled Throughput Yield. It represents the probability of getting the product right
for the first time. It is the product of Yft of all the stages in case of lines and all the sub processes
in case of a process.
Yft stands for Yield First Time and is the probability of getting the product right for the first time for
a particular stage or a particular sub process.
2 Rework 5 Rework 4 Rework
100 98 98 93
Stage 1 Stage 2 Stage 3
RTY for the entire process = Yft1 * Yft2 * Yft3 = (96/100)*(93/98)*(89/98) = .82735 = 82.73%
• Yft is calculated for individual stage
• RTY is calculated for the entire process
• More is the RTY, better is the process
33
RTY
Process in Series
Two processes are said to be in series, when outcomes from both the processes are required.
Process 1 Process 2 Process 3
80% 65%
Process 2 a Process 2 b
75% 85%
Example
In the above case for Process 2 to take place, if it requires contribution from Process 2a as well as
Process 2b, then 2a and 2b are said to be in series. In such case Process 2 can not take place
without 2a or 2b.
Yft of process 2 is given by Yft2 = Yft2a * Yft2b =.75*.85 = 0.6375
Hence now the process becomes
Process 1 Process 2 Process 3
RTY for the process = Yft 1 * Yft 2 * Yft 3 = .80 * .6375 * .65 = .3315 or 33.15%
34
RTY
Process in Parallel
Two processes are said to be in parallel, when outcomes from even a single process is sufficient.
Process 1 Process 2 Process 3 In the given case for process 2 to take place, if input
80% 65% from 2a or 2b is sufficient, then 2a and 2b are in
parallel. In such case Process 2 can take place with
Process 2 a Process 2 b either of the sub process .
75% 85%
Yft for process 2 = sqrt ( Yft 2a * Yft 2b )
= sqrt ( .75 * .85 ) = .7984
Hence, the process now becomes
Process 1 Process 2 Process 3
RTY for the process = Yft 1 * Yft 2 * Yft 3 = .80 * .7984 * .65 = .4151 or 41.51%
Yft for n processes in parallel = ( Yft 1 * Yft 2 * Yft 3 * Yft 4 *-------------*Yft n )1/n
35
RTY
99.0%
99.6% 81.0%
Shipping
36
RTY
Door Assembly : In this case it is clear that all the sub processes are required for Door foaming
to take place, hence all the processes are in series.
Yft ( Door Assembly ) = .99 * .997 * .934 * .973 = .8969
Case Forming : In this case all the sub processes are required for case forming to take place ,
hence all process are in series.
Yft ( Case Assembly ) = .996 * .992 * .917 * .81 = .7338
37
RTY
Normalized yield is the Geometric mean of all the Yft of a given process .
RTY = ( Yft )1/n Where n is the total number of stages in a given line or total number of sub
processes in a given process.
Example : Calculate the Normalized yield ( Yna ) for the process shown below .
Process 1 Process 2 Process 3
38
RTY
Consider the following case there are two production lines having same value of RTY, but different
number of processes . These two lines can be compared by comparing there Yna.
Since Yna for Line 2 is more, hence it is a better line as compared to Line 1 . Same concept is
applicable to service industry also , where we have processes and sub processes.
39
RTY
In this case, 30% of the payments are made by freight payment SDS and 70% of the payments are made by
freight payment direct .
Yfreight = P(SDS Freight)*YSDS Freight + P(Direct Freight)*Ydirect freight = 0.3*0.81 + 0.7*0.9 = .243 + .630
= 0.873
On the similar lines, calculating Yft for shipping
Yshipping = P(Ship SDS ) * Y (SDS) + P(Ship Direct) * Y (Ship Direct) = .30*.909 + .70*.969 = .2727 + .6783
= 0.951
40
RTY
Process 1 : Yft1 = Yft pricing * Yft order build * Yft availability * Yft contact
= 0.98 * 1.00 * 0.88 * 0.93 = 0.8020
Process 3 : Yft 3 = Yft order marriage * Yft scheduling * Yft warehouse loading * Yft shiping
= 0.88 * 0.88 * 0.955 * 0.951 = 0.7033
Process 5 : Yft 5 = Yft customer billing * Yft cash application * Yft freight payment
= 0.98 * 0.73 * 0.873 = 0.6245
RTY = Yft 1 * Yft 2 * Yft 3 * Yft 4 * Yft 5 = 0.80 * 0.73 * 0.70 * 0.87 * 0.62 = .2205 = 22.05%
41
YIELD
42
The Yield Calculation using Poisson formula.
The Poison equation is used to describe a number of process where the process can be
described by a discrete random variable that takes on integer(whole) values such as 0, 1, 2 , 3
and so on
•In our case for calculation yeild when DPU is given we will use.This below equation
Y = P(ND)Opportunity = (1-DPO)Opportunity
•When we have yield with respect to DPU we can calculate Zlt and Zst.
Zlt = Normsinv(yeild)
Zst = Normsinv(yeild)+1.5 Normsinv is the statistical function of excel form
Z lt = Normsinv[(1-DPO) opportunity]
Normsinv[(1-DPO) opportunity] + 1.5 Note : Its always advisable
to calculate Z value with respect to DPO as
Opportunity also taken in to consideration.
•We can also calculate Zst and Zlt using Z table. 43
YIELD
Example :100 invoices were processed in a financial firm. After processing, 34 defects were
observed. If an invoice can fail in 20 ways, for the above process calculate DPU , DPO
and DPMO .
Defect Per Unit ( DPU ) = Total Defects / Total Units = 34/100 = 0.34
DPMO ( Defects Per Million Opportunity ) = DPO * 1 Million = .017 * 10,00,000 = 17,000
44
Process Yield & Sigma (Example)
If there are 34 Defects out of 750 units, Let’s calculate sigma value of DPU / DPO / Yield /
DPMO / Sigma.... (10 opportunities per each unit)
CTQ’s and CTP’s are then decided from the technical specifications.
These are the indices that control the defects. For example lea
46
QFD
Bottom line of QFD is to study what customer wants. Customer demand is then correlated with
Engineers voice, which in turn is then correlated with measurable technical specifications.
CTQ’s and CTP’s are selected from these measurable technical specifications.
During step1 of QFD analysis, customer’s feedback is collected. Imprtant guidelines for collecting
customer’s feedback are :
1) It should be very brief
2) It should specify what is required in a product or a process
3) It should not provide any type of countermeasure or plans for improvement
It should be ensured that feedback should be taken from all types of customers.
In Step-1, all the customer requirements are correlated with Engineer’s voice at a scale of 1,3,9.
Where 9 means maximum correlation, 3 means moderate correlation and 1 means very week
correlation. In case of no correlation, no marks are given. Importance is given to each Voice of
customer, based on
47
QFD
VOC1 5 1 1 3 9
VOC2 5 3 3
VOC3 4 1 1 9
VOC4 4 9 3 1 9
VOC5 4 9 1 3
VOC6 3 1 1 9 3
VOC7 3 1 1 1 3
VOC8 2 3 3 3
Rating 54 30 41 16 46 37 138
VOC stands for Voice of Customer, Engg. V stands for engineer’s voice
48
QFD
Engg. V 1 1 3 9 54
Engg. V 2 9 30
Engg. V 3 3 41
Engg. V 4 3 9 16
Engg. V 5 3 1 46
Engg. V 6 3 37
Engg. V 7 1 3 1 138
Engg. V stands for engineer’s voice, Tech. Sp stands for Technical Specification
49
FMEA
• FMEA is a systematic tool for identifying effects or consequences of potential failure of
product or a process.
Types of FMEA
50
FMEA
FMEA characteristics
Its a team based Work , which will include leader and process/Products experts.
FMEA Terminology
FMEA variable
Severity –Is a rating corresponding to the seriousness of an effect of a potential failure
mode.
•Severity rating is in the scale of 1 to 10 , if the rating is 1 the failure would not be
noticeable to the customer and would not effect his process or products .
•If the rating is 10 :the failure is dangerously High and would injure the customer .
51
FMEA
Occurrence - Is a rating corresponding to the rate at which a first level cause and its
resultant failure mode will occur over design life of the system over the
• design life of the product, or before any additional control are applied. Occurrence rating is in
the scale of 1 to 10 .
•If the rating is 1 the occurrence of failure is very rare (One occurrence in a five year or less
than two occurrence in I billion events)
•If rating is 10 failure is almost inevitable: (More than one occurrence per day or probability of
more than 3 Occurrence in 10 events )
52
FMEA
•If the rating is 1 the detection is obvious that means automatic inspection is there
and 100% chance of
•Detecting the failure before it leaves the production facility
•If the rating is 10 absolutely there is no system to detect the failure .
RPN = Risk Priority Number
RPN = Severity x Occurrence x Detection
•There is no absolute rule for what is a high RPN number rather , FMEA often viewed
On relative scale (i,e Highest RPN addressed First )
53
FMEA
54
FMEA
55
Session 3- Measurement
56
Session 1 : MEASURE
CONTENTS :
• Introduction to Measure
• Types of Sampling
• Gage R&R
• Process Capability
57
Introduction to Measure
• It estimates the present status of a process : Before starting with any project, it is
imperative for us to know , where the process stands right now . Even though a part of this
exercise is covered in Define Stage , but this aspect is discussed completely in Measure stage
through several indices like Zvalue , Zbench , Cp , Cpk etc. These indices will be discussed in
operator as well as instrument. In Measure stage, we have to check , whether the measurement
system is appropriate or not for a particular process . Conclusions regarding any process can only
be made after the appropriateness of the measurement system has been confirmed . Tool to be used
58
Variation
Variation : Variation is the feature by which something varies or deviates from the desired state.
Although the center of data is a good measure of estimating the nature of process, it can not show
Hav
h1
h2
h4 h3
59
Variation
A person once went to a pond to take bath. He was told that the average height of the pond is 1.2m.
After learning this, he decided to dive in the pond, since this height was well below acceptable level.
In this case, his decision is solely based on the average depth of the pond. He has failed to realize
the variation in the trajectory of the pond (as shown in the picture on the previous slide).
Same is the case with the processes that we encounter in our daily life. In such processes also mean
is not the wholesome descriptor of the process. Variation is a very important index, to be taken in
account.
It has been observed that customer satisfaction is directly correlated with the extent to which the
process is consistent. This theorem is particularly applicable in case of service sector. Success of Six
Sigma in Banking sector is a result of the same philosophy. In such sectors, they have targeted
indices like cycle time, transaction efficiency to maintain consistency in the process.
60
Variation
Types of Variation : There are two types of variation that affect our process .
Common Cause Variation : It is also known as uncontrollable variation or white noise. This kind
of variation is inherent to the process . This variation can not be controlled under given technology.
It can be reduced by improving the technology. Despite of any improvement that we make in our
Assignable Cause Variation : It is also known as controllable noise or black noise. This kind of
variation is caused by the factors that are external to the process. It is caused by 4M changes. It
61
Probability Distribution
T T 2 0.5*0.5=0.25
T H 1 0.5*0.5=0.25
H H 0 0.5*0.5=0.25
H T 1 0.5*0.5=0.25
62
Probability Distribution
Probability Distribution for the possible number of tails from two tosses of a fair coin.
Table - A
Probability of this
Number of tails Tosses
outcome, P(T)
0 (H,H) .25
2 (T,T) .25
63
Probability Distribution
0.50
Probability
0.25
0 1 2
Number of Tails
It should be kept in mind that Table-A does not represents the actual outcome of the problem.
Rather it is a theoretical outcome.
64
Probability Distribution
Consider a process with specification 45+-2. Now if the data is collected for this particular
process, then the graph of the probability distribution representing the actual readings will be like
the one shown below. Targeted Mean
Actual Mean
0.4
Probability
0.3
0.2
0.1
43 45 47
Specification
Probability Distribution for the readings
It should be noticed that in case of real probability distribution curves, actual average is always
different from the targeted average.
65
Normal Distribution
2) Positive and negative deviations from this data are equally likely.
3) The mean of the normally distributed population lies at a center of its normal curve
5) Mean, median and mode are the same value and lie at the center of the curve.
6) The two tails of the normal probability distribution extend indefinitely and never touch the
horizontal axis.
66
Central Limit Theorem
Central limit theorem states that as the sample size increases, the sampling distribution of the
mean will approach normality. Statisticians use the normal distribution as an approximation to the
n=8 n=30
Sample Size
67
Central Limit Theorem
From the graph shown on the previous slide, it is evident that for sample size 30 the difference
between the standard deviations of sample and population is very less. Even though this
difference reduces further by increasing sample size, but this reduction is negligible. Hence while
Moreover the difference between the standard deviations of sample and population decreases
when the sample size is 8. Hence when it is not possible, to sample 30 pieces ( in case of
8 is considered for analysis. Minimum sample size required for any kind of analysis is 8.
68
Sampling
Sampling is the process of selecting units (e.g., people, organizations) from a population of interest
so that by studying the sample we may fairly generalize our results back to the population from
which they were chosen. It is not possible for us to study each and every unit of the population
to make interpretations. Not only is it time consuming and uneconomical, it is inaccurate also.
Types of Sampling:
1) Random Sampling : In a random or probability sample all items in the population have a
2) Cluster Sampling : In cluster sampling, we divide the population into groups, and then collect a
random sample of these clusters. We assume that these individual clusters are representative
of the population as a whole. For cluster sampling to be successful, a cluster has to be very
such cases variation between the samples is less than variation within the sample.
69
Sampling
3) Stratified Sampling : In stratified sampling, population is divided into relatively homogenous group
called strata. Strata should be as homogenous as possible. From each stratum a specified
number of elements corresponding to the proportion of that stratum in the population is drawn.
into relatively homogeneous subgroups before sampling. The strata should be mutually
exclusive : every element in the population must be assigned to only one stratum. The strata
should also be collectively exhaustive : no population element can be excluded. Then random
sampling is applied within each stratum. This often improves the representative ness of the
70
Gage R&R Basics
in the measurements, the more error there will be in the decisions based on those measurements.
The purpose of Measurement System Analysis is to qualify a measurement system for use by
Measurement System Analysis is a critical first step that should precede any data-based decision
making, including Statistical Process Control, Correlation and Regression Analysis, and Design of
Experiments. Measurement system includes both operator and instrument. Total variation in the
measurement system is the result of variation caused by operator as well as instrument . Hence it
can be written as :
71
Gage R&R Basics
Stability refers to the capacity of a measurement system to produce the same values over time
when measuring the same sample. As with statistical process control charts, stability means the
absence of "Special Cause Variation", leaving only "Common Cause Variation" (random variation).
Bias, also referred to as Accuracy, is a measure of the distance between the average value of
the measurements and the "True" or "Actual" value of the sample or part.
Linearity : Linearity is a measure of the consistency of Bias over the range of the measurement
device. For example, if a bathroom scale is under by 1.0 pound when measuring a 150 pound
person, but is off by 5.0 pounds when measuring a 200 pound person, the scale Bias is non-linear
in the sense that the degree of Bias changes over the range of use.
72
Gage R&R Basics
Repeatability assesses whether the same appraiser can measure the same part/sample multiple
times with the same measurement device and get the same value.
Reproducibility assesses whether different appraisers can measure the same part/sample with
Precision : It is the property by virtue of which same readings are obtained when measured over a
True Value
Accurate & Accurate & not Precise but not Neither accurate
Precise Precise accurate nor precise
73
Gage R&R Basics
Requirements :
The resolution, or discrimination of the measurement device must be small relative to the smaller
of either the specification tolerance or the process spread (variation). As a rule of thumb, the
measurement system should have resolution of at least 1/10th the smaller of either the
specification tolerance or the process spread. If the resolution is not fine enough, process
variability will not be recognized by the measurement system, thus blunting its effectiveness.
74
Gage R&R Basics
1.Determine the number of appraisers, number of sample parts, and the number of repeat
readings. Larger numbers of parts and repeat readings give results with a higher confidence
level, but the numbers should be balanced against the time, cost, and disruption involved.
2.Use appraisers who normally perform the measurement and who are familiar with the
3.Make sure there is a set, documented measurement procedure that is followed by all
appraisers.
4.Select the sample parts to represent the entire process spread. This is a critical point. If the
process spread is not fully represented, the degree of measurement error may be
overstated.
75
Gage R&R Basics
5.If applicable, mark the exact measurement location on each part to minimize the impact of
7.Parts should be numbered, and the measurements should be taken in random order so that the
appraisers do not know the number assigned to each part or any previous measurement value for
that part. A third party should record the measurements, the appraiser, the trial number, and the
76
Gage R&R
Stability Assessment :
1.Select a part from the middle of the process spread and determine its reference value relative
to a traceable standard. If a traceable standard is not available, measure the part ten times in a
controlled environment and average the values to determine the Reference Value. This
part/sample will be designated as the Master Sample.
2.Over at least twenty periods (days/weeks), measure the master sample 3 to 5 times. Keep
the number of repeats fixed. Take readings throughout the period to capture the natural
environmental variation.
4. Referring to the Xbar & R chart, subtract the Reference Value from Xbar to yield the Bias:
Bias = Xbar-Reference Value
77
Gage R&R
Analyze the result, if there is relatively high value, following can be the reasons behind it :
1) Appraisers not following the management procedure
2) An error in measuring the reference value.
3) Instability in the measurement. If the SPC chart shows a trend, the measurement device could
be wearing or calibration could be drifting.
78
Gage R&R
79
Gage R&R
1) Quality Tools
2) Gage Study
80
Gage R&R
81
Gage R&R
Click Options and put the value of tolerance in the window session
Substitute 6 by 5.15
82
Gage R&R
Two-Way ANOVA Table With Interaction If significant, P-value < 0.25 indicates
that an operator is having a problem
measuring some the parts. Hence Gage
Source DF SS MS F P R&R is not acceptable.
Parts 9 81.6 9.06667 204.000 0.000
Operator 1 0.1 0.10000 2.250 0.168
Parts * Operator 9 0.4 0.04444 0.889 0.552
Repeatability 20 1.0 0.05000
Total 39 83.1
Source DF SS MS F P
Parts 9 81.6 9.06667 187.810 0.000
Operator 1 0.1 0.10000 2.071 0.161
Repeatability 29 1.4 0.04828
Total 39 83.1
83
Gage R&R
84
Gage R&R
Total Gage R&R = Variation because of instrument ( repeatability )+Variation because of operator ( reproducibility )
% Study Tolerance =
5.15 s Gage R&Rl 100
Total Tolerance
% Study Variation indicates the contribution of measuring system variation in the total variation.
For measurement system to be efficient, % Study Variation should be very less.
85
Gage R&R
% Study Tolerance indicates the ability of the measuring system to perform within the given
tolerance range. It might happen that, Variation because of Gage R&R has very less magnitude
while % Study Variation is higher. In such case it can be inferred that the measuring system is not
capable enough to take measurements within the given tolerance range.
2.4 3.0SL=2.380
2.3 X=2.307
-3.0SL=2.235
Small Op variation means narrow control limits.
2.2
2.1 → Measurement variation(operator, measuring
2.0
1.9 system) is smaller than parts variation
1.8
0
relatively read variation between the parts.
R Chart by Operator
Instrument 0.15 1 2 3
Variation 3.0SL=0.1252
Sample Range
0.10
In this case favorable condition is when most of
the measuring points are in control. Repetition of
0.05
R=0.03833 same measuring value indicates that the
0.00 -3.0SL=0.000
measuring system is accurate.
0
86
Gage R&R Discrete Data
In case of attribute data, testing criteria is pass or fail. This is similar to Long Term Gage R&R for
continuous data.
1) Readings have to be taken by two operator
2) Minimum 20 samples have to be measured
3) Each part has to be checked twice. S.No.
Appraiser "A" Appraiser "B"
1 2 1 2
1 G G G G
Gage R&R for any part is acceptable, when all the four
2 G G G G
observations are same. 3 NG G G G
•The gage is acceptable if all the checkers 4 NG NG NG NG
(four per part) agree.. 5 G G G G
6 G G G G
7 NG NG NG NG
Gage R&R is un acceptable if the error is 8 NG NG G G
9 G G G G
More than 10%.
10 G G G G
In this case, unacceptable cases 11 G G G G
12 G G G G
= 3 * 100 = 15%
13 G NG G G
20 14 G G G G
15 G G G G
Since there are three cases in which all readings are
16 G G G G
not same. 17 G G G G
18 G G G G
Since % error in this case is <10%
19 G G G G
Hence, measurement system is not valid. 20 G G G G
87
Capability Analysis
Capability analysis is a set of calculations used to assess whether a system is statistically able to meet a set of
specifications or requirements. To complete the calculations, a set of data is required, usually generated by a
control chart; however, data can be collected specifically for this purpose.
While collecting data for capability analysis, rational sub grouping should be ensured. Sampling should be done in
such a way that all the components of the population are covered.
Specifications or requirements are the numerical values within which the system is expected to operate, that is,
the minimum and maximum acceptable values. Occasionally there is only one limit, a maximum or minimum.
Customers, engineers, or managers usually set specifications. Specifications are numerical requirements, goals,
aims, or standards. It is important to remember that specifications are not the same as control limits.
Control limits come from control charts and are based on the data. Specifications are the numerical
All methods of capability analysis require that the data is statistically stable, with no special causes of variation
present. To assess whether the data is statistically stable, a control chart should be completed. If special causes
exist, data from the system will be changing. If capability analysis is performed, it will show approximately what
happened in the past, but cannot be used to predict capability in the future.
88
It will provide only a snapshot of the process at best. If, however, a system is stable, capability analysis shows
not only the ability of the system in the past, but also, if the system remains stable, predicts the future
Capability analysis is summarized in indices; these indices show a system’s ability to meet its numerical
requirements. They can be monitored and reported over time to show how a system is changing. Various
capability indices are presented in this section; however, the main indices used are Cp and Cpk. The indices
are easy to interpret; for example, a Cpk of more than one indicates that the system is producing within the
specifications or requirements. If the Cpk is less than one, the system is producing data outside the
specifications or requirements. This section contains detailed explanations of various capability indices and
their interpretation. Capability analysis is an excellent tool to demonstrate the extent of an improvement made
to a process. It can summarize a great deal of information simply, showing the capability of a process, the
extent of improvement needed, and later the extent of the improvement achieved.
89
Cp ( Process Capability) :
The capability index is defined as: Cp = (allowable range)/6s = (USL - LSL)/6s
The capability index show how well a process is able to meet specifications. The higher the value of the index,
terms of spread only. It does not covers the shift of the process. So as to account for shift, process capability
index is used.
90
Process Capability
Process capability refers to the ability of a process to produce a defect-free product or service in a
controlled manner of production or service environment. Various indicators are used-some address overall
Process Capability Indices(Cp and Cpk) [Process Capability Indices(Cp and Cpk)]
A standardised measurement on the short-term process performance of a process is Cp, nornally equated to
The long term process performance, Cpk, has a similar ratio to that of Cp except that this ratio considers the
shift of the mean relative to the target value ; Cpk = min{(USL-T) / 3s, (T-LSL) / 3s)
Zshort = 3 x Cp
Zlong = 3 x Cpk
91
Statistical Concept
Process Capability Index formula
LSL X USL
Balanced USL - LSL
Both Spec.
CP =
6σ
X
One End Spec.
(Upper Spec Limit) USL - X
CP =
3σ
X
One End Spec.
(Lower Spec Limit) X - LSL
CP =
3σ
CPK = ( 1 - k ) CP
Shift
Both Spec. M-X
T- X
k= T -LSL
X T (Target)
*In case of One End Specification regardless of specific limit, Cp equals Cpk. (Should be less than 25dB)
92
1.3.11
Consider the following process
In this case it is evident that the spread of the process is very less, but it is shifting from the mean. This process
93
Cpk = ( 1-k )Cp
K= ( M - X-bar ) / (T/2)
T is total tolerance
Cp tells us about the spread of the process, Cpk tells us about both shift and spread of the process
94
Stack the given data
95
Select the subgroups that are to
be stacked.
96
Conduct the normality test for the given data
97
Probability Plot of Data
Normal
If p-Value>.05, then data is normal
99
Mean 12.7
StDev 3.164
95 N 30
AD 0.369
90
P-Value 0.404
80
70
Percent
60
50
40
30
20
10
1
5.0 7.5 10.0 12.5 15.0 17.5 20.0
Data
98
Case 1: For Normal Data
Step 1 : Stack the given data
99
Mention
Subgroup Size
Lower Spec
Upper Spec
100
Process Capability of Data St Dev(Within) represents short term
variation
LSL USL
Process Data Within
LSL 9.00000 Overall St Dev(Overall) represents long term
Target *
USL 15.00000 Potential (Within) C apability variation
Sample Mean 12.70000 Cp 0.36
Sample N 30 C PL 0.44
C PU 0.28
StDev (Within) 2.77649
C pk 0.28
Cp represents process capability .
StDev (O v erall) 3.19130
C C pk 0.36
O v erall C apability
Pp 0.31
Cpk represents process capability index
PPL 0.39
PPU 0.24
Ppk 0.24 Pp represents process performance
C pm *
For Rational Sub grouping to be there, lines for short term variation and long term variation
should be distinct. If the lines are overlapping then it shows poor rational sub grouping.
101
Process Capability of Data
LSL USL
P rocess D ata W ithin
LS L 9.00000 O v erall
T arget *
USL 15.00000 P otential (Within) C apability
S ample M ean 12.70000 C p 0.36
S ample N 30 C PL 0.44
S tD ev (Within) 2.77649 C PU 0.28
S tD ev (O v erall) 3.19130 C pk 0.28
C C pk 0.36
O v erall C apability
Pp 0.31
PPL 0.39
PPU 0.24
P pk 0.24
C pm *
6 8 10 12 14 16 18 20
O bserv ed P erformance E xp. Within P erformance E xp. O v erall P erformance
P P M < LS L 100000.00 PPM < LS L 91328.61 P P M < LS L 123146.19
PPM > USL 166666.67 PPM > USL 203726.50 PPM > USL 235544.19
P P M T otal 266666.67 PPM T otal 295055.11 P P M T otal 358690.38
Cpu = ( USL – X-Bar )/3 Standard Deviation Within standard Deviation within is replaced by
Cpl = ( X-Bar – LSL )/3 Standard Deviation Within standard deviation overall
Cpk = (1-k) Cp
102
Case 2: For Non Normal Data
Step 1 : If the data is non normal, then Box Cox transformation needs to be done.
103
Box-Cox Plot of Data
Lower C L Upper C L
9 Lambda
(using 95.0% confidence)
6
StDev
3 Limit
2
-5.0 -2.5 0.0 2.5 5.0
Lambda
104
Select Box Cox Transformation
105
Binomial Distribution of data
If the data is non measurable, not continuous then capability test will be done using
Binomial method
Trial Defects1 Go to Stat/Quality Tools/Capability analysis/Binomial
1900 23
3456 567
2345 678
2345 234
106
Binomial Distribution of data
trial defects1
1900 23
3456 567
2345 678
2345 234
107
Enter Defects in Defectives and No of trials in Use sizes in then press OK
108
Results
Binomial Process Capability Analysis of defects1
P C har t Rate of Defectiv es
0.3 1 30
% Defective
Proportion
0.2 _ 20
U C L=0.1716
P =0.1495
LC L=0.1274
0.1 10
1
0.0 1 0
1 2 3 4 1800 2400 3000
Sample Sample Size
T ests performed w ith unequal sample sizes
S ummary S tats T ar
16 1.00
(using 95.0% confidence)
% D efectiv e: 14.95
12 0.75
% Defective
Low er C I: 14.26
U pper C I: 15.66
8 T arget: 0.00 0.50
P P M D ef: 149512
4 Low er C I: 142592 0.25
U pper C I: 156637
0 P rocess Z: 1.0385 0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 Low er C I: 1.0084 0 5 10 15 20 25 30
Sample U pper C I: 1.0687
109
Z-Value
Z Value indicates the number of Standard Deviations that are lying between target and USL or
target and LSL. For a process with only one specification limits ( Upper or Lower ), this results in
six process standard deviations between the mean of the process and the customer’s
Z value gives an idea about the quality of the process. Higher the Z value better is the process.
Zst ( Z-Short Term ) : Zst is based on data, that has been taken on short term basis. Zst only
replicates technology of the process. It does not takes into account, shift in the process that is
caused due to process variation ( 4M factors ). In other words we can say that, Zst takes into
110
Z-Value
Zlt takes into account both white noise as well as black noise. It is based on the data, that has
been taken over an extended period of time. It replicates technology as well as process control.
Process control involves shift in the process, that is caused because of process variation (4M
Factors).
Examples of process shift or 4M factors : Seasonal changes, variation because of external factors,
Change in skill of a worker over a period of time etc. These changes can be attributed to :
monitoring/measurement;
d) methods (including criteria and various documentations used along the process);
e) work environment
111
Z-Value
Estimating Z-Value from Z-Table : Calculate the Z-Value for 1000 PPM value.
Step 3: It should be noticed that the red colored term lies between 1.0 and 9.9
As can be seen, the value lies in 3.0 row and 0.09 column, hence Z value = 3.0 + 0.09 = 3.09
112
Z bench
Definition of Z.BENCH
9% 10% 19%
ZBENCH = .88
ZLSL = 1.34 ZUSL = 1.22
The total defect area taken from LSL and USL is called Z bench
113
Z bench
Steps involved in calculating Z bench
•PUSL is the probability of a defect relative to the USL.
Z USL= (USL-m)/ s then see the area in Z table
Zbench is the Z value from the normal table which corresponds to the total number of defects.
That is the after addition of Z USL and Z LSL once again see the table for Z value for this area.
114
Z bench
Q 4. Calculate the following :-
1) Z bench
LSL USL
s -5 ,
-∞ +∞
10 15 20 25 30 35 50
Measurement(Time) :
Z USL= (USL-m)/ s
Z USL= (50-25)/ 5 = 5 then see the area in Z table = 4.98*10-7
Z LSL= (LSL-m)/ s
Z USL= (10-25)/ 5 = 3 then see the area in Z table =1.35 *10-3
Now see the corresponding z value for 0.00135498 and the Z value is 3
Hence Z bench =3
115
Analysis
CONTENTS :
• Introduction to analysis
• Introduction to hypothesis
116
ANALYZE
Analyze comes after Define and Measure stage in Six Sigma methodology .
Define aims at theme selection and justification along with finding for the possible root causes.
Measure aims at finding the present status of a process and establishing whether the existing
Analyze selects the vital few factors out of the possible factors. Possible factors are the probable
First step of analyze is fish bone diagram or cause and effect diagram.
Fish bone diagram involves listing all the probable factors and classifying them under suitable
categories.
117
Fish Bone Diagram
The cause & effect diagram is the brainchild of Kaoru Ishikawa, who pioneered quality
management processes in the Kawasaki shipyards, and in the process became one of the
founding fathers of modern management. The cause and effect diagram is used to explore
all the potential or real causes (or inputs) that result in a single effect (or output). Causes are
arranged according to their level of importance or detail, resulting in a depiction of
relationships and hierarchy of events. This can help you search for root causes, identify
areas where there may be problems, and compare the relative importance of different
causes.
Causes in a cause & effect diagram are frequently arranged into four major categories. While
these categories can be anything, you will often see:
1) Man, Method, Material and Machinery (recommended for manufacturing)
2) Equipment, Policies, Procedures and People (recommended for administration
and service).
These guidelines can be helpful but should not be used if they limit the diagram or are
inappropriate. The categories you use should suit your needs.
118
Fish Bone Diagram
The C&E diagram is also known as the fishbone diagram because it was drawn to resemble
the skeleton of a fish, with the main causal categories drawn as "bones" attached to the spine
Man Machine
Cause 2
Cause 1 Problem
or
Effect
Method Material
119
What is hypothesis??
•When we collect the data and judge the hypothesized value and the actual value if the
Difference is less that means our assumption right and difference is more then the assumption
Is not correct.
120
Analysis.
Example :
•Suppose a manager says his employee performance level is 90% .How we can test the validity of
her hypothesis.??
•For that we have to collect sample , if the sample indicates her performance is 95% we can
directly accept managers statement , if our sample statistics says her performance is 46% then
we can directly rejects managers statement . This reject and accept outcome is done using our
common sense.
•Now suppose our sample says her performance level is 88% , which is very close to managers
Statement , but we are not absolutely certain to accept or reject managers statement.therefore we
have to learn to deal with the uncertainty in our decision making.
•We just can not accept or reject a hypothesis about a population parameter simply by intuition ,
instead we need to learn how to decide absolutely on the basis of simple information whether to
accept or reject. 121
Null Hypothesis
•In hypothesis testing we must state the assumed or hypothesized value(or we should tell what
we are assuming about the sample) of the population before we begin the sampling.
The assumption we wish to test is called the NULL Hypothesis and Symbolized Ho .
•For example in a medical application in order to test the the Effectiveness of a new drug the
tested hypothesis (null hypothesis ) Was that it had no effect.that means there is no difference
•When we use hypothesized value of the population mean in our problem we would represent
it symbolically
µHO (The null hypothesized value of the population mean.)
122
• If our sample results fail to support the null hypothesis(if the results are not as per our assumption)
we must conclude that something else is true.
•Whenever we reject the null hypothesis , the conclusion we do accept is called the Alternate
Hypothesis and Symbolized HA .
123
Interpreting the Significance Level.
•The purpose of hypothesis to make a judgment about the difference between the sample and
the hypothesized population parameter.
• The next step after null and alternate hypothesis is to find out which Criteria to use for
deciding whether to accept or reject the null hypothesis
The term which will allow us to decide whether to accept or to reject is Significance level
What is significance level ??
.95 of area
µHO 124
In this area where there is significant
difference between the statistic and hypothesizes parameter(reject null hypothesis)
•As total area under a distribution curve is 1 in this example totally 5% of the area (.025 each side
is) which is marked in black color is outside at the tail of the curve which.
•From Z table we can determine that 95% of all the area under the curve is included in an interval
extending 1.96s either side of the hypothesized mean.In this 95% area is not having any significant
difference between the observed value of the sample statistic and the hypothesized value of the
population parameter the remaining 5% area (colored in black) where significant difference exist.
•In this example .95 of the area under the curve we would accept the null hypothesis.
The black colored part under the curve(.025 each side) representing total 5% of the area where
We would reject null hypothesis.
125
Selecting a significance level
•There is no universal standard or level for selecting the significance level. In some instances 5%
Is used and in some 1 %.Its possible to set hypothesis at any level of Significance .
•But its very important to remember that this significance level is the only important factor which
Will decide to reject or to accept null hypothesis.The higher the significance level ,
the higher the probability of rejecting a null hypothesis When its true (when its really null hypothesis).
126
Type I and type II
1. Producers risk
2. Consumer’s risk
• Producers Risk : This is when we do hypothesis testing , rejecting a Null hypothesis when its true
that means rejecting the good production lot even though there is no
Much evidence to prove that the production lot is defective. Which will lead to high rework.
This type of error done by the producer is called a type I error and is symbolized α(alpha)
Consumers : This is when we do hypothesis testing, accepting a Null hypothesis when its falls ,
that means accepting the good production lot even though there is a evidence to prove that the
production lot is defective. That means taking a chance by the producers to release the production
To market even though its defective by calculating relatively inexpensive warranty and repair at
Consumer end than reworking the entire lot.
This type of error which will be a risk on consumer.is called a type II error and is symbolized β(beta).
128
Representation of Producer and Consumer’s Risk
True
Ho Ha
The ratio which is
Correct Type 2 being “Ha” even if it’s false.
Ho Where “β” is usually
Decision Error
set up at 10%.
β
The ratio which is Accept Consumer risk
being rejected Ho even
though certain thing is true Type 1 Correct
Ha
where “ α” is α error. Error Decision
(usually 5%) α
Producer Risk
129
•In Fig1 last curve the acceptance region is quite small hence there is a rare chance of accepting
null hypothesis actually when its true . To deal with this situation in personal life professional
situations decision will be done estimating the loss ,cost or penalties attached to both the type of
error.
•Suppose making a type I error involves disassembling an entire engine at the factory ,
but making type II error involves relatively inexpensive warranty repairs by the dealers then the
manufacturer is more likely to prefer type II error and they will set lower significance level
in its testing.
130
Two tailed and one tailed tests of hypotheses.
In two tiled hypothesis there are two region to reject null hypothesis, if the sample mean is
Significantly higher than or lower than hypothesized population mean.
Two tailed hypothesis is appropriate when null hypothesis µ=µHo And alternate hypothesis is µµHo
µHo
Reject null hypothesis in this
Region.
131
Ex: the manufacturer of a bulb wants to produces bulbs with bulb mean life time of µ=µHo = 1000hrs.
If the life of the bulb is less than 1000hs he will loose the customer , if its more than that then his
Manufacturing cost will go high . So now he does not want to deviate significantly from 1000 hrs in
Either direction,thus he should use tow tailed hypothesis that he will reject the null hypothesis if
The mean life of the bulb in sample is either too far above 100 hrs or too far below 1000 hrs.
µHo
Reject null hypothesis in this
Region.
132
•And also there is a situation where the wholesaler who is buying these bulbs will rejects
where mean life time of the bulbs are lesser than 1000hs.but he will not reject if the bulbs
are measuring more than 1000hrs as he need not to pay more money for that extra hrs.so
he will use only one tailed hypothesis Ho: µ=µHo 1000hrs and Ha µ<1000hrs.
This hypothesis is also called as left tailed test or lower tailed test.
Example : Measurements were made on nine metal Pieces. And distribution of measurements has
historically been close to normal with s = 0.2. Because you know s, and you wish to test if the
population mean is 5 and obtain a 95% confidence interval for the mean, you use the Z-procedure.
136
One-Sample Z: Values
Boxplot of Values
(with Ho and 95% Z-confidence interval for the Mean, and StDev = 0.2)
_
X
Ho
The p-value is 0.002 which is less than .05 hence reject null hypothesis .
The hypothesized value falls outside the 95% confidence interval for the population mean
Use 1-Sample t to compute a confidence interval and perform a hypothesis test of the mean when
the population standard deviation, s, is unknown.
Values
4.9
5.1
4.6
5
5.1
4.7
4.4
4.7
4.6
138
3 enter 95 in confidence level
4 Click ok
139
One-Sample T: Values
Test of mu = 5 vs not = 5
Boxplot of Values
(with Ho and 90% t-confidence interval for the mean)
_
X
Ho
140
How to use the T table
Sample size Confidence level
(Normally taken as 95 %)
In our example it is 30
141
Two sample T test – 2 set of samples are available and don’t have any
Historic data
A study was performed in order to evaluate the effectiveness of two devices for improving the
efficiency of gas home-heating systems. The energy consumption data are stacked Now you want to
compare the effectiveness of these two devices by determining whether or not there is any evidence
that the difference between the devices is different from zero.
BTU.In Damper
7.87 1
9.43 1
7.16 1
8.67 1
12.31
9.84
16.9
10.04
12.62
1
1
1
1
1
1. First step Normality test to be done .
7.62 1
11.12 1
13.43 1
9.07 1
6.94 1
10.28 1
9.37 1
7.93 1
13.96 1
6.8 1
4 1
8.58 1
8 1
5.98 1
15.24 1
8.54 1
11.09 1
11.7 1
12.71 1
6.78 1
9.82 1
12.91 1
10.35 1
9.6 1
9.58 1
9.83 1
9.52 1
18.26 1
10.64 1
6.62 1
5.2 1
12.28 2
7.23 2
2.97 2
8.81 2
9.27 2
11.29 2
8.29 2
9.96 2
10.3 2
16.06 2
14.24 2
11.43 2
10.28 2
13.6 2
5.94 2
10.36 2
6.85 2
6.72 2
10.21 2
8.61 2
142
Results of normality test
Probability Plot of Data
Normal
99.9
Mean 10.04
StDev 2.868
99 N 90
AD 0.272
95 P-Value 0.663
90
80
70
Percent
60
50
40
30
20
10
5
0.1
0 5 10 15 20
Data
143
2 Nest step is HOV test
Where we will check for difference
in the variance of 3 plating lines
Path:Stat/Basic statistics/2variances
144
Results of equal variance test
Test for Equal Variances for Data
F-Test
Test Statistic 1.19
1 P-Value 0.558
Damper
1
Damper
5 10 15 20
Data
145
Next step is 2 sample test .
Check it
146
Two-Sample T-Test and CI: Data, Damper
Boxplot of Data by Damper
20
Data
Damper N Mean StDev SE Mean
10
1 2
Damper
In our example it is 80
148
ANOVA :Analysis of variance
•This test is used to analyze the differences between more than two samples.
149
Steps of ANOVA test
Conduct Normality test
151
Example : A vendor is having 3 different plating lines in his factory and he has produced
same metal part with the same thickness but he want to validate that the plating thickness of
the each plating line is same or different .
153
Results of Normality test 2 Nest step is HOV test
Where we will check for difference
in the variance of 3 plating lines
Probability Plot of Thickness
Normal Path:Stat/Basic statistics/2variances
99
Mean 20.53
StDev 0.3975
95 N 30
AD 1.070
90
P-Value 0.007
80
70
Percent
60
50
40
30
20
10
1
19.5 20.0 20.5 21.0 21.5 22.0
Thickness
Minitab path
155
Results Test for Equal Variances: Thickness versus
Plating line
156
How To perform ANOVA in Minitab?
Minitab path
157
One-way ANOVA: Thickness versus Plating line
Minitab Gives Us: Source DF SS MS F P
The following Output Plating line 2 0.834 0.417 3.00 0.066
Error 27 3.749 0.139
Total 29 4.583
(1) Sessions table
S = 0.3726 R-Sq = 18.20% R-Sq(adj) = 12.14%
(2) Box plot
Individual 95% CIs For Mean Based on Pooled StDev
Level N Mean StDev --------+---------+---------+---------+-
A 10 20.300 0.216 (---------*---------)
B 10 20.600 0.346 (---------*---------)
C 10 20.690 0.500 (---------*--------)
--------+---------+---------+---------+-
20.25 20.50 20.75 21.00
In this example the samples are same that is Null hypothesis . 158
Degrees of freedom (DF=N-1)
in our samples its 3 plating line hence (3-1=2)
(10-1)+(10-1)+(10-1)
In our example it is 27
159
IN discrete type we have 3 tests in analysis
•1 Proportion test
•2 Proportion
•3 Chi-Square
160
1 proportion test: When historic mean known
hypothesized value.
Bk500EI yari was running in cell2 from past 1 year with DPU of 8% later the same model
is moved to cell1 and they produced 400 units out of 210 units passed. Calculate for
161
How To perform 1 ProportionMinitab
test inpath
Minitab?
Stat > Basic Statistics > One proportion….
162
Results
Exact
Sample X N Sample p 95% CI P-Value
1 190 400 0.475000 (0.425155, 0.525217) 0.000
photocopy machines. After comparing many brands in terms of price, copy quality, warranty,
and features, he has narrowed the choice to two: Brand X and Brand Y. and decide that the
determining factor will be the reliability of the brands as defined by the proportion requiring service
within one year of purchase. Because corporation already uses both of these brands,
he was able to obtain information on the service history of 50 randomly selected machines of each
brand. Records indicate that six Brand X machines and eight Brand Y machines needed service.
165
Results
Test and CI for Two Proportions
Sample X N Sample p
1 44 50 0.880000
2 42 50 0.840000
167
Hypothesis Testing(Discrete Data)
Chi-Square Example 1 : Product defect
During 3months, The types of refrigerator defects are classified according to production shift
and we survey whether they has a character(dependent) or not(independent).
If there is a characterized defect type, the improvement activity could be developed by our
investigating types of defects on each shift with furthermore investigation.
A total of n=309 refrigerator defects were recorded, and the defects were classified into one of the 4
categories(A, B, C, and D) listed below. At the same time, each refrigerator was identified according to
the production shift on which it was manufactured. Our objective is to test the null hypothesis(Ho), tha
the type of defect is independent of shift, against the alternative hypothesis(Ha), that the defects are
dependent on the shift.
Shift A B C D
A : Dents
B : sealed system leaks 1 15 21 45 13
C : Switch failure 2 26 31 34 5
D : Missing Parts 3 33 17 49 20
168
Stat/tables/chi square table(table in worksheet)
169
Hypothesis Testing(Discrete Data)
value =
Grand observation Total
Session Confirmation from Minitab
Chi-Square Test
Expected counts are printed below observed counts
A B C D Total Expected Value of “A” in Defect
1 15 21 45 13 94 type of the *Shift
22.51 20.99 38.94 11.56
E = (94 X 74)/309 = 22.51
2 26 31 34 5 96
22.99 21.44 39.77 11.81 Chi-Square = (O-E) / E 2
3 33 17 49 20 119 2
Chi-Square = (15-22.51) / 22.51
28.50 26.57 49.29 14.63 = 2.506
Total 74 69 128 38 309
Chi-Sq = 2.506 + 0.000 + 0.944 + 0.179 + Higher values may show dependence
0.394 + 4.266 + 0.836 + 3.923 +
0.711 + 3.449 + 0.002 + 1.967 = 19.178 Since P Value < 0.05 ;
DF = 6, P-Value = 0.004 Reject Ho , Accept Ha
DF = (r-1)(c-1) 2
The individual χ values will answer the question:
Where does the dependent relationship exist between the defect type
and the shift?
170
Chi Square( χ 2 )Distribution Hypothesis Testing(Discrete Data)
df 0.250 0.100 0.050 0.025 0.010 0.005 0.001
Chi Square( χ 2) 1 1.323 2.706 3.841 5.024 6.635 7.879 10.828
Distribution 2 2.773 4.605 5.991 7.378 9.210 10.579 13.816
3 4.108 6.251 7.815 9.348 11.345 12.838 16.266
4 5.385 7.779 9.488 11.143 13.277 14.860 18.467
(R-1)(C-1)=df 5 6.626 9.236 11.070 12.832 15.086 16.650 20.515
6 7.841 10.645 12.592 14.449 16.812 18.548 22.458
df : Degrees of Freedom 7 9.037 12.017 14.067 16.013 18.475 20.278 24.322
8 10.219 13.362 15.507 17.535 20.090 21.955 26.125
9 11.389 14.684 16.919 19.023 21.666 23.589 27.877
10 12.549 15.987 18.307 20.483 23.209 25.188 29.588
11 13.701 17.275 19.675 21.920 24.725 26.757 31.264
12 14.845 18.549 21.026 23.337 26.217 28.300 32.909
13 15.984 19.812 22.362 24.736 27.688 29.819 34.528
14 17.117 21.064 23.685 26.119 27.141 31.319 36.123
15 18.245 22.307 24.996 27.488 30.578 32.801 37.697
16 19.369 23.541 26.296 28.845 32.000 34.267 39.252
17 20.489 24.769 27.587 30.191 33.409 35.718 40.790
18 21.605 25.989 28.869 31.526 34.805 37.156 43.312
19 22.718 27.204 30.144 32.852 36.191 38.582 43.820
20 23.828 28.412 31.410 34.170 37.566 39.997 45.315
21 24.935 29.615 32.671 35.479 38.932 41.401 46.797
22 26.036 30.813 33.924 36.781 40.289 42.796 48.268
23 27.141 32.007 35.172 38.076 41.638 44.181 49.728
24 28.241 33.196 36.415 39.364 42.980 45.558 51.179
25 29.339 34.382 37.652 40.646 44.314 46.928 52.620
26 30.434 35.563 38.885 41.923 45.642 48.290 54.052
27 31.518 36.741 40.113 43.194 46.963 49.645 55.476
28 32.620 37.916 41.337 44.461 48.278 50.993 56.892
29 33.711 39.087 42.557 45.722 49.588 52.336 58.302
30 34.800 40.256 43.773 46.979 50.892 53.672 59.703
40 45.616 51.806 55.758 59.342 63.691 66.766 73.402
50 56.334 63.167 67.505 71.420 76.154 79.490 86.661
60 66.981 74.397 79.082 83.298 88.379 91.953 99.607
70 77.577 85.527 90.531 95.023 100.425 104.215 112.317
80 88.130 96.578 101.879 106.629 112.329 116.321 124.839
90 98.650 107.565 113.145 118.136 124.116 128.299 137.208
100 109.141 118.498 124.342 129.561 135.807 140.169 149.449
171
Degrees of freedom
•Degrees of freedom is the number of values we can choose freely.
Ex:Assume we are dealing with 2 samples a and b and they have a mean of 18.
= a+b = 18
2
•In this can a and b can have any values whose sum should be 36
Because 36/2 = 18.
If a is 10 then 10+b = 18
2
•Hence if we have 2 elements in a sample and we we know the mean and we are free to
specify only one element as one more value is fixed to get the specified mean.
172
See one more example
a + b + c + d + e + f + g = 16
7
In this case the degree of freedom ,or the number of variable we can freely specify is 7- 1= 6
that means we are free to give values to 6 variable and no longer free to specify the the seventh
one as its determined automatically.
173
Estimation
Estimation : Every one makes estimations , when we are ready to cross the road we will do the
Estimation based one the car speed which is approaching and the distance between us and the car
And also our walking speed. After calculating all theses we will take a quick decision to cross the
road.
•1 Point estimation
•2 Interval estimation.
•Point estimation. : It’s a single number used to estimate an unknown population parameter.
•Example : The manager will give a statement that by end of this Q2 2005 we should have 15
•A point estimation is always insufficient be cause its either right or wrong chance of rejecting null
Hypothesis is more most of the time as we have to stick to one value only.
It will give 2 types of error indication which we will do while estimation that is one is by the 174
extent
of its range and the second one is by the probability of falling the true value within that estimated
range .
Example If a manager says by the end of this 2005 we will have 20 to 25 green belts in our
department.That he might have calculated based on the training given each quarter to min to 50
Any sample statistic which is used to estimate a population parameter is called estimator .
Sample x can be a estimator for population mean µ.and also we can use sample range as a
Employees of the furniture company our estimator is mean turn over fir a perid of 1 month ,
1 Unbiasedness :Biasing is taking the decision not just based on the previous proven results .
2. Efficiency : Efficiency in estimation is measured using standard error that is smaller the standard
3 Consistency: a statistics consistency is as the sample size increases then the estimated value
will come closer to value of population parameter. That means estimation will be more accurate if the
4 Sufficiency :Estimator is sufficient when the estimation is done using all the possible information
176
Confidence Interval: It’s a major part of estimation which will indicate how much confident we are
that the interval estimation will include in the population parameter.Higher probability means higher
•In estimation we use 90 and 95 and 99 % confidence , but we are free to apply any confidence
level.
Example : When we are preparing a income report of some community at 90% CI level and our
Statement is that the mean of the population income lie between $8000 and 24000$ then this range
•But normally we will express our confidence interval in standard error rather than in numerical value.
•That is x +1.64s x
177
Relationship between confidence level and confidence interval : We might think that we
Should use high confidence level (99%) in our estimation as high confidence means high accuracy.
But in practice high confidence levels will produce large confidence intervals , which are not precise.
This example will give the idea of confidence interval and confidence level.
•As the customer sets the tighter targets Confidence level will go down.
Customer question Store manager reaction Implied confidence level Confidence interval
Will I get washing m/c I am obsolutly sure about
within 1 year? that 99% 1
yes I am almost sure that it
Will I get washing m/c can be delivered within 1
within 1 month? month 95% 1 month
Will I get washing m/c
within 1 week? I am pretty sure 80% 1 week
Tomorrow??I am not that
Will I get washing m/c sure about it but we will try
by tomorrow? our best 40% 1 day
Will my washing m/c
get home before I could
reach? there is a very little chance . 1% 1 Hr
178
Regression
• Used to mathematically equate the relationship between the
factor X and response Y
179
Eg..To check which equation suitable for relationship between Age &
Height
181
Minitab will give you the following output
Regression Analysis: Height (In Inches) versus Age (In years)
Analysis of Variance
Source DF SS MS F P
Regression 1 2.6443 2.6443 39.08 0.000
Residual Error 18 1.2180 0.0677
Total 19 3.8624
182
Linear regression equation
Select Linear
183
Regression Analysis: Height (In Inches) versus Age (In years)
S 0.260133
6.0 R-Sq 68.5%
R-Sq(adj) 66.7%
Source DF SS MS F P
Regression 1 2.64433 2.64433 39.08 0.000 5.5
4.5
Fitted Line: Height (In Inches) versus Age (In years) 4.0
10 20 30 40 50
Age (In years)
184
Linear quadratic equation
Select Quadratic
185
Equation to be ok
Polynomial Regression Analysis: Height (In Inches) versus Age (In years)
R-Sq and R-Sq(adj)
To be more than 64%
The regression equation is
Height (In Inches) = 2.757 + 0.1402 Age (In years) - 0.001701 Age (In years)**2
Analysis of Variance
Total 19 3.86238
Source DF SS F P
4.5
Linear 1 2.64433 39.08 0.000
Quadratic 1 0.52528 12.89 0.002 4.0
10 20 30 40 50
Age (In years)
Fitted Line: Height (In Inches) versus Age (In years) 186
Linear Cubic equation
Select Cubic
187
Polynomial Regression Analysis: Height (In Inches) versus Age (In years)
Equation to be ok
S = 0.205991 R-Sq = 82.4% R-Sq(adj) = 79.1% R-Sq and R-Sq(adj)
To be more than 64%
Analysis of Variance
Source DF SS F P
4.5
Linear 1 2.64433 39.08 0.000
Quadratic 1 0.52528 12.89 0.002
4.0
Cubic 1 0.01384 0.33 0.576
10 20 30 40 50
Age (In years)
Fitted Line: Height (In Inches) versus Age (In years) 188
Note : after finding out all the 3 (linear, quadratic, cubic )equation which equation is giving more %
(more than 64%) value for R-Sq and R-Sq(adj) That equation to be used in our
experiment.
If R-Sq and R-Sq(adj) is less than 64% that equation will not be
valid.
189
Design of Experiments
• Factorial DOE
•Response Optimizer
190
DOE Steps
Decide the no.of factors of study & their levels
3) Conduct Design of Experiments & Find out optimum level of each factor
Step 1
Lets assume in our example, following are the factors & its levels
Factors
191
Step 2 - How to Create Minitab path
Factorial Design table ? Stat > DOE > Factorial > Create Factorial Design…
192
How to Create Factorial Design table ?
3
Click this
193
5
7
6
195
Minitab gives you the Factorial design table in the worksheet
10
196
Do experiments with these set values & update the table
11
197
12
Minitab path
13
Select the column
198
14
Minitab path
15
199
16
Click these
three options Then click these
Three options
17 18
Click here
200
Follow the same procedure for all 3 “ Setup” pop-ups shown above
Minitab gives the following outputs Main Effects Plot (data means) for Fan strength
Inj.speed Inj.pressure
6.60
6.45
6.00
2) Interaction plot 8
Hold on time
12 5 30
6.60
6.45
3) Cube plot 6.30
6.15
6.00
1.5 3.0
Cube Plot (data means) for Fan strength Interaction Plot (data means) for Fan strength
5 30 1.5 3.0
Hold on time
6.06 6.10
5 H old on time
1.5
8 12
Inj.speed
201
How to infer the Main effects plot ?
Main Effects Plot (data means) for Fan strength
Inj.speed Inj.pressure
6.60
6.45
6.30
Mean of Fan strength
6.15
6.00
8 12 5 30
Hold on time
6.60
6.45
6.30
6.15
6.00
1.5 3.0
Criteria : Factor having high slope has the greatest effect on the response
Inj.speed
6.50 8
12
Inj.speed
6.25
6.00
Inj.pressure
6.50 5
30
Inj.pr essur e
6.25
6.00
H old on time
In our eg..Injection speed & Hold on time have strong interaction with each
other which can affect fan strength, Injection speed & Injection pressure have
moderate interaction and Injection pressure & Hold on time have no I/action
203
How to infer the Cube plot ?
Cube Plot (data means) for Fan strength
Criteria : Select the best optimum corner value as per your target and find out the level
at which factors are set for that point. Those are the optimum settings.
In our eg..Lets assume we want the maximum fan strength. Then the best option is 6.7
So the optimum parameters for achieving this strength are
204
DOE
Response Optimizer
205
Lets assume
1) You are not getting the optimum response in both the levels of a factor,
but you want to study the response keeping the factor in between the
extreme levels, HOW DO YOU PROCEED?
2) You want to study the effects of two responses in the various settings
of factors. For eg..you want to increase the fan strength and reduce the
manufacturing cost for the same. HOW DO YOU STUDY THE OPTIMUM
SETTINGS FOR ACHEVING THESE TWO RESPONSES?
RESPONSE OPTIMISER
206
Response optimiser Minitab path
1
Stat > DOE > Factorial > Response optimiser….
2 3
Click this
207
4
208
6
209
8
In that enter your min.expected value (I.e.7) and the target value
211
Minitab gives the following output
10
212
Minitab gives the following output
11
Move the red lines. You will see the change in the target value..
Move all the three lines to get the Y value you require.
213
214
215