Statistics Tutorial

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Statistics for Electronics Technicians

Statistics is a branch of mathematics that is indispensable for many technical workers. Electronics technicians often
measure things such as voltage and resistance. It is important that they realize that such measurements are affected by
unavoidable random errors. Experienced technicians have an intuitive feel for how much error is normal and what is
abnormal. Some basic information about statistics can help beginners develop this insight more rapidly. This
document is offered as an enhancement for those instructors and students using the McGraw-Hill BSEE series.
McGraw-Hill grants permission to instructors using the BSEE series to copy this document and to use it in their
instruction as they see fit; provided that the copyright notice is maintained.
The seventh editions of the BSEE series have introduced the concepts of sampling, error and quality standards. For
example, the Experiments Manuals contain appendix material on Lab Notebook Policies including references to ISO
9000 and ANSI Q9000. Also, experiments are suggested that involve sampling and error. One such example is Lab 22 in Schulers Experiments Manual.
Another issue is that students who have completed an electronics technology program might be in a position to apply
for employment as entry level Quality Technicians. If so, the information in this document should enhance their
chances. Also, students and workers now commonly use circuit simulators with statistical features such as Monte
Carlo analysis.
The science of statistics is sometimes referred to as sampling theory, probability theory, or odds. Whatever it is called,
there is no other way for workers to fully understand process variations, component tolerance and SPC (statistical
process control). Statistical tools are used by manufacturing personnel to draw inductive inferences from experimental
data. The applications of statistics in manufacturing plants are twofold: (1) Statistical inference is a powerful aid
when manufacturing processes are being evaluated, selected, and implemented; and (2) It is one of the primary
methods to control on-going processes so as to insure a consistently high production quality.
Knowledge of basic statistics will help technicians making measurements to properly deal with error; that is, make the
right decisions as to when errors are acceptable (normal) and when they are not.
BASIC TERMS
POPULATION:

A set of observations or measurements. Example: the output voltage measurements of all


the regulated power supplies of one particular design made by one manufacturer.

SAMPLE:

A collection of observations drawn from the population. Example: the output voltage
measurements of those regulated power supplies manufactured on May 26, 1999.

RANDOM
SAMPLE:

A measurement taken or a sample drawn in such a way that all observations of the
population have an equal chance of being chosen. Random samples are essential if
experiments are to be valid (unbiased).

PARAMETER:

Some characteristic computed from a population. The average resistance of all the 100
resistors manufactured constitutes a parameter. Parameters are usually not known. The
science of statistics allows the estimation of parameters by making careful observations
(measurements) of samples.

STATISTIC:

A characteristic computed from a sample. The average of 200 power supply output voltage
measurements would constitute a statistic. How accurate the statistic is likely to be can be
calculated using statistical methods. Thus, one can draw inferences about the population
from a sample --- this is called statistical inference.

STATISTICAL
INFERENCE:

Drawing a conclusion about a population by examining only a sample


of that population. For example, after determining the MTBF (mean time between failures)
for a sample of power supplies, a manufacturer can calculate the expected MTBF for all
power supplies of this design.

We sample when it is impossible or impractical to measure the population. For example, manufacturers of light bulbs
would like to advertise the average life time for their products. They obviously do not want to use up every light bulb
they have to determine exactly how long each and every one would last. So they sample the population and make a
statistical inference about the population. It is obvious that a large sample will yield more accurate statistics. The
question of how large the sample has to be to achieve a given level of accuracy can be computed using statistical
methods.
AVERAGE:

A measure of central tendency. There are three distinct ways to measure central tendency:
(1) mean, (2) median, and (3) mode.

MEAN ( X ):

The arithmetic average of a set of observations. It is computed by:

X =

X
n
X is the mean

X is the sum of all observations


n is the number of observations
The mean of the sample shown below is 100.222.
98, 105, 100, 93, 92, 100, 110, 100, 104
MEDIAN:

The 50% point of a sample. When n observations are arranged in an ascending order
according to magnitude, the median is the middle observation. If n is even, it is the average
of the two middle observations. The median of the sample below is 101.
92, 93, 98, 100, 101, 102, 104, 105, 105

MODE:

The value of observation that occurs most often. For example, the mode of the sample
shown below is 100.
98, 105, 100, 93, 92, 100, 110, 100, 104

VARIANCE (S2):

A measure of variation among a set of observations or measurements. If one sample is 95,


100, 105, 108 and a second sample is 75, 95, 115, 123 then the variance of the first sample
is much less than the variance of the second sample (although both samples have the same
mean). Variance is calculated by:
2

S =

(x x)

n 1

where s

is the variance

means take the sum of

X is an observation (a measurement)

X is the mean of all the observations


n is the number of observations

Thus, you must subtract the mean from each observation, square each difference, sum all
squared differences, and divide by n-1. For the sample 95, 100, 105, 108, s2 = 32.7 and for
the sample 75, 95, 115, 123, s2 = 462.67.
STANDARD
DEVIATION (
):

The square root of the variance. When s2 = 32.7, = 5.72.

DISTRIBUTION:

An X versus Y plot of frequency versus magnitude. There are four common distributions
used in statistical inference: (1) normal distribution, (2) T distribution, (3) chi square
distribution and (4) F distribution.

NORMAL
DISTRIBUTION:

Often called the bell curve because of its shape. Also called the Gaussian
distribution. It is symmetric with respect to a line drawn perpendicular to the horizontal
axis at the mean. This very important distribution is illustrated below:

Mean + 3
Mean - 1

Mean + 1

Mean - 2

Mean + 2

Mean - 3

Frequency

99.7%
96 %
68 %

Mean
Magnitude
Frequency is how many times an observation occurs. Magnitude is how large an observation is. For example, a curve
of IQ scores approximates the bell curve. The mean IQ score, 100, occurs most often. Therefore its frequency is

highest. IQs higher than 100 (those to the right of the mean) occur less often. IQs lower than 100 (those to the left
of the mean) also occur less often than does 100.
Manufacturing processes are subject to random errors due to factors such as temperature or humidity. It is not possible
to absolutely control any factor. For this reason, component parameters often follow a normal distribution. A batch of
100 resistors might have an average value near 100 and a range of values from 95 to 105 . In a normal
distribution, values near 100 will occur much more often than values near 95 or 105 .
With a normal distribution, the range from the mean to one standard deviation above the mean will take in 34% of the
measurements and 68% of the measurements will be found in the interval from the mean - to the mean + . About
96% of the measurements will be found from two standard deviations below the mean to two standard deviations
above the mean. Four standard deviations take in all but 4% of the available observations. Motorola named their
award winning TQM (total quality management) program six sigma. Six standard deviations, ranging from three
below the mean to three above the mean, is a range that includes 99.7% of a total population. This implies a very low
reject rate (thus a very high success rate).

Frequency

Suppose we buy three batches of 100 resistors, each batch from a different manufacturer, and measure all of them
and plot their distributions as shown in the graph below:

100
The red batch of resistors has the greatest variance and standard deviation. The blue batch has less variance than the
red batch and the green batch has the smallest variance but has a mean value over 100 . Generally, by spending
more money, it is possible to buy parts with less variance and with mean values closer to the nominal value. The trick
is to buy parts that are just good enough to get the job done so as not to waste money.
Electronics Workbench (version 5.0) has a feature called Monte Carlo analysis. This can be used to determine the
impact of component variations on a production run. The analysis can use either Gaussian distributions for parts
variations or a uniform distribution. In a uniform distribution, the parts values are evenly distributed over the
tolerance range rather than being peaked at the nominal value. Consider the graph shown on the next page:

Frequency

LTL

100

UTL

The red curve shows a uniform distribution from the lower tolerance limit (LTL) to upper tolerance limit (UTL).
When this method is chosen, there is an equal probability of any resistor value within the tolerance range being used in
the Monte Carlo analysis. The blue curve is an example of what can happen when suppliers use parts selection. Each
resistor coming off the production line is measured and those very close to the nominal value are sorted out as
premium parts. What remains is sold at a lower price. Although selection processes are not always used, its worth
knowing that the Gaussian model is not always realistic. Designers and quality managers often work closely with
suppliers so the correct analyses can be used.
Lets take a look at an example of Monte Carlo analysis. The circuit shown below is easy to analyze and at first glance
it might seem that the performance of the two outputs should be the same. Statistically, they are different.

The voltage at Out 1 will show a lot more production variation than that from Out 2. Look at the Monte Carlo results
for Out 1 for a simulated production run of 100 which is shown on the next page:

You should notice that several voltage values fall very close to the worst case limits (the blue lines). These worst case
values are easily verified by using the voltage divider equation with values of 95 , 105 (the tolerance limits of the
resistors) and 5 volts for the supply (+Vcc). The voltage at Out 2 will show a lot less variation than that from Out 1.
Look at the Monte Carlo results for Out 2 for a simulated production run of 100 which is shown below:

Why does the second output show less variance when both outputs are based on 5% resistors? The answer is that the
second output benefits from error cancellation. Here is a simple example. Suppose you have a choice of specifying a
single 100 , 5 % resistor for an application or a series combination of two, 50 , 5 % resistors. The series
combination will show less variation in a production run. This is because there are four ways the series resistors can
combine: (high + high, high + low, low + high, and low + low). Two out of the four possibilities produce error
cancellation! Designers would usually find it less expensive to specify one resistor with better tolerance than
specifying two resistors. However, technicians should know about how errors tend to cancel in some cases.
OTHER DISTRIBUTIONS
T DISTRIBUTION:

Also called Student's T-distribution. This is a very useful frequency distribution that can be
used to test the difference between two means (T-test) when the population variance is
unknown. In other words, it allows the experimenter to apply the sample variance which
can be easily computed from collected data. It is not a single curve, but a family of curves.
The number of degrees of freedom uniquely identifies a particular T curve. T curves look
very much like normal curves and, in fact, when the number of degrees of freedom
approaches infinity (very large sample) the T curve is shaped like a normal curve, with a
mean equal to 0 and the variance equal to 1.
The degrees of freedom are n-1.

T TEST EXAMPLE:

The hypothesis is that the population mean is 5. Assumptions are that the sample is random
and drawn from a normal population. A 5% level of significance is chosen. The sample
size is 5 observations so the degrees of freedom are 4. A T-Table shows that the critical
regions are when T is less than -2.776 or greater than 2.776. T is now computed by:

T=

X u
s2
n

where is the hypothesized mean of the population (5 in this example)


Assuming the calculated value of T is -2.83, the hypothesis is rejected since T is inside the
left critical region.
TYPE I ERROR:

Rejecting a true hypothesis. The probability of committing this error is called the level of
significance or confidence. In the above example, there is a 5% chance that we have
rejected a true hypothesis.

TYPE II ERROR:

Accepting a false hypothesis.


NOTE: With a fixed sample size, decreasing the chance of type I error will increase the
probability of a type II error and vice versa. The only way to decrease the probability of
both types of error is to increase the sample size.

F DISTRIBUTION:
A family of frequency distributions relating to the ratio of two independently distributed
sample variances. Two separate degrees of freedom will identify the specific F curve.
F TEST:

Testing the hypothesis that two population variances are equal. Two random samples are
drawn. The variances are computed. The degrees of freedom for each sample are
computed. The variance ratio is found with:

s12
F = 2
s2
with n1 and n2 degrees of freedom
The critical value of F is obtained from an F table. If F is close to 1, then the variances are
nearly equal. If the 5% significance level is used the critical value might be where F
exceeds 7.3879. If the calculated value of F is 14, then the hypothesis must be rejected.
ANALYSIS OF
VARIANCE:

May be regarded as the logical extension of the T test. In fact, in testing a


hypothesis that two population means are equal, either the two tailed T-test or the one tailed
F test (analysis of variance) may be used and the two tests always give the same results: T2
= F.

EXAMPLE ANOVA
APPLICATION:

Suppose a company has three designs for making regulator circuits and wants to know if
the three produce equally stable outputs. A random sample of 20 regulators is obtained for
each design. The regulators are processed in a standard environmental test chamber and the
output is measured. The ANOVA (analysis of variance) method can be used to test the
hypothesis that the three designs are of equal stability:
Sources of variation
among designs
within designs
total variation

Degrees of freedom
2 (3 processes - 1)
57 (3 x 20 - 3)
59 (3 x 20 - 1)

CHI SQUARE ( 2)
If all possible samples of size n are drawn from a normal population with a mean equal to
DISTRIBUTION:
and variance equal to 2 and for each sample the sum of 2 is computed, the frequency
distribution of the sum of 2 follows the chi square distribution with n degrees of
freedom. (chi square is pronounced kie square):
The chi square distribution can be used to test the hypothesis that the population variance is equal to a given value.
The chi square test is often used to determine if a process is normally distributed. This test is often used before SPC is
established and can be a part of a process capability experiment and these generally require sample sizes over 100.
Computers are used since the calculations are laborious. Manufacturers require their products to have dimensional
and/or parametric stability. The permissible variation in a production sample is a specification standard. Samples are
taken periodically and the hypothesis that the tolerances are not varying too widely is tested. Control charts are
commonly applied to continuously sample production runs (more on this later).
CORRELATION
The degree of relationship between two variables. Example: IQ's and test scores. The
COEFFICIENT:
symbol is r and it ranges from -1.0 to +1.0. If r = 0 then there is no relationship.
If r = -1 then there is a perfect inverse relationship. If r = +1, there is a perfect direct
relationship.
Please Note: One of the distinct pitfalls with correlation (and statistics in general) is the promotion of false inferences.
Suppose, for example, that an r value of +0.7 is obtained between temperature and the number of cold beverages sold
at a stadium. A ridiculous conclusion: selling more cold beverages makes the temperature go up! Although this
conclusion is indeed ridiculous, it serves to make a point: we dont always understand the relationship between
variables. Sometimes the best initial conclusion is that the variables appear to be related and that additional
investigation is called for. This is especially critical when people are biased and expect to find certain relationships.
The scientific method dictates that carefully collected data are to be taken at face value and that all reasonable
possibilities must be considered (even when they fly in the face of favored theories).
A factorial experiment is one in which two or more factors are studied simultaneously. It is a number of one factor
experiments superimposed on one another. A simple experiment or series of experiments will not yield information

regarding possible interactive effects of two or more factors while a factorial experiment will. Also, it is more efficient
than running a series of separate experiments. Finally, a factorial experiment provides a wider inductive basis for a
conclusion. Example: a study to determine the effects of study time & IQ on achievement.
A contingency table is a method of compiling data to determine if different factors are dependent or independent.
Example: a table could be constructed showing the efficiency of solar cells according to five brackets or categories.
Six rows could be used with each representing a production period such as Monday through Saturday. This is a 5 x 6
contingency table. A test of independence may be administered to determine if cell efficiency is related to the day of
production. The degrees of freedom for such a table is given by D.F. = (k-1)(r-1) or 20 for this example. The degrees
of freedom must be known in order to locate the critical value of chi square in a table of standard chi square values.
Analysis of covariance and factorial experiments are very closely related. In fact, the analysis of the factorial
experiment may be regarded as the curvilinear analysis of covariance. As the factorial experiment, it enables the
researcher to study several factors in a single experiment.
Non-parametric statistics or distribution free statistics are used when distributions are not normal and/or exact shapes
are not known. They involve an entire host of methods such as transforming a population into a binomial population,
the sign test, etc. The important thing to remember is that non parametric statistics are used to deal with populations
which are far from normal.
CORRELATION
1.

The most widely used measure of correlation is the Pearson product-moment correlation coefficient. This measure is used
when the variables are quantitative.

2.

The square of the correlation coefficient is the ratio of the two variances. In other words, if r = .80 then r2 = .64 and 64
percent of the variance of the one variable is predictable from the variance of the other variable. A correlation of .10
represents only a 1% association; .50 represents a 25% association, and even a correlation as high as .90 means the
unexplained variance is 19%.

3.

Ordinal or rank-order data can be used to calculate correlations. Example: a group of five similar products may be ranked 1,
2, 3, 4, 5 according to their general appeal and 3, 1, 2, 5, 4 according to their relative cost. The data are comprised of paired
integers extending from 1 to n. The calculation procedure is based on Spearman's coefficient of rank correlation.

4.

Biserial correlation is a measure of the relationship between a continuous and a dichotomous variable. Dichotomous
variables have only two values such as what one finds in a binary digital computer or another example is sex. Continuous
variables have any possible value such as height or weight. A Pass-Fail QA (quality assurance) test is another example of a
dichotomous variable. Point biserial correlation is a product moment correlation. Biserial differs from point-biserial in the
assumption that the variable underlying the dichotomy is continuous and normal.

5.

Multiple correlation: three or more variables. This can be used when it is necessary to combine a number of variables to
provide the best possible estimate of a criterion. Example: predict resistor accuracy from a particular materials supplier, one
production machine, and one production shift.

AN INTRODUCTION TO SPC (STATICAL PROCESS CONTROL)


It is simply a fact of life that when everything in a manufacturing process is under control, there is still going to be variation.
Suppose bipolar junction transistors are being manufactured. What conditions will affect the current gain () of the transistors
produced at any given time? Some possibilities include temperature, purity of the silicon, purity of the dopants, time allocated to
each step, cleanliness of the site, accuracy of the positioning equipment, and so on. Even in the best of worlds, it would not be
possible to perfectly control even one of these factors. The natural variability in a process is called common cause. It is common
to the manufacturing system and cannot be removed by short-term conventional methods such as operator adjustments. When a
process is in control, the output is varying and the variations are acceptable. If the common cause variations are not acceptable,
then a major process design change or improvement is in order.
Obviously, things do go wrong from time to time. These occurrences are called local faults. When a local fault occurs, the
output may be thrown outside of the bell curve. Often, local faults are identifiable and repairable by employees.
Thus, there are two kinds of variability: (1) Common cause due to many subtle sources. These variations manifest themselves as
randomly and evenly distributed scatter about the average value. These variations cannot be removed by typical means by
employees using short-term actions. (2) Local faults due to malfunctions, abnormal fluctuations, or faulty material. Local faults
can often be corrected by the production workers. Why is an understanding of this important? Because, a misunderstanding will
often lead to action that will increase variability and thus reduce product quality. When workers do not understand that random
fluctuations are a normal part of the process, they might try to improve things by turning it up a little when a process
measurement falls a bit on the low side. Then, the next random fluctuation drives the output in the other direction and another
adjustment is made. What is happening now is the probability curve is being shifted all over the place as shown below:

What happens with overcontrol


The probability curve should remain centered on the nominal value. Too much operator control is a bad situation!
This leads to confusion and even can be demoralizing. Have you ever heard the statement: The harder I try, the
worse things seem to get. This is what can happen in situations where there is pressure to do better but there is no
knowledge of statistics and normal errors.
In many processing plants, data are used to assure quality and to call attention to problems. Sometimes, marketing and
management impose limits on these data ... these are often in the form of specifications. A sad situation arises in those
cases where the specs are not based on the natural process capabilities. In these cases, there will be times when the
process is in control but out of specifications and, surprisingly to some workers, should be left alone! Of course, the
product would have to be sorted and some of it rejected. Scrap makes managers unhappy. However, messing with the
process in such a case is guaranteed to increase the scrap rate!
So what does one do when a process is in statistical control but the scrap rate is too high? Either the specifications
have to be relaxed or the process has to be changed or improved in some major way. This takes time and is often

10

expensive. This is beyond the capabilities of the process operators. For example, better materials and/or better
manufacturing equipment are often needed to make significant improvements.
CONTROL CHARTS

Process Output

UCL

MEAN

LCL
Time

Process operators (or quality technicians) might be required to log data in the form of a control chart, or this might be
automated with computers and software. The upper and lower points of the bell-shaped curve are called control limits.
Production data can be plotted as a time chart or as a control chart as shown above. The mean value is the desired or
nominal value. If 100 resistors are being manufactured, the mean value should be 100 and the UCL (upper
control limit) might be 103 and the LCL (lower control limit) 97 . The above process shows random variations
due to common cause. The process is in control. If there is a local fault, the process goes out of control and one of the
following might occur:
1.
2.
3.
4.

The last plotted point is outside of the UCL or the LCL (called an outlier).
There is a run of consecutive points above or below the mean line (5 to 7 is a typical decision point).
There are two-out-of-three consecutive points in the outer third zones of the chart.
There is an obvious trend or shift.

Obviously, the rules for when to react vary. If the process is automated, a computer might automatically send an alert
or abort a process when it measures as being out of control. Remember that overcontrol is a mistake. For example,
the first reading above is below the mean value. If an operator reacts and adjusts the process upward, the entire
control curve will then be shifted upward. This would not be a proper action.
A company does not benefit from making scrap. Quality technicians and workers use statistical tools to minimize or
eliminate scrap. Electronics technicians, not involved in production, also use statistical tools such as Monte Carlo
analysis and they also must properly deal with measurement errors on a day to day basis.

11

You might also like