Unit 4 & Unit 5

Unit 4
Nonparametric Statistics
Hypothesis Testing Procedures
Many More Tests Exist!
Parametric Test Procedures

1.
Involve Population Parameters (Mean)
2.
Have Stringent Assumptions

(Normality)
3.
Examples: Z Test, t Test,
Nonparametric Test
Procedures
1.
Do Not Involve Population Parameters

Example: Probability Distributions, Independence
2.
Data Measured on Any Scale (Ratio or

Interval, Ordinal or Nominal)
3.
Example: Run Test
Advantages of
Nonparametric Tests
1.
2.
3.
4.
5.
Used With All Scales

Easier to Compute
Make Fewer Assumptions
Need Not Involve
Population Parameters
Results May Be as Exact
as Parametric Procedures
Disadvantages of
Nonparametric Tests
1. May Waste Information
Parametric model more efficient
if data Permit
2. Difficult to Compute by
hand for Large Samples
3. Tables Not Widely Available
Nonparametric Tests
One Sample and Two Sample Tests
1.
Appropriate for Nominal Data

1.
McNemar Test
Appropriate for Ordinal Data
2.
Run Test
3.
Mann-Whitney Test U
4.
Kolmogorov-Smirnov Test D
5.
Sign Test
6.
Wilcoxon Matched Pairs Test
K-Sample Tests (K 3)
2.
1.
Kruskal-Wallis Test - H
McNemar Test
Appropriate for Nominal Data
Relates to Measurements taken Before & After of one
and same sample
PROCEDURE
1.Set up Null Hypothesis that there is no change in
peoples attitude as a result of the treatment
2.Classify data into the following 4 categories
I.
Those who give positive response before treatment and

negative response after the treatment. (Denote as A)
II. Those who give negative response before treatment and
positive response after the treatment. (Denote as B)
III. Those who give positive response at both times
IV. Those who give negative response at both times
McNemar Test
3. Calculate the test static which is only a transformation
of X2 :
4. Compare the calculated X2 with the table value of 1

d.o.f at the given level of significance. If calculated value
is less than the critical value accept H0
McNemar Test
Example1:
A researcher attempts to determine if a drug has an effect on a
particular disease. Counts of individuals are given in the table, with
the diagnosis (disease: present or absent) before treatment given
in the rows, and the diagnosis after treatment in the columns. The
test requires the same subjects to be included in the before-and
after measurements. Test whether there is a significant effect of the
treatment.
After: present
After: absent
Row total
Before: present
101
121
222
Before: absent
59
33
92
Column total
160
154
314
Critical Values of
Chi-square Distribution
Example 2: In the BBC program The Doha

Debates 1000 people were surveyed regarding
their opinion about capital punishment. 705 were
in favor and 295 against. They then listened to a
debate about the subject and the survey was
repeated. After they listened to the debate 663
voted in favor and 337 against. 73 changed their
mind from against to in favor and 115 changed
their mind from in favor to against. Did the
debate affect peoples opinion?
Run Test
Run Test is used to know whether the observations in a
given series can be regarded as random.
For example: Queue at a bus-stop
MFMFMFMFMFMFMFMFMFMF
Another sequence:
MMMMMMMFFFFFFF
Data is converted into runs
M/F/M/F/M/F/M/F/M/F/M/F/M/F/M/F/M/F/M
MMMMMMM / FFFFFFF
Run Test
Example 1:
Marks of 15 students are
55,52,43,49,36,61,44,47,67,78,63,57,41,28 and 50
Taking deviations from 50
aa bbb a bb aaaa bb a
PROCEDURE (Small Sample <20 )
n1 = Number of occurrences of type one (say 8 a)
n2 = Number of occurrences of type two (say 7 b)
r = Total number of runs (7)
If Observed number of runs (r) lies between the two critical values
then accept null hypothesis otherwise reject it.
Run Test
PROCEDURE (Large Sample >20 )
If either n1 or n2 is greater than 20 then the sample is said to be large
and approximated to normal distribution. The number of runs (r) is a
statistic with sampling distribution (Z) and the mean (r) and standard
error (sigma r) of the r statistic is
( 2n1n2 )( 2n1n2 n1 n2 )
r
(n1 n2 ) 2 ( n1 n2 1)
r r
z
r
n1 = Number of occurrences of type one

n2 = Number of occurrences of type two
r = Total number of runs (7)
The critical values for two-sided test at 5% level of significance are
rc (lower) = - 1.96
rc (upper) = + 1.96
If Observed number of runs (r) lies between the two critical values then
accept null hypothesis otherwise reject it.
Rejection
Region /2 =
0.025
Rejection
Region /2 =
0.025
z.025 1.96
z.025 1.96
Example 2:
On a commuter train, the conducter wishes to see whether
the passengers enter the train at random. He observes the
first 30 people with the following sequences of males (M)
and females (F)
FFFFMMFFFFMFMMFFFFFFMMFFFFFMMM
Example
3:
OOOUOOUOUUOOUUOOOOUUOUUOOO
UUUOOOOUUOOUUUOUUOOUUUUUOO
OUOUUOOOUOOOOUUUOUUOOOUOOU
UOUOOUUUOUUOOOOUUUOOO
Mann Whitney U Test

The Mann Whitney U test can be used to
compare two samples from two populations if
the following assumptions are satisfied:
The two samples are independent and
random.
The value measured is a continuous
variable.
The measurement scale used is at least
ordinal.
Procedure
Set up the null hypothesis that the 2 samples have been

drawn from the same population and there is no difference
in their scores
Rank all the scores of the 2 samples together in ascending
or decending order.
Sum up separately the ranks assigned to the scores as
(R1) and (R2)
Calculate U i.e.
Tip: U1 + U2 = N1N2
Take the smaller of U1 & U2 and compare with the table value.
If calculated value is less than equal to the table value reject
H 0.
Example
1: Following are the scores of 6

randomly selected students at the Midterm and Final Exams:
Mid Term: 55, 57, 72, 90, 57, 74
Final: 80, 76, 63, 58, 56, 37
Test the null hypothesis that the distribution
of scores on the two occasions is the same.
Use Mann-Whitney Test.
If N1 or N2 > 20 then calculate
n1n2
(n1 )( n2 )( n1 n2 1)
12
n1n2
U
2
(n1 )( n2 )( n1 n2 1)
12
If calculated Z value is less than the table

value accept Null Hypothesis
Example 2:
A survey is conducted to test the difference between 2
alternative methods of teaching. A sample of 20 students
is selected at random. Two group of 10 students each of
equal ability are formed and taught by different methods.
A standardized test is then given to both the groups and
the following marks (out of 100) are scored by the 10
students in each group:
Group A: 40,45,48,46,52,58,72,85,67,73
Group B: 42,68,45,64,85,78,87,62,84,90
Kolmogorov-Smirnov Test D
Like Chi-Square test, this test is also used to find out
whether an emphirical distribution agrees with an assumed
theoretical one or whether two samples may reasonably be
regarded as coming from the same population.
Procedure:
1. Calculate cumulative frequencies for each class in
respect of both observed and theoretical categories
2. Convert the cumulative frequency of each class into
proportion in respect of both categories
Procedure:
3. Compute the difference between the observed and
tjeoritical proportions ignoring the pls or minus sign
4. Compare the Max Difference (Dn) figure with critical
value in the table at desired level of significance.
If calculated value is less than the table value accept H0,if it
is greater, reject the H0
Example 1:
A manufacturer of readymade garments
conducts a market survey to know the
choice of brands A,B,C and D of 100
prospective customers. The result show:
A=20, B=30, C=18, D=32
Find out if the customers have any distinct
brand preference.
Example 2:
A sample of 26 male patients and another 25 female patients
suffering from respiratory T.B. is randomly selected. The
following table gives the frequency distributions of these
samples according to age:
Age
Male
Female
05
5 15
15 25
25 35
35 45
45 55
55 65
Above 65
N1 = 26
N2 = 25
Use K-S test to test the hypothesis that the age distribution of
males and females is the same.
Sign Test
This test is used in both study of single and paired
samples.
Procedure:
1. Find difference between the sample items and
hypothesized mean/median are computed and
expressed in terms of pls and minus signs. If a zero
difference is found then it is ignored and sample size is
correspondingly reduced.
2. Number of times the less frequent sign (plus or minus)
occurs among the differences, is counted as success
and denoted as r.
3. Next, with N as the number of trials and p=q=1/2 the

combined probability of r is either calculated or seen in
table.
4. In case of a one-tail test, the computed value of
probability is compared with prefixed significance level
(0.01% or 0.05%). If the calculated value is less than the
critical value Null Hypothesis is rejected. For a two-tail test
the calculated probability is doubled and compared.
Example 1: On 15 occasions Mr. X had to wait for

the bus to his office for different period (minutes) 4,
8, 2, 7, 7, 5, 8, 6, 1, 9, 6, 6, 5, 9 and 5. Use the
Sign Test at 5% level of significance to verify the
bus companys claim that Mr. Xs average waiting
period was 5 minutes.
Example 2: The following table gives yields in quintals (with

difference) for two varities of apples A and B, each pair of
trees being planted near together under similar conditions
of soil, moisture etc. The separate pairs are, however,
scattered over various localities.
A
13
16
11
12
11
11
11
10
13
13
10
13
12
14
12
15
15
12
11
19
14
12
10
Wilcoxon Matched Pair Test

Also known as Wilcoxon Signed Rank Test T. Unlike Sign
Test which is based only the number of plus or minus signs
this test utilizes both the signs an magnitude of differences.
Process:
1. Find differences between sample items and
hypothesized mean or median.
2. Assign ranks to all differences (except zero differences)
in increasing order disregarding signs. if 2 or more of
differences have same value, give them average rank.
3. Assign the same signs to the ranks

4. Total the positive and negative ranks separately. The
smaller of the 2 rank sums is out test statistic T
5. Compare the calculated value with the table value. If
calculated T is equal or less than the table value T, reject
Null Hypothesis. For large samples
n(n 1)
T
n(n 1)
n(n 1)( 2n 1) z
4
n(n 1)( 2n 1)
4
24
24
Example 1: Ten Workers were given on-the-job training with

a view to shorten their assembly time for a certain
mechanism. The results of the time(in minutes) and motion
studies before and after training programme are given
below:
Worker
10
Before
61
62
55
62
59
74
62
57
64
62
After
59
63
52
54
59
70
67
65
59
71
Is there evidence that the training programme has

shortened the average assembly time?
Kruskal-Wallis Test H Test

This test can be used(in place of the one-way Analysis of
variance) to test the null hypothesis that K independent
random samples come from identical populations and have
identical means.
Procedure:
1. All the items of identical samples should be pooled
together and ranked from low to high or high to low. If
there are ties, the ususal mid-rank method can be
followed.
2. The ranks of each sample should be separately
summed up.
3.
Calculate H
where
n1, n2, ..nk are the number of items in each of the K
samples.
N= n1 + n2 +nk
R1, R2,.Rk are the sums of the ranks given to
observations in each samples.
4. H is approximately distributed as X2. The calculated
value should be compared with the table value of X2 with
(K-1) degrees of freedom at desired level of significance.
If calculated value is less than the table value, accept H0.
Example
1: School children taking coaching in three private
schools A,B and C secure the following scores out of hundred.

No. of
Children
A School
B School
C School
33
32
55
38
15
68
39
87
27
48
32
38
58
22
46
70
63
52
61
56
76
41
57
45
44
10
49
Test the hypothesis that the students studying in the 3 private

schools have identical distribution of scores at significance
level = 0.01
Unit 5
Multiple Regression
Multiple
Regression equation involving two
independent variables x1 and x2 and a dependent

variable y is represented as:
However, in calculation of regression coefficients we

commonly find dependent variable x1 and independent
variable x2 nd x3
To determine the values a, b1 and b2 we will use three
normal equations:
Example 1: A sample survey of 5 families was taken and

figures were obtained with respect to their annual savings x1
(Rs in 100s), annual income x2 (Rs in 1000s) and family size
x3. The data is summarized in the table below:
Family
Annual
Savings x1
Annual
Income x2
Family Size
x3
10
16
13
10
21
10
13
(a) Find the least-squares regression equation of x1 on x2 and

x3
(b) Estimate the annual savings of a family whose size is 4
and annual income is Rs 16000.
Factor Analysis
Factor analysis is a general name denoting a

class of procedures primarily used for data
reduction and summarization.
Factor analysis is an interdependence technique
in that an entire set of interdependent
relationships is examined without making the
distinction between dependent and independent
variables.
Factor Analysis
Factor analysis is used in the following

circumstances:
To identify underlying dimensions, or factors, that

explain the correlations among a set of variables.
To identify a new, smaller, set of uncorrelated
variables to replace the original set of correlated
variables in subsequent multivariate analysis
(regression or discriminant analysis).
To identify a smaller set of salient variables from a
larger set for use in subsequent multivariate analysis.
Cluster Analysis
Cluster
analysis is a class of techniques

used to classify objects or cases into
relatively homogeneous groups called
clusters. Objects in each cluster tend to
be similar to each other and dissimilar to
objects in the other clusters. Cluster
analysis is also called classification
analysis, or numerical taxonomy.
Cluster Analysis
Both cluster analysis and discriminant analysis

are concerned with classification. However,
discriminant analysis requires prior knowledge of
the cluster or group membership for each object
or case included, to develop the classification
rule. In contrast, in cluster analysis there is no a
priori information about the group or cluster
membership for any of the objects. Groups or
clusters are suggested by the data, not defined
a priori.
Cluster Analysis
Cluster
Analysis is used for a variety of

purposes:
Segmenting the market

Understanding buyer behavior
Identifying new product opportunities
Selecting Test Markets
Reducing Data
Discriminant Analysis
Discriminant
analysis is a technique for

analyzing data when the criterion or
dependent variable is categorical and the
predictor or independent variables are
interval in nature.
The objectives of discriminant analysis are
as follows:
Development
of discriminant functions, or
linear combinations of the predictor or
independent variables, which will best
discriminate between the categories of the
criterion or dependent variable (groups).
Examination of whether significant differences
exist among the groups, in terms of the
predictor variables.
The objectives of discriminant analysis are

as follows:
Determination
of which predictor variables

contribute to most of the intergroup
differences.
Classification of cases to one of the groups
based on the values of the predictor variables.
Evaluation of the accuracy of classification.
Conjoint Analysis
Conjoint analysis attempts to determine the relative

importance consumers attach to salient attributes
and the utilities they attach to the levels of attributes.
The respondents are presented with stimuli that

consist of combinations of attribute levels and asked
to evaluate these stimuli in terms of their desirability.
Conjoint procedures attempt to assign values to the

levels of each attribute, so that the resulting values
or utilities attached to the stimuli match, as closely
as possible, the input evaluations provided by the
Conjoint Analysis
Conjoint
Analysis is used for:
Determining the relative importance of

attributes in the consumer choice process.
Estimating market share of brands that differ
in attributes levels.
Determining the composition of the most
preferred brand
Segmenting the market based on similarity of
preferences for attributes levels.
MultiDimensional Scaling (MDS)

Multidimensional scaling (MDS) is a class of
procedures for representing perceptions and
preferences of respondents spatially by means of a
visual display.
Perceived or psychological relationships among
stimuli are represented as geometric relationships
among points in a multidimensional space.
These geometric representations are often called
spatial maps. The axes of the spatial map are
assumed to denote the psychological bases or
underlying dimensions respondents use to form
perceptions and preferences for stimuli.

MDS
is used to identify:
The number and nature of dimensions,

consumers use to perceive different brands in
the market place.
The positioning of current brands on these
dimensions.
The positioning of consumers ideal brand on
these dimensions.

MDS
is majorly used for:
Image measurement
Market Segmentation
New Product Development
Assessing Advertising Effectiveness
Pricing Analysis

Unit 4 & Unit 5

Uploaded by

Copyright:

Available Formats

Unit 4 & Unit 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4 & Unit 5

Uploaded by

Copyright:

Available Formats

Unit 4

Hypothesis Testing Procedures

Many More Tests Exist!

Parametric Test Procedures

Involve Population Parameters (Mean)

Have Stringent Assumptions

Examples: Z Test, t Test,

Do Not Involve Population Parameters

Data Measured on Any Scale (Ratio or

Example: Run Test

Used With All Scales

Appropriate for Nominal Data

Those who give positive response before treatment and

4. Compare the calculated X2 with the table value of 1

Example 2: In the BBC program The Doha

n1 = Number of occurrences of type one

Mann Whitney U Test

Set up the null hypothesis that the 2 samples have been

1: Following are the scores of 6

If N1 or N2 > 20 then calculate

If calculated Z value is less than the table

3. Next, with N as the number of trials and p=q=1/2 the

Example 1: On 15 occasions Mr. X had to wait for

Example 2: The following table gives yields in quintals (with

Wilcoxon Matched Pair Test

3. Assign the same signs to the ranks

Example 1: Ten Workers were given on-the-job training with

Is there evidence that the training programme has

Kruskal-Wallis Test H Test

schools A,B and C secure the following scores out of hundred.

Test the hypothesis that the students studying in the 3 private

independent variables x1 and x2 and a dependent

However, in calculation of regression coefficients we

Example 1: A sample survey of 5 families was taken and

(a) Find the least-squares regression equation of x1 on x2 and

Factor analysis is a general name denoting a

Factor analysis is used in the following

To identify underlying dimensions, or factors, that

analysis is a class of techniques

Both cluster analysis and discriminant analysis

Analysis is used for a variety of

Segmenting the market

analysis is a technique for

The objectives of discriminant analysis are

of which predictor variables

Conjoint analysis attempts to determine the relative

The respondents are presented with stimuli that

Conjoint procedures attempt to assign values to the

Analysis is used for:

Determining the relative importance of

MultiDimensional Scaling (MDS)

MultiDimensional Scaling (MDS)

The number and nature of dimensions,

MultiDimensional Scaling (MDS)

is majorly used for:

You might also like