Unit 4 & Unit 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Unit 4

Nonparametric Statistics

Hypothesis Testing Procedures

Many More Tests Exist!

Parametric Test Procedures


1.

Involve Population Parameters (Mean)

2.

Have Stringent Assumptions


(Normality)

3.

Examples: Z Test, t Test,

Nonparametric Test
Procedures
1.

Do Not Involve Population Parameters


Example: Probability Distributions, Independence

2.

Data Measured on Any Scale (Ratio or


Interval, Ordinal or Nominal)

3.

Example: Run Test

Advantages of
Nonparametric Tests
1.
2.
3.
4.
5.

Used With All Scales


Easier to Compute
Make Fewer Assumptions
Need Not Involve
Population Parameters
Results May Be as Exact
as Parametric Procedures

Disadvantages of
Nonparametric Tests
1. May Waste Information
Parametric model more efficient
if data Permit

2. Difficult to Compute by
hand for Large Samples
3. Tables Not Widely Available

Nonparametric Tests
One Sample and Two Sample Tests

1.

Appropriate for Nominal Data


1.
McNemar Test
Appropriate for Ordinal Data
2.
Run Test
3.
Mann-Whitney Test U
4.
Kolmogorov-Smirnov Test D
5.
Sign Test
6.
Wilcoxon Matched Pairs Test

K-Sample Tests (K 3)

2.
1.

Kruskal-Wallis Test - H

McNemar Test
Appropriate for Nominal Data
Relates to Measurements taken Before & After of one
and same sample
PROCEDURE
1.Set up Null Hypothesis that there is no change in
peoples attitude as a result of the treatment
2.Classify data into the following 4 categories
I.

Those who give positive response before treatment and


negative response after the treatment. (Denote as A)
II. Those who give negative response before treatment and
positive response after the treatment. (Denote as B)
III. Those who give positive response at both times
IV. Those who give negative response at both times

McNemar Test
3. Calculate the test static which is only a transformation
of X2 :

4. Compare the calculated X2 with the table value of 1


d.o.f at the given level of significance. If calculated value
is less than the critical value accept H0

McNemar Test
Example1:
A researcher attempts to determine if a drug has an effect on a
particular disease. Counts of individuals are given in the table, with
the diagnosis (disease: present or absent) before treatment given
in the rows, and the diagnosis after treatment in the columns. The
test requires the same subjects to be included in the before-and
after measurements. Test whether there is a significant effect of the
treatment.
After: present

After: absent

Row total

Before: present

101

121

222

Before: absent

59

33

92

Column total

160

154

314

Critical Values of
Chi-square Distribution

Example 2: In the BBC program The Doha


Debates 1000 people were surveyed regarding
their opinion about capital punishment. 705 were
in favor and 295 against. They then listened to a
debate about the subject and the survey was
repeated. After they listened to the debate 663
voted in favor and 337 against. 73 changed their
mind from against to in favor and 115 changed
their mind from in favor to against. Did the
debate affect peoples opinion?

Run Test
Run Test is used to know whether the observations in a
given series can be regarded as random.
For example: Queue at a bus-stop
MFMFMFMFMFMFMFMFMFMF
Another sequence:
MMMMMMMFFFFFFF
Data is converted into runs
M/F/M/F/M/F/M/F/M/F/M/F/M/F/M/F/M/F/M
MMMMMMM / FFFFFFF

Run Test
Example 1:
Marks of 15 students are
55,52,43,49,36,61,44,47,67,78,63,57,41,28 and 50
Taking deviations from 50
aa bbb a bb aaaa bb a
PROCEDURE (Small Sample <20 )
n1 = Number of occurrences of type one (say 8 a)
n2 = Number of occurrences of type two (say 7 b)
r = Total number of runs (7)
If Observed number of runs (r) lies between the two critical values
then accept null hypothesis otherwise reject it.

Run Test
PROCEDURE (Large Sample >20 )
If either n1 or n2 is greater than 20 then the sample is said to be large
and approximated to normal distribution. The number of runs (r) is a
statistic with sampling distribution (Z) and the mean (r) and standard
error (sigma r) of the r statistic is

( 2n1n2 )( 2n1n2 n1 n2 )
r
(n1 n2 ) 2 ( n1 n2 1)

r r
z
r

n1 = Number of occurrences of type one


n2 = Number of occurrences of type two
r = Total number of runs (7)
The critical values for two-sided test at 5% level of significance are
rc (lower) = - 1.96
rc (upper) = + 1.96
If Observed number of runs (r) lies between the two critical values then
accept null hypothesis otherwise reject it.

Rejection
Region /2 =
0.025

Rejection
Region /2 =
0.025

z.025 1.96

z.025 1.96

Example 2:
On a commuter train, the conducter wishes to see whether
the passengers enter the train at random. He observes the
first 30 people with the following sequences of males (M)
and females (F)
FFFFMMFFFFMFMMFFFFFFMMFFFFFMMM

Example

3:

OOOUOOUOUUOOUUOOOOUUOUUOOO
UUUOOOOUUOOUUUOUUOOUUUUUOO
OUOUUOOOUOOOOUUUOUUOOOUOOU
UOUOOUUUOUUOOOOUUUOOO

Mann Whitney U Test


The Mann Whitney U test can be used to
compare two samples from two populations if
the following assumptions are satisfied:
The two samples are independent and
random.
The value measured is a continuous
variable.
The measurement scale used is at least
ordinal.

Procedure

Set up the null hypothesis that the 2 samples have been


drawn from the same population and there is no difference
in their scores
Rank all the scores of the 2 samples together in ascending
or decending order.
Sum up separately the ranks assigned to the scores as
(R1) and (R2)
Calculate U i.e.

Tip: U1 + U2 = N1N2
Take the smaller of U1 & U2 and compare with the table value.
If calculated value is less than equal to the table value reject
H 0.

Example

1: Following are the scores of 6


randomly selected students at the Midterm and Final Exams:
Mid Term: 55, 57, 72, 90, 57, 74
Final: 80, 76, 63, 58, 56, 37
Test the null hypothesis that the distribution
of scores on the two occasions is the same.
Use Mann-Whitney Test.

If N1 or N2 > 20 then calculate

n1n2

(n1 )( n2 )( n1 n2 1)

12
n1n2
U
2
(n1 )( n2 )( n1 n2 1)
12

If calculated Z value is less than the table


value accept Null Hypothesis

Example 2:
A survey is conducted to test the difference between 2
alternative methods of teaching. A sample of 20 students
is selected at random. Two group of 10 students each of
equal ability are formed and taught by different methods.
A standardized test is then given to both the groups and
the following marks (out of 100) are scored by the 10
students in each group:
Group A: 40,45,48,46,52,58,72,85,67,73
Group B: 42,68,45,64,85,78,87,62,84,90

Kolmogorov-Smirnov Test D
Like Chi-Square test, this test is also used to find out
whether an emphirical distribution agrees with an assumed
theoretical one or whether two samples may reasonably be
regarded as coming from the same population.
Procedure:
1. Calculate cumulative frequencies for each class in
respect of both observed and theoretical categories
2. Convert the cumulative frequency of each class into
proportion in respect of both categories

Procedure:
3. Compute the difference between the observed and
tjeoritical proportions ignoring the pls or minus sign
4. Compare the Max Difference (Dn) figure with critical
value in the table at desired level of significance.
If calculated value is less than the table value accept H0,if it
is greater, reject the H0

Example 1:
A manufacturer of readymade garments
conducts a market survey to know the
choice of brands A,B,C and D of 100
prospective customers. The result show:
A=20, B=30, C=18, D=32
Find out if the customers have any distinct
brand preference.

Example 2:
A sample of 26 male patients and another 25 female patients
suffering from respiratory T.B. is randomly selected. The
following table gives the frequency distributions of these
samples according to age:
Age

Male

Female

05

5 15

15 25

25 35

35 45

45 55

55 65

Above 65

N1 = 26

N2 = 25

Use K-S test to test the hypothesis that the age distribution of
males and females is the same.

Sign Test
This test is used in both study of single and paired
samples.
Procedure:
1. Find difference between the sample items and
hypothesized mean/median are computed and
expressed in terms of pls and minus signs. If a zero
difference is found then it is ignored and sample size is
correspondingly reduced.
2. Number of times the less frequent sign (plus or minus)
occurs among the differences, is counted as success
and denoted as r.

3. Next, with N as the number of trials and p=q=1/2 the


combined probability of r is either calculated or seen in
table.
4. In case of a one-tail test, the computed value of
probability is compared with prefixed significance level
(0.01% or 0.05%). If the calculated value is less than the
critical value Null Hypothesis is rejected. For a two-tail test
the calculated probability is doubled and compared.

Example 1: On 15 occasions Mr. X had to wait for


the bus to his office for different period (minutes) 4,
8, 2, 7, 7, 5, 8, 6, 1, 9, 6, 6, 5, 9 and 5. Use the
Sign Test at 5% level of significance to verify the
bus companys claim that Mr. Xs average waiting
period was 5 minutes.

Example 2: The following table gives yields in quintals (with


difference) for two varities of apples A and B, each pair of
trees being planted near together under similar conditions
of soil, moisture etc. The separate pairs are, however,
scattered over various localities.
A

13

16

11

12

11

11

11

10

13

13

10

13

12

14

12

15

15

12

11

19

14

12

10

Wilcoxon Matched Pair Test


Also known as Wilcoxon Signed Rank Test T. Unlike Sign
Test which is based only the number of plus or minus signs
this test utilizes both the signs an magnitude of differences.
Process:
1. Find differences between sample items and
hypothesized mean or median.
2. Assign ranks to all differences (except zero differences)
in increasing order disregarding signs. if 2 or more of
differences have same value, give them average rank.

3. Assign the same signs to the ranks


4. Total the positive and negative ranks separately. The
smaller of the 2 rank sums is out test statistic T
5. Compare the calculated value with the table value. If
calculated T is equal or less than the table value T, reject
Null Hypothesis. For large samples
n(n 1)
T

n(n 1)
n(n 1)( 2n 1) z
4

n(n 1)( 2n 1)
4
24
24

Example 1: Ten Workers were given on-the-job training with


a view to shorten their assembly time for a certain
mechanism. The results of the time(in minutes) and motion
studies before and after training programme are given
below:
Worker

10

Before

61

62

55

62

59

74

62

57

64

62

After

59

63

52

54

59

70

67

65

59

71

Is there evidence that the training programme has


shortened the average assembly time?

Kruskal-Wallis Test H Test


This test can be used(in place of the one-way Analysis of
variance) to test the null hypothesis that K independent
random samples come from identical populations and have
identical means.
Procedure:
1. All the items of identical samples should be pooled
together and ranked from low to high or high to low. If
there are ties, the ususal mid-rank method can be
followed.
2. The ranks of each sample should be separately
summed up.

3.
Calculate H

where
n1, n2, ..nk are the number of items in each of the K
samples.
N= n1 + n2 +nk
R1, R2,.Rk are the sums of the ranks given to
observations in each samples.
4. H is approximately distributed as X2. The calculated
value should be compared with the table value of X2 with
(K-1) degrees of freedom at desired level of significance.
If calculated value is less than the table value, accept H0.

Example
1: School children taking coaching in three private

schools A,B and C secure the following scores out of hundred.


No. of
Children

A School

B School

C School

33

32

55

38

15

68

39

87

27

48

32

38

58

22

46

70

63

52

61

56

76

41

57

45

44

10

49

Test the hypothesis that the students studying in the 3 private


schools have identical distribution of scores at significance
level = 0.01

Unit 5

Multiple Regression
Multiple
Regression equation involving two

independent variables x1 and x2 and a dependent


variable y is represented as:

However, in calculation of regression coefficients we


commonly find dependent variable x1 and independent
variable x2 nd x3
To determine the values a, b1 and b2 we will use three
normal equations:

Example 1: A sample survey of 5 families was taken and


figures were obtained with respect to their annual savings x1
(Rs in 100s), annual income x2 (Rs in 1000s) and family size
x3. The data is summarized in the table below:
Family

Annual
Savings x1

Annual
Income x2

Family Size
x3

10

16

13

10

21

10

13

(a) Find the least-squares regression equation of x1 on x2 and


x3
(b) Estimate the annual savings of a family whose size is 4
and annual income is Rs 16000.

Factor Analysis

Factor analysis is a general name denoting a


class of procedures primarily used for data
reduction and summarization.
Factor analysis is an interdependence technique
in that an entire set of interdependent
relationships is examined without making the
distinction between dependent and independent
variables.

Factor Analysis

Factor analysis is used in the following


circumstances:

To identify underlying dimensions, or factors, that


explain the correlations among a set of variables.
To identify a new, smaller, set of uncorrelated
variables to replace the original set of correlated
variables in subsequent multivariate analysis
(regression or discriminant analysis).
To identify a smaller set of salient variables from a
larger set for use in subsequent multivariate analysis.

Cluster Analysis
Cluster

analysis is a class of techniques


used to classify objects or cases into
relatively homogeneous groups called
clusters. Objects in each cluster tend to
be similar to each other and dissimilar to
objects in the other clusters. Cluster
analysis is also called classification
analysis, or numerical taxonomy.

Cluster Analysis

Both cluster analysis and discriminant analysis


are concerned with classification. However,
discriminant analysis requires prior knowledge of
the cluster or group membership for each object
or case included, to develop the classification
rule. In contrast, in cluster analysis there is no a
priori information about the group or cluster
membership for any of the objects. Groups or
clusters are suggested by the data, not defined
a priori.

Cluster Analysis
Cluster

Analysis is used for a variety of


purposes:

Segmenting the market


Understanding buyer behavior
Identifying new product opportunities
Selecting Test Markets
Reducing Data

Discriminant Analysis
Discriminant

analysis is a technique for


analyzing data when the criterion or
dependent variable is categorical and the
predictor or independent variables are
interval in nature.

Discriminant Analysis
The objectives of discriminant analysis are
as follows:
Development

of discriminant functions, or
linear combinations of the predictor or
independent variables, which will best
discriminate between the categories of the
criterion or dependent variable (groups).
Examination of whether significant differences
exist among the groups, in terms of the
predictor variables.

Discriminant Analysis

The objectives of discriminant analysis are


as follows:
Determination

of which predictor variables


contribute to most of the intergroup
differences.
Classification of cases to one of the groups
based on the values of the predictor variables.
Evaluation of the accuracy of classification.

Conjoint Analysis

Conjoint analysis attempts to determine the relative


importance consumers attach to salient attributes
and the utilities they attach to the levels of attributes.

The respondents are presented with stimuli that


consist of combinations of attribute levels and asked
to evaluate these stimuli in terms of their desirability.

Conjoint procedures attempt to assign values to the


levels of each attribute, so that the resulting values
or utilities attached to the stimuli match, as closely
as possible, the input evaluations provided by the

Conjoint Analysis
Conjoint

Analysis is used for:

Determining the relative importance of


attributes in the consumer choice process.
Estimating market share of brands that differ
in attributes levels.
Determining the composition of the most
preferred brand
Segmenting the market based on similarity of
preferences for attributes levels.

MultiDimensional Scaling (MDS)


Multidimensional scaling (MDS) is a class of
procedures for representing perceptions and
preferences of respondents spatially by means of a
visual display.
Perceived or psychological relationships among
stimuli are represented as geometric relationships
among points in a multidimensional space.
These geometric representations are often called
spatial maps. The axes of the spatial map are
assumed to denote the psychological bases or
underlying dimensions respondents use to form
perceptions and preferences for stimuli.

MultiDimensional Scaling (MDS)


MDS

is used to identify:

The number and nature of dimensions,


consumers use to perceive different brands in
the market place.
The positioning of current brands on these
dimensions.
The positioning of consumers ideal brand on
these dimensions.

MultiDimensional Scaling (MDS)


MDS

is majorly used for:

Image measurement
Market Segmentation
New Product Development
Assessing Advertising Effectiveness
Pricing Analysis

You might also like