Non parametric test are very useful methods when your data are not following the assumption as in parametric tests. unlike parametric test non-parametric test are assumption free tests.
Non parametric test are very useful methods when your data are not following the assumption as in parametric tests. unlike parametric test non-parametric test are assumption free tests.
Non parametric test are very useful methods when your data are not following the assumption as in parametric tests. unlike parametric test non-parametric test are assumption free tests.
Non parametric test are very useful methods when your data are not following the assumption as in parametric tests. unlike parametric test non-parametric test are assumption free tests.
Hypothesis Testing for Statistical Inference Inferences about a population are made on the basis of results obtained from a sample drawn from that population
Want to talk about the larger population from which the subjects are drawn, not the particular subjects! What Do We Test Effect or Difference we are interested in Difference in Means Difference in Proportions Odds Ratio (OR) Correlation Coefficient Some examples Effect of Advertisement or Sales Promotion program Acceptance of product across gender Acceptance of product across region Elements of a hypothesis test Null hypothesis - Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality) Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality)
H 0 : 1 = 2
H A : 1
2
Two-sided test H A : 1 > 2
One-sided test
Example Hypotheses Elements of a hypothesis test Test statistic - Quantity based on sample data and null hypothesis used to test between null and alternative hypotheses Rejection region - Values of the test statistic for which we reject the null in favour of the alternative hypothesis
Why Use Nonparametric Tests? Parametric hypothesis tests can be used for the estimation of one or more unknown parameters (e.g., population mean or variance).
Parametric tests depends on population parameters for inferences (which are often unrealistic) Probability distribution (Normal distribution) Requires normal or ration data Requires homogeneity of variance Large sample size McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Parametric Tests Questionnaire 1. Name: 2. Age:_________ Years 3. Gender: Male ( ) Female ( ) 4. Income: ______________________ 5. Educational Qualification: ( ) B. Com./BBA/ BCA ( ) B. Sc. ( ) B.A. ( ) B.E. ( ) M.B.A. ( ) M.H.R.D. ( ) M.A. ( ) M. Sc. 6. With how many persons you usually watch movie:________ 7. Where do you like to watch movie? ( ) At home ( ) At Mulitplexes ( ) Theatres How many movies you usually watch in a months time: ________ 9. Rate Movie Type based on your likings: 1 like most 5 Dislike most ( ) Comedy 1 2 3 4 5 ( ) Thriller 1 2 3 4 5 ( ) Love Story 1 2 3 4 5 ( ) Theme based 1 2 3 4 5 ( ) English fiction 1 2 3 4 5
Why Use Nonparametric Tests? Nonparametric tests do not rely on data belonging to any distribution usually they focus on the sign or rank of the data rather than the exact numerical value. do not specify the shape of the parent population. can often be used in smaller samples. can be used for ordinal and Nominal data. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Tests Why Use Nonparametric Tests? McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Advantages and Disadvantages of Nonparametric Tests Why Use Nonparametric Tests? McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Some Common Nonparametric Tests Nonparametric Methods There is at least one nonparametric test equivalent to a parametric test These tests fall into several categories 1. Tests of differences between two groups (independent samples) 2. Tests of differences between more than two groups (independent samples) 3. Tests of differences between variables (dependent samples) 4. Tests of relationships between variables Differences between independent groups Two samples compare mean value for some variable of interest
Parametric Nonparametric t-test for independent samples Mann-Whitney U test Wald-Wolfowitz runs test Kolmogorov- Smirnov two sample test Mann-Whitney Test The Mann-Whitney test is a nonparametric test that compares two populations. It does not assume normality. It is a test for the equality of medians, assuming - the populations differ only in centrality, - equal variances The hypotheses are H 0 : M 1 = M 2 (no difference in medians) H 1 : M 1 M 2 (medians differ) McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Mann-Whitney Test Step 1: Sort the combined samples from lowest to highest. Step 2: Assign a rank to each value. If values are tied, the average of the ranks is assigned to each. Step 3: The ranks are summed for each column (e.g., T 1 , T 2 ). Step 4: The sum of the ranks T 1 + T 2 must be equal to n(n + 1)/2, where n = n 1 + n 2 . McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Mann-Whitney Test First, combine the samples and assign a rank to each observation in each group. For example: When a tie occurs, each observation is assigned the average of the ranks. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Rank Heigh t(cm) Gende r Ran k Heigh t (cm) Gender 1 193 M 9 170 M 2 188 M 10 168 F 3 185 M 11 165 F 4 183 M 12 163 F 5 180 M 6 178 M 7 175 F 8 173 F Mann-Whitney Test Next, arrange the data by groups and sum the ranks to obtain the T j s. Remember, ST j = n(n+1)/2. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Heights of males (cm) Heights of females (cm) Ranks of male heights Ranks of female heights 193 175 1 7 188 173 2 8 185 168 3 10 183 165 4 11 180 163 5 12 178 6 170 9 n 1 = 7 n 2 = 5 T 1 = 30 T 2 = 48 Mann-Whitney Test Step 5: Calculate the mean rank sums T 1 and T 2 . McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Step 6: For large samples (n 1 < 10, n 2 > 10), use a z test. Step 7: For a given a, reject H 0 if z < -z a or z > +z a The Kruskal-Wallis (K-W) test compares c independent medians, assuming the populations differ only in centrality. The K-W test is a generalization of the Mann- Whitney test and is analogous to a one-factor ANOVA (completely randomized model). Groups can be of different sizes if each group has 5 or more observations. Populations must be of similar shape but normality is not a requirement.
Kruskal-Wallis Test for Independent Samples McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Kruskal-Wallis Test for Independent Samples First, combine the samples and assign a rank to each observation in each group. For example: When a tie occurs, each observation is assigned the average of the ranks. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Kruskal-Wallis Test for Independent Samples Next, arrange the data by groups and sum the ranks to obtain the T j s. Remember, ST j = n(n+1)/2.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Kruskal-Wallis Test for Independent Samples The hypotheses to be tested are: H 0 : All c population medians are the same H 1 : Not all the population medians are the same For a completely randomized design with c groups, the tests statistic is
where n = n 1 + n 2 + + n c n j = number of observations in group j T j = sum of ranks for group j
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Kruskal-Wallis Test for Independent Samples The H test statistic follows a chi-square distribution with n = c 1 degrees of freedom. This is a right-tailed test, so reject H 0 if H > c 2 a or if p-value < a.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Differences between dependent groups Compare two variables measured in the same sample
If more than two variables are measured in same sample
Parametric Nonparametric t-test for dependent samples
Sign test Wilcoxons matched pairs test Repeated measures ANOVA Friedmans two way analysis of variance Cochran Q Wilcoxon Signed-Rank Test The Wilcoxon signed-rank test compares a single sample median with a benchmark using only ranks of the data instead of the original observations. It is used to compare paired observations. Advantages are - freedom from the normality assumption, - robustness to outliers - applicability to ordinal data. The population should be roughly symmetric. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Wilcoxon Signed-Rank Test To compare the sample median (M) with a benchmark median (M 0 ), the hypotheses are: McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. When evaluating the difference between paired observations, use the median difference (M d ) and zero as the benchmark. Wilcoxon Signed-Rank Test Calculate the difference between the paired observations. Rank the differences from smallest to largest by absolute value. Add the ranks of the positive differences to obtain the rank sum W. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Wilcoxon Signed-Rank Test For small samples, a special table is required to obtain critical values. For large samples (n > 20), the test statistic is approximately normal.
Use Excel or Appendix C to get a p-value. Reject H 0 if p-value < a. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. The Friedman test determines if c treatments have the same central tendency (medians) when there is a second factor with r levels and the populations are assumed to be the same except for centrality. This test is analogous to a two-factor ANOVA without replication (randomized block design) with one observation per cell. The groups must be of the same size. Treatments should be randomly assigned within blocks. Data should be at least interval scale. Friedman Test for Related Samples McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Friedman Test for Related Samples In addition to the c treatment levels that define the columns, the Friedman test also specifies r block factor levels to define each row of the observation matrix. The hypotheses to be tested are: H 0 : All c populations have the same median H 1 : Not all the populations have the same median Unlike the Kruskal-Wallis test, the Friedman ranks are computed within each block rather than within a pooled sample. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Friedman Test for Related Samples First, assign a rank to each observation within each row. For example, within each Trial:
When a tie occurs, each observation is assigned the average of the ranks. McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Friedman Test for Related Samples Compute the test statistic:
where r = the number of blocks (rows) c = the number of treatments (columns) T j = the sum of ranks for treatment j
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test Friedman Test for Related Samples McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved. Performing the Test The Friedman test statistic F, follows a chi- square distribution with n = c 1 degrees of freedom. Reject H 0 if F > c 2 a or if p-value < a.
Your Doubts or Queries. Chi-Square Test for Independence Contingency Tables A contingency table is a cross-tabulation of n paired observations into categories.
Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading. Steps in Testing the Hypotheses using Chi-Square Test Step 1: State the Hypotheses H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B
Steps in Testing the Hypotheses using Chi-Square Test Step 2: State the Decision Rule For a given a, look up the right-tail critical value (c 2 R ) from c 2
table. For n = (r 1)(c 1)
Reject H 0 if c 2 cal > test statistic. or incase of p- value Step 3: Construct Contingency Table A contingency table is a cross-tabulation of n paired observations into categories. In which each cell shows the count of observations that fall in the cell defined by its row (r) and column (c) heading. Steps in Testing the Hypotheses using Chi-Square Test Steps in Testing the Hypotheses using Chi-Square Test Step 4: Calculate the Expected Frequencies e jk = R j C k /n
Step 5: Calculate the Test Statistic The chi-square test statistic is
Step 6: Make the Decision Reject H 0 if c 2 R > test statistic or if the p-value < a. Steps in Testing the Hypotheses using Chi-Square Test Steps in Testing the Hypotheses using Chi-Square Test Caution The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochrans Rule requires that e jk > 5 for all cells. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.