Hypothesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Sampling Theory

Population or Universe Population or Universe refers to the aggregate of statistical information on particular character of all the members covered by an investigation / enquiry. For example, marks obtained by students in PG examination of Kerala University constitute population. Population Size Population size refer to the total number of members of the population. Usually, the population size is denoted by N. For example, population size of marks obtained by student in PG examination of Kerala University is 10,000. Finite Population The population is said to be finite when the number of members of the population can be expressed as a definite quantity. Infinite Population The population is said to be infinite when the number of members of the population cannot be expressed as a definite quantity. Existent / Real Population The population is said to be existent when all the members of the population really exist. For example, population of taxable incomes of all the persons in India is an example of existent population because all the population really exists. Hypothetical Population The population is said to be hypothetical when all the members of the population do not really exist. It is built up by repeating the event any number of items. Population of points obtained in all possible throws of a die. Sampling Statistical method of obtaining representative data or observations from a group (lot, batch, population, or universe). Main Objective of Sampling To obtain the maximum information about the population with the minimum effort. To draw inferences on the behavior of the population. Methods of Sampling Deliberate, Purposive or Judgment Sampling. Block or Cluster Sampling. Area Sampling. Quota sampling. Random or Probability Sampling.(E.g.: Lottery System, Random Tables) Systematic Sampling . Stratified Sampling. Multi Stage Sampling. Sequential Sampling.

Sample Sample refers to that part of aggregate statistical information (i.e. Population) which is actually selected in the course of an investigation/enquiry to ascertain the characteristic of the population. Sample Size Sample size refers to the number of members of the population included in the sample. Usually the sample size is denoted by n. Parameter Parameter is a statistical measure based on each and every item of the Universe/Population. For example, Population Mean, Population S.D, Proportion of Defectives in the whole lot of Population. Parameter shows the characteristics of the Universe/Population. Statistic Statistic is a statistical measure based on items/observations of a sample. For example, Sample Mean, Sample S.D, Proportion of Defectives observed in the sample. Since the value of a statistics varies from sample to sample, it has sampling fluctuation, sampling distribution and standard error. Sampling Distribution Sample statistic is a random variable. As every random variable has a probability distribution, sample statistic also has a probability distribution. The probability distribution of a sample statistic is called the sampling distribution of that statistic. For example, Sample mean is a statistic and the distribution of sample mean is a sampling distribution. Commonly used sampling distributions Normal distribution Chi-square distribution t distribution F distribution Uses of the sampling distribution of z (Normal distribution) To test the given population mean To test the significance of difference between two population means. To test given population proportion. To test the difference between two population proportions. To test the given population SD. To test the difference between two population SD. Uses of the sampling distribution of Chi-square distribution To test the given population variance when sample is small. To test the goodness of fit between observed and expected frequencies. To test the independence of two attributes. To test the homogeneity of data. Uses of the sampling distribution of t - distribution To test the given population mean when sample is small. To test whether the two samples have same mean when the samples are small. To test whether there is difference in the observations of the two dependent samples. To test the significance of population correlation coefficient

Uses of the sampling distribution of F - distribution Test the equality of variances of two populations when samples are small. Test the equality of means of three or more populations. Central Limit Theorem Let x1, x2xn be n independent variable. Let all have same distribution, same mean say and same Standard Deviation say . Then the mean of all these variables is *x1+x2+ +..+xn+ / n follows a normal distribution with mean and S.D is / n when n is large. Central limit theorem is considered to be one of the most remarkable theorems in the entire theory of statistics. The theorem is called central because of its central position in probability distribution and statistical inference. So the conditions for Central Limit theorem are Variables must be independent All variables should have common mean and a common SD All variables should have same distribution n is very large Standard Error The Standard deviation of a sampling distribution of a statistic is called Standard Error (SE) of that statistic. For example, Standard Error of Mean is the standard deviation of sampling distribution of mean. Likewise, Standard Error of Proportion is the standard deviation of sampling distribution of proportion obtained from all possible samples of same size drawn from same population. In other words, SE of a given statistic is the standard deviation of all possible values of that statistic in repeated sample of a fixed size from a given population. Uses of Standard Error SE plays a very important role in the large sample theory and forms the basis of the testing of hypothesis. It is use for testing a given hypothesis. SE gives an idea about the reliability of a sample. Difference between Standard Deviation and Standard Error SD is a measure of dispersion of statistical data. SE is a measure of dispersion of a sampling distribution. SD can be used to find out SE. SE can be used for estimation and testing hypothesis. Statistical Inference The primary objective of sample study is to draw inferences or conclusions about the population by examining only a part of the population. Such inference drawn is called Statistical Inferences. Statistical inference is therefore a process by which we draw conclusions about the population based on sample drawn from that population. The main branches of statistical inference are: Test of Hypothesis Estimation

Hypothesis A hypothesis generally means a mere assumption or supposition to be proved or disproved. But for a researcher, hypothesis is a normal question that he intends to resolve. Hypothesis is a tentative proposition formulated for empirical testing. It is a tentative answer to a research question. It is tentative because its validity remains to be tested. A researcher conducting research has to start somewhere. This point of start is called a Hypothesis. Hypothesis is an assumption about a population parameter. In any test of hypothesis we begin with some assumptions about the population from which the sample is drawn. This assumption may be about the form of the population or about the parameters of the population. Such assumption should be logically drawn. This assumption is called Hypothesis. Test of Hypothesis or Hypothesis Testing or Test of Significance Statistical test of Hypothesis is a process or procedure under which a statistical hypothesis is laid down and it is accepted or rejected on the basis of a random sample drawn from the population. The test conducted to accept or to reject the hypothesis are known as statistical test of hypothesis. Commonly used statistical test are; Z-test t-test 2 test F-test PROCEDURE FOR TESTING HYPOTHESIS 1. Set up a Null hypothesis (H0) and an alternative hypothesis (H1) appropriate to the test to be conducted. 2. Decide the appropriate test criterion ( such as Z-test, t-test, 2 test, F-test etc.) 3. Specify level of significance desired. Usually Level of significance specified is 5% or 1%. Determine suitable level of significance. In the absence of any specific instruction, it should be normally 5%. 4. Calculate the value of the test statistic using appropriate formula. 5. Then obtain the table value of the test statistic corresponding to the level of significance and the degree of freedom. 6. Make decision on whether accepting or rejecting the null hypothesis. When the computed value of the test statistic is numerically less than the table value of the test statistic if falls in the acceptance region. Otherwise in the rejection region. Large and Small Sample The number of sampling units selected from the population is called the size of the sample. If the size of the sample is small it may not represent the universe. If the sample size is very large, it may require more time and money for investigation. Hence sample size should not be too small or too large. It should be optimum. Optimum size ensures efficiency, representativeness, reliability and flexibility. When the sample size is more than 30, the sample is known as large sample, otherwise small sample.

Basic Concepts Concerning Testing of Hypothesis 1. Null Hypothesis and Alternative Hypothesis: A Null Hypothesis can be defined as a statistical hypothesis which is stated for the purpose of possible acceptance. A Null Hypothesis is original hypothesis. Any Hypothesis other than null hypothesis is called Alternative Hypothesis. So when the Null Hypothesis is rejected we accept the other hypothesis known as alternative hypothesis. The Null Hypothesis is denoted by H0 and Alternative Hypothesis by H1. For E.g. When we want to test whether the population mean is 65 the Null Hypothesis is Population Mean is 65 H0: = 65 Alternative Hypothesis is Population Mean is Not 65 H1: 65 H1: > 65 H1: < 65 Level of Significance Level of significance is the maximum probability of rejecting the null hypothesis when it is true. Confidence with which a null hypothesis is accepted or rejected depends on what is called Significance Level. The probability with which we may reject a null hypothesis, when it is true is called the level of significance. So when the level of significance is 5%, it means that in the long run the statistician is rejecting true null hypothesis 5 times out of every 100 times. Level of significance is denoted by . Test Statistic The decision to accept or to reject the null hypothesis is made on the basis of a statistic computed from the sample. Such a statistic is called test statistic. It follows a sampling distribution. The commonly used test statistic is Z, t, F, 2 etc. Critical Region In a test procedure we calculate a test statistic on which we base our decision. The range of variation of this statistic is divided into two regions, accepting region and rejection region. It is the critical value which separates the acceptance region from the rejection region. If the computed value of the test statistic falls in the rejection region we reject the Null Hypothesis. The rejection region is known as Critical Region. The critical region is the region corresponding to a predetermined level of significance (say ) and the acceptance region corresponds to the region, 1 Critical Value ( Table Value ) The value of the test static which separates the critical region from the acceptance region is called the critical value. It depends on the level of significance and alternative hypothesis. When Z test is applied the critical values for different levels of significance are Level of Significance Two Tailed One Tailed 5% 1.96 1.645

Type I and Type II errors In any test of hypothesis the decision is to accept or to reject a null hypothesis. The decision is based on the information supplied by the sample data. The four possibilities of the decision are: Accepting a null hypothesis when it is true. Rejecting a null hypothesis when it is false. Rejecting a null hypothesis when it is true. Accepting a null hypothesis when it is false. Decision True Situation H0 (true) H0 (false) Accept H0 Correct Decision Type II error Reject H0 Type I error Correct Decision

Two Tailed and One Tailed A two tailed test is one in which we reject the null hypothesis if the computed value of the test statistic is significantly greater than or lower than the critical value of the test statistic . Thus in two tailed tests the critical region is represented by both tails. If we are testing hypothesis at 5% level of significance, the size of the acceptance region is 0.95 and the size of the rejection region is 0.05 on both sides together. So if the computed value of the test static falls either in the left or in the right tail hypothesis is rejected. Suppose we are interested interesting the Null Hypothesis that the average height of people is 156 cm. Then the rejection would be on both sides since the null hypothesis is rejected if the average height in the sample is much more than 156 or much less than 156. In one tailed test, the rejection region will be located in only one tail which may be either left or right, depending on the alternative hypothesis. Thus in one tailed test critical region is represented by only one tail. Suppose if the level of significance is 0.05, then in the case of one tailed test the size of the rejection region is 0.05 either falling in the left side only or in the right side only. Power of a Test Probability for rejecting a null hypothesis when the alternative hypothesis is true is called Power of a test. Power of a test = 1 Type II Error Degree of Freedom ( d f ) Degree of freedom is defined as the number of independent observation which is obtained by subtracting the number of constraints from the total number of observations. Parametric and Non-Parametric Test In certain test procedure, assumption about the population distribution or parameter are made. For example, in t - test we assume test that the samples are drawn from the population following normal distribution. When such assumption are made, the test is known as parametric test. There are situation, when it is not possible to make any assumption about the distribution of the population from which samples are drawn. In such situation we follow test procedures which are known as non-parametric tests, chi-square test is an example of a non-parametric test.

Chi Square Test The statistical test in which the test statistic follows a 2 distribution, is called the 2 test. Therefore 2 test is a statistical test, which tests the significance of difference between observed and the corresponding theoretical frequencies of a distribution, without any assumption about the distribution of the population. 2 test is one of the simplest and most widely used non-parametric tests in statistical work. This test was developed by Prof: Karl Pearson in 1990. Characteristics of X2 test It is a non-parametric test. It is a distribution free test, which can be used in any type of distribution of population. It is easy to calculate. It analyses the difference between a set of observed frequencies and a set of corresponding expected frequencies. The following conditions should be satisfied before 2 test can be applied Observations recorded and used are collected on a random basis All the times in the sample must be independent. No group should contain very few items, say less than 10. In case where frequencies are less than 10,regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10, some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians. The overall number of items must also be reasonably large. It should normally be at least 50, Uses (Application) Chi Square Test Useful for the test of goodness of fit: 2 test can be used to ascertain how well theoretical distributions fit the data. Here we can test whether there is goodness of fit between the observed frequencies and expected frequencies. Useful for the test of independence of attribute: With the help of 2 test we can find out whether two attributes are associated or not. Useful for testing homogeneity: Test of independence are concerned with the problem of whether one attribute is independent of another, while test of homogeneity are concerned with whether different samples come from the same population. Useful for testing given population variance: 2 test can be used for testing whether the given population variance is acceptable on the basis of sample drawn from that population. Contingency Table A contingency Table is a frequency table in which a sample from the population is classified according to two attributes, which are divided into two or more classes. When there are only two divisions for each attribute the contingency table is known as 2x2 contingency table. For Example consider the two attributes Smoking and Dirking A 2x2 contingency table for these two attributes can be shown as follows. Smokers Non Smokers Drinkers 40 30 Non Drinkers 45 24

The frequencies appearing in the table are known as cell frequencies. The independence of these two attributes can be tested by X2 test. Yates Correction For Calculating 2 quantity, we estimate theoretical frequencies. If any theoretical frequency is less than 5, then we use pooling method. While applying pooling method, we add the frequency which is less than 5, with the adjacent frequency. Take the cases of 2x2 contingency table. Here degree of freedom is (2-1)(2-1)=1 In this case suppose we applying pooling method degree of freedom will be (1-1)(2-1)=0. This is meaningless. So in such cases, we apply a correction known as Yates Correction for Continuity. 2 = [ | ad bc | - N/2 ] 2 N __________________ (a+b)(c+d)(a+c)(b+d)

Additive Property of 2 test This means that several values of 2 can be added together and if the degree of freedom are also added, this number gives the degree freedom of the total value of 2 . Thus, if a number of 2 values have been obtained from a number of samples of similar data, then because of the additive nature of 2 we can combine the various values 2 by just simply adding them. Such addition of various values of 2 gives one value of 2 which helps in forming a better idea about the significance of the problem. F-test (Variance Ratio Test) Uses of the sampling distribution of F - distribution Test the equality of variances of two populations when samples are small. Test the equality of means of three or more populations. The test based on the test statistic which follows F-distribution is called F-test. Very often we like to test whether there is significance differences between variances of two populations based on the small samples drawn from those population. In those situations we draw samples from the two population and find variances. Let n1 and n2 be the size of the samples and s12 and s22 be the variances for the two samples. Then F = [ n1 s12 / n1 -- 1 ] [ n2 s22 / n2 1 ] is a called F-ratio. It follows F distribution and the test based on this statistics is known as F-test. In F-ratio we always take the larger of two estimates in the numerator and the smaller in the denominator. Null Hypothesis is There is no difference between the variances of two populations from which the two samples are drawn. Analysis of Variance ( ANOVA ) An Agronomist may like to know whether yield per acre will the same if four different varieties of wheat are sown in different identical plots. A diary f arm may like to test whether there is significance differences between the quality and quantity of milk obtained from different classes of cattle's. A

business manager may like to find out whether there is any difference in the average sales by four salesmen and so on. In all these situations analysis of variance technique can be employed. Analysis of Variance technique is used to test whether the means of several samples differ significantly. It, therefore, test whether given samples are drawn from populations with same mean or all samples belong to same population. Analysis of variance technique was formerly used in Agricultural Research. But now the technique finds useful application in the field of both natural and social sciences. Definition Analysis of Variance may be defined as a technique which analysis the variances of two or more comparable series (or samples) for determining the significance of differences in their Arithmetic means, and for determining whether difference sample under study are drawn from same populations or not, with the help of the statistical technique called F-test Null Hypothesis of the test is Population means are equal or All the samples belong to the same population having same variance. One way classification of data Analysis of Variance ( ANOVA ) In one way classification. Observation are classified into groups on the basis of a single criteria. For example suppose we want to study the yield of a crop. This study can be made with respect of the effect of a variable, say Fertilizer. Here we apply different kinds fertilizers on different paddy fields and try to find out the difference in the effect of these different kinds of fertilizers on yield. Two way classification of data Analysis of Variance ( ANOVA ) In two classification, observations are classified into groups on the basis of two criteria. For example suppose we want to study the yield of crops. This study can be made with respect to the effect of two variables say fertilizer and seed. Here we apply different kinds of fertilizers and different kinds of seeds on different paddy field and try to find out the difference in the effect of theses different fertilizers and different seeds on the yield. Types of Variances Types of Variances in one way classification Variance between samples. Variance within the sample. Variance about the sample ( Total variance for all observations together) Types of Variances in two way classification Variance between samples due to column variable. Variance within the samples due to row variable. Variance within the samples. Variance about the sample ( Total variance for all observations together)

Procedure for carrying out ANOVA in one way classification 1. Assume that the means of the samples are equal. 2. Compute Mean Square between the samples say MSC and Mean Square within the sample say MSE. 3. For computing MSC and MSE, following calculations are made. 1. T = Sum of all observations. 2. SST = sum of squares all observations --- T2 / N 3. SSC = ( x1 )2/n1 + ( x2 )2/n2 + ( x3 )2/n3. ---T2/N where = x1, x2, x3. Are column totals. 4. SSE = SST --- SSC 5. Then calculate MSC = SSC /c--1 where c is the number of columns 6. Calculate MSE = SSE / Nc 3. Calculate F = MSE/MSC 4. Obtain Table value of F for (c1,Nc) d.f One Way ANOVA Table Source of variation Between Sample Within Samples Total

Sum of Squares SSC SSE SST

Degree of freedom c--1 N--c N--1

Mean Square MSC MSE

Procedure for carrying out ANOVA in two way classification a) Assume that the means of all columns are equal. b) Assume that the means of all rows are equal. Compute T = Sum of all the values. Find SST = sum of squares of all observations T2/N Find SSC = ( x1 )2/n1 + ( x2 )2/n2 + ( x3 )2/n3. ---T2/N where = x1, x2, x3. Are column totals. Find SSR = ( x1 )2/n1 + ( x2 )2/n2 + ( x3 )2/n3. ---T2/N where = x1, x2, x3. Are row totals. SSE = SST SSC SSR Find MSC = SSC/ c 1, MSR = SSR/ r 1, MSE = SSE/( c 1 ) ( r 1 ) Find Fc = MSC / MSE, and Fr = MSR / MSE Degree of Freedom for Fc = [ c 1, ( c 1)( r 1 ) ] Degree of freedom for Fr = [ r 1, ( c 1)( r 1) ] Two Way ANOVA Table Source of variation Sum of Squares Degree of freedom Mean Square Between Columns SSC c -- 1 MSC Between Rows SSR r -- 1 MSR Residual SSE ( c 1 ) ( r 1) MSE Total SST N--1

F ratio Fc Fr

You might also like