Testing Hypothesis Main Pages
Testing Hypothesis Main Pages
Testing Hypothesis Main Pages
Science mainly consists of two main factors on which its working depends first is the body of the knowledge and the other one is the method of the inquiry. The body of knowledge involves the various laws, theories, hypothesis etc. and the other factor of inquiry methodology consists of the various mechanisms that help a great deal in the addition of the theories etc to the body of the knowledge.
Hypothesis and the theories are generally responsible for the movement of knowledge from the unknown to the known. Hypotheses play a very important and a critical role in the assertion of a particular thing, as they are able to describe certain facts and are also able to explain the various relationships between these facts. As a result of this, hypotheses help a great deal in the investigation operations or activities. On the institution of the problem to be answered in the process of the research, the researcher forms various tentative or possible solutions to these problems these proposed answers or the solutions are referred to as the hypothesis. But a very critical and essential point to be kept in mind here is that these propositions are not at all verified in nature. So Hypothesis can be referred to as the interpretation of certain facts which is just a possible solution or a tentative answer to a problem and is completely or partly unverified in nature. Then afterwards on its establishment, it ceases to be a hypothesis and then finally becomes a theory or a principle. The word Hypothesis has come from the Greek word hypo (means under)and tithenas (means to place) together these words indicate towards the support they provide to each other on the placement of the hypothesis under the evidence, which acts as a foundation.
According to George A Luniberg, hypothesis can be defined as a tentative generalization, the validity of which remains to be tested. In this elementary stage, the hypothesis may be very hunch, guess, imaginative data, which becomes the basis for an action or an investigation. A very vital point that should be kept in mind about the hypotheses is that these are not theories these only have some linkage to the theory but hypothesis is not that much elaborated as the theory is. But it can be said that the hypothesis is derived from the theory. A researcher uses hypothesis testing to support beliefs about comparisons (i.e., variables or groups). Basically, it is how we empirically test our research hypotheses for "accuracy." We NEVER prove beyond the shadow of a doubt that a comparison is true. Rather, we conclude that, based on some collected data and assumptions, the probability of the comparison being true is very high (i.e., around 95 99% sure). In all hypothesis testing, the hypothesis being tested is a hypothesis about equality. The researcher thinks the equality hypothesis is NOT true, and by showing how the data do not fit it, the equality hypothesis can be rejected.We call this equality hypothesis the null hypothesis , and its symbol is: H0. The null hypothesis is a statement comparing two statistics (usually two means).This difference hypothesis is the alternative hypothesis , and its symbol is: Ha or H1. The alternative hypothesis is a statement comparing two statistics or groups, suggesting there is a difference.
Hypothesis should be stated in advance The hypothesis must be stated in writing during the proposal state. This will help to keep the research effort focused on the primary objective and create a stronger basis for interpreting the studys results as compared to a hypothesis that emerges as a result of inspecting the data. The habit of post hoc hypothesis testing (common among researchers) is nothing but using third-degree methods on the data (data dredging), to yield at least something significant. This leads to overrating the occasional chance associations in the study.
SOURCES OF HYPOTHESIS
Observations made in routine activities. Theories based on the scientific approach. Analogies. Knowledge obtained from the functional executives. Results of the research and development department. Experience of the investigator.
statistical advantage of permitting a smaller sample size as compared to that permissible by a two-tailed hypothesis. Unfortunately, one-tailed hypotheses are not always appropriate; in fact, some investigators believe that they should never be used. However, they are appropriate when only one direction for the association is important or biologically meaningful. An example is the one-sided hypothesis that a drug has a greater frequency of side effects than a placebo; the possibility that the drug has fewer side effects than the placebo is not worth testing. Whatever strategy is used, it should be stated in advance; otherwise, it would lack statistical rigor. Data dredging after it has been collected and post hoc deciding to change over to one-tailed hypothesis testing to reduce the sample size and P value are indicative of lack of scientific integrity.
Assume that each k value represents the number of populations to be compared. Difficulty in meeting assumptions The tests used in the testing of hypothesis, viz., t-tests and ANOVA have some fundamental assumptions that need to be met, for the test to work properly and yield good results. The main assumptions for the t-test and ANOVA are listed below. The primary assumptions underlying the a t-test are: 1. The samples are drawn randomly from a population in which the data are distributed normally distributed.
2 2 .Therefore
it is assumed that
2
s12 and s22 both estimate a common population variance, assumption is called the homogeneity of variances
. This
3. In the case of a two sample t-test, the measurements in sample 1 are independent of those in sample 2. Like the t-test, analysis of variance is based on a model that requires certain assumptions. Assumptions of ANOVA are that: 1. Each group is obtained randomly, with each observation independent of all other observations and the groups independent of each other. 2. The samples represent populations in which the data are normally distributed.
10
11
Keep in mind that the only reason we are testing the null hypothesis is because we think it is wrong. We state what we think is wrong about the null hypothesis in an alternative hypothesis. For the children watching TV example, we may have reason to believe that children watch more than (>) or less than (<) 3 hours of TV per week. When we are uncertain of the direction, we can state that the value in the null hypothesis is not equal to () 3 hours. In a courtroom, since the defendant is assumed to be innocent (this is the null hypothesis so to speak), the burden is on a prosecutor to conduct a trial to show evidence that the defendant is not innocent. In a similar way, we assume the null hypothesis is true, placing the burden on the researcher to conduct a study to show evidence that the null hypothesis is unlikely to be true. Regardless, we always make a decision about the null hypothesis (that it is likely or unlikely to be true). The alternative hypothesis is needed for Step 2. An alternative hypothesis (H1) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis. The alternative hypothesis states what we think is wrong about the null hypothesis, which is needed for Step 2. Step 2: Set the criteria for a decision. To set the criteria for a decision, we state the level of significance for a test. This is similar to the criterion that jurors use in a criminal trial. Jurors decide whether the evidence presented shows guilt beyond a reasonable doubt (this is the criterion). Likewise, in hypothesis testing, we collect data to show that the null hypothesis is not true, based on the likelihood of selecting a sample mean from a population (the likelihood is the criterion). The likelihood or level of significance is typically set at 5% in behavioral research studies. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we conclude that the sample we selected is too unlikely and so we reject the null hypothesis.
12
Level of significance, or significance level, refers to a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true. In behavioral science, the criterion or level of significance is typically set at 5%. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we reject the value stated in the null hypothesis. The alternative hypothesis establishes where to place the level of significance. Remember that we know that the sample mean will equal the population mean on average if the null hypothesis is true. All other possible values of the sample mean are normally distributed (central limit theorem). The empirical rule tells us that at least 95% of all sample means fall within about 2 standard deviations (SD) of the population mean, meaning that there is less than a 5% probability of obtaining a sample mean that is beyond 2 SD from the population mean. For the children watching TV example, we can look for the probability of obtaining a sample mean beyond 2 SD in the upper tail (greater than 3), the lower tail (less than 3), or both tails (not equal to 3). Figure 8.2 shows that the alternative hypothesis is used to determine which tail or tails to place the level of significance for a hypothesis test. Step 3: Compute the test statistic. Suppose we measure a sample mean equal to 4 hours per week that children watch TV. To make a decision, we need to evaluate how likely this sample outcome is, if the population mean stated by the null hypothesis (3 hours per week) is true. We use a test statistic to determine this likelihood. Specifically, a test statistic tells us how far, or how many standard deviations, a sample mean is from the population mean. The larger the value of the test statistic, the further the distance, or number of standard deviations, a sample mean is from the population mean stated in the null hypothesis. The value of the test statistic is used to make a decision in Step 4.
13
The test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the test statistic is used to make a decision regarding the null hypothesis. Step 4: Make a decision. We use the value of the test statistic to make a decision about the null hypothesis. The decision is based on the probability of obtaining a sample mean, given that the value stated in the null hypothesis is true. If the probability of obtaining a sample mean is less than 5% when the null hypothesis is true, then the decision is to reject the null hypothesis. If the probability of obtaining a sample mean is greater than 5% when the null hypothesis is true, then the decision is to retain the null hypothesis. In sum, there are two decisions a researcher can make: 1. Reject the null hypothesis. The sample mean is associated with a low probability of occurrence when the null hypothesis is true. 2. Retain the null hypothesis. The sample mean is associated with a high probability of occurrence when the null hypothesis is true. The probability of obtaining a sample mean, given that the value stated in the null hypothesis is true, is stated by the p value. The p value is a probability: It varies between 0 and 1 and can never be negative. In Step 2, we stated the criterion or probability of obtaining a sample mean at which point we will decide to reject the value stated in the null hypothesis, which is typically set at 5% in behavioral research. To make a decision, we compare the p value to the criterion we set in Step 2. A p value is the probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true. P-values are the actual probabilities calculated from a statistical test, and are compared against alpha to determine whether to reject the null hypothesis or not.The p value for obtaining a sample outcome is compared to the level of significance.
14
Example: alpha = 0.05; alpha = 0.05; calculated p-value = 0.008; reject null hypothesis calculated p-value = 0.110; do not reject null hypothesis
Significance, or statistical significance, describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance. When the p value is less than 5% (p < .05), we reject the null hypothesis. We will refer to p < .05 as the criterion for deciding to reject the null hypothesis, although note that when p = .05, the decision is also to reject the null hypothesis. When the p value is greater than 5% (p > .05), we retain the null hypothesis. The decision to reject or retain the null hypothesis is called significance. When the p value is less than .05, we reach significance; the decision is to reject the null hypothesis. When the p value is greater than .05, we fail to reach significance; the decision is to retain the null hypothesis.
15
16
H0: there is no difference between the two drugs on average. A type I error would occur if we concluded that the two drugs produced different effects when in fact there was no difference between them.A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be precisely computed as P(type I error) = significance level = The exact probability of a type II error is generally unknown.If we do not reject the null hypothesis, it may still be false (a type II error) as the sample may not be big enough to identify the falseness of the null hypothesis (especially if the truth is very close to hypothesis).For any given set of data, type I and type II errors are inversely related; the smaller the risk of one, the higher the risk of the other.A type I error can also be referred to as an error of the first kind.
Table 2. Types of error Type of decision Reject H0 Accept H0 Correct decision (1H0 true H0 false Correct decision (1-
TYPE II ERROR- In a hypothesis test, a type II error occurs when the null hypothesis H0, is not rejected when it is in fact false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; i.e. H0: there is no difference between the two drugs on average.
17
A type II error would occur if it was concluded that the two drugs produced the same effect, i.e. there is no difference between the two drugs on average, when in fact they produced different ones. A type II error is frequently due to sample sizes being too small. The probability of a type II error is generally unknown, but is symbolized by and written P(type II error) = A type II error
can also be referred to as an error of the second kind. we decide whether to retain or reject the null hypothesis. Because we are observing a sample and not an entire population, it is possible that a conclusion may be wrong. there are four decision alternatives regarding the truth and falsity of the decision we make about a null hypothesis: 1. The decision to retain the null hypothesis could be correct. 2. The decision to retain the null hypothesis could be incorrect. 3. The decision to reject the null hypothesis could be correct. 4. The decision to reject the null hypothesis could be incorrect. The consequences of these different types of error are very different. For example, if one tests for the significant presence of a pollutant, incorrectly deciding that a site is polluted (Type I error) will cause a waste of resources and energy cleaning up a site that does not need it. On the other hand, failure to determine presence of pollution (Type II error) can lead to environmental deterioration or health problems in the nearby community The analysis plan includes decision rules for rejecting the null hypothesis. In practice, statisticians describe these decision rules in two ways - with reference to a P-value or with reference to a region of acceptance.
P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The Pvalue is the probability of observing a test statistic as extreme as S,
18
assuming the null hypothesis is true. If the P-value is less than the significance level, we reject the null hypothesis.
Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level. The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the level of significance. These approaches are equivalent. Some statistics texts use the P-value
approach; others use the region of acceptance approach. In subsequent lessons, this tutorial will present examples that illustrate each approach.
19
Type II error, or beta (b) error, is the probability of retaining a null hypothesis that is actually false.
power because it is the decision we aim for. Remember that we are only testing the null hypothesis because we think it is wrong. Deciding to reject a false null hypothesis, then, is the power, inasmuch as we learn the most about populations when we accurately reject false notions of truth. This decision is the most published result in behavioral research. The power in hypothesis testing is the probability of rejecting a false null hypothesis. Specifically, it is the probability that a randomly selected sample will show that the null hypothesis is false when the null hypothesis is indeed false.
Null Hypothesis (Treatment A = Treatment B) POPULATION True (No difference) Decision Based on Inferential Statistical Test Accept H0 (No difference) Reject H0 (Difference) Correct Decision Type I Error (alpha () error) False (Difference) Type II Error (beta () error) Correct Decision Power (1-)
21
22
FACTORS THAT AFFECT POWER The power of a hypothesis test is affected by three factors.
Sample size (n). Other things being equal, the greater the sample size, the greater the power of the test.
Significance level (). The higher the significance level, the higher the power of the test. If you increase the significance level, you reduce the region of acceptance. As a result, you are more likely to reject the null hypothesis. This means you are less likely to accept the null hypothesis when it is false; i.e., less likely to make a Type II error. Hence, the power of the test is increased.
The "true" value of the parameter being tested. The greater the difference between the "true" value of a parameter and the value specified in the null hypothesis, the greater the power of the test. That is, the greater the effect size, the greater the power of the test.
One advantage of knowing effect size, d, is that its value can be used to determine the power of detecting an effect in hypothesis testing. The likelihood of detecting an effect, called power, is critical in behavioral research because it lets the researcher know the probability that a randomly selected sample will lead to a decision to reject the null hypothesis, if the null hypothesis is false. In this section, we describe how effect size and sample size are related to power.
23
Now, what if you're researching a drug, and you want to know whether the drug is effective and the drug has nasty side effects. Would that affect the alpha error you choose? The alpha error tells you what your chances are of concluding the drug is effective when it's really not. It tells you what your chances of a false positive are. If the drug has really nasty side effects would you want to increase or decrease your alpha? Would you want to set it higher than 5% or lower than 5%? You'd want to lower your alpha to under 5%. You want to reduce the chance of your falsely concluding this is a good drug when it's not, because it has nasty side effects. So, the 5% is the convention, but if you have good reasons for increasing or decreasing it by all means do so. On the other hand, if in starting a new program of research, the drug has no harmful side effects, and you want to reduce the chances of missing an important effect especially since at this point your procedures may be relatively unrefined, then you may want to increase your alpha level to say .10. That is, your experiment is designed in a way that you have a 10% chance of a false positive. It doesn't matter if you misapply this drug. It's not going to hurt anybody. So, on the one hand, you may want to have a .01 a point .001 alpha level if the drug has nasty side effects. Or, you may want to have a .10 alpha level if you are doing a pilot study. What if this drug is for a horrific disease, a crippling disease or a life threatening disease? Well, you don't want to do an experiment that causes you to miss the good drug. Assume it is a very devasting disease. You've got a chance to do something about it. So you want to make sure that if the drug works you don't miss it. Well you could try to reduce the beta error, to .1 instead of .2. That is increase your power from .8 to .9 maybe .95, whatever it takes to do that kind of
24
thing. Of course, increasing alpha increases power, so that is one of your alternatives. So these numbers are important for you to interpret, not the statisician. Because only you know the medical aspects of the treatments that you're using, how devastating the disease is, or how painful the side effects might be. You are the person who knows these issues. Then the statistician builds an experiment to guarantee your alpha level is what you want it to be and your beta error or your power is what you want it to be. You'll be interacting with the statistician on these kinds of issues, but you'll have to use your medical knowledge to decide these sorts of things. The figure below shows what alpha, beta and power look like in a graph and illustrates some of the relationships between them. Remember, alpha and beta represent the probabilities of Type I and Type II Errors, respectively. Figure 3.3: Alpha and Beta Errors
Since power is 1-beta, the area to the right of 107.5 under the right curve represents your power in this experiment. It should be apparent from the graph that alpha, beta and power are closely related. In particular, you can see that reducing alpha is equivalent to moving the vertical line between the two sample means to the right. When you do this, alpha decreases, power (1 - beta) decreases,
25
and beta increases. On the other hand moving that same vertical line to the left increases alpha, increases power, and decreases beta. To put it another way, increases in alpha increase power and decreases in alpha decrease power.
The sampling method is simple random sampling. The sample is drawn from a normal or near-normal population.
Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.
The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.
The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.
26
The table below shows three sets of hypotheses. Each makes a statement about how the population mean is related to a specified value M. (In the table, the symbol means " not equal to ".)
Set 1 2 3
Number of tails 2 1 1
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis. Formulate an Analysis Plan The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.
Analyze Sample Data Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.
27
Standard error. Compute the standard error (SE) of the sampling distribution. SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] } where s is the standard deviation of the sample, N is the population size, and n is the sample size. When the population size is much larger (at least 10 times larger) than the sample size, the standard error can be approximated by: SE = s / sqrt( n )
Degrees of freedom. The degrees of freedom (DF) are equal to the sample size (n) minus one. Thus, DF = n - 1.
Test statistic. The test statistic is a t-score (t) defined by the following equation. t = (x - ) / SE
where x is the sample mean, is the hypothesized population mean in the null hypothesis, and SE is the standard error.
P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to assess the probability associated with the tscore, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)
Interpret Results If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.
28
29
You may conclude that the null hypothesis is false when, in fact, it is true. This is called a Type I error. If the null hypothesis is, in fact true, the probability of committing a Type I error is determined by (and equal to) the alpha level, usually .05. That means 5%, or 1 in 20, true null hypotheses end up being rejected by hypothesis tests!
30
That's the nature of the beast. There is nothing you can do about it (other than lower the alpha level, which has other unfortunate consequences).
o
You may conclude that the null hypothesis is true when, in fact, it is false. That is, you may claim not to see an effect that is really there. This is called a Type II error. If the null hypothesis is, in fact, false, then the probability of committing a Type II error is called beta. The ability of a hypothesis test to find an effect that is really there is called the power of the test and is equal to 1-beta. If you decrease the alpha level of a test in order to avoid making a Type I error, you will generally increase beta and, therefore, decrease the power of the test to find an effect that really is there. You can't have it both ways. Type I and Type II errors are generally traded off.
The most important thing you can do to increase the power of a test is to increase the sample size. Small sample sizes generally mean small power. Sample size DOES NOT affect the Type I error rate. You are NOT more likely to make a Type I error because of a small sample size.
If you conduct more than one hypothesis test at alpha=.05, the overall (or "family wise") Type I error rate obeys the simple laws of probability. The more tests you conduct, the more likely you are to commit a Type I error on at least one of them. If you do 20 tests and find only 1 significant difference, that one is very likely a Type I error.
31
32