Testing Hypothesis Main Pages

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

CHAPTER 1: INTRODUCTION

Science mainly consists of two main factors on which its working depends first is the body of the knowledge and the other one is the method of the inquiry. The body of knowledge involves the various laws, theories, hypothesis etc. and the other factor of inquiry methodology consists of the various mechanisms that help a great deal in the addition of the theories etc to the body of the knowledge.

Hypothesis and the theories are generally responsible for the movement of knowledge from the unknown to the known. Hypotheses play a very important and a critical role in the assertion of a particular thing, as they are able to describe certain facts and are also able to explain the various relationships between these facts. As a result of this, hypotheses help a great deal in the investigation operations or activities. On the institution of the problem to be answered in the process of the research, the researcher forms various tentative or possible solutions to these problems these proposed answers or the solutions are referred to as the hypothesis. But a very critical and essential point to be kept in mind here is that these propositions are not at all verified in nature. So Hypothesis can be referred to as the interpretation of certain facts which is just a possible solution or a tentative answer to a problem and is completely or partly unverified in nature. Then afterwards on its establishment, it ceases to be a hypothesis and then finally becomes a theory or a principle. The word Hypothesis has come from the Greek word hypo (means under)and tithenas (means to place) together these words indicate towards the support they provide to each other on the placement of the hypothesis under the evidence, which acts as a foundation.

According to George A Luniberg, hypothesis can be defined as a tentative generalization, the validity of which remains to be tested. In this elementary stage, the hypothesis may be very hunch, guess, imaginative data, which becomes the basis for an action or an investigation. A very vital point that should be kept in mind about the hypotheses is that these are not theories these only have some linkage to the theory but hypothesis is not that much elaborated as the theory is. But it can be said that the hypothesis is derived from the theory. A researcher uses hypothesis testing to support beliefs about comparisons (i.e., variables or groups). Basically, it is how we empirically test our research hypotheses for "accuracy." We NEVER prove beyond the shadow of a doubt that a comparison is true. Rather, we conclude that, based on some collected data and assumptions, the probability of the comparison being true is very high (i.e., around 95 99% sure). In all hypothesis testing, the hypothesis being tested is a hypothesis about equality. The researcher thinks the equality hypothesis is NOT true, and by showing how the data do not fit it, the equality hypothesis can be rejected.We call this equality hypothesis the null hypothesis , and its symbol is: H0. The null hypothesis is a statement comparing two statistics (usually two means).This difference hypothesis is the alternative hypothesis , and its symbol is: Ha or H1. The alternative hypothesis is a statement comparing two statistics or groups, suggesting there is a difference.

CHAPTER 2: CHARACTERISTICS OF A GOOD HYPOTHESIS


A good hypothesis must be based on a good research question. It should be simple, specific and stated in advance Hypothesis should be simple A simple hypothesis contains one predictor and one outcome variable, e.g. positive family history of schizophrenia increases the risk of developing the condition in first-degree relatives. Here the single predictor variable is positive family history of schizophrenia and the outcome variable is schizophrenia. A complex hypothesis contains more than one predictor variable or more than one outcome variable, e.g., a positive family history and stressful life events are associated with an increased incidence of Alzheimers disease. Here there are 2 predictor variables, i.e., positive family history and stressful life events, while one outcome variable, i.e., Alzheimers disease. Complex hypothesis like this cannot be easily tested with a single statistical test and should always be separated into 2 or more simple hypotheses. Hypothesis should be specific A specific hypothesis leaves no ambiguity about the subjects and variables, or about how the test of statistical significance will be applied. It uses concise operational definitions that summarize the nature and source of the subjects and the approach to measuring variables (History of medication with tranquilizers, as measured by review of medical store records and physicians prescriptions in the past year, is more common in patients who attempted suicides than in controls hospitalized for other conditions). This is a long-winded sentence, but it explicitly states the nature of predictor and outcome variables, how they will be measured and the research hypothesis. Often these details may be included in the study proposal and may not be stated in the research hypothesis. However, they should be clear in the mind of the investigator while conceptualizing the study.
3

Hypothesis should be stated in advance The hypothesis must be stated in writing during the proposal state. This will help to keep the research effort focused on the primary objective and create a stronger basis for interpreting the studys results as compared to a hypothesis that emerges as a result of inspecting the data. The habit of post hoc hypothesis testing (common among researchers) is nothing but using third-degree methods on the data (data dredging), to yield at least something significant. This leads to overrating the occasional chance associations in the study.

CHAPTER 3 : ROLE AND FUNCTIONS OF THE HYPOTHESIS


Helps in the testing of the theories. Serves as a great platform in the investigation activities. Provides guidance to the research work or study. Hypothesis sometimes suggests theories. Helps in knowing the needs of the data. Explains social phenomena. Develops the theory. Also acts as a bridge between the theory and the investigation. Provides a relationship between phenomena in such a way that it leads to the empirical testing of the relationship. Helps in knowing the most suitable technique of analysis. Helps in the determination of the most suitable type of research. Provides knowledge about the required sources of data. Research becomes focused under the direction of the hypothesis. Is very helpful in carrying out an enquiry of a certain activity. Helps in reaching conclusions, if it is correctly drawn.

SOURCES OF HYPOTHESIS
Observations made in routine activities. Theories based on the scientific approach. Analogies. Knowledge obtained from the functional executives. Results of the research and development department. Experience of the investigator.

CHAPTER 4: WHY USE HYPOTHESES IN SOCIAL SCIENCE RESEARCH?


In examining phenomena of the social world, there are many numbers of relationships that we could examine to learn more about their workings. However, it is possible that some of the relationships that we observe might be due to chance, rather than some relationship between two variables. For instance, consider a hypothetical experiment that is designed to evaluate whether enhancing hospital patients' "sense of control" influences their health. In this experiment, conducted in McGregor Hospital, ten people in the chronic care ward were sampled and given "enhanced control" over their schedule and living conditions. They could specify when they would have their meals, which hours they could receive visitors, and which programs they could watch on television. To compare the benefits of this enhanced control, an additional ten patients of the chronic care ward were chosen, though their routines were not altered. After six weeks, the health of all subjects was measured and it was found that the mean level of health (on a 10-point scale with higher numbers indicating better health) was 6 for the enhanced control group and 4 for the non-enhanced group. Perhaps the first question that should be asked is: "Can we be sure that the enhanced sense of control is responsible for the difference between the groups, rather than chance?" It might be that simply by chance the people who were chosen for the enhanced control group were somewhat healthier before the experiment than those assigned to the other group. Or it might be that these differences were due only to chance, rather than some benefit of control over living conditions. What is needed is a way to evaluate the likelihood that relationships, such as those in the study in the hospital described above, occurred by chance. The establishing and testing of hypotheses is such a method.
6

CHAPTER 5: TYPES OF HYPOTHESIS


For the purpose of testing statistical significance, hypotheses are classified by the way they describe the expected difference between the study groups. Null and alternative hypotheses The null hypothesis states that there is no association between the predictor and outcome variables in the population (There is no difference between tranquilizer habits of patients with attempted suicides and those of age- and sexmatched control patients hospitalized for other diagnoses). The null hypothesis is the formal basis for testing statistical significance. By starting with the proposition that there is no association, statistical tests can estimate the probability that an observed association could be due to chance. The proposition that there is an association that patients with attempted suicides will report different tranquilizer habits from those of the controls is called the alternative hypothesis. The alternative hypothesis cannot be tested directly; it is accepted by exclusion if the test of statistical significance rejects the null hypothesis. One- and two-tailed alternative hypotheses A one-tailed (or one-sided) hypothesis specifies the direction of the association between the predictor and outcome variables. The prediction that patients of attempted suicides will have a higher rate of use of tranquilizers than control patients is a one-tailed hypothesis. A two-tailed hypothesis states only that an association exists; it does not specify the direction. The prediction that patients with attempted suicides will have a different rate of tranquilizer use either higher or lower than control patients is a two-tailed hypothesis. (The word tails refers to the tail ends of the statistical distribution such as the familiar bell-shaped normal curve that is used to test a hypothesis. One tail represents a positive effect or association; the other, a negative effect.) A one-tailed hypothesis has the
7

statistical advantage of permitting a smaller sample size as compared to that permissible by a two-tailed hypothesis. Unfortunately, one-tailed hypotheses are not always appropriate; in fact, some investigators believe that they should never be used. However, they are appropriate when only one direction for the association is important or biologically meaningful. An example is the one-sided hypothesis that a drug has a greater frequency of side effects than a placebo; the possibility that the drug has fewer side effects than the placebo is not worth testing. Whatever strategy is used, it should be stated in advance; otherwise, it would lack statistical rigor. Data dredging after it has been collected and post hoc deciding to change over to one-tailed hypothesis testing to reduce the sample size and P value are indicative of lack of scientific integrity.

PROBLEMS FACED DURING HYPOTHESIS FORMULATION


Formulating a hypothesis is not at all an easy process and is faced with a large number of difficulties. According to Goode and Hatt, the various difficulties faced during the formulation of the hypothesis generally include the lack of the knowledge about the scientific approach of the method involved, as sometimes it becomes impossible to gather the complete information about a particular scientific method. One other major difficulty in the formulation of the hypothesis is the lack of clear theoretical background. Because of this problem of unclear and indefinite background of theory one is not able to arrive to a conclusion easily. But with time answers to all such problems are available and these difficulties that arise during the hypothesis formulation can be easily removed by having complete and accurate information about the concepts of the subjects involved. Also the hypothesis should not be very long and should be timely in nature.

CHAPTER 6: LIMITATIONS FOR ENVIRONMENTAL SAMPLING


Although hypothesis tests are a very useful tool in general, they are sometimes not appropriate in the environmental field. The following cases illustrate some of the limitations of this type of test: Multiple Comparisons z and t tests are very useful when comparing two population menas. However, when it comes to comparing several population means at the same time, this method is not very appropriate. For each test, there is always the possibility of committing an error. Since we are conducting three such tests, the overall error probability would exceed the acceptable ranges, and we could not feel very confident about the final conclusion. Table 8 shows the resulting overall if multiple t tests are conducted.

Assume that each k value represents the number of populations to be compared. Difficulty in meeting assumptions The tests used in the testing of hypothesis, viz., t-tests and ANOVA have some fundamental assumptions that need to be met, for the test to work properly and yield good results. The main assumptions for the t-test and ANOVA are listed below. The primary assumptions underlying the a t-test are: 1. The samples are drawn randomly from a population in which the data are distributed normally distributed.

2. In the case of a two sample t-test,

2 2 .Therefore

it is assumed that
2

s12 and s22 both estimate a common population variance, assumption is called the homogeneity of variances

. This

3. In the case of a two sample t-test, the measurements in sample 1 are independent of those in sample 2. Like the t-test, analysis of variance is based on a model that requires certain assumptions. Assumptions of ANOVA are that: 1. Each group is obtained randomly, with each observation independent of all other observations and the groups independent of each other. 2. The samples represent populations in which the data are normally distributed.

10

CHAPTER 7: FOUR STEPS TO HYPOTHESIS TESTING


The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the mean, is likely to be true. In this section, we describe the four steps of hypothesis testing that were briefly introduced in Section 8.1: Step 1: State the hypotheses. Step 2: Set the criteria for a decision. Step 3: Compute the test statistic. Step 4: Make a decision. Step 1: State the hypotheses. We begin by stating the value of a population mean in a null hypothesis, which we presume is true. For the children watching TV example, we state the null hypothesis that children in the United States watch an average of 3 hours of TV per week. This is a starting point so that we can decide whether this is likely to be true, similar to the presumption of innocence in a courtroom. When a defendant is on trial, the jury starts by assuming that the defendant is innocent. The basis of the decision is to determine whether this assumption is true. Likewise, in hypothesis testing, we start by assuming that the hypothesis or claim we are testing is true. This is stated in the null hypothesis. The basis of the decision is to determine whether this assumption is likely to be true. The null hypothesis (H0), stated as the null, is a statement about a population parameter, such as the population mean, that is assumed to be true. The null hypothesis is a starting point. We will test whether the value stated in the null hypothesis is likely to be true.

11

Keep in mind that the only reason we are testing the null hypothesis is because we think it is wrong. We state what we think is wrong about the null hypothesis in an alternative hypothesis. For the children watching TV example, we may have reason to believe that children watch more than (>) or less than (<) 3 hours of TV per week. When we are uncertain of the direction, we can state that the value in the null hypothesis is not equal to () 3 hours. In a courtroom, since the defendant is assumed to be innocent (this is the null hypothesis so to speak), the burden is on a prosecutor to conduct a trial to show evidence that the defendant is not innocent. In a similar way, we assume the null hypothesis is true, placing the burden on the researcher to conduct a study to show evidence that the null hypothesis is unlikely to be true. Regardless, we always make a decision about the null hypothesis (that it is likely or unlikely to be true). The alternative hypothesis is needed for Step 2. An alternative hypothesis (H1) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis. The alternative hypothesis states what we think is wrong about the null hypothesis, which is needed for Step 2. Step 2: Set the criteria for a decision. To set the criteria for a decision, we state the level of significance for a test. This is similar to the criterion that jurors use in a criminal trial. Jurors decide whether the evidence presented shows guilt beyond a reasonable doubt (this is the criterion). Likewise, in hypothesis testing, we collect data to show that the null hypothesis is not true, based on the likelihood of selecting a sample mean from a population (the likelihood is the criterion). The likelihood or level of significance is typically set at 5% in behavioral research studies. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we conclude that the sample we selected is too unlikely and so we reject the null hypothesis.

12

Level of significance, or significance level, refers to a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true. In behavioral science, the criterion or level of significance is typically set at 5%. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we reject the value stated in the null hypothesis. The alternative hypothesis establishes where to place the level of significance. Remember that we know that the sample mean will equal the population mean on average if the null hypothesis is true. All other possible values of the sample mean are normally distributed (central limit theorem). The empirical rule tells us that at least 95% of all sample means fall within about 2 standard deviations (SD) of the population mean, meaning that there is less than a 5% probability of obtaining a sample mean that is beyond 2 SD from the population mean. For the children watching TV example, we can look for the probability of obtaining a sample mean beyond 2 SD in the upper tail (greater than 3), the lower tail (less than 3), or both tails (not equal to 3). Figure 8.2 shows that the alternative hypothesis is used to determine which tail or tails to place the level of significance for a hypothesis test. Step 3: Compute the test statistic. Suppose we measure a sample mean equal to 4 hours per week that children watch TV. To make a decision, we need to evaluate how likely this sample outcome is, if the population mean stated by the null hypothesis (3 hours per week) is true. We use a test statistic to determine this likelihood. Specifically, a test statistic tells us how far, or how many standard deviations, a sample mean is from the population mean. The larger the value of the test statistic, the further the distance, or number of standard deviations, a sample mean is from the population mean stated in the null hypothesis. The value of the test statistic is used to make a decision in Step 4.

13

The test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the test statistic is used to make a decision regarding the null hypothesis. Step 4: Make a decision. We use the value of the test statistic to make a decision about the null hypothesis. The decision is based on the probability of obtaining a sample mean, given that the value stated in the null hypothesis is true. If the probability of obtaining a sample mean is less than 5% when the null hypothesis is true, then the decision is to reject the null hypothesis. If the probability of obtaining a sample mean is greater than 5% when the null hypothesis is true, then the decision is to retain the null hypothesis. In sum, there are two decisions a researcher can make: 1. Reject the null hypothesis. The sample mean is associated with a low probability of occurrence when the null hypothesis is true. 2. Retain the null hypothesis. The sample mean is associated with a high probability of occurrence when the null hypothesis is true. The probability of obtaining a sample mean, given that the value stated in the null hypothesis is true, is stated by the p value. The p value is a probability: It varies between 0 and 1 and can never be negative. In Step 2, we stated the criterion or probability of obtaining a sample mean at which point we will decide to reject the value stated in the null hypothesis, which is typically set at 5% in behavioral research. To make a decision, we compare the p value to the criterion we set in Step 2. A p value is the probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true. P-values are the actual probabilities calculated from a statistical test, and are compared against alpha to determine whether to reject the null hypothesis or not.The p value for obtaining a sample outcome is compared to the level of significance.

14

Example: alpha = 0.05; alpha = 0.05; calculated p-value = 0.008; reject null hypothesis calculated p-value = 0.110; do not reject null hypothesis

Significance, or statistical significance, describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance. When the p value is less than 5% (p < .05), we reject the null hypothesis. We will refer to p < .05 as the criterion for deciding to reject the null hypothesis, although note that when p = .05, the decision is also to reject the null hypothesis. When the p value is greater than 5% (p > .05), we retain the null hypothesis. The decision to reject or retain the null hypothesis is called significance. When the p value is less than .05, we reach significance; the decision is to reject the null hypothesis. When the p value is greater than .05, we fail to reach significance; the decision is to retain the null hypothesis.

15

CHAPTER 8: HYPOTHESIS TESTING AND SAMPLING DISTRIBUTIONS


The logic of hypothesis testing is rooted in an understanding of the sampling distribution of the mean. we showed three characteristics of the mean, two of which are particularly relevant in this section: 1. The sample mean is an unbiased estimator of the population mean. On average, a randomly selected sample will have a mean equal to that in the population. In hypothesis testing, we begin by stating the null hypothesis. We expect that, if the null hypothesis is true, then a random sample selected from a given population will have a sample mean equal to the value stated in the null hypothesis. 2. Regardless of the distribution in the population, the sampling distribution of the sample mean is normally distributed. Hence, the probabilities of all other possible sample means we could select are normally distributed. Using this distribution, we can therefore state an alternative hypothesis to locate the probability of obtaining sample means with less than a 5% chance of being selected if the value stated in the null hypothesis is true. To locate the probability of obtaining a sample mean in a sampling distribution, we must know (1) the population mean and (2) the standard error of the mean Each value is entered in the test statistic formula computed in Step 3, thereby allowing us to make a decision in Step 4.

MAKING A DECISION: TYPES OF ERROR


TYPE I ERROR- In a hypothesis test, a type I error occurs when the null hypothesis is rejected when it is in fact true; that is, H0 is wrongly rejected. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; i.e.

16

H0: there is no difference between the two drugs on average. A type I error would occur if we concluded that the two drugs produced different effects when in fact there was no difference between them.A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be precisely computed as P(type I error) = significance level = The exact probability of a type II error is generally unknown.If we do not reject the null hypothesis, it may still be false (a type II error) as the sample may not be big enough to identify the falseness of the null hypothesis (especially if the truth is very close to hypothesis).For any given set of data, type I and type II errors are inversely related; the smaller the risk of one, the higher the risk of the other.A type I error can also be referred to as an error of the first kind.

Table 2. Types of error Type of decision Reject H0 Accept H0 Correct decision (1H0 true H0 false Correct decision (1-

TYPE II ERROR- In a hypothesis test, a type II error occurs when the null hypothesis H0, is not rejected when it is in fact false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; i.e. H0: there is no difference between the two drugs on average.
17

A type II error would occur if it was concluded that the two drugs produced the same effect, i.e. there is no difference between the two drugs on average, when in fact they produced different ones. A type II error is frequently due to sample sizes being too small. The probability of a type II error is generally unknown, but is symbolized by and written P(type II error) = A type II error

can also be referred to as an error of the second kind. we decide whether to retain or reject the null hypothesis. Because we are observing a sample and not an entire population, it is possible that a conclusion may be wrong. there are four decision alternatives regarding the truth and falsity of the decision we make about a null hypothesis: 1. The decision to retain the null hypothesis could be correct. 2. The decision to retain the null hypothesis could be incorrect. 3. The decision to reject the null hypothesis could be correct. 4. The decision to reject the null hypothesis could be incorrect. The consequences of these different types of error are very different. For example, if one tests for the significant presence of a pollutant, incorrectly deciding that a site is polluted (Type I error) will cause a waste of resources and energy cleaning up a site that does not need it. On the other hand, failure to determine presence of pollution (Type II error) can lead to environmental deterioration or health problems in the nearby community The analysis plan includes decision rules for rejecting the null hypothesis. In practice, statisticians describe these decision rules in two ways - with reference to a P-value or with reference to a region of acceptance.

P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The Pvalue is the probability of observing a test statistic as extreme as S,

18

assuming the null hypothesis is true. If the P-value is less than the significance level, we reject the null hypothesis.

Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level. The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the level of significance. These approaches are equivalent. Some statistics texts use the P-value

approach; others use the region of acceptance approach. In subsequent lessons, this tutorial will present examples that illustrate each approach.

DECISION: RETAIN THE NULL HYPOTHESIS


When we decide to retain the null hypothesis, we can be correct or incorrect. The correct decision is to retain a true null hypothesis. This decision is called a null result or null finding. This is usually an uninteresting decision because the decision is to retain what we already assumed: that the value stated in the null hypothesis is correct. For this reason, null results alone are rarely published in behavioral research. The incorrect decision is to retain a false null hypothesis. This decision is an example of a Type II error, or b error. With each test we make, there is always some probability that the decision could be a Type II error. In this decision, we decide to retain previous notions of truth that are in fact false. While its an error, we still did nothing; we retained the null hypothesis. We can always go back and conduct more studies.

19

Type II error, or beta (b) error, is the probability of retaining a null hypothesis that is actually false.

DECISION: REJECT THE NULL HYPOTHESIS


When we decide to reject the null hypothesis, we can be correct or incorrect. The incorrect decision is to reject a true null hypothesis. This decision is an example of a Type I error. With each test we make, there is always some probability that our decision is a Type I error. A researcher who makes this error decides to reject previous notions of truth that are in fact true. Making this type of error is analogous to finding an innocent person guilty. To minimize this error, we assume a defendant is innocent when beginning a trial. Similarly, to minimize making a Type I error, we assume the null hypothesis is true when beginning a hypothesis test. Type I error is the probability of rejecting a null hypothesis that is actually true. Researchers directly control for the probability of committing this type of error. An alpha (a) level is the level of significance or criterion for a hypothesis test. It is the largest probability of committing a Type I error that we will allow and still decide to reject the null hypothesis. Since we assume the null hypothesis is true, we control for Type I error by stating a level of significance. The level we set, called the alpha level (symbolized as a), is the largest probability of committing a Type I error that we will allow and still decide to reject the null hypothesis. This criterion is usually set at .05 (a = .05), and we compare the alpha level to the p value. When the probability of a Type I error is less than 5% (p < .05),we decide to reject the null hypothesis; otherwise, we retain the null hypothesis. The correct decision is to reject a false null hypothesis. There is always some probability that we decide that the null hypothesis is false when it is indeed false. This decision is called the power of the decision-making process. It is called
20

power because it is the decision we aim for. Remember that we are only testing the null hypothesis because we think it is wrong. Deciding to reject a false null hypothesis, then, is the power, inasmuch as we learn the most about populations when we accurately reject false notions of truth. This decision is the most published result in behavioral research. The power in hypothesis testing is the probability of rejecting a false null hypothesis. Specifically, it is the probability that a randomly selected sample will show that the null hypothesis is false when the null hypothesis is indeed false.

Null Hypothesis (Treatment A = Treatment B) POPULATION True (No difference) Decision Based on Inferential Statistical Test Accept H0 (No difference) Reject H0 (Difference) Correct Decision Type I Error (alpha () error) False (Difference) Type II Error (beta () error) Correct Decision Power (1-)

21

CHAPTER 9: EFFECT SIZE, POWER, AND SAMPLE SIZE


EFFECT SIZE To compute the power of the test, one offers an alternative view about the "true" value of the population parameter, assuming that the null hypothesis is false. The effect size is the difference between the true value and the value specified in the null hypothesis. Effect size = True value - Hypothesized value For example, suppose the null hypothesis states that a population mean is equal to 100. A researcher might ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to 90? In this example, the effect size would be 90 - 100, which equals -10. POWER The probability of not committing a Type II error is called the power of a hypothesis test.The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is actually false - that is, to make a correct decision.In other words, the power of a hypothesis test is the probability of not committing a type II error. It is calculated by subtracting the probability of a type II error from 1, usually expressed as: Power = 1 - P(type II error) = The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power, close to 1.

22

FACTORS THAT AFFECT POWER The power of a hypothesis test is affected by three factors.

Sample size (n). Other things being equal, the greater the sample size, the greater the power of the test.

Significance level (). The higher the significance level, the higher the power of the test. If you increase the significance level, you reduce the region of acceptance. As a result, you are more likely to reject the null hypothesis. This means you are less likely to accept the null hypothesis when it is false; i.e., less likely to make a Type II error. Hence, the power of the test is increased.

The "true" value of the parameter being tested. The greater the difference between the "true" value of a parameter and the value specified in the null hypothesis, the greater the power of the test. That is, the greater the effect size, the greater the power of the test.

One advantage of knowing effect size, d, is that its value can be used to determine the power of detecting an effect in hypothesis testing. The likelihood of detecting an effect, called power, is critical in behavioral research because it lets the researcher know the probability that a randomly selected sample will lead to a decision to reject the null hypothesis, if the null hypothesis is false. In this section, we describe how effect size and sample size are related to power.

23

CHAPTER 10: FACTORS AFFECTING THE CHOICE OF ALPHA AND POWER

Now, what if you're researching a drug, and you want to know whether the drug is effective and the drug has nasty side effects. Would that affect the alpha error you choose? The alpha error tells you what your chances are of concluding the drug is effective when it's really not. It tells you what your chances of a false positive are. If the drug has really nasty side effects would you want to increase or decrease your alpha? Would you want to set it higher than 5% or lower than 5%? You'd want to lower your alpha to under 5%. You want to reduce the chance of your falsely concluding this is a good drug when it's not, because it has nasty side effects. So, the 5% is the convention, but if you have good reasons for increasing or decreasing it by all means do so. On the other hand, if in starting a new program of research, the drug has no harmful side effects, and you want to reduce the chances of missing an important effect especially since at this point your procedures may be relatively unrefined, then you may want to increase your alpha level to say .10. That is, your experiment is designed in a way that you have a 10% chance of a false positive. It doesn't matter if you misapply this drug. It's not going to hurt anybody. So, on the one hand, you may want to have a .01 a point .001 alpha level if the drug has nasty side effects. Or, you may want to have a .10 alpha level if you are doing a pilot study. What if this drug is for a horrific disease, a crippling disease or a life threatening disease? Well, you don't want to do an experiment that causes you to miss the good drug. Assume it is a very devasting disease. You've got a chance to do something about it. So you want to make sure that if the drug works you don't miss it. Well you could try to reduce the beta error, to .1 instead of .2. That is increase your power from .8 to .9 maybe .95, whatever it takes to do that kind of
24

thing. Of course, increasing alpha increases power, so that is one of your alternatives. So these numbers are important for you to interpret, not the statisician. Because only you know the medical aspects of the treatments that you're using, how devastating the disease is, or how painful the side effects might be. You are the person who knows these issues. Then the statistician builds an experiment to guarantee your alpha level is what you want it to be and your beta error or your power is what you want it to be. You'll be interacting with the statistician on these kinds of issues, but you'll have to use your medical knowledge to decide these sorts of things. The figure below shows what alpha, beta and power look like in a graph and illustrates some of the relationships between them. Remember, alpha and beta represent the probabilities of Type I and Type II Errors, respectively. Figure 3.3: Alpha and Beta Errors

Since power is 1-beta, the area to the right of 107.5 under the right curve represents your power in this experiment. It should be apparent from the graph that alpha, beta and power are closely related. In particular, you can see that reducing alpha is equivalent to moving the vertical line between the two sample means to the right. When you do this, alpha decreases, power (1 - beta) decreases,
25

and beta increases. On the other hand moving that same vertical line to the left increases alpha, increases power, and decreases beta. To put it another way, increases in alpha increase power and decreases in alpha decrease power.

CHAPTER 11: HYPOTHESIS TEST FOR A MEAN


This chapter explains how to conduct a hypothesis test of a mean, when the following conditions are met:

The sampling method is simple random sampling. The sample is drawn from a normal or near-normal population.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.

The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.

The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

26

The table below shows three sets of hypotheses. Each makes a statement about how the population mean is related to a specified value M. (In the table, the symbol means " not equal to ".)

Set 1 2 3

Null hypothesis =M >M <M

Alternative hypothesis M <M >M

Number of tails 2 1 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis. Formulate an Analysis Plan The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.

Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

Analyze Sample Data Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.
27

Standard error. Compute the standard error (SE) of the sampling distribution. SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] } where s is the standard deviation of the sample, N is the population size, and n is the sample size. When the population size is much larger (at least 10 times larger) than the sample size, the standard error can be approximated by: SE = s / sqrt( n )

Degrees of freedom. The degrees of freedom (DF) are equal to the sample size (n) minus one. Thus, DF = n - 1.

Test statistic. The test statistic is a t-score (t) defined by the following equation. t = (x - ) / SE

where x is the sample mean, is the hypothesized population mean in the null hypothesis, and SE is the standard error.

P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to assess the probability associated with the tscore, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Interpret Results If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

28

CHAPTER 12: CONCLUSION


Hypothesis testing is the sheet anchor of empirical research and in the rapidly emerging practice of evidence-based medicine. However, empirical research and, ipso facto, hypothesis testing have their limits. The empirical approach to research cannot eliminate uncertainty completely. At the best, it can quantify uncertainty. This uncertainty can be of 2 types: Type I error (falsely rejecting a null hypothesis) and type II error (falsely accepting a null hypothesis). The acceptable magnitudes of type I and type II errors are set in advance and are important for sample size calculations. Another important point to remember is that we cannot prove or disprove anything by hypothesis testing and statistical tests. We can only knock down or reject the null hypothesis and by default accept the alternative hypothesis. If we fail to reject the null hypothesis, we accept it by default.

29

CHAPTER 13: RECOMMENDATIONS


Here are some things you should remember about hypothesis testing: Hypotheses are never about the samples. We can see what's true in the samples. Hypotheses are always about the population or "general case." On the other hand, when we do a hypothesis test, reject the null hypothesis, and conclude that means are "significantly different," THAT is a statement about the samples. Data are not significant. Results are not significant. The analysis is not significant. DIFFERENCES are significant. (Sometimes we also say that effects are significant, meaning that the effect indicates a significant difference.) A significant difference means that the observed difference in some statistic between two (or more) samples is PROBABLY NOT DUE TO RANDOM CHANCE. It does not mean that this difference was caused by the independent variable, and it most certainly does not "prove" that the groups are different or that the hypothesis is correct. A significant difference (or effect) "supports" or "confirms" the experimental hypothesis. Never, EVER say that the hypothesis has been proven as the result of a single hypothesis test. Remember, the hypothesis is about the population, and we have not seen the population. We've only seem a small piece of it--the sample. What's true or false in the sample does not prove anything about the population. A hypothesis test may lead you to the wrong conclusion in two ways, as follows:
o

You may conclude that the null hypothesis is false when, in fact, it is true. This is called a Type I error. If the null hypothesis is, in fact true, the probability of committing a Type I error is determined by (and equal to) the alpha level, usually .05. That means 5%, or 1 in 20, true null hypotheses end up being rejected by hypothesis tests!
30

That's the nature of the beast. There is nothing you can do about it (other than lower the alpha level, which has other unfortunate consequences).
o

You may conclude that the null hypothesis is true when, in fact, it is false. That is, you may claim not to see an effect that is really there. This is called a Type II error. If the null hypothesis is, in fact, false, then the probability of committing a Type II error is called beta. The ability of a hypothesis test to find an effect that is really there is called the power of the test and is equal to 1-beta. If you decrease the alpha level of a test in order to avoid making a Type I error, you will generally increase beta and, therefore, decrease the power of the test to find an effect that really is there. You can't have it both ways. Type I and Type II errors are generally traded off.

The most important thing you can do to increase the power of a test is to increase the sample size. Small sample sizes generally mean small power. Sample size DOES NOT affect the Type I error rate. You are NOT more likely to make a Type I error because of a small sample size.

If you conduct more than one hypothesis test at alpha=.05, the overall (or "family wise") Type I error rate obeys the simple laws of probability. The more tests you conduct, the more likely you are to commit a Type I error on at least one of them. If you do 20 tests and find only 1 significant difference, that one is very likely a Type I error.

31

CHAPTER 14: WIBLIOGRAPHY


Websites or the links referred are as follows: http://www.zdnet.com/testing/stories/main/0,10475,2636557,00.ht ml http://www.gazhoo.com/doc/201105281211242104/Project+Repor t+on+hypothesis testing www.scribd.com www.slideshare.net www.eprimers.org http://nrega.nic.in/circular/So testing hypothesisI.pdf

32

You might also like