ISP Handout
ISP Handout
ISP Handout
The purpose of inferential statistics is to draw a conclusion (an inference) about conditions that
exist in a population (complete set of observations) from the study of a sample (a subset) drawn
from the population. Population includes all observations (e.g. students’ scores, people’s incomes,
etc.) in which the researcher is interested. Usually, the populations we are interested in are either
too big to study in their entirety, or are inaccessible to the researcher. The practical consequence is
that we seldom if ever measure entire populations. Instead, we draw a sample, a carefully chosen
subset of the population, and use that sample to infer something about the characteristics of the
population. Thus, inferential statistics allows us to make inferences from the data to reach valid
conclusions that extend beyond the obtained dataset that we have analyzed.
Typically, the research process begins with a question about a population parameter. However, data
is obtained from a subset of the population and a sample statistic is computed from the data to make
an inference about the population parameter. Parameter is a descriptive index of the population. It
may be a mean, a median, a standard deviation, a correlation coefficient, or any other of a number
of statistical parameters that define the population. Statistic is a descriptive index of the sample.
Any of the same measures when it is calculated from a sample of data that we have collected is a
statistic. Parameters are the real entities of interest, and the corresponding statistics are guesses at
reality. Thus, the aim of statistical inference is to infer or form a conclusion about the characteristics
of the population (parameter) from what we know about the characteristics of the sample (statistic).
Inferences are basically educated guesses, based on a probability model for making better than
chance predictions.
Fig 1
The steps used in statistical inference
Inferential statistics is often used to compare the differences between the treatment groups and
make generalizations about the larger population of subjects. For example, suppose an investigator
wishes to test if a drug has an effect on the speed of learning. It is impossible to administer the drug
to everyone in the population, so she selects two samples at random and administers the drug to one
of the groups and a placebo to the other group. She gives both groups a learning task and finds that
the average learning scores of the two groups differ. Using inferential procedures, if she finds that
the obtained difference cannot be accounted for by chance variation, she will infer that the drug
improves the speed of learning for the population at large.
Inferential procedures are also used when exit polls are conducted to predict the outcome of an
election. It is not practical to ask every single citizen how they intend to vote. Instead, a relatively
small number of citizens are asked about their voting preferences, and inferences are drawn about
the outcome of the election from their responses.
Hypothesis Testing
Hypothesis testing is a means of statistical inference in which a specific value of the population
parameter is hypothesized by the researcher in advance, and is tested by using sample statistics. The
final statement is about a derived score (such as t, F) on the basis of which a decision is made about
retaining or rejecting the hypothesized value. Thus, hypothesis testing is used to determine if the
population parameter differs from some hypothesized value. Statistical tests used for hypothesis
testing include the t-test, Analysis of Variance (ANOVA), chi-square test etc.
Estimation
In estimation, unlike hypothesis testing, no value is specified in advance. The question is, “What is
the value of the population parameter?” For some kinds of research questions, such as those
ascertaining what percentage of voters prefer a particular candidate, hypothesis testing is useless; no
specific hypothesis presents itself. On the other hand, estimation procedures are exactly suited to
such problems. The researcher uses these procedures to directly estimate the true value of the
population parameter from a sample statistic.
Estimation may be point or interval. Point estimates are single values calculated from the sample
that estimates the parameter. For example, what percentage of voters will vote for candidate X?
What is the mean CUET score of applicants for admission to Delhi University? If it is impractical
to find the percentage in the entire population (as in the first example) or the mean of the entire
population (as in the second example), we can make an estimate of the population characteristic
from a random sample.
Interval estimates are estimations of a range of values within which the parameter is expected to
fall. In the question about candidate X, a point estimate might state that 49% of the population of
voters favor him or her. If we made an interval estimate, the outcome might state that we are 95%
confident that no less than 46% and no more than 52% of the voters (in the population) favor him or
her.
Parametric and Nonparametric Statistics
The statistical techniques for data analysis and inference in behavioural sciences can be categorized
as parametric and nonparametric statistical tests.
Parametric Tests
Parametric statistical tests specify certain conditions about the nature of the populations from which
the research sample or the observations are drawn. These procedures assume that the underlying
population distributions approximately resemble the normal distribution, and have roughly equal
variances (homogeneity of variance). Moderate departures from normality and even substantial
departures from homogeneity of variance may not seriously invalidate these tests when sample size
is moderate to large. However, problems can arise when the variables are distributed in a manner
that differs substantially from normality, and when sample size is small. In such cases, parametric
tests are of dubious value or not applicable at all.
Further, proper interpretation of parametric tests based on the normal distribution assumes that the
samples are randomly drawn from a population consisting of observations on an interval or ratio
measurement scale. But many real-world variables can be measured only in terms of an ordinal or a
nominal scale. By using a parametric test in such cases where the measurement is weaker than that
of an interval scale, the researcher would "add information" and thereby create distortions that are
quite damaging.
Nonparametric Tests
Nonparametric tests are a class of statistical procedures that rely on no or few assumptions about
the nature of the population distributions from which the data were drawn, and can be applied to
measurement weaker than that of an interval scale. These procedures are referred to as
nonparametric statistics because the statistical hypotheses that are tested are not about parameters
such as mean. Instead, they utilize some simple aspects of sample data such as the signs of
measurements, order relationships or category frequencies, and state hypotheses that are less
specific.
They are also called as distribution-free, or assumption-free methods because they make less
restrictive assumptions about the shape of the population distributions from which the samples were
drawn. Certain assumptions are associated with most nonparametric tests, namely that the
observations are independent, the variable under study has underlying continuity; but these
assumptions are fewer, weaker and less stringent than those associated with parametric tests. In fact,
nonparametric tests are used when assumptions of parametric tests, such as normal distribution and
homogeneity of variance, are doubtful.
Moreover, nonparametric tests may be applied appropriately to data measured in an ordinal scale, or
a nominal scale. Many of these tests are used to analyze data that can be expressed only in ranks
and even data that can be categorized only as plus or minus (better or worse, more or less). For
example, in studying anxiety, we may be able to state that subject ‘A’ is more anxious than subject
‘B’ without knowing at all how much more anxious ‘A’ is. Such data cannot be treated by
parametric methods unless precarious, and perhaps, unrealistic assumptions are made about the
underlying measurement scale. Further, nonparametric methods are available to treat data which are
simply classificatory or categorical i.e. are measured on a nominal scale. There are no parametric
techniques that apply to such data.
In addition, these measures are typically quite easy to use and require substantially less computation
than their parametric counterparts. However, the nonparametric techniques do not have the power
of the parametric tests ie. they are less able to detect a true difference when such is present. In fact,
a nonparametric test will require a slightly larger sample size to have the same power as the
corresponding parametric test. Therefore, nonparametric tests should not be used when the
parametric tests are applicable i.e. when assumptions for the latter are met.
Parametric techniques are used to test Nonparametric tests use statistical hypotheses
statistical hypotheses about specific that do not involve population parameters. They
population parameters such as the mean. utilize simple aspects of sample data such as the
signs of measurements, order relationships or
category frequencies, and state hypotheses that are
less specific.
Parametric tests have stringent assumptions Nonparametric tests have no stringent distribution
about the underlying population distribution assumptions. Can be used with non-normal
such as normality, homogeneity of variance. distributions and groups with unequal variances.
More power-efficient when all of the Less power-efficient i.e. they are less able to detect a
assumptions of a parametric statistical model true difference when such is present.
are met in the data.
The output from such tests is easy to Nonparametric tests are typically much easier to
interpret; more conclusions can be drawn compute, but are more difficult to interpret. Many of
from the results. However, it can be the tests use rankings of the values in the data rather
challenging to understand their workings. than using the actual data. Knowing that the
difference in mean ranks between two groups does
not really help our intuitive understanding of the
data.
A large sample size is required to run such Can be used if the sample size is very small (e.g.
tests. Theoretically, the sample size should samples of individuals with a rare
psychopathology). This feature of usefulness with
be more than 25 to 30 so that the central small samples might also be helpful to the
limit theorem can come into effect. researcher collecting pilot study data.
Parametric tests have been systematized and Nonparametric procedures are not systematic i.e.
different tests are simply variations on a they are not based on common themes and the
central theme. computational formulae are quite different for
different tests. Also, the tables necessary to
implement nonparametric tests are scattered widely
and appear in different formats.
On the whole, we can say that both parametric and nonparametric tests are important and useful
statistical techniques for data analysis and inference in behavioural sciences. The choice of a
particular test for use in making a decision depends on two criteria:
● The applicability or validity of the test (which includes the level of measurement and other
assumptions of the test)
● The power and efficiency of the test
When data is, at least, on an interval scale and when all the assumptions appropriate to the
parametric statistical model are satisfied, parametric tests are more powerful and applicable. On the
other hand, when the measurement is weaker than that of an interval scale and the assumptions
concerning the population are doubtful, nonparametric tests are more appropriate.
Compare two dependent Dependent Samples t-test Sign Test (two-sample test)
groups Wilcoxon Signed-Rank Test
Compare three or more One-way ANOVA Friedman’s Rank Test for Correlated
dependent groups (Repeated Measures) Samples
The alternative hypothesis states that the The alternative hypothesis states that the
population parameter differs from the value population parameter may be less than or
stated in HO in one particular direction. greater than the value stated in HO .
Our interest is in discovering whether or not Our interest is in discovering whether or not
there is a difference in a particular direction. there is a difference regardless of the direction
of the difference.
The critical region (region of rejection) is The critical region (region of rejection) is
located in only one tail of the distribution. divided between both tails of the distribution.
It does not allow for any chance of discovering It allows us to discover the possibility of a
that reality is just the opposite of what the difference in either direction.
alternative hypothesis states.
Other things being equal, the probability of Since the region of rejection is divided across
rejecting a false null hypothesis is greater for a the two tails of the distribution, the probability
one-tailed test (but only when the direction of rejecting a false null hypothesis is lesser than
specified by HA is correct). that for a one-tailed test.
It is appropriate when there is no practical It is useful for exploratory work when one is
difference in meaning between retaining the not sure of the direction of the relationship
null hypothesis and concluding that a between variables.
difference exists in a direction opposite to that
stated in the directional one-tailed hypothesis.
For example, if the physical fitness of school For example, if the performance of a group
children is tested, and it is necessary to institute were compared to a known standard, we would
a special training program if the performance is be interested in discovering if the group is
substandard, we may conduct a one-tailed test. superior or inferior, and hence conduct a
two-tailed test.
Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to
draw conclusions about an entire population. In hypothesis testing, we use significance levels and
p-values to determine whether the test results are statistically significant.
Alpha (𝛼) or level of significance is the probability of rejecting HO when HO is true. We reject HO if
X̅ falls in the region of rejection based on that level of significance. Level of significance is chosen
by the researchers in advance of conducting the test by considering the risk they are willing to take
of committing such an error when the sample is drawn. For instance, a significance level of 0.05
signifies a 5% risk of rejecting a true HO i.e. deciding that an effect exists when it does not exist.
However, many investigators do not report their selected level of significance, they report p-values.
A p-value is the probability, when HO is true, of observing a sample mean as deviant or more
deviant (in the direction specified in HA) than the obtained value of X̅. It is not established in
advance and is not a statement of risk; it simply describes the rarity of the sample outcome if HO is
true.
Investigators seldom report their sample p-values as exact figures. Instead, they commonly report
them relative to the landmarks of .05 and .01 — and sometimes .001. If the p-value is less than or
equal to the significance level, it is a significant result (i.e., HO is rejected), and it is commonly
reported as falling below the landmark. If the p-value is higher than the significance level, the result
is non-significant (i.e., HO is retained), and it will be reported as falling above the landmark. Here
are some examples:
Note: You may add figures to illustrate level of significance and p-value.
Unit II
t-test for independent samples is used when the samples are drawn so that the selection of elements
in one sample is in no way influenced by the selection of elements in the other, and vice versa.
Assumptions associated with t-test for independent samples are as follows:
1. Each sample is drawn at random from its respective population. In the real world, it is
usually impossible for investigators to work with samples that are randomly drawn from the
population of interest. Most work with convenient samples. For example, biomedical
researchers rely on volunteers for subjects. Faculty members who do research with humans
often use students who are currently enrolled in courses they teach.
In the case of testing hypotheses about the difference between two means, the random
assignment of available subjects is generally used as a substitute for random sampling. For
most situations in which groups are formed by random assignment, application of the
random sampling model will lead to the same statistical conclusion as would result from
application of the proper model.
3. Samples are drawn with replacement. However, it is common practice to sample without
replacement. The consequent error in inference is quite small as long as the sample size is a
small fraction of the population size.
4. The sampling distribution of (X̅ −Y̅) follows the normal curve. Strictly speaking,
sampling distribution will be normal only when the distribution of the population of scores
is also normal. However, according to the Central Limit Theorem, the sampling distribution
of the mean tends towards normality even when the population of scores is not normal. The
strength of this tendency towards normality is pronounced unless sample size is quite small.
For example, a moderate degree of skewness can be tolerated if the sample size is, say 25, or
more. When we have a serious question about the normality of the parent populations, a
non-parametric test will be more appropriate.
5. Homogeneity of variance - This assumption states that the variances of scores in the two
treatment groups is homogenous i.e. the variances of the two groups are equal. At first, this
assumption seems formidable. In practical application, however, there is help from several
quarters:
(ii) Violation of the assumption makes less disturbance when samples are large than
when they are small. Thus, moderate departure from homogeneity of variance will
have little effect when the two samples consist of 25 or more observations.
(iii) Heterogeneity of variance ordinarily becomes a problem when sample sizes differ
considerably. Thus the problem created by heterogeneity of variance can be
minimized when the two samples are of equal (or nearly equal size).
On the whole, to combat non-homogeneity of variance, the best bet is to select samples of
equal (or approx. equal) size and the larger, the better.
Unit III
The ratio of between groups to within groups variance is distributed as F if the assumptions
underlying ANOVA are satisfied. If the assumptions are not sufficiently approximated, the
sampling distribution of mean square ratios may differ from the F-distribution and the conclusions
based on the F-test may not be valid. These assumptions are as follows:
1. Normality - This assumption states that scores comprising the treatment groups should be
sampled from normally distributed populations. Moderate departure from the normal
distribution does not unduly disturb the outcome of the test because the Central Limit
Theorem states that the sampling distribution of the mean tends towards normality even
when the population of scores is not normal. This is especially true when the sample size
increases.
Further, with highly skewed distributions, ANOVA results in less accurate probability
statements. When samples are quite small, and there is serious question about the
assumption of normality, one possible alternative is the Kruskal-Wallis non-parametric
one-way ANOVA.
2. Homogeneity of Variance - This assumption states that the variances of scores in each of
the 'k' treatment groups is homogenous i.e. the variances of the individual groups are equal.
At first, this assumption seems formidable. In practical application, however, there is help
from several quarters:
(ii) Violation of the assumption makes less disturbance when samples are large than
when they are small. Thus, moderate departure from homogeneity of variance will
have little effect when samples for each treatment condition consist of 25 or more
observations.
3. Independence of Samples - This is an essential assumption and it states that the selection
of elements comprising any particular sample is independent of the selection of elements in
any other sample i.e. when observations are made, any single score in any particular
treatment condition is independent of all other scores.
Generally, positive correlations between the different sets of scores can result in inflating
Type I error rate and negative correlation in its deflation. Therefore, the appropriate
procedure in ANOVA is to obtain only one observation from each subject and to assign
subjects at random to the different treatment conditions.
So, in the real world, most researchers use random assignment of available members of the
population to the different treatment conditions. If the subjects are randomly assigned to the
treatment conditions, it is contended that the different treatment groups are all random
samples from the population. On the whole, random sampling is more of a premise than a
requirement.
5. Sampling with Replacement - This assumption states that samples for the different
treatment groups are drawn from the population at random using sampling with replacement
plan. However, it is common practice to sample without replacement. The consequent error
in inference is quite small as long as the sample size is a small fraction of the population
size.
In sum, it can be said that moderate departure from normality and homogeneity of variance do
not seriously affect the appropriateness of the F-test. Resistance to such disturbances is
enhanced when sample size increases. Problems arise only when heterogeneous variances are
accompanied by unequal sample sizes. So, unequal samples should be avoided whenever
possible. In fact, samples of equal size make computation a little simpler, minimize the effect of
failing to satisfy the condition of homogeneity of variance and for a given total, minimize the
probability of committing Type II error. The assumption of independence of scores is essential
and in practice, it is achieved by random assignment of available subjects to the different
treatment conditions. In case of repeated measures on the same subjects or when matched
subjects have been assigned to treatment groups, this assumption is not valid and a different
procedure is used for such designs.