Introducing Inferential Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

INTRODUCING

INFERENTIAL
STATISTICS
I A N R E G GY B . PA R I N G
TARGETS OF THE SESSION
1. Explain the difference between descriptive and inferential statistics.
2. Define the central limit theorem and explain why it is important to the
world of inferential statistics.
3. List the steps in completing a test of statistical significance.
4. Explain and run a test in determining the normality of a data set.
5. Discuss the basic types of statistical tests and how they are used.
WORKSHOP

Sample
Calculation

Discussion

FRAMEWORK OF THE SESSION


Say Hello to Inferential
Statistics
- Descriptive statistics provide basic measures of a
distribution of scores.
- Inferential statistics allow inferences to a larger
population from the sample.
How inference works?
1. Representative samples from two groups are selected.
2. Participants are selected.
3. Means for each group are compared.
4. Researchers conclude that measured difference/ relationship
between groups are either result from chance or true difference.
5. A conclusion is drawn regarding the result/s.
The Central Limit Theorem (CLT)
- The means of the sample drawn from a population will
be normally distributed.
- In many situations, when independent random variables
are summed up, their properly normalized sum tends
toward a normal distribution even if the original variables
themselves are not normally distributed.
The Idea of Statistical Significance
- Because sampling is imperfect
◦Samples may not ideally match the population, and
- Because hypotheses cannot be directly tested,
◦Inference is subject to error
Statistical Significance
- The degree of risk that you are willing to take that
you will reject a null hypothesis when it is actually
true – ERROR ( I or II).
Making Decision
How a Test of Significance Works
- Each type of hypothesis is tested with a particular
statistic.
- Each statistic is characterized by a unique distribution of
values that are used to evaluate the sample data.
Using a Statistical Test
1. State the null hypothesis. Ho: μ1 = μ2
2. Establish the level of significance. (α = 0.05 or 0.01)
3. Select the appropriate test statistics.
4. Determine the value needed to reject the null hypothesis which depends on the:
◦ Level of significance and the degree of freedom.

5. Compare the obtained value to critical value.


p – value > α
CONVENTIONAL
◦ If obtained value > than the critical value, reject the null.
p – value < α
◦ If obtained value < than the critical value, accept the null.
How to select the appropriate test
statistics?
Let’s RECALL IMPORTANT PRINCIPLES
1. Data types

Data Variables

Categorical:
Scale appear as categories
Measurements/ Numerical/ Tick boxes on questionnaires
count data

WWW.STATSTUTOR.AC.UK
Data types
Variables

Scale Categorical

Continuous
Measurements Discrete: Ordinal: Nominal:
takes any value Whole number values obvious order no meaningful order
(Interval and
Ratio)

WWW.STATSTUTOR.AC.UK
Questionnaire for NAT Maths Pupils
What data types relate to following questions?

➢Q1: What is your favourite subject?


Maths English Science Art French

Q2: Gender: Male Female

Q3: I consider myself to be good at mathematics:


Strongly Disagree Not Sure Agree Strongly
Disagree Agree
➢Q4: Score in a recent mock NAT maths exam:

Score between 0% and 100%

WWW.STATSTUTOR.AC.UK
Questionnaire for NAT Maths Pupils
What data types relate to following questions?

➢Q1: What is your favourite subject?


Nominal
Maths English Science Art French

Q2: Gender: Male Female Binary/ Nominal

Q3: I consider myself to be good at mathematics:


Strongly Disagree Not Sure Agree Strongly
Ordinal or sometimes
Disagree Agree
treated as Interval
➢Q4: Score in a recent mock NAT maths exam:

Score between 0% and 100% Scale

WWW.STATSTUTOR.AC.UK
2. Populations and samples

Taking a sample from a


population.

Sample data ‘represents’ the


whole population

WWW.STATSTUTOR.AC.UK
3. Assessing Normality & Identifying
Outliers

WWW.STATSTUTOR.AC.UK
ormal
Discussion

How could you tell that the data follow a


Normal Distribution?

WWW.STATSTUTOR.AC.UK
Assessing Normality
Charts can be used to informally assess whether data is:

Normally
Or….Skewed
distributed

Statistical Tests: Shapiro-Wilk Test, D’Agostino K’ Test, or Anderson-Darling Test

WWW.STATSTUTOR.AC.UK
Assessing Normality
There are 3 ways you can check the normality of the data using
excel.

1. Compare the mean and median;


2. Check the value of skewness; and
3. Analyse normality using Box and Whiskers
(Checking the presence of outliers.

Statistical Tests: Shapiro-Wilk Test, D’Agostino K’ Test, or Anderson-Darling Test

WWW.STATSTUTOR.AC.UK
Activity 1
Test for Normality
WORKSHOP 1

Using the data set below, determine whether the


data set is normally distributed. Suppose that the
data set is composed of the different test scores
of your students.

78, 81, 92, 85, 67, 73, 89, 95, 79, 83, 90, 72, 86, 69, 91, 84, 76,
88, 80, 77, 94, 71, 82, 96, 70, 75, 93, 68, 87, 74
4. Choosing summary statistics
Which average and measure of
spread?

Scale Categorical

Normally
Skewed data Ordinal:
distributed Nominal:
Median Median
Mean (Standard (middle 50%) Mode
(middle 50%)
deviation)

WWW.STATSTUTOR.AC.UK
How to select the appropriate test
statistics?
Guide for
Choosing
Test
Statistics
Comparing Means
T-test for Independent Samples (2 groups/Data)
T-test for Two Paired Samples (1 group/2 Data)
One-Way ANOVA or ANOVA: Single Factor (3 or more
groups)
Two-Way ANOVA Without replication (3 or more groups)
Two-Way ANOVA With Replication (3 or more groups)
T-test for Independent Samples
(2 groups/Data)
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for a Sig.
Diff.
2. Is the data normally distributed? → Yes
3. Are the same participants being tested more than once? → No.
4. How many Groups? → 2 groups.
Then, you are looking for….

T-test for Two Independent Sample


To conduct a valid T-test for Two
Independent Samples:
•Data values must be independent. Measurements for one observation do not
affect measurements for any other observation. Run F-test
•Data in each group must be obtained via a random sample from the population.
•Data in each group are normally distributed.
•Data values are continuous (i.e., interval or ratio level).
•The variances for the two independent groups are equal or unequal. (Rule of
thumb, if the larger variance/smaller variance, and if the resulting answer is smaller than 4, we can safely
say that the 2 variances are equal or run for F-test)

• No outliers. Nonparametric Mann Whitney-U Test


Activity 2 Using Excel
T-test for Two Independent Samples
WORKSHOP 2

Using the data set below, determine whether the


data set is normally distributed. Suppose that
each data set is composed of the different test
scores of your students.
Section A Test Scores: 82, 74, 68, 91, 65, 77, 88, 70, 79, 84, 92,
61, 73, 80, 87
Section B Test Scores: 89, 76, 70, 95, 69, 81, 90, 73, 83, 88, 94,
64, 77, 85, 92
T-test for Two Paired Samples (1
group/2 Data)
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for a Sig.
Diff.
2. Is the data normally distributed? → Yes
3. Are the same participants being tested more than once? → Yes.
4. How many Groups? → 1 group but 2 Data.
Then, you are looking for….

T-test for Two Paired Samples


To conduct a valid T-test for Two
Paired Samples:
1.Dependent variable that is continuous (i.e., interval or ratio level)
1. Note: The paired measurements must be recorded in two separate variables.
2.Related samples/groups (i.e., dependent observations)
1. The subjects in each sample, or group, are the same. This means that the subjects
in the first group are also in the second group.
3.Random sample of data from the population
4.Normal distribution (approximately) of the difference between the
paired values
5.No outlier/s.
Nonparametric Wilcoxon Signed-Ranks Test
Activity 3 Using Excel
T-test for Two Paired Samples
WORKSHOP 3

Using the data set below, determine whether the


two data sets are statistically different. Suppose
these are the pre and post tests scores of one of
your classes before and after introducing a
strategy for improving reading comprehension.
Pre-test scores: 67, 72, 55, 80, 63, 78, 81, 68, 59, 73, 76, 62, 70, 85, 69
Post-test scores: 76, 80, 63, 92, 74, 88, 89, 77, 68, 81, 84, 72, 78, 94, 78
One-Way ANOVA or ANOVA:
Single Factor
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for
a Sig. Diff.
2. Is the data normally distributed? → Yes
3. How many Groups? → More than 2 groups.
Then, you are looking for….

One-Way ANOVA or ANOVA: Single Factor


To conduct a valid One-Way ANOVA
or ANOVA: Single Factor:
1.Your dependent variable should be measured at the continuous level (i.e., they are interval or ratio variables).

2.Your two independent variables should each consist of more than two categorical, independent groups.
3.Cases that have values on both the dependent and independent variables

4.Independent samples/groups (i.e., independence of observations) - Bartlett's Test For Equal Variance
1. There is no relationship between the subjects in each sample. This means that: (1) subjects in the first group cannot also be in the
second group, (2) no subject in either group can influence subjects in the other group, and (3) no group can influence the other
group

5.Random sample of data from the population

6.Your dependent variable should be approximately normally distributed for each combination of the groups
of the two independent variables.
7.Homogeneity of variances (i.e., variances approximately equal across groups) - Browne-Forsythe or Welch statistics for
unequal variances or

8.No outlier/s.

Nonparametric Kruskal-Wallis
One-Way ANOVA or ANOVA:
Single Factor:

After conducting a valid One-Way ANOVA or ANOVA: Single


Factor or after proving that there is a significant difference
between the groups, you will need to run the Tukey's Honestly-
Significant-Difference (Tukey HSD) Post Hoc Test to determine
which variables are significantly different from each other.
Activity 4 Using Excel
One-Way ANOVA or ANOVA: Single
Factor
Two-Way ANOVA without
Replication
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for
a Sig. Diff.
2. Is the data normally distributed? → Yes
3. 4. How many Groups? → More than 2 groups.
Then, you are looking for….

Two-Way ANOVA Without Replication


To conduct a valid Two-Way ANOVA
Without Replication
1.Your dependent variable should be measured at the continuous level (i.e., they
are interval or ratio variables).
2.Your two independent variables should each consist of two or more categorical, independent groups.
3.Cases that have values on both the dependent and independent variables

4.Independent samples/groups (i.e., independence of observations) There is no relationship between the subjects in
each sample. This means that: (1) subjects in the first group cannot also be in the second group, (2) no subject in either
group can influence subjects in the other group, and (3) no group can influence the other group

5.Random sample of data from the population

6.Your dependent variable should be approximately normally distributed for each combination of the
groups of the two independent variables.
7.Homogeneity of variances (i.e., variances approximately equal across groups) - Bartlett's Test For Equal
Variance

8.No outlier/s. Friedman Two-way Analysis of


Variance (ANOVA) by Ranks Test
Activity 5 Using Excel
Two-Way ANOVA Without Replication
Two-Way ANOVA With
Replication
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for
a Sig. Diff.
2. Is the data normally distributed? → Yes
34. How many Groups? → More than 2 groups.
Then, you are looking for….

Two-Way ANOVA With Replication


To conduct a valid Two-Way ANOVA
With Replication
1.Your dependent variable should be measured at the continuous level (i.e., they
are interval or ratio variables).
2.Your two independent variables should each consist of two or more categorical, independent groups.
3.Cases that have values on both the dependent and independent variables

4.Independent samples/groups (i.e., independence of observations) - There is no relationship between the subjects
in each sample. This means that: (1) subjects in the first group cannot also be in the second group, (2) no subject in
either group can influence subjects in the other group, and (3) no group can influence the other group

5.Random sample of data from the population

6.Your dependent variable should be approximately normally distributed for each combination of the
groups of the two independent variables.
7.Homogeneity of variances (i.e., variances approximately equal across groups) - Bartlett's Test For Equal
Variance

8.No outlier/s. Friedman Two-way Analysis of


Variance (ANOVA) by Ranks Test
Activity 6 Using Excel
Two-Way ANOVA With Replication
Looking for a Relationship
Pearson-r Moment Correlation
Linear Regression (Simple)
Pearson-r Moment Correlation
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for
a Sig. Relationship
2. Is the data normally distributed? → Yes
3. Are the you dealing two variables? → Yes.
Then, you are looking for….

Pearson-r Moment Correlation


To conduct a valid Pearson-r
Moment Correlation
1. Level of Measurement: The two variables should be measured at
the interval or ratio level.
2. Linear Relationship: There should exist a linear relationship
between the two variables.
3. Normality: Both variables should be roughly normally distributed.
4. Related Pairs: Each observation in the dataset should have a pair
of values.
5. No Outliers: There should be no extreme outliers in the dataset.
How to Interpret the r.
Activity 5 Using Excel
Pearson-r Moment Correlation
Linear Regression
Nature of Research Objective/s and Data:
1. Are you examining significant difference or relationship? → I’m looking for a Sig.
Relationship
2. Is the data normally distributed? → Yes
3. Are the you dealing one dependent variable (y) and 1 or more independent
variable/s (x)? → Yes.
Then, you are looking for….

Linear Regression (Simple or Multiple)


To conduct a valid Linear
Regression
1. Linear relationship: There exists a linear relationship between
the independent variable, x, and the dependent variable, y.
2. Independence: The residuals are independent. In particular,
there is no correlation between consecutive residuals in time
series data.
3. Homoscedasticity: The residuals have constant variance at
every level of x.
4. Normality: The residuals of the model are normally distributed.
Activity 6 Using Excel
Linear Regression
WORKSHOP

You might also like