Univariate and Bivariate Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

Univariate and Bivariate

Analysis
T-Test
• A t-test is used to determine whether or not there is a statistically significant

difference between the means of two groups. There groups could be

completely independent groups

• (difference in smokers/ non smokers in sleep hours)

• pretest post test groups (related groups) in an experiment/ survey:

Performance test before or after tutorial.


 Independent samples t-test. 
• This is used when we wish to compare the difference between the means of
two groups and the groups are completely independent of each other.

• For example, researchers may want to know whether diet A or diet B helps


people lose more weight. 100 randomly assigned people are assigned to
diet A. Another 100 randomly assigned people are assigned to diet B. After
three months, researchers record the total weight loss for each person. To
determine if the mean weight loss between the two groups is significantly
different, researchers can conduct an independent samples t-test.
Hypothesis
• H0 The mean scores of Control and experimental groups are not any
different from each other on DV.

• H1 The mean scores of Control and experimental groups are


significantly different from each other on DV.
Paired Sample T-Test

• When we wish to compare the difference between the means of a group


before and after an event/ treatment.

• For example, suppose 20 students in a class take a test, then study a


certain guide, then retake the test. To compare the difference between the
scores in the first and second test, we use a paired t-test because for each
student their first test score can be paired with their second test score.
Hypothesis
• H0 There is no pre and post treatment difference in mean scores of
DV in the sample

• H1 There is a significant pre/post difference in the mean scores of DV


in the sample
Comparison parametric vs non parametric
• Metric (ratio/interval) • Non Metric (non-normal,
nominal/ ordinal)DV

• Independent sample T –Test • Mann-Whitney U Test


• Paired Sample T-Test • Wilcoxon test
• One-Way ANOVA • Kruskal-Wallis test
Bivariate Tests

Analysis involves two variables (dependent and

independent variables).
ANOVA- extension of independent sample T Test

• An ANOVA (analysis of variance) is used to determine whether or not

there is a statistically significant difference between the means of three or

more groups of people when the groups can be split on one factor.

• . The most commonly used ANOVA tests in practice are the one-way

ANOVA in univariate tests and the two-way ANOVA in bivariate tests:


ANOVA
• Same as independent sample T Test except for the
number of categories of IV is >2.

• Engineers and lawyers are significantly different in


their level of income (DV)? Independent sample T-
Test

• Whether engineers, doctors and lawyers are


significantly different in their level of income? ANOVA
Example- One way ANOVA

Example: You randomly split up a class of 90 students into three groups of 30.


90

30
30 30
• Each group uses a different studying technique for one month to prepare for
an exam. At the end of the month, all of the students take the same
exam. You want to know whether or not the studying technique (factor) has an
impact on exam scores so you conduct a one-way ANOVA to determine if
there is a statistically significant difference between the mean scores of the
three groups.
.•
When to use a one-way ANOVA

• Use a one-way ANOVA when you have collected data about one categorical
independent variable and one quantitative dependent variable. The
independent variable should have at least three levels (i.e. at least three
different groups or categories).

• ANOVA tells you how the mean of a dependent variable changes according
to three or more levels of independent variable. For example:

• Your independent variable is brand of soda (IV), and you collect data
on Coke, Pepsi, Sprite, and Fanta (4 categories) to find out if there is a
difference in the price per 100ml (DV).
Hypothesis
• H0 There is no significant difference in DV (scores) with respect to a
specific categories of the independent variable (study method) .

• H1 There is a significant difference in DV with respect to a specific


categories of the independent variable.

.

• Assumptions of ANOVA
• The assumptions of the ANOVA test are the same as the general
assumptions for any parametric test:
1.Independence of observations: The observations should be
independent of each other.
2.Normally-distributed response variable: The values of the
dependent variable follow a normal distribution.
3.Homogeneity of variance: The population variances are the same
across all categories of DV ( all means are coming from same
population).
in terms of sales kids adults sports
Two Way ANOVA
• A two-way ANOVA is an extension of the one-way ANOVA. A two-way
ANOVA test is a statistical test used to determine the effect of two
nominal predictor (IV) variables on a continuous DV.

• Income variance (DV) depends on the profession (IV1) and


gender(IV2) as well.

• The primary purpose of a two-way ANOVA is to test the joint influence


(interaction) of two (or more) factors (IV) on a single DV.
When to use a two-way ANOVA

• You can use a two-way ANOVA when you have collected data on a
quantitative dependent variable at multiple levels of two categorical
independent variables.

Stress levels of employees (mean level) dependent upon their gender (nominal
variable) and organizational level (upper, middle, lower)

Gender stress
Org Level stress
Gender*Org Level stress
Questions asked by Two Way ANOVA
• Does age affect memory?
• Does health affect memory?
These two effects are called main effects and refer to the effect of
a single factor on dependent variable (memory) at a time.

• Do age and health interact to predict memory?


An interaction occurs when the effect of one factor depends on
the other factor. An example of an interaction would be if the
effect of age on memory depended on whether your health was
below average, average or above average.

Maybe being in above average health makes old people memorize


the same as young people in above average health.
Hypothesis two-way ANOVA with interaction tests three null hypotheses at the same time:

• There is no difference in group means at any level of the first independent


variable.
• There is no difference in group means at any level of the second independent
variable.
• The effect of one independent variable does not depend on the effect of the
other independent variable (a.k.a. no interaction effect).
• Alternative hypothesis are the opposite
• There is difference in group means at a specific level of the first independent
variable.
• There is a difference in group means at a specific level of the second
independent variable.
• The effect of one independent variable depends on the effect of the other
independent variable (a.k.a. interaction effect).
Comparing one way and Two way ANOVA
• One Way ANOVA • Two Way Factorial ANOVA

• One categorical IV with more • Two categorical IV with any


than 2 categories number of categories

• One DV which is continuous • One DV which is continuous

• The average amount spent on • The average amount spent on


dining out is the same among dining out is the same among
low, middle and high income low, middle and high income
groups. groups depending upon their
age.
Correlation
• Measure of association (not causation) between 2 variables which are
ratio/ interval variables.

• For example, there exists a correlation between two variables X and Y,


which means the value of one variable is found to change in one
direction, the value of the other variable is found to change either in
the same direction (i.e. positive correlation) or in the opposite
direction (i.e. negative correlation).

• Increase in summer increase in ice cream sales


• Increase in performance decrease in absenteeism
Correlation coefficient
• The correlation coefficient, r, is a summary measure that describes the extent of
the statistical relationship between two interval or ratio level variables. The
correlation coefficient is always between -1 and +1. When r is close to 0 this
means that there is little relationship between the variables and the farther away
from 0 r is, in either the positive or negative direction, the greater the relationship
between the two variables.

• 0-0.5 weak

• 0.5-0.7 moderate

• 0.7-1 Strong

You might also like