Chapter 4 and 5: Bivariate Analysis of Variables
Chapter 4 and 5: Bivariate Analysis of Variables
Chapter 4 and 5: Bivariate Analysis of Variables
Chapter 4 and 5
Thus, a contingency table is a double entry table, where each box showing the
number of cases or individuals who possess a level of one of the characteristics
analyzed and another level of the other characteristic.
Marginal distributions
When analyzing a two-dimensional distribution, one can focus the study on the
behavior of one variable, regardless of how the other behaves. We would then
calculate the marginal distributions:
Defining:
J I
are marginal absolute frequencies of the
ni nij n j nij variables A and B, respectively.
j 1 i 1
J nij I nij
fi f j are marginal relative frequencies of the
j 1 n i 1 n variables A and B, respectively.
Using these marginal distributions we can construct the following
contingency table:
a) Marginal distributions
d) Column profiles
Example: Balearic Islands as a second home. In order to see the evolution and
structure of tourist expenditure, the Balearic Government conducted an annual
survey on tourist expenditure in the Balearic Islands. Among the published
information for 1990 is the desire of the tourists to select Baleares as a possible
second residence. It is considered that this desire may be a function of the zone
where the stay, i.e. the answers to the question "would you choose the Balearic
Islands as a second home?" have been crossed with the place of stay. Possible
answers to the question are: (i) no, (ii) yes, in the coming years, (iii) yes, when I
retire, (iv) does not know. Places of residence were classified into the following
areas: (1) Palma, (2) Costa de Ponent, (3) Costa de Tramuntana, (4) bay of
Pollença, (5) Badia d'Alcudia, (6) Costa Llevant; (7) Platja de Palma-El Arenal, (8)
Minorca (9) Eivissa-Formentera.
Contingency table
Row profile:
Column profile:
Distribution of one variable if the
other meets a specific condition.
xi ni.
(Frequency when y = specific value)
x1 n1.
x2 n2.
… …
xn-1 nn-1.
xn nn.
Leves X
1
2
0,1692
0,0769
0,0615
0,0385
0,0231
0,0154
0,0077
0,0154
0,2615
0,1462 ni.
3 0,0923 0,0615 0,0077 0,0154 0,1769
4
5
0,0615
0,0308
0,0308
0,0077
0,0000
0,0000
0,0077
0,0000
0,1000
0,0385 N
Marginal de Graves
0,6615 0,2385 0,0538 0,0462 1
n. j
N
ni . n. j nij
Si ij Independencia
N N N
• If we have a population we can use the
definition in a strict sense, that is, if the equality
is not fulfilled, we have dependence, but keep in
mind that the dependence can be very weak,
even irrelevant. (This analysis is found in this
chapter).
• If we have a sample, we have a hypothesis of
independence. That is, it is possible that the
equality is not fulfilled, but still we can not reject
the hypothesis of independence. (This analysis
introduced in Chapter 7).
Coefficients of association
Note: C can only reach close to 1 if the contingency table is very large. You must
compare C with maximum limit to interpret the degree (i.e., strength) of the association.
(The closer to the maximum limit, the stronger is the association between the variables).
Important:
3) Even if we study two ordinal variables, neither of the measures show the
sign of the relation.
(Note: it does not make any sense of talking about the sign of the
association if at least one of the variables is nominal!)
Bivariate analysis for quantitative variables.
Concept of dependence or linear association
We say that a relation of exact linear dependence between X and Y exists when
a, b are such that:
Quadratic relation
Concept of linear dependence
xy S xy
rxy
x y S x S y
The measure is invariant to linear transformations (change of origin and scale) of the
variables. (Exception: If the transformation(s) include(s) a change in the sign of one (but
not both!) of the variables. In that case r changes sign, while the magnitude remains the
same).
• It is a dimensionless coefficient -1 r 1
• If there is a positive linear relation r > 0 and near 1
• If there is a negative linear relation r < 0 and near -1
Properties:
• The closer to -1 or 1, the stronger is the degree of association of the variable
• If there is no linear relation r approaches 0 IMPORTANT!!!
• If X and Y are independent Sxy = 0 and accordingly r = 0
Important:
If two variables are independent, their covariance is zero. We can not assure the same in reverse. If two variables have zero
covariance, this does not mean that they are independent. Linearly no relation exists, but they may be dependent.
rxy
Correlation Matrix (R)
If we have k variables we can calculate the correlation coefficients for each pair
of the variables and present them in a matrix of correlations:
Properties: