Lecture
Lecture
Lecture
Sample Exercise:
Plot the following points in the rectangular coordinate system.
1. (-3, 2) 6. (3, 3)
Bivariate Data
Data in statistics is sometimes classified according to how many variables are in particular study. When you
conduct a study that looks at a single variable, that study involves univariate data. For example, you study a
group of students to find out their average grade.
Bivariate data is when you are studying two variables. These variables are compared to find the
relationships between them. For example, age might be one variable and weight might be another variable.
Another is when you want to find out the temperature and the ice cream sales.
Using correlation analysis, we can find out the relationship of variables in a bivariate data. Many businesses,
marketing and social science questions and problems could be solved using bivariate data sets. For instance,
is there
a link between child obesity and family income? This is where correlation analysis is helpful.
Correlation analysis is a method of statistical evaluation used to study the strength of a relationship
between two numerically measured, continuous variables(e.g. height and weight). This particular type of analysis
is useful when a researcher wants to establish if there are possible connections between variables.
A scatterplot, or diagram, is a type of mathematical diagram using Cartesian coordinates to display values
for two variables in a set of data. The independent variable is plotted along the horizontal axis (x) and the
dependent variable is plotted along the vertical axis (y). Scatterplot provides a visual representation of the
correlation, or relationship between the two variables. It shows the direction and strength of a relationship of
the variables.
All correlations have two properties: direction and strength.
Positive correlation: Both variables move in the same direction. In other words, as one variable
increases, the other variable also increases. As one variable decreases, the other variable also
decreases. An upward trend in points indicates a positive correlation.
Negative correlation: The variables move in opposite directions. As one variable increases, the other
variable decreases. As one variable decreases, the other variable increases. A downward trend in points
indicates a negative correlation.
Examples: academic performance vs. no. of hours watching tv; stress vs. job performance
Zero or no correlation: It means that there is no apparent relationship between the two variables.
Example: shoe size vs. salary; socio-economic status vs. grades
NORHAN A. SARIP 1
The strength of a correlation is determined by its numerical value. It may be perfect, very high,
moderately high, moderately low, very low, and zero.
The diagram above shows some examples of scatter plots and correlations.
What’s interesting is you can create your scatterplot from your data using Excel. Here are the steps you need:
The most common coefficient of correlation is known as the Pearson product-moment correlation coefficient,
or Pearson’s r. It is a measure of the linear correlation (dependence) between two variables X and Y, giving a
NORHAN A. SARIP 2
value between +1 and −1. It was developed by Karl Pearson from a related idea introduced by Francis Galton
in the 1880s.
When conducting a statistical test between two variables, it is a good idea to conduct a Pearson correlation
coefficient value to determine just how strong that relationship is between the two variables. If the coefficient
value is in the negative range, then that means the relationship between the variables is negatively correlated,
or as one value increases, the other decreases. If the value is in the positive range, then that means the
relationship between the variables is positively correlated, or both values increase or decrease together.
If r = ±l perfect correlation
Most spreadsheet editors such as Excel, Google sheets and OpenOffice can compute correlations for you.
The illustration below shows an example:
Using the Excel, click on an empty cell where you want the correlation coefficient to be entered. Then enter
the following formula.
=PEARSON(array1, array2)
Simply replace ‘array1‘ with the range of cells containing the first variable and replace ‘array2‘ with the
range of cells containing the second variable.
For the example above, the Pearson correlation coefficient (r) is 0. 76.
NORHAN A. SARIP 3