Mba Unit 3 Notes
Mba Unit 3 Notes
Mba Unit 3 Notes
Analysis
Properties of Regression
4 VIEW
Coefficients
If the relationship between two variables X and Y is to be ascertained, then the following
formula is used:
• The value of the coefficient of correlation (r) always lies between±1. Such as:
r=+1, perfect positive correlation
r=-1, perfect negative correlation
r=0, no correlation
• The coefficient of correlation is independent of the origin and scale.By origin, it means
subtracting any non-zero constant from the given value of X and Y the vale of “r” remains
unchanged. By scale it means, there is no effect on the value of “r” if the value of X and Y is
divided or multiplied by any constant.
• The coefficient of correlation is a geometric mean of two regression coefficient. Symbolically
it is represented as:
• The coefficient of correlation is “ zero” when the variables X and Y are independent. But,
however, the converse is not true.
2. There are a large number of independent causes that affect the variables under study so as to
form a Normal Distribution. Such as, variables like price, demand, supply, etc. are affected by
such factors that the normal distribution is formed.
Note: The coefficient of correlation measures not only the magnitude of correlation but also
tells the direction. Such as, r = -0.67, which shows correlation is negative because the sign is “-
“ and the magnitude is 0.67.
The Spearman correlation between two variables is equal to the Pearson correlation between
the rank values of those two variables; while Pearson’s correlation assesses linear relationships,
Spearman’s correlation assesses monotonic relationships (whether linear or not). If there are
no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the
variables is a perfect monotone function of the other.
Intuitively, the Spearman correlation between two variables will be high when observations
have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the
observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when
observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two
variables.
n= number of observations
Assumptions
The assumptions of the Spearman correlation are that data must be at least ordinal and the
scores on one variable must be monotonically related to the other variable.
REPORT THIS AD
The coefficient of correlation cannot take value less than -1 or more than one +1. Symbolically,
-1<=r<= + 1 or | r | <1.
This property reveals that if we subtract any constant from all the values of X and Y, it will not affect
the coefficient of correlation.
This property reveals that if we divide or multiply all the values of X and Y, it will not affect the
coefficient of correlation.
Karl Pearson’s Coefficient of Correlation is widely used mathematical method wherein the
numerical expression is used to calculate the degree and direction of the relationship between
linear related variables.
If the relationship between two variables X and Y is to be ascertained, then the following formula
is used:
• The coefficient of correlation is “zero”when the variables X and Y are independent. But, however,
the converse is not true.
1. The relationship between the variables is “Linear”,which means when the two variables are
plotted, a straight line is formed by the points plotted.
2. There are a large number of independent causes that affect the variables under study so as to
form a Normal Distribution. Such as, variables like price, demand, supply, etc. are affected by
such factors that the normal distribution is formed.
3. The variables are independent of each other.
Note: The coefficient of correlation measures not only the magnitude of correlation but also tells
the direction. Such as, r = -0.67, which shows correlation is negative because the sign is “-“and
the magnitude is 0.67.
Advertisements
REPORT THIS AD
REPORT THIS AD
Regression is a statistical measurement used in finance, investing and other disciplines that
attempts to determine the strength of the relationship between one dependent variable (usually
denoted by Y) and a series of other changing variables (known as independent variables).
Regression helps investment and financial managers to value assets and understand the
relationships between variables, such as commodity prices and the stocks of businesses dealing
in those commodities.
Regression Explained
The two basic types of regression are linear regression and multiple linear regressions, although
there are non-linear regression methods for more complicated data and analysis. Linear
regression uses one independent variable to explain or predict the outcome of the dependent
variable Y, while multiple regressions use two or more independent variables to predict the
outcome.
Regression can help finance and investment professionals as well as professionals in other
businesses. Regression can also help predict sales for a company based on weather, previous
sales, GDP growth or other types of conditions. The capital asset pricing model (CAPM) is an
often-used regression model in finance for pricing assets and discovering costs of capital.
• Linear regression: Y = a + bX + u
• Multiple regression: Y = a + b1X1 + b2X2 + b3X3 + … + btXt + u
Where:
a = the intercept.
b = the slope.
Regression takes a group of random variables, thought to be predicting Y, and tries to find a
mathematical relationship between them. This relationship is typically in the form of a straight
line (linear regression) that best approximates all the individual data points. In multiple
regression, the separate variables are differentiated by using numbers with subscripts.
ASSUMPTIONS IN REGRESSION
REGRESSION LINE
Definition: The Regression Line is the line that best fits the data, such that the overall distance
from the line to the points (variable values) plotted on a graph is the smallest. In other words,
a line used to minimize the squared deviations of predictions is called as the regression line.
There are as many numbers of regression lines as variables. Suppose we take two variables,
say X and Y, then there will be two regression lines:
• Regression line of Y on X: This gives the most probable values of Y from the given values of
X.
• Regression line of X on Y: This gives the most probable values of X from the given values of
Y.
The algebraic expression of these regression lines is called as Regression Equations. There will
be two regression equations for the two regression lines.
The correlation between the variables depend on the distance between these two regression
lines, such as the nearer the regression lines to each other the higher is the degree of correlation,
and the farther the regression lines to each other the lesser is the degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the two regression
lines coincide, i.e. only one line exists. In case, the variables are independent; then the
correlation will be zero, and the lines of regression will be at right angles, i.e. parallel to the X
axis and Y axis.
Note: The regression lines cut each other at the point of average of X and Y. This means, from
the point where the lines intersect each other the perpendicular is drawn on the X axis we will
get the mean value of X. Similarly, if the horizontal line is drawn on the Y axis we will get the
mean value of Y.
Thus, all these properties should be kept in mind while solving for the regression coefficients.
Correlation Analysis
Correlation is a measure of association between two variables. The variables are not designated
as dependent or independent. The two most popular correlation coefficients are: Spearman’s
correlation coefficient rho and Pearson’s product-moment correlation coefficient.
When calculating a correlation coefficient for ordinal data, select Spearman’s technique. For
interval or ratio-type data, use Pearson’s technique.
The value of a correlation coefficient can vary from minus one to plus one. A minus one
indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation.
A correlation of zero means there is no relationship between the two variables. When there is
a negative correlation between two variables, as the value of one variable increases, the value
of the other variable decreases, and vise versa. In other words, for a negative correlation, the
variables work opposite each other. When there is a positive correlation between two variables,
as the value of one variable increases, the value of the other variable also increases. The
variables move together.
The standard error of a correlation coefficient is used to determine the confidence intervals
around a true correlation of zero. If your correlation coefficient falls outside of this range, then
it is significantly different than zero. The standard error can be calculated for interval or ratio-
type data (i.e., only for Pearson’s product-moment correlation).
The significance (probability) of the correlation coefficient is determined from the t-statistic.
The probability of the t-statistic indicates whether the observed correlation coefficient occurred
by chance if the true correlation is zero. In other words, it asks if the correlation is significantly
different than zero. When the t-statistic is calculated for Spearman’s rank-difference correlation
coefficient, there must be at least 30 cases before the t-distribution can be used to determine
the probability. If there are fewer than 30 cases, you must refer to a special table to find the
probability of the correlation coefficient.
Example
A company wanted to know if there is a significant relationship between the total number of
salespeople and the total number of sales. They collect data for five months.
Variable 1 Variable 2
207 6907
180 5991
220 6810
205 6553
190 6190
——————————–
Correlation coefficient = .921
Standard error of the coefficient = ..068
t-test for the significance of the coefficient = 4.100
Degrees of freedom = 3
Two-tailed probability = .0263
Another Example
Respondents to a survey were asked to judge the quality of a product on a four-point Likert
scale (excellent, good, fair, poor). They were also asked to judge the reputation of the company
that made the product on a three-point scale (good, fair, poor). Is there a significant relationship
between respondents perceptions of the company and their perceptions of quality of the
product?
Since both variables are ordinal, Spearman’s method is chosen. The first variable is the rating
for the quality the product. Responses are coded as 4=excellent, 3=good, 2=fair, and 1=poor.
The second variable is the perceived reputation of the company and is coded 3=good, 2=fair,
and 1=poor.
Variable 1 Variable 2
4 3
2 2
1 2
3 3
4 3
1 1
2 1
——————————————-
Correlation coefficient rho = .830
t-test for the significance of the coefficient = 3.332
Number of data pairs = 7
Probability must be determined from a table because of the small sample size.
Regression Analysis
Simple regression is used to examine the relationship between one dependent and one
independent variable. After performing an analysis, the regression statistics can be used to
predict the dependent variable when the independent variable is known. Regression goes
beyond correlation by adding prediction capabilities.
People use regression on an intuitive level every day. In business, a well-dressed man is thought
to be financially successful. A mother knows that more sugar in her children’s diet results in
higher energy levels. The ease of waking up in the morning often depends on how late you
went to bed the night before. Quantitative regression adds precision by developing a
mathematical formula that can be used for predictive purposes.
For example, a medical researcher might want to use body weight (independent variable) to
predict the most appropriate dose for a new drug (dependent variable). The purpose of running
the regression is to find a formula that fits the relationship between the two variables. Then you
can use that formula to predict values for the dependent variable when only the independent
variable is known. A doctor could prescribe the proper dose based on a person’s body weight.
The regression line (known as the least squares line) is a plot of the expected value of the
dependent variable for all values of the independent variable. Technically, it is the line that
“minimizes the squared residuals”. The regression line is the one that best fits the data on a
scatterplot.
Using the regression equation, the dependent variable may be predicted from the independent
variable. The slope of the regression line (b) is defined as the rise divided by the run. The y
intercept (a) is the point on the y axis where the regression line would intercept the y axis. The
slope and y intercept are incorporated into the regression equation. The intercept is usually
called the constant, and the slope is referred to as the coefficient. Since the regression model is
usually not a perfect predictor, there is also an error term in the equation.
In the regression equation, y is always the dependent variable and x is always the independent
variable. Here are three equivalent ways to mathematically describe a linear regression model.
y = a + bx + e
The significance of the slope of the regression line is determined from the t-statistic. It is the
probability that the observed correlation coefficient occurred by chance if the true correlation
is zero. Some researchers prefer to report the F-ratio instead of the t-statistic. The F-ratio is
equal to the t-statistic squared.
The t-statistic for the significance of the slope is essentially a test to determine if the regression
model (equation) is usable. If the slope is significantly different than zero, then we can use the
regression model to predict the dependent variable for any value of the independent variable.
On the other hand, take an example where the slope is zero. It has no prediction ability because
for every value of the independent variable, the prediction for the dependent variable would be
the same. Knowing the value of the independent variable would not improve our ability to
predict the dependent variable. Thus, if the slope is not significantly different than zero, don’t
use the model to make predictions.
The coefficient of determination (r-squared) is the square of the correlation coefficient. Its
value may vary from zero to one. It has the advantage over the correlation coefficient in that it
may be interpreted directly as the proportion of variance in the dependent variable that can be
accounted for by the regression equation. For example, an r-squared value of .49 means that
49% of the variance in the dependent variable can be explained by the regression equation. The
other 51% is unexplained.
The standard error of the estimate for regression measures the amount of variability in the
points around the regression line. It is the standard deviation of the data points as they are
distributed around the regression line. The standard error of the estimate can be used to develop
confidence intervals around a prediction.
Example
4.2 27.1
6.1 30.4
3.9 25.0
5.7 29.7
7.3 40.1
5.9 28.8
————————————————–
You might make a statement in a report like this: A simple linear regression was performed on
six months of data to determine if there was a significant relationship between advertising
expenditures and sales volume. The t-statistic for the slope was significant at the .05 critical
alpha level, t(4)=3.96, p=.015. Thus, we reject the null hypothesis and conclude that there was
a positive significant relationship between advertising expenditures and sales volume.
Furthermore, 80.7% of the variability in sales volume could be explained by advertising
expenditures.