Take Home Exam 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Candidate number: V11822 Take home exam 04/12/14

a)
SATmath: The estimated coefficient is 0.011. This means as point of score in pre-
university maths test increases by 1, the model estimates that the student course grade
increases 0.011.

Male: The estimated coefficient is 0.736. This means that the estimated effect of being a
male on student course grade is 0.736. Thus, the estimated average difference between
men and woman is 0.736.

b)
The standard error measures the variation in the sample and is given by:

,
Where s.e. = standard error ; s = sample standard deviation; and n=number of
observations

It can be computed from:

The estimated coefficient of b for variable male is 0.736 and t-statistic is 0.8, which
means:

The standard error for the variable male is 0.92

Whether the variable is statistically significant – and the estimated coefficient from our
sample is the true population parameter – depends on the underlying variability in the
population and the sample size. We can test it using two statistical methods: the t-
statistic/p-value and the confidence interval:
P-value:
- The p-value is the calculated probability associated with the t-statistics. The p-
value is the probability of observing this sample statistic if the null-hypothesis β=0
is true. For the variable male the t-statistic is 0.8 and the associated p-value is
0.424. It is above our 5 % level of significance and we cannot reject the hypothesis
of no effect for the variable male. Therefore the effect is statistical insignificant.
Confidence interval:
- Assuming that b is normally distributed around the mean, there is 95 % probability
that the true population parameter lies within 2 standard error of the regression
coefficient b. Thus, the confidence interval for the estimated coefficient for male
variable can be calculated from the formula:

1
Candidate number: V11822 Take home exam 04/12/14

The confidence interval for variable ‘male’:

We note that 0 is part of the confidence intervalm and we cannot reject that the
effect is different from 0 at 5 % level of significance. The effect is statistically
insignificant.

c)
In short, we say the variable is included to control to get the true effect of the impact on
attendance on grades. Even though we are interested in assessing the impact on
attendance on grades, it is necessary to consider potential alternative explanations. In
this case it seems plausible that, beside attendance, hours of study statistics per week
also have an impact on grades. Thereby ‘hoursstudy’ is a cause on its own. Furthermore,
it seems reasonable to expect a relationship between attendance and self-study. By
including ‘hoursstudy’ in the model, we can remove the effect of this variable – ‘by
holding it constant’ - and thereby assess the true impact on attendance on grades.

I would expect that the relationship is both direct and indirect and the association
between attendance and grades would be larger without ‘hoursstudy’ in the regression.
Including ‘hoursstudy’ in the model will mean that the association between attendance
and grades will decrease somewhat, and as seen not completely disappear as in spurious
or chain relationships.

d)
To say that the lectures improve students’ grade is a statement of causality involving
three requirements: 1) association, 2) time order, and 3) absence of alternative
explanations.
1) Association between attendance and lectures will be discussed below.
2) Time order is fulfilled. Students’ attendance to lectures cannot take place after
students get their course grade.
3) I will argue in the next question that there is no seriously omitted variable that
causes both attendance and higher grades – a spurious relationship. Thus,
absence of alternative explanation is fulfilled.

The model shows that with 5 % level of significance we can reject that students that have
9 or more absences compared with students who have 0 absences from lectures do not
have any effect. Noting that the operational sign of the estimated coefficient is negative, it
means that with 95 % level of confidence the students who skip classes more than 9
times get lower grades than students who have no absences. The estimated average
effect is 3.521 lower course grades for students who skip classes more than 9 times.

Likewise, with 95 % level of confidence students who have no absences get higher
course grades than students who have 5 or 6 absences. There is though not enough
statistical evidence to say that having 1 or 2 absences; 3 or 4 absences; or 7 or 8
absences compared with having 0 absences have an impact on student course grades.

2
Candidate number: V11822 Take home exam 04/12/14

It is hard to figure out why 5 or 6 absences have a negative impact on student’s grade,
but not 7 or 8 absences have an impact (both compared to 0 absences) with 95 % level
of confidence. Although, we can say with 90 % level of confidence that 7 or 8 absences
compared to 0 absences have a negative effect. Indeed, it is important to emphasise it is
problematic to change the level of significance without a very good reason.

To sum up, I would argue this indicate a relationship between lecture attendance and
student’s grades. The models shows that with 95 % level of confidence students who
have 9 or more absences from lectures get lower grades compared with students who
have no absences. Bearing the problematic with changed level of significance, it is
possible to say that with 90 % level of confidence students who have 5 or more absences
from lectures get lower grades compared with students who have no absences.

Based on this, the three requirements for causal relationship is fulfilled and it is possible
to say that lectures attendance improve student’s grades.

e)

The adjusted R2 tells how much of the variation in students course grades can be
explained by the independent variables: absence from lectures, pre-university tests,
hours of study statistics and job per week, parents’ education, ethnicity, and sex. The
adjusted R2 is 0.435 and means that the model can predict 43,5 % of the observed
variation in the students’ course grade.

I do not see any serious omitted variables in the model, which could cause bias.
However, an even more precise coefficient can be achieved and a better fit of our model
(even though the latter is not an aim in itself) if we added more variables.
I would expect the students’ socioeconomic background, pre-ability, motivation, and
habits to affect the students’ course grade.
1. A better measure of the students’ socioeconomic background could be achieved
by including: parents’ economy, housing type, and parents’ job status.
2. I consider the pre-university tests as a good measure of pre-ability.
3. I would expect students’ motivation to be an important factor for their grade. More
motivation means more desire to spend time on the subject as well as more
attention and concentration in lectures. The variables hours of study per week and
attendance can be seen as a reflection of motivation. Other measures for
motivation could be where the student sits during lectures, as I would expect that
the most motivated students sit in front. Furthermore, it is possible to ask the
students for their self-stated motivation and interest in statistics.
4. Previous studies shows that diet, alcohol intake, sleep, and exercise have an
impact on performance. I would therefore assume they also would impact
students’ course grade.

(Words: 1197)

You might also like