Homework 4
Homework 4
Homework 4
STAT 240
Name: Alexis Camargo
Date: January, 21
Honor Code: “I have neither given or received, nor have I tolerated others’ use of unauthorized aid.”
1.
(a) Make a scatterplot of the data with “number of undergraduate students in year 2006” as the
explanatory variable and “number of undergraduate students in year 2011” as the response variable.
Include the least-squares regression line on your plot.
(b) Plot the residuals versus the number of undergraduate students in 2006.
(c) Give an explanation in the context of the problem of what it means for a state to have a positive
residual.
For a state to have positive residual means that they have more undergraduates than what the prahics
predicted.
(d) Identify the names of any states that you would consider outliers or influential observations. Give
reasons for your answers.
Arizona is the only state considered outlier because its residual is above 3. There are more suspicious
outliers, but only Arizona can be considered as one.
(e) Give a naive interpretation of the y-intercept of the regression line in the context of the problem,
and then explain why this interpretation is not valid.
We can interpret that when x=0 the y-intercept will be 1782, but it would not be valid because X wont
be 0 never
(f) Give a complete interpretation of the slope of the regression line in the context of the problem.
we know that the slope in statistics is the change of the dependent variable respect to the independent
variable, so we can interpet the slope of this problem as the increase of students according the years
(g) If there were a 51st state that had 200,000 undergraduate students in year 2006, predict how many
undergraduate students the state would have in 2011
The 51st state would have 244076 undergraduates in 2011 according to the regression line.
2.
(a) X = size of a hospital (measured by its number of beds), Y = median number of days that patients
remain in the hospital Data shows that larger hospitals tend to have longer hospital stays
This is a causation relationship because when it is logical to thing that when a hospital has more beds,
they can take care more time of patients.
(b) X = marital status of men, Y = income of men Data shows that men who are married, divorced, or
widowed tend to earn quite a bit more than men who have never been married
This is a confounding relationship, because a man with a high income is more propense to get married
and that’s why married men earn more money than singles, or they have to work more for fulfill their
needs, so they earn more and work more, but it is not a causation relationship or a common response.
(c) X = hours of television watched, Y = grades in school Data shows that children who watch more hours
of television tend to get lower grades in school
This is a common response relationship, because the hours of television may imply that who is watching
tv doesn’t have enough time to study, and that would be another factor to have low grades.
3.
(c) Make a bar graph to illustrate the results from parts (a) and (b).
(d) Summarize the relationship between working classes and newspaper selections using your
results from parts (a)-(c).
As we see in the graph we know that for the blue collar workers the 35.68% of them read Times, and the
64.31% o them read Post, for the white collar workers, the 68.46% of them read Times and the 31.53%
read Post.
(e) Find the distribution of newspaper selection among blue collar works.
Blue collar Work
Times 0.241791
Post 0.758209
(f) Find the distribution of newspaper selection among white collar works.
White Collar
Times 0.555133
Post 0.444867
(g) Make a bar graph to illustrate the results from parts (e) and (f).
0.8
0.7
0.6
0.5
0.2
0.1
0
TIMES POST
(h) Summarize the relationship between working classes and newspaper selections using your
results from parts (e)-(g).
As we see in the graph, we know that 24.17% of the readers of Times are blue collar workers and the
75.82% left are white collar workers. 55.55 of the readers of Post are blue collar workers and the 44.48%
left are white collar workers.
4.
Party A Party B
82.63% 79.48%
i.
Party A Party B
86.44% 59.09%
ii.
Western
Party A Party B
69.68% 72.08%
There is not a contradiction between the part a and b, because in part a we see the data as whole (Party
A and Party B), and in part b we see data separately (Regions) , and other important point is that Party A
has almost the double amount of members than Party B, and that’s why it doesn’t matter if they (Party
A) has less internal support than party B because it wouldn’t mean that they will have minority in
Congress. But this would be considered as a Simpsons Paradox