Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Institute of

Business
Administration
Assignment #1
Question #1
(a) Following is the normal probability plot, box plot, histogram, and stem-and-leaf diagram of the
data.

Stem-and-leaf of WEIGHT N = 60
2 3 67
7 3 88899
12 4 01111
22 4 2222222333
(12) 4 444455555555
26 4 6666677777
16 4 8888888999
6 5 011
3 5 23
1 5
1 5 6

Leaf Unit = 1

(b) By looking at the answer above we can conclude that the data is normally distributed with the
data deviating equally from mean towards both sides. So, we can apply the confidence interval
procedure to this data correctly.

(c) 𝛼 = 5%
Mean = 2717.8/60 = 45.3
4.5
C.I. = 45.3 ± = (44.71,45.28)
√60

We can claim with 95% confidence that the weight of Ethiopian-born school children between ages
12 and 15 lies between 44.71 and 45.28.
Question #2
(a) Random sample of 35.

28 30 28 27 19 27 34
33 28 29 51 32 31 28
26 27 30 24 21 14 25
25 22 22 28 25 20 25
24 20 20 23 31 27 28

(b) Mean= 932/35= 26.63, StDev= 6.08


6.08
C.I = 26.63 ± 2.032 ( ) = (24.54,28.72)
√35

We can claim with 95% confidence that the mileage of all cars from the year in question lies between
24.54 and 28.72 mpg.

(c) The confidence interval does not necessarily indicate that all cars will have a value of mileage
between this interval since it is just a 95% confidence level. It still leaves a 5% chance that the values
might fall outside this range. The sample we took was of 35 cars meanwhile the list has 1076 cars,
altogether they have a population mean of 24.82 mpg. The population is outside our interval hence it
does not necessarily mean that all values must lie within this range.

Question #3
(a) Mean= 5.9/20= 0.295, StDev= 0.42
H0 = The net percentage gain for jobs is greater than 0.2 (XL > 0.2)
H1 = The net percentage gain for jobs is lesser than or equal to 0.2 (XL ≤ 0.2)
𝛼 = 5%
0.295−0.20
Z= 0.42 = 1.011
√20
Z (from table) = 1.96
When we compare the Z calculated and Z tabulated: Zcal < Ztab
Hence, we accept our null hypothesis which dictates that the net percentage gain for jobs is greater
than 0.2. We make this claim with a 95% confidence level.

(b)
Stem-and-leaf of GAIN N = 20
1 -1 1
1 -0
1 -0
2 -0 5
3 -0 2
4 -0 1
6 0 01
10 0 2233
10 0 45
8 0 6667
4 0 8889

Leaf Unit = 0.1

(c) Mean= 7/19= 0.384, StDev= 0.42


H0 = The net percentage gain for jobs is greater than 0.2 (XL > 0.2)
H1 = The net percentage gain for jobs is lesser than or equal to 0.2 (XL ≤ 0.2)
𝛼 = 5%
0.384−0.20
Z= 0.42 = 1.959
√20
Z (from table) = 1.96
When we compare the Z calculated and Z tabulated: Zcal < Ztab
Hence, we will accept our null hypothesis once again which dictates that the net percentage gain for
jobs is greater than 0.2. We make this claim with a 95% confidence level.

(d) With the presence of an outlier, our calculated z score value differed from the tabulated one with
a larger difference but when we removed it the difference decreased significantly. We can say that
the outlier represented a very rare event and removing it made the hypothesis testing much more
accurate. If there were a little more difference present, we would’ve ended up with value which rejects
our null hypothesis.

Question #4
(a)

Standard Deviations:
StDev of Integration: 4.513
StDev of Standard: 7.221
(b) The pooled t-procedure is carried out on samples when both of their standard deviations are equal,
in this case we are inclined to use because as shown in (a) the standard deviations are different.

(c)
Mean I = 24.92 S1 = 4.513 N1 = 227
Mean 2 = 23 S2 = 7.221 N2 = 192
𝛼 = 1%

H0 = The satisfaction level of integrated patients is greater than those of standard patients (XI – Xs >
0)
H1 = The satisfaction level of integrated patients is lesser than or equal to those of standard patients (
XI – Xs ≤ 0)

Non-pooled:
(24.92−23)− 0
tcal = 2 2
= 3.194
√4.513 +7.221
227 192
df = 227+192-1 = 418
ttab = 2.335
Since tcal > ttab , we will reject the H0.
The alternative hypothesis is accepted which indicates that patients with standard treatment had more
satisfaction. We claim this with 99% confidence.
Pooled:
(227−1)4.5132 +(192−1)7.2212
Sp = √ = 5.902
227+192−1
(24.92−23)−0
tcal = 1 1
= 3.318
5.902√ +
227 192
ttab = 2.335
The alternative hypothesis is accepted which indicates that patients with standard treatment had more
satisfaction. We claim this with 99% confidence.

(d) Mean I = 24.92 S1 = 4.513 N1 = 227


Mean 2 = 23 S2 = 7.221 N2 = 192

Z α/2 = 2.33 (at 98% CI)


4.5132 7.2212
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = (24.92 − 23) ± 2.33 √ + = (0.519,3.321)
227 192
By calculating this interval, we can conclude from the results with 98% confidence that the average
level of satisfaction of patients with integrated treatment is higher than patients with standard
treatment. And the difference of satisfaction lies between 0.519 and 3.321.

Question #5

Weight Method
Mean: 23.71, StDev: 7.19
Groove Method
Mean: 19.95, StDev: 5.77
Mean of Difference
Mean: 3.75, StDev: 3.22

𝛼 = 98%
H0 = The difference between the two means is equal to 0
H1 = The difference between the two means is not equal to 0
3.22−0
𝑍= 2 2
= 1.158
√7.19 +5.77
11 11

Ztab = 1.96
When both Z values are compared, Zcal < Ztab
Since Zcal is lesser, we will accept our null hypothesis which dictates that the difference between both
means is zero. Furthermore, it gives us a claim that both weight and groove method give us the same
results. We make this claim with 95% confidence.

Question #6
(a)

Stem-and-leaf of TEMP N = 93
1 96 7
3 96 89
8 97 00001
13 97 22233
19 97 444444
26 97 6666777
31 97 88889
45 98 00000000000111
(10) 98 2222222233
38 98 4444445555
28 98 66666666677
17 98 8888888
10 99 00001
5 99 2233
1 99 4

Leaf Unit = 0.1

(b) We can observe from the diagrams in (a) that the data is normally distributed with equal deviations
from the mean. Furthermore, most of the high frequency values lie nearer to the mean (highest
frequency value), hence it will be reasonable to apply one-standard-deviation χ2- procedures to the
data.
(c)
𝛼 = 95%
H0 = The mean body temperature of healthy humans is equal to 98.6 F (XT = 98.6)
H1 = The mean body temperature of healthy humans is not equal to 98.6 F (XT ≠ 150)
Mean = 9125.5 / 93 = 98.123

Std = 0.63 F
98.123−98.6
Zcal = 0.63 = −7.302
√93
Ztab = 1.96
When we compare Z values: Zcal> Ztab
Since the calculated Z value is greater than tabulated value, we are going to reject our null hypothesis.
Furthermore, we can make the claim that the body temperature of healthy humans is not equal to 98.6˚
F. We make this claim with 95% confidence.

(d)
α = 95%
0.63
C.I. = 98.123 ± 1.96 ( ) = (97.99,98.25)
√93
We can claim with 95% confident that the mean body temperature of healthy humans lies between
97.99 F and 98.25 F.

Question #7
α= 99%
H0 = Diabetic state and education are two independent entities.
H1 = Diabetic state and education are two dependent entities.

O E (O-E)2/E
33 18.73 10.872
218 232.27 0.877
25 30.9 1.127
389 383.1 0.091
20 30.82 3.799
393 382.18 0.306
17 14.55 0.413
178 180.45 0.033

Total= 17.518

χ2cal = 17.518
df = (4-1) * (2-1) = 3
χ2tab = 11.345

Since the value of | χ2cal | > | χ2tab |, we will reject the null hypothesis.
With 99% confidence we can claim that education and diabetic state are two independent entities.
Although the procedure does not prove a strong relation of independence.
Question #8
(a)
SE T- P-
Term Coef Coef Value Value VIF
Constant -12.87 2.56 -5.03 0.000
Age 0.7033 0.0496 14.18 0.000 1.76
Weight 0.9699 0.0631 15.37 0.000 8.42
BSA 3.78 1.58 2.39 0.033 5.33
Dur 0.0684 0.0484 1.41 0.182 1.24
Pulse -0.0845 0.0516 -1.64 0.126 4.41
Stress 0.00557 0.00341 1.63 0.126 1.83

(b)
BP-age, coefficient = 0.659
BP-weight, coefficient = 0.950
BP-BSA, coefficient = 0.866
BP-DUR, coefficient = 0.293
BP-Pulse, coefficient = 0.721
BP-Stress, coefficient = 0.164

(c) R2 equals 0.9962 (99.62%). This means that the effect of all the variables on blood pressure is
very significant. It consists of 99.62% affect in bp.

(d)
Regression Equation
BP = -12.87 + 0.7033 Age + 0.9699 Weight + 3.78 BSA
+ 0.0684 Dur - 0.0845 Pulse+ 0.00557 Stress

BP = -12.87 + 0.7033 (49) + 0.9699 (93) + 3.78 (2)


+ 0.0684 (6.43) - 0.0845 (69.6) + 0.00557 (54)
= 120.09

Question #9

Mean of current year’s retail price: 3257.31/40 = $81.433


S = 7.61
n=40
µ = 78.01

𝛼 = 99%
H0 = The mean retail price of all history books this year is greater than 78.01 (Xc > 78.01)
H1 = The mean retail price of all history books this year is lesser than or equal to 78.01 (Xc ≤ 78.01)
81.433−78.01
Zcal = 7.61 = 2.844
√40
Ztab=2.575
When we compare Z values: |Zcal|> |Ztab|
Since the calculated value of Z is greater than the tabulated value, we are going to reject our null
hypothesis. Consequently, being able to make a claim that the mean retail prices of all history books
are either lesser or equal to $78.01. We make this claim with 99% confidence level.

Question #10

X1 = 0.02426 S1 = 0.00488 N= 10
X2 = 0.03647 S2 = 0.06641 N= 15
𝛼 = 99%
H0 = The dopamine activity in psychotic patients is higher than in non-psychotic patients (X1 – X2 >
0)
H1 = The dopamine activity in psychotic patients is lesser than or equal to that in non-psychotic
patients (X1 – X2 ≤ 0)

(0.02426−0.03647)−0
tcal = 2 2
= -0.709
√0.00488 +0.06641
10 15
with df = 10+15-1 = 24
ttab = 2.797
When we compare the t-values: |tcal| < |ttab|
Since our calculated value of t is lesser than that found in the t-table, we are going to accept our null
hypothesis which dictates that dopamine activity in psychotic patients is higher than non-psychotic
patients. We make this claim with 99% confidence.

Question #11

H0 = The residents of red, blue, and purple states are non-homogenous concerning their views (both
factors are independent)
H1 = The residents of red, blue, and purple states are homogenous concerning their views (both
factors are dependent)

𝛼 = 95%

O E (O-E)2/E
169 185.333 1.439
191 185.333 0.173
196 185.333 0.614
129 108 4.083
89 108 3.343
106 108 0.037
149 162.667 1.148
178 162.667 1.445
161 162.667 0.017
53 44 1.841
42 44 0.091
37 44 1.114

Total=15.345
χ2cal = 15.3457
df = (4-1) * (3-1) = 6
χ2tab = 16.812
After comparison: | χ2cal | < | χ2tab |
Since the calculated value of χ2 is lesser than that of the tabulated value, we are going to accept our
null hypothesis. We can claim that both the variables are independent which means that the residents
of red, blue, and purple are non-homogenous concerning their views. We make this claim with 95%
confidence.

Question #12
(a)

Using minitab correlation calculation feature, the linear correlation coefficient is equal to r=0.108.

(b)

Using minitab correlation calculation feature, the linear correlation coefficient is equal to r=0.721.
(c)

Using minitab correlation calculation feature, the linear correlation coefficient is equal to r= -0.309.

(d)
Population and area: according to my results the correlation coefficient turns out to be a positive
value of 0.108. This value indicates the presence of a positive correlation between the population
and area. It can be further explained by saying that the larger the area is, the more population it is
going to have. The R2 is equal to 0.011664. Although a coefficient with so small magnitude does
not indicate a very significant relation of dependency between both the variables.

Population and exotic plants: according to the results, the correlation coefficient turns out to be
0.721. this value indicates a strong correlation between both variables. The value of R2 equal to
0.5198 which can be expressed as the effect of the variables on each other is 52%. We can say that
the number of exotic plants are in a direct proportional relation to the population.

Area and exotic plants: the value of -0.309 indicates an inversely proportional relationship which
means that when one variable increases the other decreases. In this case the value of coefficient is
quite low which further translates to the fact that the relationship is not that significant. The size of
the area increases, the number of exotic plants in each house decreases.

Question #13
(a)

(b) Since the scatterplots show an increasing trend of sorts, a regression line can be made to further
evaluate the results.
(c)
The regression equation is
LE_FEMALE = 77.80 + 0.4008 HEALTH_GDP
The regression equation of the first scatter plot, women against health GDP dictates that with every
1% increase in health GDP, the life expectancy of women increases by 0.4008 years. Meanwhile, if
the increase is zero, the average life expectancy remains at 77.80 years.
The regression equation is
LE_MALE = 70.26 + 0.6100 HEALTH_GDP
The regression equation of the second scatter plot, men against health GDP dictates that with every
1% increase in health GDP, the life expectancy of men increases by 0.6100 years. Meanwhile, if the
increase is zero, the average life expectancy remains at 70.26 years.

(d)
United States is the outlier in both cases. With 15.3% health GDP, which is the highest in the whole
table. Meanwhile the life expectancy of men it is 75.2 just and for women its just 80.4.

(e)
The regression equation is
LE_MALE = 66.19 + 1.092 HEALTH_GDP
The effect of 1 % increase in GDP on life expectancy of males increased from 0.6100 to 1.092
while on the other hand, life expectancy despite the change in GDP decreased from 70.26 to 66.19.

The regression equation is


LE_FEMALE = 74.53 + 0.7882 HEALTH_GDP
The effect of 1 % increase in GDP on life expectancy of females increased from 0.4008 to 0.7882
while on the other hand, life expectancy despite the change in GDP decreased from 77.80 to 74.53.

You might also like