TP_stat_inf_103957
TP_stat_inf_103957
TP_stat_inf_103957
******
Ministère de l’enseignement
supérieur et de la recherche
scientifique
******
Université Nationale des Sciences,
Technologies, Ingénierie et
Mathématiques
******
Ecole Nationale Supérieure de
Génie Mathématique et
Modélisation
TP
Exam 2017-2018
Alexandre DAHOUE
Uriel H. JOHNSON
2 SECTION B 12
2.1 B5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 B6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 B7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Additional exercise 27
3.1 Exercise 1(3 in the page 39 of pdf Tests.hypotheses) . . . . . . . . . . . . 27
3.2 Solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Exercise 2(4 in the page 40 of pdf Test.hypotheses) . . . . . . . . . . . . 28
3.4 Solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Exercise 3(7 in the page 41 of pdf Test.hypotheses) . . . . . . . . . . . . 30
3.6 Solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
1 SECTION A
h = ak+1 − ak for k = 1, 2, . . . , K.
Using the formula for Hist(x) in the intervals Bk = (ak , ak+1 ], we can write this as:
Z aK+1 K
X vk
µ̂Hist = x· · 1Bk (x) dx,
a1 k=1
nh
where 1Bk (x) is the indicator function that is 1 if x ∈ Bk and 0 otherwise.
This integral simplifies as:
2
K Z ak+1
1 X
µ̂Hist = vk x dx.
nh k=1 ak
ak+1 ak+1
x2 a2k+1 a2k
Z
x dx = = − .
ak 2 ak 2 2
K
a2k+1 a2k
1 X
µ̂Hist = vk − .
nh k=1 2 2
K
1 X
µ̂Hist = vk (ak+1 + ak ).
2n k=1
This is the desired expression for the estimated mean based on the histogram.
Interval (5, 10] (10, 15] (15, 20] (20, 25] (25, 30] (30, 35]
Frequency 1 11 39 38 10 1
6
1 X
µ̂Hist = vk · (ak+1 + ak ),
2n k=1
3
we substitute the frequency values and the midpoints:
1
µ̂Hist = (1(7.5) + 11(12.5) + 39(17.5) + 38(22.5) + 10(27.5) + 1(32.5)) ,
100
Simplifying:
1990
µ̂Hist = = 19.90.
100
Thus, the estimated mean of the distribution is 19.90 .
1.2 A2
E[θ̂] = θ.
In other words, the estimator does not systematically overestimate or underestimate the
true parameter value.
n m
1X 1 X
X̄ = Xi and Ȳ = Yj .
n i=1 m j=1
Since X1 , X2 , . . . , Xn are independent and follow the normal distribution N (µ, σ 2 ), the
expected value of X̄ is:
" n
# n n
1X 1X 1X
E[X̄] = E Xi = E[Xi ] = µ = µ.
n i=1 n i=1 n i=1
4
2. Expected Value of Ȳ :
Similarly, since Y1 , Y2 , . . . , Ym are independent and follow the distribution N (µ, τ 2 ), the
expected value of Ȳ is:
" m
# m m
1 X 1 X 1 X
E[Ȳ ] = E Yj = E[Yj ] = µ = µ.
m j=1 m j=1 m j=1
We have already shown in part (i) that both X̄ and Ȳ are unbiased estimators of µ.
Specifically,
This follows from the fact that each of the Xi ’s and Yj ’s are independent and follow the
distributions N (µ, σ 2 ) and N (µ, τ 2 ), respectively. Thus, their sample means are unbiased
estimators of the population mean µ.
Var(Xi )
Var(X̄) = .
n
Since the Xi ’s are independent and follow N (µ, σ 2 ), the variance of each Xi is σ 2 . There-
fore, the variance of X̄ is:
σ2
Var(X̄) = .
n
Var(Yj )
Var(Ȳ ) = .
m
Since the Yj ’s are independent and follow N (µ, τ 2 ), the variance of each Yj is τ 2 . Thus,
the variance of Ȳ is:
5
τ2
Var(Ȳ ) = .
m
Summary of Variances:
σ2 τ2
- The variance of X̄ is Var(X̄) = n
. - The variance of Ȳ is Var(Ȳ ) = m
.
σ2 τ2
Thus, both X̄ and Ȳ are unbiased estimators of µ, and their variances are n
and m
,
respectively.
µ̂ = wX̄ + (1 − w)Ȳ ,
We are tasked with finding the value of w that minimizes the variance of µ̂.
1. Variance of µ̂:
We know that:
σ2 τ2
Var(X̄) = , Var(Ȳ ) = .
n m
Therefore, the variance of µ̂ becomes:
σ2 τ2
Var(µ̂) = w2 + (1 − w)2 .
n m
d σ2 τ2
Var(µ̂) = 2w − 2(1 − w) .
dw n m
Setting the derivative equal to zero:
σ2 τ2
2w − 2(1 − w) = 0.
n m
6
Simplifying:
σ2 τ2
w = (1 − w) .
n m
Expanding:
σ2 τ2 τ2
w = −w .
n m m
Collecting terms involving w:
σ2 τ 2 τ2
w + = .
n m m
Solving for w:
τ2
m
w= σ2 τ2
.
n
+ m
nτ 2
w= .
mσ 2 + nτ 2
Thus, the value of w that minimizes the variance of µ̂ is:
nτ 2
w= .
mσ 2 + nτ 2
To ensure that this value of w minimizes the variance, we calculate the second derivative
of the variance with respect to w.
The second derivative is:
d2 σ2 τ2
Var(µ̂) = 2 + 2 .
dw2 n m
σ2 τ2
Since both n
and m
are positive, we have:
d2 σ2 τ 2
Var(µ̂) = 2 + > 0.
dw2 n m
Therefore, the function Var(µ̂) is concave upwards, confirming that the value of w =
nτ 2
mσ 2 +nτ 2
indeed minimizes the variance.
7
1.3 A3
This means we want to test whether the population mean is 13.5, or if it differs from this
value (i.e., the mean could be either greater than or less than 13.5).
Now, we will proceed with the hypothesis tests for the following two cases:
• Part (iii): Test with unknown variance, using the sample estimate of variance.
Given that n = 8, ni=1 xi = 113.6627, and σ 2 = 1, we use a two-tailed t-test. Note that
P
since the sample size is small (n = 8), we use the t-distribution even though σ 2 is known.
This is because, for small sample sizes, the t-distribution is more appropriate than the
normal distribution, which assumes that the sample is large enough to approximate the
true distribution of the mean.
The test statistic is given by:
8
x̄ − µ0
t=
√σ
n
n
1X 113.6627
x̄ = xi = = 14.2084
n i=1 8
The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.00 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.
x̄ − µ0
t= s′
√
n
n Pn !
2
′2 1 X ( i=1 xi )
s = x2i −
n−1 i=1
n
(113.6627)2
′21 1
s = 1621.391 − = × 6.846 = 0.9779
7 8 7
The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.03 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.
9
1.4 A4
P (a(X) ≤ θ ≤ b(X)) = 1 − α.
This means that, in repeated sampling, the interval [a(X), b(X)] will contain the true
value of the parameter θ in 1−α of the cases. The confidence level 100(1−α)% represents
the probability that the confidence interval will contain the true value of the parameter.
In other words, if we performed the same experiment multiple times and calculated the
interval I(X) each time, this interval would contain the true value of θ in 1 − α of the
trials.
ples. The estimator of the parameter θ is defined as θ̂ = X̄1 − X̄2 , where X̄1 and X̄2 are
independent and follow normal distributions:
σ12 σ22
X̄1 ∼ N (µ1 , ) and X̄2 ∼ N (µ2 , ).
n m
σ12 σ22
Var(θ̂) = Var(X̄1 ) + Var(X̄2 ) = + .
n m
σ12 σ22
θ̂ ∼ N (µ1 − µ2 , + ).
n m
10
X 20
X
n = 10, m = 20, X1i = 96.08, X2j = 237.09.
i=1 j=1
10
96.08 237.09
X̄1 = = 9.608, X̄2 = = 11.8545.
10 20
Next, we calculate the sample variances (which are now given as σ12 = 2 and σ22 = 4):
Now, to construct a 95% confidence interval for θ = µ1 − µ2 , we use the fact that the
sampling distribution of θ̂ is normal. Since the sample sizes n = 10 and m = 20 are
relatively small, we will use the Student’s t-distribution for the confidence interval.
The standard error of θ̂ is:
r
√
r
σ12 σ22 2 4 √
SE(θ̂) = + = + = 0.2 + 0.2 = 0.4 ≈ 0.6325.
n m 10 20
Using the t-distribution with ν = min(n − 1, m − 1) = min(10 − 1, 20 − 1) = 9 degrees
of freedom and a 95% confidence level, we look up the critical value t0.025,9 from the
t-distribution table, which is approximately 2.262.
Thus, the 95% confidence interval for θ = µ1 − µ2 is:
[−3.6755, −0.8175].
11
2 SECTION B
2.1 B5
Expanding the expression for the sum of squared deviations from the sample mean:
n
X n
X
2
(Xi − X̄) = (Xi − µ + µ − X̄)2 .
i=1 i=1
We know that: n n
X X
2
(Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2 .
i=1 i=1
2
Since E[(Xi − µ)2 ] = σ 2 and E[(X̄ − µ)2 ] = σn , we get:
σ2
2 1 2
E[S ] = nσ − n = σ2.
n−1 n
12
(iii) We are tasked with showing that the interval estimator:
!
(n − 1)S 2 (n − 1)S 2
I(X) = ,
χ2n−1,1−α/2 χ2n−1,α/2
is a 100(1 − α)% confidence interval for σ 2 .
To do so, we use the following steps:
Step 1: Chi-Squared Distribution of Sample Variance
The sample variance S 2 follows a scaled chi-squared distribution:
(n − 1)S 2
2
∼ χ2n−1 ,
σ
(n−1)S 2
which means that σ2
follows a chi-squared distribution with n−1 degrees of freedom.
Step 2: Confidence Interval Construction
To construct a confidence interval for σ 2 , we need to use the fact that the chi-squared
distribution is not symmetric, but its cumulative distribution function (CDF) gives us
the probability of the value falling within a particular range.
We want to find critical values corresponding to the desired confidence level. These
critical values are denoted as: - χ2n−1,α/2 , the critical value corresponding to the lower tail
of the distribution. - χ2n−1,1−α/2 , the critical value corresponding to the upper tail of the
distribution.
Step 3: Deriving the Confidence Interval
Using the properties of the chi-squared distribution, we can derive the confidence interval
for σ 2 as follows:
(n − 1)S 2
2 2
P χn−1,α/2 ≤ ≤ χn−1,1−α/2 = 1 − α.
σ2
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤ .
χ2n−1,1−α/2 χ2n−1,α/2
• n = 10
•
Pn
i=1 xi = 104.334
•
Pn 2
i=1 xi = 1132.207
13
Step 1: Calculate the Sample Mean X̄
Pn
i=1 xi 104.334
X̄ = = = 10.4334
n 10
Step 2: Calculate the Sample Variance S 2
Using the formula for the sample variance:
n Pn !
2
1 X ( i=1 xi )
S2 = x2i −
n−1 i=1
n
(104.334)2
2 1
S = 1132.207 −
9 10
(104.334)2
First, calculate 10
:
10886.8798
(104.334)2 = 10886.8798 ⇒ = 1088.68798
10
Now substitute this into the variance formula:
1
S2 = (1132.207 − 1088.68798)
9
1
S2 = (43.51902) = 4.8354
9
Step 3: Find the Critical Values from the Chi-Squared Distribution Table
For a 95% confidence interval, the critical values are taken from the chi-squared distri-
bution table with n − 1 = 9 degrees of freedom:
(n − 1)S 2 (n − 1)S 2
,
χ29,0.975 χ29,0.025
[2.574, 16.129]
14
2.2 B6
n
Y
L(p) = pX (Xi )
i=1
n
Y
L(p) = (1 − p)Xi p
i=1
Since the product involves n terms, we can simplify the expression as:
n
Y
n
L(p) = p (1 − p)Xi
i=1
We want to maximize this likelihood function with respect to p. We took the log-likelihood
function:
n
!
X
ℓ(p) = log(L(p)) = n log(p) + Xi log(1 − p)
i=1
15
Taking the derivative of ℓ(p) with respect to p and setting it to 0, we found that:
Pn
n Xi
i=1
=
p 1−p
Cross-multiplying and simplifying:
n
X
n − np = p Xi
i=1
n
!
X
n=p n+ Xi
i=1
Solving for p:
n
p= Pn
n+ i=1 Xi
n
X
Xi = nX̄
i=1
n 1
p= =
n + nX̄ 1 + X̄
Therefore, the maximum likelihood estimator for p is:
1
p̂ =
1 + X̄
1−b 1−a
(iii) Showing the Relation Between P (a < p < b) and P b < X̄ < a
From Part (ii), we know that the maximum likelihood estimator for p is given by:
1
p̂ =
1 + X̄
Now, we want to compute the probability P (a < p < b). Using the relationship between
p and p̂, we have:
16
1
P (a < p < b) = P a< <b
1 + X̄
1 1
< 1 + X̄ <
b a
2. Subtract 1 from both sides:
1 1
− 1 < X̄ < − 1
b a
3. Simplify each side:
1−b 1−a
< X̄ <
b a
Thus, we have:
1−b 1−a
P (a < p < b) = P < X̄ <
b a
Since X̄ is the sample mean, this relation allows us to calculate the probability for p in
terms of the sample mean X̄.
1−p 1 − 0.25
E(X1 ) = = =3
p 0.25
1−p 1 − 0.25
Var(X1 ) = = = 0.12
p2 (0.25)2
1−p 0.12
E(X̄) = = 3, Var(X̄) = = 0.12
p 100
√
SD(X̄) = 0.12 ≈ 0.3464
17
Now,using the result from part (iii), we can express the probability P (0.22 < p̂ < 0.26)
as:
1 − 0.26 1 − 0.22
P (0.22 < p̂ < 0.26) = P < X̄ < .
0.26 0.22
X̄ − E(X̄)
Z= .
SD(X̄)
2.8462 − 3
Zlower = ≈ −0.444.
0.3464
For X̄ = 3.5455:
3.5455 − 3
Zupper = ≈ 1.576.
0.3464
Using the standard normal distribution table, we find:
Therefore, the approximate probability that 0.22 < p̂ < 0.26 is:
0.61415 .
2.3 B7
(i)
Consider a clinical study on a new treatment for Rhinovirus with n patients. Each
patient has a probability p of recovering, independently of other patients. Let Xi denote
the indicator variable for the i-th patient’s recovery:
18
(
1 if the i-th patient recovers
Xi =
0 if the i-th patient does not recover
H0 : p = p0 vs H1 : p > p0
We are interested in finding the expression for the sample proportion p̂ and its approxi-
mate distribution under the null hypothesis H0 .
The sample proportion p̂ is simply the average of the Xi ’s, i.e., the proportion of patients
that recover in the sample. The mathematical expression for the sample proportion is:
n
1X
p̂ = Xi
n i=1
This quantity p̂ represents the observed proportion of recovered patients in the sample.
Xi ∼ Bernoulli(p0 )
n
1X
p̂ = Xi
n i=1
E[p̂] = p0
Now, applying the **Central Limit Theorem (CLT)**, which states that the sample
mean of a large number of independent and identically distributed random variables will
approximate a normal distribution, we find that for large n, the sample proportion p̂
approximately follows a normal distribution:
19
p0 (1 − p0 )
p̂ ∼ N p0 ,
n
Thus, for large n, the distribution of p̂ is approximately normal with mean p0 and variance
p0 (1−p0 )
n
.
Conclusion
n
1X
p̂ = Xi
n i=1
Under the null hypothesis H0 : p = p0 , for large n, the approximate distribution of p̂ is:
p0 (1 − p0 )
p̂ ∼ N p0 ,
n
This result allows us to use the normal approximation for hypothesis testing and confi-
dence intervals regarding the recovery proportion p.
20
Step 1: Initial Expression for the Probability
Under the hypothesis test, we reject H0 if p̂ > γ, where γ is defined by the quadratic
formula. The probability of rejecting H0 is:
γ−p
P (p̂ > γ) = P Z > q ,
p(1−p)
n
γ−p γ−p √
q =p · n.
p(1−p) p(1 − p)
n
Using the cumulative distribution function Φ of the standard normal distribution, we can
write the probability as:
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)
Conclusion
The probability of rejecting H0 under the alternative hypothesis p > p0 is given by:
√ !
n(γ − p)
P (p̂ > γ) = 1 − Φ p .
p(1 − p)
This expression provides the desired form where the square root of n appears in the
numerator, as requested.
21
(iii) Rejection Probability Based on Z2
We want to compute the approximate probability of rejecting H0 when using the proce-
dure based on Z2 with the given parameters.
Given Parameters
Step 1: Compute γ
- c = p20 .
Substituting the known values:
- zα = −1.6449,
- p0 = 0.3,
- n = 200,
we compute:
(−1.6449)2 2.702
a=1+ =1+ = 1 + 0.01352 = 1.01352,
200 200
(−1.6449)2
b = − 2(0.3) + = − (0.6 + 0.01352) = −0.61352,
200
22
c = (0.3)2 = 0.09.
Thus,
√
0.61352 + 0.0116 0.61352 + 0.1077 0.72122
γ= = = ≈ 0.3557.
2(1.01352) 2.02702 2.02704
So, γ ≈ 0.3557.
Now that we have γ ≈ 0.3557, we compute the probability of rejecting H0 when the true
proportion is p = 0.35.
The probability of rejecting H0 is:
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)
p √ √
0.35(1 − 0.35) = 0.35 × 0.65 = 0.2275 ≈ 0.4769.
Now calculate:
23
From the standard normal distribution, Φ(0.1482) ≈ 0.5588, so:
Conclusion
0.4412 .
This means that there is approximately **44.12%** chance of rejecting H0 when the true
proportion is p = 0.35, p0 = 0.3, and the sample size is n = 200, with a significance level
of α = 0.05.
p̂ − p0
Z2 = q ,
p̂(1−p̂)
n
where:
- p̂ is the sample proportion,
- p0 is the hypothesized proportion under the null hypothesis,
- n is the sample size.
Under the testing procedure, we reject H0 if and only if Z2 > zα . This can be rewritten
as:
To find the equivalent condition in terms of p̂, we start with the expression for Z22 :
24
2
p̂ − p0 (p̂ − p0 )2
Z22 = q = p̂(1−p̂) .
p̂(1−p̂)
n n
(p̂ − p0 )2
p̂(1−p̂)
> zα2 .
n
p̂(1 − p̂)
(p̂ − p0 )2 > zα2 · .
n
Next, we solve for p̂ by considering the equality case. We set the inequality to equality
to find the critical threshold γ:
p̂(1 − p̂)
(p̂ − p0 )2 = zα2 · .
n
Expanding both sides:
p̂(1 − p̂)
p̂2 − 2p0 p̂ + p20 = zα2 · .
n
Now, multiply out the right-hand side:
zα2
p̂2 − 2p0 p̂ + p20 = (p̂ − p̂2 ).
n
Rearranging all terms involving p̂ to one side:
zα2 z2
p̂2 − 2p0 p̂ + p20 = p̂ − α p̂2 .
n n
Now, group the terms involving p̂2 together:
zα2 2 z2
p̂2 + p̂ − 2p0 p̂ + p20 = α p̂.
n n
z2 zα2
1+ α 2
p̂ − 2p0 + p̂ + p20 = 0.
n n
This is a quadratic equation in p̂, which we solve using the quadratic formula:
25
√
−b ± b2 − 4ac
p̂ = ,
2a
where:
2
- a = 1 + znα ,
2
- b = − 2p0 + zα
n
,
- c = p20 .
Substituting these values into the quadratic formula, we obtain:
r 2
2
zα 2 2
2p0 + n
± 2p0 + znα − 4 1 + zα
n
p20
p̂ = 2
.
2 1 + znα
The positive root of this quadratic equation corresponds to the critical value γ, so we
define:
√
−b + b2 − 4ac
γ= .
2a
Thus, the rejection criterion is:
Conclusion
We have shown that under the testing procedure based on Z2 , H0 is rejected if and only
if p̂ > γ, where γ is given by the quadratic formula:
√
−b + b2 − 4ac
γ= ,
2a
with the coefficients:
2
- a = 1 + znα ,
2
- b = − 2p0 + zα
n
,
- c = p20 .
This shows that the test based on Z2 is equivalent to rejecting H0 when p̂ > γ, as required.
26
3 Additional exercise
3.2 Solution:
This is a test of conformity to a reference value, where we are testing if the observed
proportion is greater than the reference value of 22%.
Hypotheses:
• Null hypothesis (H0): The proportion of people in favor of the reform is equal
to 22%, i.e., p = 0.22.
This is a one-tailed test where we are testing if the proportion is greater than the reference
value.
Decision Rule:
To make a decision, we will calculate the p-value corresponding to the observed data. If
the p-value is less than the significance level α = 0.05, we will reject the null hypothesis.
R Code:
This will perform a one-sided test for the proportion, comparing the observed proportion
(232 out of 620) to the reference value of 0.22.
27
Result:
The output of the R code will give us the p-value. Running the code, we obtain:
[1] 1.48694e-20
Since the p-value is much smaller than the significance level of 0.05, we reject the null
hypothesis.
Conclusion:
Since the p-value is less than 0.05, we reject the null hypothesis at the 5% significance
level. Therefore, we can conclude that the proportion of people in favor of the reform is
strictly greater than 22%.
48.73, 43.44, 46.71, 51.62, 47.24, 54.64, 47.00, 48.40, 45.86, 47.70, 46.14, 47.68, 44.73, 51.69, 50.54
44.89, 34.31, 42.74, 53.36, 41.98, 41.64, 47.24, 37.86, 45.89, 40.88, 40.85, 38.60, 44.38, 44.52, 38.26
The weight of a strawberry in field A can be modeled by a variable X1 , and the weight
of a strawberry in field B can be modeled by a variable X2 . We assume that X1 and X2
follow normal distributions with equal variances.
Can we conclude, at a 2% significance level, that the mean weight of a strawberry differs
between the fields?
3.4 Solution:
This is a **homogeneity test for two independent samples**, specifically a **two-sample
t-test** assuming equal variances. The goal is to compare the mean strawberry weights
between the two fields.
Hypotheses:
28
• Null hypothesis (H0): The mean weight of a strawberry is the same in both
fields, i.e., µA = µB .
Decision Rule:
We will use a **two-sample t-test** assuming equal variances. If the p-value is smaller
than the significance level α = 0.02, we reject the null hypothesis.
R Code:
The following R code can be used to perform the two-sample t-test assuming equal
variances:
Result:
[1] 0.0003957631
Since the p-value 0.0003957631 is smaller than the significance level α = 0.02, we reject
the null hypothesis.
Conclusion:
We reject the null hypothesis at the 2% significance level. Therefore, we can conclude
that the mean weight of a strawberry differs between the two fields.
29
3.5 Exercise 3(7 in the page 41 of pdf Test.hypotheses)
A salesman supplying gas stations wants to know if there is a relationship between the
purchase of bottled beers and the purchase of bags of chips. To test this, he randomly
selects tickets from a year’s worth of sales data:
- 92 customers bought both beers and chips
- 32 customers bought beers but not chips
- 10 customers bought chips but not beers
- 12 customers bought neither beers nor chips
He wants to make sure he only makes a Type I error 1% of the time when concluding
there is a link between the two purchases.
3.6 Solution:
This is a **Chi-Square test of independence** between two categorical variables: the
purchase of beers and the purchase of chips. The test will determine if there is a statis-
tically significant relationship between the two variables, or if they are independent of
each other.
Hypotheses:
This is a test of independence between two categorical variables (the purchase of beers
and the purchase of chips). If the variables are independent, the distribution of purchases
of beers and chips would not be related.
Decision Rule:
We will use the **Chi-Square test** for independence. The critical value for the test will
be determined by the significance level α = 0.01. If the p-value is smaller than α, we
reject the null hypothesis.
R Code:
The following R code can be used to perform the Chi-Square test for independence:
30
# Observed frequencies
observed <- matrix(c(92, 32, 10, 12), nrow = 2, byrow = TRUE)
Beers No Beers
Chips 92 10
No Chips 32 12
Result:
[1] 0.01407829
Since the p-value 0.01407829 is greater than the significance level α = 0.01, we do not
reject the null hypothesis.
Conclusion:
At the 1% significance level, we do not reject the null hypothesis. Therefore, we conclude
that there is no statistically significant relationship between the purchase of beers and
the purchase of chips. In other words, the purchase of beers and chips are independent
at this level of significance.
31