蔡贻康 15220202201964 HW1
蔡贻康 15220202201964 HW1
蔡贻康 15220202201964 HW1
1 Assignment 1
1.1 Problem 1
Suppose Y is a n × p data matrix(which means Y has p dimensions{yi , i = 1,...,p}). And we can get the unbiased
estimator of the variance of yi and covariance between yi and y j can be written as:
n
1 1
sii = ∑
n − 1 k =1
(yik − ȳi )2 =
n−1
(yi − ȳi )′ (yi − ȳi )
n
1 1
n−1 ∑
sij = (yik − ȳi )(y jk − ȳj ) = (yi − ȳi )′ (yj − ȳj )
j =1
n − 1
1.2 Problem 2
a)
[ ] 4 3 [ ]
4 8 8 144 −12
AA′ = 8 6 =
3 6 −9 −12 126
8 −9
λ2 − 270λ + 18000 = 0
then we get:
λ1 = 120 λ2 = 150
then we get:
v1 = [1 2] ′
v2 = [1 0.5]′
b)
4 3 [ ] 25 50 5
4 8 8
A′ A =
8 6
=
5 100 10
3 6 −9
8 −9 50 10 145
Let |A′ A − λI| = 0 and we can get an equation:
λ1 = 0 λ2 = 120 λ3 = 150
v1 = [1 − 0.5 0]′
v2 = [1 2 − 1] ′
v3 = [1 2 5] ′
Then [ ]
−1 0.2 0.8
P =
0.4 −0.4
And let: [ ]
120 0
Λ =
0 150
1.3 Problem 3
a) E(y(1) ) = (4, 3)′
1.4 Problem 4
> library(readxl)
> library(lava)
> Data1 <- read_excel("F:/PyCharm/Project/HW_MSA/Data1.xlsx")
> options(digits=3)
> z1 <- Data1[’y1’]+Data1[’y2’]+Data1[’y3’]
> z2 <- 2*Data1[’y1’] - 3*Data1[’y2’] + 2*Data1[’y3’]
> z3 <- -Data1[’y1’] - 2*Data1[’y2’] - 3*Data1[’y3’]
> z <- data.frame(z1, z2, z3)
> names(z)[1] <- ’y1’
> names(z)[2] <- ’y2’
> names(z)[3] <- ’y3’
> apply(z, 2, mean)[1:3]
[ ]
(a) mean vector z= 38.4 40.8 −51.7
> cov(z[,1:3])
1 ASSIGNMENT 1 4
323.6 19.3 −461
sample covariance matrix Sz= 19.3 588.7 104
−461.0 104.1 686
> cor(z[,1:3])
1.0000 0.0441 −0.978
(b) sample correlation matrix Rz = 0.0441 1.0000 0.164
−0.9781 0.1637 1.000
> det(cov(z[,1:3])); tr(cov(z[,1:3]))
(c) generalized variance = 45996; total variance = 1599
> chol(cov(z)); chol(cor(z))
(d) Cholesky decomposition:
18 1.07 −25.624 1 0.0441 −0.9781
LSz = 0 24.24 5.425 L Rz =
0 0.9990 0.2071
0 0.00 0.492 0 0.0000 0.0188
图 1:
2.500 −2.780 −0.378 −0.463 −0.585 −2.232 0.171
−2.780 300.516 3.909 −1.387 6.763 30.791 0.624
−0.378 3.909 1.522 0.674 2.315 2.8220.142
cov =
−0.463 −1.387 0.674 1.182 1.088 −0.811 0.177
−0.585 6.763 2.315 1.088 11.364 3.127 1.044
−2.232 2.822 −0.811 3.127
30.791 30.979 0.595
0.171 0.624 0.142 0.177 1.044 0.595 0.479
1.000 −0.1014 −0.194 −0.2695 −0.110 −0.254 0.156
−0.101 1.0000 0.183 −0.0736 0.116 0.319 0.052
−0.194 0.1828 1.000 0.5022 0.557 0.411 0.166
cor =
−0.270 −0.0736 0.502 1.0000 0.297 −0.134 0.235
−0.110 0.1157 0.557 0.2969 1.000 0.167 0.448
−0.254 0.411 −0.1340 0.167 1.000 0.154
0.3191
0.156 0.0520 0.166 0.2347 0.448 0.154 1.000
> plt_5<- plt[1:5,1:7]
> dist(plt_5,method="euclidean",diag=TRUE,upper=TRUE)
(c)
0.00 10.54 9.49 13.30 9.11
10.54 0.00 5.74 21.77 16.85
euclidean = 9.49 5.74 0.00 18.08 13.08
13.30 21.77 18.08 0.00 7.21
9.11 16.85 13.08 7.21 0.00
> dist(plt_5,method="manhattan",diag=TRUE,upper=TRUE)
0 21 20 27 19
21 0 9 36 24
manhattan = 20 9 0 33 21
27 36 33 0 14
19 24 21 14 0
Using manhattan distance considers the distribution of the spots better than euclidean.
> det(cov(plt))
(d)the overall variability = 35308
> chol(cor(plt))
1 ASSIGNMENT 1 7
1 −0.101 −0.194 −0.270 −0.1098 −0.2536 0.1561
0 0.995 0.164 −0.101 0.1051 0.2949 0.0682
0 0.000 0.967 0.482 0.5356 0.3240 0.1914
CD =
0 0.000 0.000 0.827 0.0237 −0.3973 0.2313
0 0.000 0.000 0.000 0.8303 −0.0679 0.4212
0.7624 0.3048
0 0.000 0.000 0.000 0.0000
0 0.000 0.000 0.000 0.0000 0.0000 0.7813
> ev<- eigen(cor(plt)); L<- ev$values; V<-ev$vectors;
> V; diag(L); t(V)
0.237 0.27845 0.643 0.1727 0.5605 −0.2236 −0.2415
−0.206 −0.52661 0.224 0.7781 −0.1561 −0.0057 −0.0113
−0.551 −0.00682 −0.114 0.0053 0.5734 −0.1095 0.5852
A =
−0.378 0.43467 −0.407 0.2905 −0.0567 −0.4502 −0.4609
−0.498 0.19977 0.197 −0.0424 0.0502 0.7450 −0.3378
−0.325 −0.56697 0.160 −0.5079 0.0802 − 0.3306 −0.4171
−0.319 0.30788 0.541 −0.1431 −0.5661 −0.2665 0.3139
2.34 0.00 0.0 0.000 0.000 0.000 0.000
0.00 1.39 0.0 0.000 0.000 0.000 0.000
0.00 0.00 1.2 0.000 0.000 0.000 0.000
D =
0.00 0.00 0.0 0.727 0.000 0.000 0.000
0.00 0.00 0.0 0.000 0.653 0.000 0.000
0.00 0.00 0.0 0.000 0.000 0.537 0.000
0.00 0.00 0.0 0.000 0.000 0.000 0.156
SD = ADA T