蔡贻康 15220202201964 HW1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1 ASSIGNMENT 1 1

1 Assignment 1

1.1 Problem 1
Suppose Y is a n × p data matrix(which means Y has p dimensions{yi , i = 1,...,p}). And we can get the unbiased
estimator of the variance of yi and covariance between yi and y j can be written as:

n
1 1
sii = ∑
n − 1 k =1
(yik − ȳi )2 =
n−1
(yi − ȳi )′ (yi − ȳi )

n
1 1
n−1 ∑
sij = (yik − ȳi )(y jk − ȳj ) = (yi − ȳi )′ (yj − ȳj )
j =1
n − 1

Then we can denote S:


 
s11 s12 · · · s1p  
  (y1 − y¯1 )′ (y1 − y¯1 ) · · · (y1 − y¯1 )′ (yp − y¯p )
 s21 s22 · · · s2p   
  1  .. .. ..  = 1
S =  .  = . (Y − Ȳ)′ (Y − Ȳ)
 .. ..
.
..
.
..
.  n−1  . .  n−1
  ′
(yp − y¯p ) (y1 − y¯1 ) · · · (yp − y¯p )′ (yp − y¯p )
s p1 s p2 · · · s pp

And we can find that:


1
sij =[y′ yj − nȳi ′ ȳj ]
n−1 i
1 1 1 ′ 1 1
S = Y′ Y − Y JY = Y′ (I − J)Y
n−1 n−1n n−1 n

1.2 Problem 2
a)  
[ ] 4 3 [ ]
4 8 8   144 −12
AA′ =  8 6  =
3 6 −9   −12 126
8 −9

Let |AA′ − λI| = 0 and we can get an equation:

λ2 − 270λ + 18000 = 0

then we get:
λ1 = 120 λ2 = 150

Let |AA′ − λ1 I| = 0 and |AA′ − λ2 I| = 0


1 ASSIGNMENT 1 2

then we get:
v1 = [1 2] ′

v2 = [1 0.5]′

b)    
4 3 [ ] 25 50 5
  4 8 8  
A′ A = 
 8 6 
 = 
 5 100 10 

3 6 −9
8 −9 50 10 145
Let |A′ A − λI| = 0 and we can get an equation:

−λ3 + 270λ2 − 18000λ = 0

λ1 = 0 λ2 = 120 λ3 = 150

And take λ into the equation-|A′ A − λI| = 0, then we get:

v1 = [1 − 0.5 0]′

v2 = [1 2 − 1] ′

v3 = [1 2 5] ′

c) Firstly, we can get: [ ]


1 1
P =
2 −0.5

Then [ ]
−1 0.2 0.8
P =
0.4 −0.4

And let: [ ]
120 0
Λ =
0 150

Then we can obtain that: A = P−1 ΛP


1 ASSIGNMENT 1 3

1.3 Problem 3
a) E(y(1) ) = (4, 3)′

b) E(Ay(1) ) = E(Y1 + 2Y2 ) = 4 + 2 × 3 = 10


[ ]
3 0
c) COV (y(1) ) =
0 1

d) COV (Ay(1) ) = var (Y1 + 2Y2 ) = 3 + 4 × 1 = 7

e) E(By(2) ) = E((Y3 − 2Y4 , 2Y3 − Y4 )′ ) = (0, 3)


[ ][ ]
(2) 9 − 2 × 2 × (−2) + 4 × 4 2 × 9 − 5 × (−2) + 2 × 4 33 36
f) COV (By ) =
2 × 9 − 5 × (−2) + 2 × 4 4 × 9 − 2 × 2 × (−2) + 4 36 48
[ ] [ ]
COV (Y1 , Y3 ) COV (Y1 , Y4 ) 2 2
g) COV (y(1) , y(2) ) = =
COV (Y2 , Y3 ) COV (Y2 , Y4 ) 1 0
[ ] [ ] [ ]
(1) (2) COV (Y1 + 2Y2 , Y3 − 2Y4 ) 2−2×2+2×1 0
h) COV (Ay , By ) = = =
COV (Y1 + 2Y2 , 2Y3 − Y4 ) 2×2−2+2×2×1 6

1.4 Problem 4
> library(readxl)
> library(lava)
> Data1 <- read_excel("F:/PyCharm/Project/HW_MSA/Data1.xlsx")
> options(digits=3)
> z1 <- Data1[’y1’]+Data1[’y2’]+Data1[’y3’]
> z2 <- 2*Data1[’y1’] - 3*Data1[’y2’] + 2*Data1[’y3’]
> z3 <- -Data1[’y1’] - 2*Data1[’y2’] - 3*Data1[’y3’]
> z <- data.frame(z1, z2, z3)
> names(z)[1] <- ’y1’
> names(z)[2] <- ’y2’
> names(z)[3] <- ’y3’
> apply(z, 2, mean)[1:3]
[ ]
(a) mean vector z= 38.4 40.8 −51.7
> cov(z[,1:3])
1 ASSIGNMENT 1 4

 
323.6 19.3 −461
 
sample covariance matrix Sz=  19.3 588.7 104  
−461.0 104.1 686
> cor(z[,1:3])  
1.0000 0.0441 −0.978
 
(b) sample correlation matrix Rz =   0.0441 1.0000 0.164 

−0.9781 0.1637 1.000
> det(cov(z[,1:3])); tr(cov(z[,1:3]))
(c) generalized variance = 45996; total variance = 1599
> chol(cov(z)); chol(cor(z))
(d) Cholesky decomposition:
   
18 1.07 −25.624 1 0.0441 −0.9781
   
LSz =   0 24.24 5.425  L Rz = 
 0 0.9990 0.2071


0 0.00 0.492 0 0.0000 0.0188

> ev_Sz<- eigen(cov(z)); L_Sz<- ev_Sz$values; V_Sz<-ev_Sz$vectors; > V_Sz;diag(L_Sz);t(V_Sz)


spectral decomposition and square root matrix of Sz:
   
−0.543 0.2039 0.814 1014 0 0.0000 −0.543 0.176 0.8208
   

SD (Sz) =  0.176 
0.9761 −0.127   0  
585 0.0000   0.204 0.976 −0.0747 

0.821 −0.0747 0.566 0 0 0.0776 0.814 −0.127 0.5663
 
−17.30 4.93 0.2268
 
SQ(Sz ) = 
 5.61 23.60 −0.0353 

26.13 −1.81 0.1578
spectral decomposition and square root matrix of Rz:
   
−0.6999 0.164 0.695 1.99 0.00 0.000000 −0.700 0.0865 0.709
   
SD ( Rz) =   
 0.0865 0.986 −0.146   0.00 1.01 0.000000   0.164 0.9855 0.042 

0.7090 0.042 0.704 0.00 0.00 0.000175 0.695 −0.1459 0.704
 
−1.390 0.1667 1.21e − 04
 
SQ( Rz ) =  
 0.172 0.9996 −2.55e − 05 
1.408 0.0426 1.23e − 04

> DATA_pollution1 <- read.csv("F:/备份 1/DATA_pollution.txt", sep="")


1 ASSIGNMENT 1 5

图 1:

> wind<- DATA_pollution1$Wind


> Solar_radiation<- DATA_pollution1$Solar.radiation
> CO<- DATA_pollution1$CO
> NO<- DATA_pollution1$NO
> NO2<- DATA_pollution1$NO2
> O3<- DATA_pollution1$O3
> HC<- DATA_pollution1$HC
> as.double(wind,Solar_radiation,CO,NO,NO2,O3,HC)
> plt<- cbind(wind,Solar_radiation,CO,NO,NO2,O3,HC)
> pairs(plt)
Comments: Some of the variables are correlated.
(b)
> mean_plt<- mean(plt);
mean = 15.8
> cov(plt); cor(plt)
1 ASSIGNMENT 1 6

 
2.500 −2.780 −0.378 −0.463 −0.585 −2.232 0.171
 
 −2.780 300.516 3.909 −1.387 6.763 30.791 0.624 

 
 −0.378 3.909 1.522 0.674 2.315 2.8220.142 
 
cov = 
 −0.463 −1.387 0.674 1.182 1.088 −0.811 0.177 
 
 −0.585 6.763 2.315 1.088 11.364 3.127 1.044 
 
 −2.232 2.822 −0.811 3.127 
 30.791 30.979 0.595 
0.171 0.624 0.142 0.177 1.044 0.595 0.479
 
1.000 −0.1014 −0.194 −0.2695 −0.110 −0.254 0.156
 
 −0.101 1.0000 0.183 −0.0736 0.116 0.319 0.052 
 
 
 −0.194 0.1828 1.000 0.5022 0.557 0.411 0.166 
 
cor = 
 −0.270 −0.0736 0.502 1.0000 0.297 −0.134 0.235  
 
 −0.110 0.1157 0.557 0.2969 1.000 0.167 0.448 
 
 −0.254 0.411 −0.1340 0.167 1.000 0.154 
 0.3191 
0.156 0.0520 0.166 0.2347 0.448 0.154 1.000
> plt_5<- plt[1:5,1:7]
> dist(plt_5,method="euclidean",diag=TRUE,upper=TRUE)
(c)  
0.00 10.54 9.49 13.30 9.11
 
 10.54 0.00 5.74 21.77 16.85 
 
 
euclidean =  9.49 5.74 0.00 18.08 13.08 
 
 13.30 21.77 18.08 0.00 7.21 
 
9.11 16.85 13.08 7.21 0.00
> dist(plt_5,method="manhattan",diag=TRUE,upper=TRUE)
 
0 21 20 27 19
 
 21 0 9 36 24 
 
 
manhattan =  20 9 0 33 21 
 
 27 36 33 0 14 
 
19 24 21 14 0
Using manhattan distance considers the distribution of the spots better than euclidean.
> det(cov(plt))
(d)the overall variability = 35308
> chol(cor(plt))
1 ASSIGNMENT 1 7

 
1 −0.101 −0.194 −0.270 −0.1098 −0.2536 0.1561
 
 0 0.995 0.164 −0.101 0.1051 0.2949 0.0682 
 
 
 0 0.000 0.967 0.482 0.5356 0.3240 0.1914 
 
CD = 
 0 0.000 0.000 0.827 0.0237 −0.3973 0.2313 

 
 0 0.000 0.000 0.000 0.8303 −0.0679 0.4212 
 
 0.7624 0.3048 
 0 0.000 0.000 0.000 0.0000 
0 0.000 0.000 0.000 0.0000 0.0000 0.7813
> ev<- eigen(cor(plt)); L<- ev$values; V<-ev$vectors;
> V; diag(L); t(V)
 
0.237 0.27845 0.643 0.1727 0.5605 −0.2236 −0.2415
 
 −0.206 −0.52661 0.224 0.7781 −0.1561 −0.0057 −0.0113 
 
 
 −0.551 −0.00682 −0.114 0.0053 0.5734 −0.1095 0.5852 
 
A = 
 −0.378 0.43467 −0.407 0.2905 −0.0567 −0.4502 −0.4609 

 
 −0.498 0.19977 0.197 −0.0424 0.0502 0.7450 −0.3378 
 
 −0.325 −0.56697 0.160 −0.5079 0.0802 − 0.3306 −0.4171 
 
−0.319 0.30788 0.541 −0.1431 −0.5661 −0.2665 0.3139
 
2.34 0.00 0.0 0.000 0.000 0.000 0.000
 
 0.00 1.39 0.0 0.000 0.000 0.000 0.000 
 
 
 0.00 0.00 1.2 0.000 0.000 0.000 0.000 
 
D = 
 0.00 0.00 0.0 0.727 0.000 0.000 0.000 
 
 0.00 0.00 0.0 0.000 0.653 0.000 0.000 
 
 0.00 0.00 0.0 0.000 0.000 0.537 0.000 
 
0.00 0.00 0.0 0.000 0.000 0.000 0.156

SD = ADA T

You might also like