Data Analyzing by Using Z-Score Method and PCA: W.M.Safras Sc/2018/10464
Data Analyzing by Using Z-Score Method and PCA: W.M.Safras Sc/2018/10464
Data Analyzing by Using Z-Score Method and PCA: W.M.Safras Sc/2018/10464
W.M.SAFRAS
Sc/2018/10464
Outline
Z-score
PCA
Z-score
Standard Deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of
values . A low standard deviation indicates that the values tend to be close to the mean of the set,
while a high standard deviation indicates that the values are spread out over a wide range.
σ(𝑥𝑖 − 𝜇)2
𝜎=
𝑁
; σ = population standard deviation , µ = population mean, N = size of the population
This is the equation of standard deviation
Explanation through an example
Suppose we are given a data set as follow,
𝑥𝑖𝑗 =𝑖 𝑡ℎ students’ marks in 𝑗𝑡ℎ subject ;i=1,2,3……..N ;j=1,2,3
Then we need to find Z-scores in each subject,
Initially we are going to calculate mean and Standard deviation of each subject,
σ𝑁
𝑖=1 𝑥𝑖𝑗
1. 𝜇𝑗 = since we can calculate the population mean of each subject. Then
𝑁
we have three mean values are 𝜇1 ,𝜇2 & 𝜇3
σ𝑁
𝑖=1(𝑥𝑖𝑗 −𝜎𝑗 )
2
2. 𝜌𝑗 = by using this equation we can obtain 𝜎1 ,𝜎2 &𝜎3
𝑁
𝑥𝑖𝑗 −𝜇𝑗
• 𝑧𝑖𝑗 = from this equation you can find the students’ Z-scores with respect to the subjects.
𝜎𝑗
• Our initial assumption is j=3 ,since each student should have three Z-score values.
• Finally we need the average Z-score of each student, from which we can make students’ ranking
process.
𝑗
σ𝑗=1 𝑧𝑖𝑗
𝑍𝑖 =
𝑗
𝐸(𝑋 − 𝜇1 ) 𝐸 𝑋 − 𝜇1 𝜇1 − 𝜇2
𝐸 𝑧𝑖1 = = = =0
𝜎1 𝜎1 𝜎1
𝑋 − 𝜇1 1 1
𝑣𝑎𝑟 𝑧𝑖1 = 𝑣𝑎𝑟 = 2 𝑣𝑎𝑟 𝑋 − 𝜇1 = 2 𝑣𝑎𝑟 𝑋 = 1
𝜎1 𝜎 𝜎
𝑣𝑎𝑟 𝑋 = 𝜎 2
Principal Component Analysis(PCA)
A covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.
In the matrix diagonal there are variances i.e.. The covariance of each element with itself.
Formula
1
𝑐𝑜𝑣 𝑥, 𝑦 = (𝑥𝑖 − 𝜇𝑥 )(𝑦𝑖 − 𝜇𝑦 )
𝑁
⋯
⋮ ⋱ ⋮
⋯
Entries in covariance matrix are symmetric with respect to main diagonal
Explanation about PCA through an example
• Now we should select the eigen vectors corresponding the highest eigen value.
• This is the first principal components.
• Since we can find our new Z-scores.
𝑠1𝑖 − 𝜇1 𝑠2𝑖 − 𝜇2 𝑠3𝑖 − 𝜇3
𝑍 = 𝑥1 + 𝑥2 + 𝑥3
𝜎1 𝜎2 𝜎3
From this equation we can calculate the Z-scores for students, so above equation becomes as,
𝑍 = 𝑥1 𝑧1 + 𝑥2 𝑧2 + 𝑥3 𝑧3