Lecture 2 Annotated
Lecture 2 Annotated
Lecture 2 Annotated
Georgios Arvanitidis
Nearest Neighbor
3 October: C10, C12 (Project 1 due before 13:00)
7 Performance evaluation, Bayes, and
Naive Bayes
10 October: C11, C13
What is data?
Types of attributes
There are a finite set of brands thus NAME is TYPE must be considered nominal, VIT on the other
discrete and as the only operators that can be applied hand is ordinal as 0% is less than 25% which in turn
to NAME is equal or not equal NAME is nominal. is less than 100%. An attribute that is ratio will also
PROT, FAT and SOD are all continuous and since be both interval, ordinal and nominal, i.e. we can
they have that zero means absence they are ratio. apply all the operations =, 6=, >, <, +, , ⇤, / to a ratio
TYPE and VIT are both discrete, however, TYPE attribute.
is not ordinal, i.e. Hot is not better than Cold, thus
XI
x2
x3
x4
x 5
0 1
0
Data quality
Noise
Outliers
Missing values
• Definition
• No value is stored for an attribute
in a data object
• Reasons for missing values
• Information is not collected or
measured
• People decline to give their age
• Attribute is not applicable
• Annual income is not applicable
to children
• Handling missing values
• Eliminate data objects
• Eliminate attributes
• Estimate missing values (e.g. an
average)
• Ignore the missing value in analysis
• Model the missing values
18 DTU Compute Lecture 2 5 September, 2023
osvg-17
Dataset manipulations
Feature processing
One-out-of K encoding
DNS
Image representation
-
E
- -
..
--
...
Matrix multiplication
C DC D C D
1 2 5 6 19 22
Example: =
3 4 7 8 43 50
33 DTU Compute Lecture 2 5 September, 2023
osvg-32
Matrix transpose
Norms
x
-
[2]
1 1 2 2
xX
.
.
+
=
trace)-])
L
=
a + b
Vector spaces
- =
[
"
V
........
V =
find O
we can
&" by using M
sincer
n
poin combination XiGIR
a
vectors
of these
Subspaces
~ in IR3
any
V dr21+012 227 03 Es
EIR
.
ae2
x
=
a 21 + a .
es
e =
(1 ,
0 , 0)
er xz =
a?. 2 .
+
023 EIR 3
0)
.
t > in IR
ez
=
(0 , ,
won this plane
any
es =
(0 0 H ,
e
n b. bz x z EIR
,
x
.
w
+
=
Basis of a (sub)space
Basis of a (sub)space
in 1123
point
any
x
=
an(8) an( : )
+
+ a(8)
112
any point on tre plane in
br Un + bz Vz
y
.
=
linear
each vi is vector and by
find any point
a
combinations we can
·n a subspace/rector space .
Projection
EIR
x
·
Emel
-
[i ,
,
,
. .
.,
v]mx
x V] EIR"
xun ,
= Ix'r ,
..., M
bxVx R
=b : V- +
br vz
.
+ -.
+
·
X
=[xre ,
x ve] < EIR -I
[4 +] [b bz] bi
in be
=
.
=
+
,
,
are
&
y
43 DTU Compute Lecture 2 5 September, 2023 r
osvg-46
o
po
%.......
6 - - o o
11
46 DTU Compute Lecture 2 5 September, 2023
osvg-49
biGIR
PCA derivation
-
PCA derivation
ˆL
L(v, ⁄) = v T X̃ T X̃v ≠ ⁄(v T v ≠ 1), = 2X̃ T X̃v ≠ 2⁄v = 0
ˆv
or X̃ T X̃v = ⁄v
48 DTU Compute Lecture 2 5 September, 2023
osvg-49
PCA derivation
(1 +
,
b)
ˆL
ˆv = 2X̃ T X̃v ≠ 2⁄v = 0 or X̃ T X̃v = ⁄v (62 ,
v2)
i
1 1
This means that Var[b] = v T ⁄v = ⁄ (1v vn)
N ≠1 N ≠1 ,
mi[fne[/in
>I
Explained Variance
Recall that from SVD: X̃ = U ⌃V T
In the original space, the coordinates
of X̃ project onto the first K
components are:
centerestructions
recons X Õ = U ⌃(K) V(K)
T cov(X̃) = X̃ T X̃
NxK kxM trace(AB) = trace(BA)
We can measure how much variance
is retained in the reconstruction X Õ :
ÎX̃Î2F = trace(X̃ T X̃) = trace(X̃ X̃ T )
ÎX Õ Î2F
Explained var. =
ÎX̃Î2F
Explained Variance
Recall that from SVD: X̃ = U ⌃V T
In the original space, the coordinates
of X̃ project onto the first K
components are:
cov(X̃) = X̃ T X̃
X Õ = U ⌃(K) V(K)
T
trace(AB) = trace(BA)
We can measure how much variance
is retained in the reconstruction X Õ :
ÎX̃Î2F = trace(X̃ T X̃) = trace(X̃ X̃ T )
qK
ÎX Õ Î2F 2
i=1 ‡i = trace(U ⌃V T (U ⌃V T )T )
Explained var. = = q
ÎX̃Î2F M 2
i=1 ‡i = trace(U ⌃V T V ⌃T U T )
= trace(U ⌃⌃T U T )
= trace(U T U ⌃⌃T )
ÿ
= trace(⌃⌃T ) = ‡i2
i
Explained Variance
Recall that from SVD: X̃ = U ⌃V T
In the original space, the coordinates
of X̃ project onto the first K
components are:
cov(X̃) = X̃ T X̃
X Õ = U ⌃(K) V(K)
T
trace(AB) = trace(BA)
We can measure how much variance
is retained in the reconstruction X Õ :
ÎX̃Î2F = trace(X̃ T X̃) = trace(X̃ X̃ T )
qK
ÎX Õ Î2F 2
i=1 ‡i = trace(U ⌃V T (U ⌃V T )T )
Explained var. = = q
ÎX̃Î2F M 2
i=1 ‡i = trace(U ⌃V T V ⌃T U T )
2 2 2 2
The ith principal component accounts for P i
2 = count for 40.1 +34.2 +28.1
5780.0 = 61.7% of the variation
j j
2 whereas the fourth principal component accounts for
i
. We therefore have that the first PCA compo- 24.8 2
kXk2F 5780.0 = 10.6%. Thus, the first three PCA compo-
2 2
40.1
nent accounts for 5780.0 34.2
= 27.8%, the second 5780.0 = nents account for less than 70% of the variation in
20.2%, and the first three principal components ac- the data.
0
C. PCA2 mainly discriminate between old subjects with
low measurements of TB, DB, AlA, AsA, TP, AB, and
A/G from young subjects with high values of TB, DB,
AlA, AsA, TP, AB, and A/G.
D. The principal component directions are not guaran-
teed to be orthogonal to each other since the data has
61 DTU Compute Lecture 2 5 September, 2023
been standardized.
Solution:
AGE, GDR, TB, DB, AP, AlA, and AsA have have positive values while GDR and AP have small
negative coefficients of PCA1 whereas TP, AB, and amplitudes. As a result PCA2 mainly discriminate
A/G have positive coefficients resulting in a negative between young subjects with high measurements of
projection onto the first principal component, thus TD, DB, AlA, AsA, TP, AB, and A/G from old
this is correct. From the figure we observe that subjects with low values of TD, DB, AlA, AsA, TP,
observations with low values of PCA1 and high values AB, and A/G hence this is correct. The principal
of PCA2 in general have a red dot meaning they component directions are always orthogonal to each
have a liver disease. For PCA2 we observe that AGE other irrespective of the data preprocessing.
has a negative value whereas the remaining entities
PCA as compression
add the
mean
actua
recontractions
↓