Final 2012 W

Machine Learning 1 — WS2012 — Module IN2064 Final Exam · Page 1
Machine Learning 1 — Final Exam
Preliminaries
• How to hand in:

– write your answers on these exam sheets only;
– write your immat but not your name on every page that you hand in.
– hand in all exam sheets.
• The exam is open book. You may use all the material you want, while obeying the following rules:
– you are not allowed to consult or communicate with other people, be it in the room or anywhere
outside, except for with the examinators;
– you must always place the screens of your computers and other used digital devices so that
the examiners can see what you are doing;
– failure to comply with these simple rules may lead to 0 points.
In short, we will be as fair as we can, but expect the same from you in return.
• The exam is limited to 3 × 60 minutes.
• This exam consists of 8 pages, 7 sections, 20 problems. You can earn up to 72 points.
Problem 1 [2 points]. Fill in your immatriculation number on every sheet you hand in. As you will
also write on these sheets, write it on this one, too. Make sure it is easily readable. Make sure you do
not write your name on any sheet you hand in.
imat:
Final Exam · Page 2 Machine Learning 1 — WS2012/2013 — Module IN2064
1 Linear (?) Classification
You are given a one-dimensional data set consisting of six points belonging to two classes. The set for
the first class is {−1, 0, 1} and for the second class {−.1, 0, .1}.
Problem 2 [2 points]. Consider a generative model where you want to model each class with a univariate
Gaussian:
p(x|c1 ) = N (µ1 , σ12 )

p(x|c2 ) = N (µ2 , σ22 )
Calculate the sufficient statistics µ1 , σ1 , µ2 and σ2 from the data set.
Problem 3 [2 points]. Draw the points, the conditional densities and the decision boundaries qualita-
tively into a single diagram.
Problem 4 [2 points]. You are given the data set in the plot in Figure 1. Plot the decision boundaries
of the following three models qualitatively into the plot:
• A generative model with Gaussian class conditionals where each class has its own mean and spherical
covariance matrix,
• A logistic regression model,
• A linear model trained with soft zero-one loss.
Figure 1: Some data points waiting to meet their decision boundaries.
imat:
2 Dice
Problem 5 [10 points]. As an avid player of board games you have a nice collection of non-standard
dice: You have a 3-sided, 5-sided, 7-sided, 11-sided and 20-sided die. The 5 dice are treasured in a
beautiful purple velvet bag. Without looking, a friend of yours randomly chooses a die from the bag and
rolls a 6. What is the probability that the 11-sided die was chosen? What is the probability that the
20-sided die was used for the role? Show your work!
Now your friend rolls (with the same die!) an 18. What is the probability now that the die is 11-sided?
What is the probability that it is 20-sided? Show your work!
imat:
3 Linear Regression
In Figure 2 you see 6 i.i.d noisy samples from an unknown function f : R → R. Every sample (a black dot
in the figure) is a pair (xi , zi = f (xi ) + ε), where ε is some Gaussian noise, with mean zero and
P5 variance
σ . We want to model this unknown function using polynomials of degree 5, that is ẑ(x) = i=0 wi xi .
2
Because we have the 6 samples, we decide to use maximum likelihood estimation in order to find the
parameters w = (w0 , w1 , w2 , w3 , w4 , w5 ). We also know that MLE is prone to overfitting and therefore
decide to use `2 regularisation. Mathematically this is formulated as follows:
6
" 5
#2 5
X X X
∗ i
w = arg min zj − wi (xj ) +λ wi2
w
j=1 i=0 i=1
where λ is the regularisation parameter. Note that w0 is not regularised.
Figure 2: Figure for problems in section 3. 6 i.i.d samples from an unknown function f . The horizontal
axis is the x-axis
Problem 6 [3 points]. Consider the 6 samples in Figure 2 again and draw (qualitatively) the solution
with λ = 0 into Figure 2.
Problem 7 [3 points]. What is the solution as λ → ∞? Again, draw (qualitatively) your solution into
Figure 2.
Problem 8 [3 points]. Now, we also regularise w0 , i.e., the above optimisation problem is slightly
changed to:
6
" 5
#2 5
X X X
∗ i
w = arg min zj − wi (xj ) +λ wi2
w
j=1 i=0 i=0
What happens as λ → ∞? Draw the exact solution again into Figure 2.
imat:
4 K-Means
Suppose we have some data points (the black dots) as in Figure 3. We want to apply K-Means to this
data, using k = 2 and centres initialised to the two circled data points.
Figure 3: Cluster the black dots with K-Means. The two circles denote the initial guess for the two
cluster centres, respectively.
Problem 9 [3 points]. What are the clusters after K-Means converges? Draw your solution into Figure 3,
i.e., show the rough location of the centres (using a star-like shape) and show the clusters by drawing two
bounding boxes around the points assigned to each centre.
Problem 10 [3 points]. Give a different initialisation that will produce the same result. Draw your
choice also into Figure 3, using triangles.
Figure 4: Can this clustering be produced by K-Means?
Problem 11 [4 points]. Dr. Iso Map claims, the clustering in Figure 4 (indicated by the bounding boxes)
results from a K-Means run. Is this possible? Give reason(s) for your answer.
imat:
5 Neural Networks
Zila could not pass on this offer. . . as engineer she always wanted to have a robot, and this was too good
to be true.
On arrival of the package, Zila is irritated to only find the 3-joint arm itself—no manual, no control board,
nothing. Fortunately, she recognises the connector of the robot: a standard ******1 connector. So off
she goes and purchases the corresponding computer interface, and can quickly connect to the robot.
The connector, she knows, allows for several interfaces. One is a simple position control interface: if one
sends a command (q1 , q2 , q3 )—the angles of the three rotary joints—the robot moves to that position. But
how can she use that to move the robot’s hand to a position (x, y, z)? Zila does not know the kinematics
of the robot.
How can she solve the inverse kinematics, i.e., find the function f (x, y, z) = (q1 , q2 , q3 )? Zila smartly
decides to use a neural network to solve the problem. She randomly generates about N = 1000 joint
positions (q1i , q2i , q3i ) and records the corresponding points (xi , y i , z i ) where the hand ends. To represent
the data, Zila takes a neural network
      
Xn X3 n
X 3
X Xn X3
F(x1 , x2 , x3 ) =  w1i σ  vij xj  , w2i σ  vij xj  , w3i σ  vij xj 
i=1 j=0 i=1 j=0 i=1 j=0
where x0 = 1 and σ(x) = tanh(x). To determine the parameters v and w she minimises the error
L≤N
1 X 2
E= f (xi , y i , z i ) − F(xi , y i , z i )
2
i=1
using back-propagation.
Problem 12 [4 points]. What would happen if we would take x0 = 0.5? What would happen if we
would take x0 = 0?
Problem 13 [4 points]. Explain why Zila’s (standard) choice of error function means, that she implicitly
assumes that her measurement error has a Gaussian distribution. Hint: consider what the corresponding
likelihood function is.
Problem 14 [4 points]. At what value of E should Zila stop training?
Problem 15 [6 points]. It works! Zila finds an acceptable number n of hidden units with which she can
get a reasonable representation of f . However, she finds that she does need many, many samples N to
find her optimal value of E.
She has a great idea: rather than using σ(x) = tanh(x), she goes ahead and trains with σ(x) = sin(x).
Why is this a good idea for learning the inverse kinematics of a robot with rotary joints?
1
We don’t want to be commercial here.
imat:
6 Stacking feature maps
Suppose you have found a feature map θ : Rn → Rm that transforms your data into a feature space
in which a SVM with a Gaussian kernel works well. However computing the feature map θ(x) is
computationally expensive and luckily you discover an efficient method to compute the scalar product
K(x, y) = θ(x)T θ(y) in your feature space without having to compute θ(x) and θ(y) explicitly.
Problem 16 [6 points]. Show how you can use the scalar product K(x, y) to efficiently compute the
Gaussian kernel in your feature space, that is
Kg (θ(x), θ(y))
where
|a − b|2

Kg (a, b) = exp −
2σ 2
is the Gaussian kernel.
imat:
7 Constrained optimisation
Given l points x1 , . . . , xl ∈ RN consider the problem of finding the ball, that is the set
BR (v) = {u : |u − v|2 ≤ R2 } ,
with minimum radius R that contains all points x1 , . . . , xl . For example, in two dimensions this is the
smallest circle that contains all given points as shown in the figure below.
R v
Problem 17 [3 points]. Formulate the problem of finding the smallest ball BR (v) that contains all
points x1 , . . . , xl as a constrained optimisation problem. Note that both R and v are not given, but have
to be found during the optimisation.
Problem 18 [2 points]. Formulate the Lagrangian corresponding to this constrained optimisation prob-
lem.
Problem 19 [4 points]. Calculate the Lagange dual function and formulate the Lagrange dual problem.
Do not attempt to solve the Lagrange dual problem.
Problem 20 [2 points]. In the two dimensional example shown in the figure above mark all points xi
which have their corresponding Lagrange multiplier αi = 0. Shortly explain how you chose these points.
imat:

Final 2012 W

Uploaded by

Copyright:

Available Formats

Final 2012 W

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final 2012 W

Uploaded by

Copyright:

Available Formats

Machine Learning 1 — WS2012 — Module IN2064 Final Exam · Page 1

Machine Learning 1 — Final Exam

• How to hand in:

1 Linear (?) Classification

p(x|c1 ) = N (µ1 , σ12 )

Calculate the sufficient statistics µ1 , σ1 , µ2 and σ2 from the data set.

Figure 1: Some data points waiting to meet their decision boundaries.

where λ is the regularisation parameter. Note that w0 is not regularised.

What happens as λ → ∞? Draw the exact solution again into Figure 2.

Figure 4: Can this clustering be produced by K-Means?

Problem 14 [4 points]. At what value of E should Zila stop training?

6 Stacking feature maps

You might also like