QB AMT305module 2
QB AMT305module 2
QB AMT305module 2
1. Suppose that you are asked to perform linear regression to learn the function that outputs
y, given the D-dimensional input x. You are given N independent data points, and that all the
D attributes are linearly independent. Assuming that D is around 100, would you prefer the
closed form solution or gradient descent to estimate the regressor?
2. Suppose you have a three class problem where class label y ∈ 0, 1, 2 and each training
example X has 3 binary attributes X1, X2, X3 ∈0, 1. How many parameters (probability
distribution) do you need to know to classify an example using the Naive Bayes classifier?
3. Is principal component analysis a supervised learning problem? Justify your answer
4. Explain feature selection and feature extraction method for dimensionality reduction
5. Use the ID3 algorithm to construct a decision tree for the data in the following table.
6. (a)Classifier A attains 100% accuracy on the training set and 70% accuracy on the test
set. Classifier B attains 70% accuracy on the training set and 75% accuracy on the test
set. Which one is a better classifier? Justify your answer.
(b) How does bias and variance trade-off affect machine learning algorithms?
7. Let X = R2 and C be the set of all possible rectangles in two dimensional plane
which are axis aligned (not rotated). Show that this concept class is PAC
learnable.
8. The following dataset can be used to train a classifier that determines whether a
given person is likely to own a car or not. There are three features: education level
(primary, secondary, or university); residence (city or country); gender (female,
male).Use ID3 Algorithm and find the best attribute at the root level of the tree 9.
9. Consider a linear regression problem y = w1x + w0, with a training set having m
examples (x1, y1), . . .(xm, ym). Suppose that we wish to minimize the mean 5th
degree error (loss function) given by 1/m Σ1m(yi −w1xi − w0)5.
1. Calculate the gradient with respect to the parameter w1.
2. Write down pseudo-code for on-line gradient descent on w1.
3. Give one reason in favor of on-line gradient descent compared to batch-gradient
descent, and one reason in favor of batch over on-line.
10. Suppose the dataset had 9700 cancer-free images from 10000 images from cancer
patients. Find precision, recall and accuracy ? Is it a good classifier? Justify.
13. Describe the different regression models and formulate any one error measurrements
used in regression analysis. used in regression analysis.
14. State ID3 algorithm, used for decision tree classification
15. What is meant by k-fold cross validation. Given a dataset with 1600 instances, 3
how the k-fold validation is done with k = 10.
Use the above dataset to calculate, mean square error, mean absolute error and
root mean square error
16. Write a short note on logistic regression.
17 Use the following data to construct a linear regression model for an auto
insurance premium as a function of driving experience.
18. Explain the procedure to reduce the dimensionality of a dataset using principal
component analysis
19. Given the following data on a certain set of patients seen by a doctor, can the doctor conclude
that the person having chills, fever, mild headache and without running nose has a flu?