Sample MCQ

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

150511/290501 Data Science

SECTION-A (Marks = 01)

1. What are the three key components of Data Science?

a) Data, Algorithm, and Visualization.

b) Data, Model, and Visualization.

c) Data, Statistics, and Visualization.

d) Data, Machine Learning, and Visualization.

2. Which of the following is a supervised learning technique?

a) Linear Regression.

b) K-Means Clustering.

c) Hierarchical Clustering.

d) Apriori Algorithm.

3. Which of the following is an unsupervised learning technique?

a) Decision Trees.

b) Naive Bayes.

c) K-Means Clustering.

d) Gradient Boosting.

4. What is the goal of feature engineering?

a) To reduce the number of features.

b) To increase the number of features.

c) To transform the features into a more suitable representation for a machine learning algorithm.

d) To remove all features.

5. Which of the following is a data visualization technique?

a) K-Means Clustering.

b) Hierarchical Clustering.

c) Box Plot.

d) Apriori Algorithm.

6. What is the difference between precision and recall?


a) Precision measures the number of true positives, while recall measures the number of false
positives.

b) Precision measures the number of true positives, while recall measures the number of false
negatives.

c) Precision measures the number of false positives, while recall measures the number of true
negatives.

d) Precision measures the number of false negatives, while recall measures the number of true
positives.

7. Which of the following is a measure of the quality of a classification algorithm?

a) Precision.

b) Recall.

c) F1 Score.

d) All of the above.

8. What is the purpose of cross-validation?

a) To ensure that the model is not overfitting the data.

b) To ensure that the model is not underfitting the data.

c) To ensure that the model is not biased towards the training data.

d) To ensure that the model is not biased towards the test data.

9. Which of the following is a classification algorithm?

a) Linear Regression.

b) Random Forest.

c) K-Means Clustering.

d) Principal Component Analysis.

10. What is the purpose of hypothesis testing in data science?

a) To determine if a sample statistic is significantly different from a population parameter.

b) To determine the accuracy of a machine learning algorithm.

c) To determine the correlation between two variables.

d) To determine the causation between two variables.

11. What is the null hypothesis in hypothesis testing?

a) A hypothesis that states there is a significant difference between a sample statistic and a
population parameter.

b) A hypothesis that states there is no significant difference between a sample statistic and a
population parameter.
c) A hypothesis that states there is a perfect correlation between two variables.

d) A hypothesis that states there is a causal relationship between two variables.

12. What is the alternative hypothesis in hypothesis testing?

a) A hypothesis that states there is a significant difference between a sample statistic and a
population parameter.

b) A hypothesis that states there is no significant difference between a sample statistic and a
population parameter.

c) A hypothesis that states there is a perfect correlation between two variables.

d) A hypothesis that states there is a causal relationship between two variables.

13. What is the p-value in hypothesis testing?

a) The probability of observing a sample statistic as extreme or more extreme than the one
observed, assuming the null hypothesis is true.

b) The probability of observing a sample statistic as extreme or more extreme than the one
observed, assuming the alternative hypothesis is true.

c) The probability of observing a perfect correlation between two variables.

d) The probability of observing a causal relationship between two variables.

14. What is the significance level in hypothesis testing?

a) The probability of making a type II error.

b) The probability of making a type I error.

c) The probability of observing a perfect correlation between two variables.

d) The probability of observing a causal relationship between two variables.

15. What is a type II error in hypothesis testing?

a) Failing to reject the null hypothesis when it is actually false.

b) Rejecting the null hypothesis when it is actually true.

c) Failing to reject the alternative hypothesis when it is actually false.

d) Rejecting the alternative hypothesis when it is actually true.

16. Which of the following measures of central tendency is most affected by outliers?

a) Mean

b) Median

c) Mode

d) All of the above


17. What is the measure of central tendency that represents the middle value of a dataset?

a) Mean

b) Median

c) Mode

d) Standard deviation

18. The measure of central tendency that represents the most frequently occurring value in a
dataset is known as-

a) Mean

b) Median

c) Mode

d) Range

19. If a dataset has an even number of observations, what value is used as the median?

a) The mean of the two middle values

b) The first middle value

c) The last middle value

d) None of the above

20. What is the measure of central tendency that is used to represent the typical value of a
dataset?

a) Mean

b) Median

c) Mode

d) All of the above

21. Which of the following is not a measure of dispersion?

a) Range

b) Standard deviation

c) Variance

d) Median

22. What is the range of a dataset?

a) The difference between the highest and lowest values

b) The average of the highest and lowest values

c) The sum of the highest and lowest values

d) None of the above


23. Which of the following measures of central tendency is affected by the size of the dataset?

a) Mean

b) Median

c) Mode

d) All of the above

24. What is the formula for calculating the variance?

a) (sum of values) / (number of values)

b) (sum of squared deviations) / (number of values)

c) (sum of deviations) / (number of values)

d) (sum of squared values) / (number of values)

25. What is the measure of spread that is equal to the square root of the variance?

a) Variance

b) Standard deviation

c) Range

d) Mean absolute deviation

26. What best defines Data Science?


a. A domain focusing solely on data storage
b. The study of data to extract meaningful insights
c. A field restricted to structured data only
d. The process of securing data

27. What is the impact of Data Science on businesses?


a. No impact at all
b. Decreased profitability
c. Improved decision-making and efficiency
d. Increased manufacturing costs

28. Which of the following is a built-in data type in Python?


a. Stack
b. Queue
c. List
d. Tree

29. What is the result of the expression `"Hello, " + "World!"` in Python?
a. "Hello,World!"
b. "Hello World!"
c. "Hello, World!"
d. Error

30. Which Python module is used to generate pseudo-random numbers?


a. `random`
b. `math`
c. `numpy`
d. `statistics`

31. . To generate a random integer between 1 and 10 (inclusive), which `random` module
function should you use?
a. `randint(1, 10)`
b. `random(1, 10)`
c. `randrange(1, 10)`
d. `uniform(1, 10)`

32. In Python, how can you define a function that accepts an arbitrary number of positional
arguments?
a. Using the `*args` parameter
b. Using the `**kwargs` parameter
c. Using the `&args` parameter
d. Using the `#args` parameter
33. . Which of the following is the primary data structure in NumPy for working with arrays?
a. List
b. Tuple
c. ndarray
d. Dictionary

34. . How can you create a NumPy array containing integers from 0 to 9?
a. `np.array(0, 9)`
b. `np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`
c. `np.arange(10)`
d. `np.linspace(0, 9, 10)`

35. . What is the data type of the elements in a NumPy array by default?
a. Integer
b. Float
c. String
d. None

36. What will be the result of the operation `np.array([1, 2, 3]) + np.array([4, 5, 6])` in
NumPy?
a. `[5, 7, 9]`
b. `[1, 2, 3, 4, 5, 6]`
c. `[1, 2, 3, 4, 5, 6]`
d. `[4, 5, 6]`

37. How can you access the element at the second row and third column of a NumPy array
`arr`?
a. `arr(2, 3)`
b. `arr[2, 3]`
c. `arr[1, 2]`
d. `arr[3, 2]`

38 . Which of the following is a universal function (ufunc) in NumPy?


a. `for loop`
b. `if statement`
c. `sqrt`
d. `class definition`

39. . A sample of N observations are independently drawn from a normal distribution. The
sample variance follows
a. Normal distribution
b. Chi-square with N degrees of freedom
c. Chi-square with N − 1 degrees of freedom
d. t-distribution with N − 1 degrees of freedom

40. Gradient descent algorithm convergestothelo calminimum.


a. True
b. False

41. State whether the following statement is True or False.[1 mark] Covariance is a better
metric to analyze the association between two numerical variables than correlation.
a. True
b. Fasle

42. Linear Regression is an optimization problem where we attempt to minimize [1 mark]


a. SSR (residual sum-of-squares)
b. SST (total sum-of-squares)
c. SSE (sum-squared error)
d. Slope

43. . Which among the following is not a type of cross-validation technique?


a. LOOCV
b. k-fold croos validation
c. Validation set approach
d. Bias variance trade off

44. Which among the following is a classification problem?


a. Predicting the average rainfall in a given month.
b. Predicting whether a patient is diagnosed with a disease or not.
c. Predicting the price of a house.
d. None

45. In Simple Linear Regression, if the equation of the regression line is given as `y = 2x +
3`, what is the predicted value of `y` when `x` is 5?
a. 13
b. 12
c. 16
d. None

46. Which of the following is NOT a common assumption of linear regression?


a. Linearity.
b. Independence of residuals.
c. Multicollinearity.
d. Normality of residuals.

47. In logistic regression, what is the dependent variable typically representing?


a. A continuous numeric value.
b. A binary outcome or category.
c. Time series datA)
d. A count of occurrences.

48. What is the purpose of residual analysis in regression?


a. To calculate the coefficient of determination (R-squared).
b. To identify outliers and assess the model's assumptions.
c. To compute the p-value for the model.
d. To estimate the model's parameters.

49. When performing polynomial regression, what does increasing the degree of the
polynomial typically result in?
a. Improved model simplicity.
b. Decreased model flexibility.
c. Overfitting the datA)
d. Decreased model complexity.

50. Which type of regression is most suitable for handling multicollinearity among
independent variables?
a. Linear Regression.
b. Polynomial Regression.
c. Lasso Regression.
d. Logistic Regression.

SECTION-B (Marks = 1.5)


1. Which of the following would complete val = to set val to 20 by slicing aTuple.
aTuple = ("Orange", (10, 20, 30), (5, 15, 25))
val =
a. val = aTuple[1:2][1]
b. val = aTuple[2][1]
c. val = aTuple[1:2](1)
d. val = aTuple[1][1]

2. What will the following code return?


def practice(tup):
a, b, c = tup
return a
aTuple = "Yellow", 20, "Red"
practice(aTuple)
a. ("Yellow", 20, "Red")
b. Yellow
c. 20
d. Red

3. Letf(x)=x3+3x2−24x+7.Selectthecorrectoptionsfromthefollowing:[3marks]
a. x=2willgivethemaximumforf(x).
b. x=2willgivetheminimumforf(x).
c. Thestationarypointsforf(x)are2and4.
d. None

4. Letf(x,y)=−3x2−6xy−6y2.Thepoint(0,0)isa
a. saddlepoint
b. maxima
c. Minima
d. None
5. You have a dataset with three variables: `X`, `Y`, and `Z`. The correlation coefficient
between `X` and `Y` is -0.6, and between `Y` and `Z` is 0.8. What is the correlation
coefficient between `X` and `Z`?
a. -0.48
b. -0.75
c. -1.33
d. None

6. Which of the following is an example of a dependent variable in regression analysis?

a) Age

b) Gender

c) Income

d) Sales

7. What is the difference between linear and logistic regression?

a) Linear regression is used for predicting continuous variables, while logistic regression is used for
predicting categorical variables.

b) Linear regression is used for predicting categorical variables, while logistic regression is used for
predicting continuous variables.

c) Linear regression and logistic regression are the same thing.

d) None of the above.

8. Which of the following is a measure of the goodness of fit in linear regression?

a) R-squared

b) Odds ratio

c) P-value

d) Coefficient of determination

9. Which of the following is a measure of the strength and direction of the relationship between
two variables in regression analysis?

a) Coefficient of determination

b) R-squared

c) Correlation coefficient

d) Odds ratio

10. What is the objective of k-means clustering?

a) To maximize the distance between clusters


b) To minimize the distance within clusters

c) To maximize the distance within clusters

d) To minimize the distance between clusters

11. What is the difference between k-means clustering and hierarchical clustering?

a) K-means clustering forms clusters by iteratively assigning data points to the nearest centroid,
while hierarchical clustering forms clusters by iteratively merging or splitting clusters.

b) K-means clustering forms clusters by iteratively merging or splitting clusters, while hierarchical
clustering forms clusters by iteratively assigning data points to the nearest centroid.

c) K-means clustering and hierarchical clustering are the same thing.

d) None of the above.

12. Which of the following is a limitation of k-means clustering?

a) It requires a priori knowledge of the number of clusters.

b) It can only handle numerical data.

c) It is sensitive to the initial positions of the centroids.

d) All of the above.

13. What is gradient descent?

a) An optimization algorithm used to minimize the cost function in machine learning.

b) A clustering algorithm used to group similar data points together.

c) A supervised learning algorithm used to predict categorical outcomes.

d) An unsupervised learning algorithm used to find patterns in data.

14. Which of the following is a hyperparameter of the gradient descent algorithm?

a) The learning rate

b) The number of iterations

c) The cost function

d) The data set

15. Which of the following is a disadvantage of using a low learning rate in gradient descent?

a) The algorithm may converge slowly.

b) The algorithm may overshoot the minimum.

c) The algorithm may get stuck in a local minimum.

d) None of the above.


SECTION-C (Marks = 2)

1. Consider the following systerm of linear equation:


x + y + z = −2
x + 2y − z = 1
2x + ay + bz = 2
Find the conditions on a and b for which the above system has no solution.
a. 2a + b − 6 = 0
b. a ̸= 4, 2a + b − 6 = 0
c. a = 4, b = −2
d. 2a + b − 6 ̸= 0

2. Which among the following is true for the determinant of a matrix?


a. The determinant of a diagonal matrix is the product of its diagonal entries.
b. If one row of a matrix is a scalar multiple of another, the determinant is 1.
c. The determinant of a permutation matrix can only be 1.
d. None

3. ConsiderthefollowingconfusionmatrixfortheclassicationofHatchbackandSUV:

True
Hatchback SU
V
Prediction Hatchback 55 5
SUV 0 40

Findtheaccuracyofthemodel.
a. 0.95
b. 0.55
c. 0.45
d. 0.88

4. ConsiderthefollowingconfusionmatrixfortheclassicationofHatchbackandSUV:

True
Hatchback SU
V
Prediction Hatchback 55 5
SUV 0 40

Findthesensitivityofthemodel.
a. 0.95
b. 0.55
c. 1
d. 0.88

5. Given a Multiple Linear Regression model with three independent variables: `X1 = 4`, `X2
= 7`, and `X3 = 10`, and the coefficients: `β1 = 2`, `β2 = 3`, and `β3 = 1`, calculate the
predicted value of the dependent variable `Y`.
a. 31
b. 29
c. 23
d. 27

6. What is the difference between simple linear regression and multiple regression?

a) Simple linear regression involves only one independent variable, while multiple regression
involves more than one independent variable.

b) Simple linear regression involves more than one independent variable, while multiple regression
involves only one independent variable.

c) Simple linear regression and multiple regression are the same thing.

d) None of the above.

7. What is the purpose of multivariate optimization?

a) To find the minimum or maximum of a function with multiple variables.

b) To find the optimal number of clusters in a data set.

c) To find the best model for a machine learning problem.

d) None of the above.

8. Which of the following is a method for finding the minimum of a function with multiple
variables without using derivatives?

a) Gradient descent

b) Newton's method

c) Levenberg-Marquardt algorithm

d) None of the above.

9. What is pruning in decision trees?

a) A technique used to reduce the size of a decision tree by removing unnecessary branches.
b) A technique used to increase the size of a decision tree by adding more branches.

c) A technique used to change the shape of a decision tree.

d) None of the above.

10. What is the goal of a support vector machine (SVM)?

a) To find the optimal decision boundary that separates two classes of data.

b) To cluster data points together based on their similarities.

c) To predict continuous outcomes.

d) None of the above.

1 Answer: b) Data, Model, and 36 Answer: a) Simple linear regression


Visualization. involves only one independent
variable, while multiple regression
involves more than one independent
variable.

You might also like