Sample MCQ

150511/290501 Data Science
SECTION-A (Marks = 01)
1. What are the three key components of Data Science?
a) Data, Algorithm, and Visualization.
b) Data, Model, and Visualization.
c) Data, Statistics, and Visualization.
d) Data, Machine Learning, and Visualization.
2. Which of the following is a supervised learning technique?
a) Linear Regression.
b) K-Means Clustering.
c) Hierarchical Clustering.
d) Apriori Algorithm.
3. Which of the following is an unsupervised learning technique?
a) Decision Trees.
b) Naive Bayes.
c) K-Means Clustering.
d) Gradient Boosting.
4. What is the goal of feature engineering?
a) To reduce the number of features.
b) To increase the number of features.
c) To transform the features into a more suitable representation for a machine learning algorithm.
d) To remove all features.
5. Which of the following is a data visualization technique?
a) K-Means Clustering.
b) Hierarchical Clustering.
c) Box Plot.
d) Apriori Algorithm.
6. What is the difference between precision and recall?

a) Precision measures the number of true positives, while recall measures the number of false
positives.
b) Precision measures the number of true positives, while recall measures the number of false
negatives.
c) Precision measures the number of false positives, while recall measures the number of true
negatives.
d) Precision measures the number of false negatives, while recall measures the number of true
positives.
7. Which of the following is a measure of the quality of a classification algorithm?
a) Precision.
b) Recall.
c) F1 Score.
d) All of the above.
8. What is the purpose of cross-validation?
a) To ensure that the model is not overfitting the data.
b) To ensure that the model is not underfitting the data.
c) To ensure that the model is not biased towards the training data.
d) To ensure that the model is not biased towards the test data.
9. Which of the following is a classification algorithm?
a) Linear Regression.
b) Random Forest.
c) K-Means Clustering.
d) Principal Component Analysis.
10. What is the purpose of hypothesis testing in data science?
a) To determine if a sample statistic is significantly different from a population parameter.
b) To determine the accuracy of a machine learning algorithm.
c) To determine the correlation between two variables.
d) To determine the causation between two variables.
11. What is the null hypothesis in hypothesis testing?
a) A hypothesis that states there is a significant difference between a sample statistic and a
population parameter.
b) A hypothesis that states there is no significant difference between a sample statistic and a
c) A hypothesis that states there is a perfect correlation between two variables.
d) A hypothesis that states there is a causal relationship between two variables.
12. What is the alternative hypothesis in hypothesis testing?
a) A hypothesis that states there is a significant difference between a sample statistic and a
b) A hypothesis that states there is no significant difference between a sample statistic and a
c) A hypothesis that states there is a perfect correlation between two variables.
d) A hypothesis that states there is a causal relationship between two variables.
13. What is the p-value in hypothesis testing?
a) The probability of observing a sample statistic as extreme or more extreme than the one
observed, assuming the null hypothesis is true.
b) The probability of observing a sample statistic as extreme or more extreme than the one
observed, assuming the alternative hypothesis is true.
c) The probability of observing a perfect correlation between two variables.
d) The probability of observing a causal relationship between two variables.
14. What is the significance level in hypothesis testing?
a) The probability of making a type II error.
b) The probability of making a type I error.
c) The probability of observing a perfect correlation between two variables.
d) The probability of observing a causal relationship between two variables.
15. What is a type II error in hypothesis testing?
a) Failing to reject the null hypothesis when it is actually false.
b) Rejecting the null hypothesis when it is actually true.
c) Failing to reject the alternative hypothesis when it is actually false.
d) Rejecting the alternative hypothesis when it is actually true.
16. Which of the following measures of central tendency is most affected by outliers?
a) Mean
b) Median
c) Mode
d) All of the above

17. What is the measure of central tendency that represents the middle value of a dataset?
a) Mean
b) Median
c) Mode
d) Standard deviation
18. The measure of central tendency that represents the most frequently occurring value in a
dataset is known as-
a) Mean
b) Median
c) Mode
d) Range
19. If a dataset has an even number of observations, what value is used as the median?
a) The mean of the two middle values
b) The first middle value
c) The last middle value
d) None of the above
20. What is the measure of central tendency that is used to represent the typical value of a
dataset?
a) Mean
b) Median
c) Mode
d) All of the above
21. Which of the following is not a measure of dispersion?
a) Range
b) Standard deviation
c) Variance
d) Median
22. What is the range of a dataset?
a) The difference between the highest and lowest values
b) The average of the highest and lowest values
c) The sum of the highest and lowest values
d) None of the above

23. Which of the following measures of central tendency is affected by the size of the dataset?
a) Mean
b) Median
c) Mode
d) All of the above
24. What is the formula for calculating the variance?
a) (sum of values) / (number of values)
b) (sum of squared deviations) / (number of values)
c) (sum of deviations) / (number of values)
d) (sum of squared values) / (number of values)
25. What is the measure of spread that is equal to the square root of the variance?
a) Variance
b) Standard deviation
c) Range
d) Mean absolute deviation
26. What best defines Data Science?

a. A domain focusing solely on data storage
b. The study of data to extract meaningful insights
c. A field restricted to structured data only
d. The process of securing data
27. What is the impact of Data Science on businesses?

a. No impact at all
b. Decreased profitability
c. Improved decision-making and efficiency
d. Increased manufacturing costs
28. Which of the following is a built-in data type in Python?

a. Stack
b. Queue
c. List
d. Tree
29. What is the result of the expression `"Hello, " + "World!"` in Python?
a. "Hello,World!"
b. "Hello World!"
c. "Hello, World!"
d. Error
30. Which Python module is used to generate pseudo-random numbers?

a. `random`
b. `math`
c. `numpy`
d. `statistics`
31. . To generate a random integer between 1 and 10 (inclusive), which `random` module
function should you use?
a. `randint(1, 10)`
b. `random(1, 10)`
c. `randrange(1, 10)`
d. ùniform(1, 10)`
32. In Python, how can you define a function that accepts an arbitrary number of positional
arguments?
a. Using the `*args` parameter
b. Using the `**kwargs` parameter
c. Using the `&args` parameter
d. Using the `#args` parameter
33. . Which of the following is the primary data structure in NumPy for working with arrays?
a. List
b. Tuple
c. ndarray
d. Dictionary
34. . How can you create a NumPy array containing integers from 0 to 9?
a. `np.array(0, 9)`
b. `np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`
c. `np.arange(10)`
d. `np.linspace(0, 9, 10)`
35. . What is the data type of the elements in a NumPy array by default?
a. Integer
b. Float
c. String
d. None
36. What will be the result of the operation `np.array([1, 2, 3]) + np.array([4, 5, 6])` in
NumPy?
a. `[5, 7, 9]`
b. `[1, 2, 3, 4, 5, 6]`
c. `[1, 2, 3, 4, 5, 6]`
d. `[4, 5, 6]`
37. How can you access the element at the second row and third column of a NumPy array
àrr`?
a. àrr(2, 3)`
b. àrr[2, 3]`
c. àrr[1, 2]`
d. àrr[3, 2]`
38 . Which of the following is a universal function (ufunc) in NumPy?

a. `for loop`
b. ìf statement`
c. `sqrt`
d. `class definition`
39. . A sample of N observations are independently drawn from a normal distribution. The
sample variance follows
a. Normal distribution
b. Chi-square with N degrees of freedom
c. Chi-square with N − 1 degrees of freedom
d. t-distribution with N − 1 degrees of freedom
40. Gradient descent algorithm convergestothelo calminimum.

a. True
b. False
41. State whether the following statement is True or False.[1 mark] Covariance is a better
metric to analyze the association between two numerical variables than correlation.
a. True
b. Fasle
42. Linear Regression is an optimization problem where we attempt to minimize [1 mark]

a. SSR (residual sum-of-squares)
b. SST (total sum-of-squares)
c. SSE (sum-squared error)
d. Slope
43. . Which among the following is not a type of cross-validation technique?

a. LOOCV
b. k-fold croos validation
c. Validation set approach
d. Bias variance trade off
44. Which among the following is a classification problem?

a. Predicting the average rainfall in a given month.
b. Predicting whether a patient is diagnosed with a disease or not.
c. Predicting the price of a house.
d. None
45. In Simple Linear Regression, if the equation of the regression line is given as `y = 2x +
3`, what is the predicted value of `y` when `x` is 5?
a. 13
b. 12
c. 16
d. None
46. Which of the following is NOT a common assumption of linear regression?

a. Linearity.
b. Independence of residuals.
c. Multicollinearity.
d. Normality of residuals.
47. In logistic regression, what is the dependent variable typically representing?

a. A continuous numeric value.
b. A binary outcome or category.
c. Time series datA)
d. A count of occurrences.
48. What is the purpose of residual analysis in regression?

a. To calculate the coefficient of determination (R-squared).
b. To identify outliers and assess the model's assumptions.
c. To compute the p-value for the model.
d. To estimate the model's parameters.
49. When performing polynomial regression, what does increasing the degree of the
polynomial typically result in?
a. Improved model simplicity.
b. Decreased model flexibility.
c. Overfitting the datA)
d. Decreased model complexity.
50. Which type of regression is most suitable for handling multicollinearity among
independent variables?
a. Linear Regression.
b. Polynomial Regression.
c. Lasso Regression.
d. Logistic Regression.
SECTION-B (Marks = 1.5)

1. Which of the following would complete val = to set val to 20 by slicing aTuple.
aTuple = ("Orange", (10, 20, 30), (5, 15, 25))
val =
a. val = aTuple[1:2][1]
b. val = aTuple[2][1]
c. val = aTuple[1:2](1)
d. val = aTuple[1][1]
2. What will the following code return?

def practice(tup):
a, b, c = tup
return a
aTuple = "Yellow", 20, "Red"
practice(aTuple)
a. ("Yellow", 20, "Red")
b. Yellow
c. 20
d. Red
3. Letf(x)=x3+3x2−24x+7.Selectthecorrectoptionsfromthefollowing:[3marks]
a. x=2willgivethemaximumforf(x).
b. x=2willgivetheminimumforf(x).
c. Thestationarypointsforf(x)are2and4.
d. None
4. Letf(x,y)=−3x2−6xy−6y2.Thepoint(0,0)isa
a. saddlepoint
b. maxima
c. Minima
d. None
5. You have a dataset with three variables: `X`, `Y`, and `Z`. The correlation coefficient
between `X` and `Y` is -0.6, and between `Y` and `Z` is 0.8. What is the correlation
coefficient between `X` and `Z`?
a. -0.48
b. -0.75
c. -1.33
d. None
6. Which of the following is an example of a dependent variable in regression analysis?
a) Age
b) Gender
c) Income
d) Sales
7. What is the difference between linear and logistic regression?
a) Linear regression is used for predicting continuous variables, while logistic regression is used for
predicting categorical variables.
b) Linear regression is used for predicting categorical variables, while logistic regression is used for
predicting continuous variables.
c) Linear regression and logistic regression are the same thing.
d) None of the above.
8. Which of the following is a measure of the goodness of fit in linear regression?
a) R-squared
b) Odds ratio
c) P-value
d) Coefficient of determination
9. Which of the following is a measure of the strength and direction of the relationship between
two variables in regression analysis?
a) Coefficient of determination
b) R-squared
c) Correlation coefficient
d) Odds ratio
10. What is the objective of k-means clustering?
a) To maximize the distance between clusters

b) To minimize the distance within clusters
c) To maximize the distance within clusters
d) To minimize the distance between clusters
11. What is the difference between k-means clustering and hierarchical clustering?
a) K-means clustering forms clusters by iteratively assigning data points to the nearest centroid,
while hierarchical clustering forms clusters by iteratively merging or splitting clusters.
b) K-means clustering forms clusters by iteratively merging or splitting clusters, while hierarchical
clustering forms clusters by iteratively assigning data points to the nearest centroid.
c) K-means clustering and hierarchical clustering are the same thing.
12. Which of the following is a limitation of k-means clustering?
a) It requires a priori knowledge of the number of clusters.
b) It can only handle numerical data.
c) It is sensitive to the initial positions of the centroids.
d) All of the above.
13. What is gradient descent?
a) An optimization algorithm used to minimize the cost function in machine learning.
b) A clustering algorithm used to group similar data points together.
c) A supervised learning algorithm used to predict categorical outcomes.
d) An unsupervised learning algorithm used to find patterns in data.
14. Which of the following is a hyperparameter of the gradient descent algorithm?
a) The learning rate
b) The number of iterations
c) The cost function
d) The data set
15. Which of the following is a disadvantage of using a low learning rate in gradient descent?
a) The algorithm may converge slowly.
b) The algorithm may overshoot the minimum.
c) The algorithm may get stuck in a local minimum.

SECTION-C (Marks = 2)
1. Consider the following systerm of linear equation:

x + y + z = −2
x + 2y − z = 1
2x + ay + bz = 2
Find the conditions on a and b for which the above system has no solution.
a. 2a + b − 6 = 0
b. a ̸= 4, 2a + b − 6 = 0
c. a = 4, b = −2
d. 2a + b − 6 ̸= 0
2. Which among the following is true for the determinant of a matrix?

a. The determinant of a diagonal matrix is the product of its diagonal entries.
b. If one row of a matrix is a scalar multiple of another, the determinant is 1.
c. The determinant of a permutation matrix can only be 1.
d. None
3. ConsiderthefollowingconfusionmatrixfortheclassicationofHatchbackandSUV:
True
Hatchback SU
V
Prediction Hatchback 55 5
SUV 0 40
Findtheaccuracyofthemodel.
a. 0.95
b. 0.55
c. 0.45
d. 0.88
4. ConsiderthefollowingconfusionmatrixfortheclassicationofHatchbackandSUV:
True
Hatchback SU
V
Prediction Hatchback 55 5
SUV 0 40
Findthesensitivityofthemodel.
a. 0.95
b. 0.55
c. 1
d. 0.88
5. Given a Multiple Linear Regression model with three independent variables: `X1 = 4`, `X2
= 7`, and `X3 = 10`, and the coefficients: `β1 = 2`, `β2 = 3`, and `β3 = 1`, calculate the
predicted value of the dependent variable `Y`.
a. 31
b. 29
c. 23
d. 27
6. What is the difference between simple linear regression and multiple regression?
a) Simple linear regression involves only one independent variable, while multiple regression
involves more than one independent variable.
b) Simple linear regression involves more than one independent variable, while multiple regression
involves only one independent variable.
c) Simple linear regression and multiple regression are the same thing.
7. What is the purpose of multivariate optimization?
a) To find the minimum or maximum of a function with multiple variables.
b) To find the optimal number of clusters in a data set.
c) To find the best model for a machine learning problem.
8. Which of the following is a method for finding the minimum of a function with multiple
variables without using derivatives?
a) Gradient descent
b) Newton's method
c) Levenberg-Marquardt algorithm
9. What is pruning in decision trees?
a) A technique used to reduce the size of a decision tree by removing unnecessary branches.
b) A technique used to increase the size of a decision tree by adding more branches.
c) A technique used to change the shape of a decision tree.
10. What is the goal of a support vector machine (SVM)?
a) To find the optimal decision boundary that separates two classes of data.
b) To cluster data points together based on their similarities.
c) To predict continuous outcomes.
1 Answer: b) Data, Model, and 36 Answer: a) Simple linear regression

Visualization. involves only one independent
variable, while multiple regression
involves more than one independent
variable.

Sample MCQ

Uploaded by

Copyright:

Available Formats

Sample MCQ

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample MCQ

Uploaded by

Copyright:

Available Formats

150511/290501 Data Science

SECTION-A (Marks = 01)

1. What are the three key components of Data Science?

a) Data, Algorithm, and Visualization.

b) Data, Model, and Visualization.

c) Data, Statistics, and Visualization.

d) Data, Machine Learning, and Visualization.

2. Which of the following is a supervised learning technique?

3. Which of the following is an unsupervised learning technique?

4. What is the goal of feature engineering?

a) To reduce the number of features.

b) To increase the number of features.

d) To remove all features.

5. Which of the following is a data visualization technique?

6. What is the difference between precision and recall?

7. Which of the following is a measure of the quality of a classification algorithm?

d) All of the above.

8. What is the purpose of cross-validation?

a) To ensure that the model is not overfitting the data.

b) To ensure that the model is not underfitting the data.

9. Which of the following is a classification algorithm?

d) Principal Component Analysis.

10. What is the purpose of hypothesis testing in data science?

a) To determine if a sample statistic is significantly different from a population parameter.

b) To determine the accuracy of a machine learning algorithm.

c) To determine the correlation between two variables.

d) To determine the causation between two variables.

11. What is the null hypothesis in hypothesis testing?

d) A hypothesis that states there is a causal relationship between two variables.

12. What is the alternative hypothesis in hypothesis testing?

c) A hypothesis that states there is a perfect correlation between two variables.

d) A hypothesis that states there is a causal relationship between two variables.

13. What is the p-value in hypothesis testing?

c) The probability of observing a perfect correlation between two variables.

d) The probability of observing a causal relationship between two variables.

14. What is the significance level in hypothesis testing?

a) The probability of making a type II error.

b) The probability of making a type I error.

c) The probability of observing a perfect correlation between two variables.

d) The probability of observing a causal relationship between two variables.

15. What is a type II error in hypothesis testing?

a) Failing to reject the null hypothesis when it is actually false.

b) Rejecting the null hypothesis when it is actually true.

c) Failing to reject the alternative hypothesis when it is actually false.

d) Rejecting the alternative hypothesis when it is actually true.

d) All of the above

a) The mean of the two middle values

b) The first middle value

c) The last middle value

d) None of the above

d) All of the above

21. Which of the following is not a measure of dispersion?

22. What is the range of a dataset?

a) The difference between the highest and lowest values

b) The average of the highest and lowest values

c) The sum of the highest and lowest values

d) None of the above

d) All of the above

24. What is the formula for calculating the variance?

a) (sum of values) / (number of values)