DataMining - Workbook MCQ
DataMining - Workbook MCQ
DataMining - Workbook MCQ
2. We have Market Basket data for 1,000 rental transactions at a Video Store.
There are four videos for rent -- Video A, Video B, Video C and Video D. The
probability that both Video C and Video D are rented at the same time is known
as ________ .
Consider the following transaction database: Suppose that minsup is set to 40%
and minconf. to 70%.
TransID Items
3. The support of the item set A, B, E is……… T100 A, B, C, D
T200 A, B, C, E
a. 50% b.40% c. 70% d. 66% T300 A, B, E, F, H
4. Based on the given minimum support the item set T400 A, C, H
A,B,E is……..
8. The value of the lift in the previous question means that items are……….
14. What will be the cluster centroids if you want to proceed for second
iteration?
a. C1: (4, 4), C2: (2, 2), C3: (7, 7) b. C1: (6, 6), C2: (4, 4), C3: (9, 9)
c. C1: (2, 2), C2: (0, 0), C3: (5, 5) d. None of these
15. What will be the Manhattan distance for observation (9, 9) from cluster
centroid C1 in second iteration?
a. 10 b. 5 c. 13 d. None of these
16. Consider the given data: {3, 4, 5, 10, 21, 32, 43, 44, 46, 52, 59, 67}, Using
equal-width partitioning and four bins, how many values are there in the
first bin?
a. 3 b. 4 c. 5 d. 6
19. If smooth by median is applied to the previous bins, what is the new value
of the data in the first bin?
a. 4 b. 4.5 c. 5 d. 7.5
17. Supervised learning differs from unsupervised clustering in that
supervised learning requires
18. The correlation between the number of years an employee has worked for
a company and the salary of the employee is 0.75. What can be said about
employee salary and years worked?
b. Individuals that have worked for the company the longest have higher
salaries.
c. Individuals that have worked for the company the longest have lower
salaries.
d. The majority of employees have been with the company a long time.
e. The majority of employees have been with the company a short period of
time.
19. The correlation coefficient for two real-valued attributes is –0.85. What
does this value tell you?
B. As the value of one attribute increases the value of the second attribute
also increases.
C. As the value of one attribute decreases the value of the second attribute
increases.
c. representing data.
A. Knowledge Database
26. This approach is best when we are interested in finding all possible
interactions among a set of attributes.
a. decision tree
b. association rules
c. K-Means algorithm
d. genetic learning
27. If the information gain of age, income and gender attributes are 0.42, 0.24
and 0.024 which one will you choose as splitting attribute
a. age
b. income
c. gender
d. all of them
b. may be frequent
c. can't be frequent
d. all of them
a. cleaning c. dimensionality
reduction
b. over fitting
d. Dimensionality
30. The bottleneck of the Apriori algorithm is caused by all the following
except
31. Which of the following is the process of detecting and correcting wrong
data:
a. data cleaning
b. data selection
c. data integration
d. all of them
32. Which of the following is the process of combining data from different
sources:
a. data cleaning
b. data selection
c. data integration
d. all of them
33. Which of the following are interesting measures for association rules:
a. lift
b. Recall
c. Accuracy
d. Compactness
34. If the lift measure of items bred and rice if equal 0.5 this means that:
d. none of them
35. Nonparametric data reduction strategies include all the following except:
a. Histograms
b. Clustering
c. Sampling
d. Regression
36. If you want to give all attributes equal weight, which preprocess task you
will use:
a. Cleaning
b. Transformation
c. Integration
d. Reduction
a. Transformation
b. Cluster analysis
c. Classification
d. Association rues
a. Cleaning
b. Transformation
c. Reduction
d. Integration
39. The termination condition of the decision tree include the following
except:
b. No noise
c. No remaining attributes
41. If the mean is larger than the median then this might be an indication that
the data is a. negatively skewed
b. positively skewed
c. symmetric
d. correlated
42. _______is the result of tuple firing more than one rule with different
class prediction. a. Association rule
b. Strong rule
c. Rule conflict
a. Covariance
b. Chi-square
c. Lift
d. Correlation co-efficient
44. We have Market Basket data for 1,000 rental transactions at a Video
Store. There are four videos for rent -- Video A, Video B, Video C and Video
D. The probability that Video D will be rented given that Video C has been
rented is known as ________ .
B. support
C. lift
D. confidence
a. Validation data
b. Test data
c. Training data
d. Hidden data
46. This technique uses mean and standard deviation scores to transform
real-valued attributes.
a. decimal scaling
b. min-max normalization
c. z-score normalization
d. logarithmic normalization
a. Nominal c. Interval
b. Ordinal d. Ratio
49. Which of the following best describes the process of finding the
interquartile range for a set of data?
a. 13cm
b. 14cm
c. 164cm
d. 330cm
54. Supervised learning and unsupervised clustering both require at least one
a. hidden attribute.
b. output attribute.
c. input attribute.
d. categorical attribute.
55. Supervised learning differs from unsupervised clustering in that supervised
learning requires
a. at least one input attribute.
b. input attributes to be categorical.
c. at least one output attribute.
d. ouput attriubutes to be categorical.
56. Which of the following is a valid production rule for the decision tree
below?
Business
Appoint
ment?
No Yes
Decision =
wear slacks
Temp
above
70?
No Yes
Decision = Decision =
wear jeans wear shorts
a. Data warehousing
b. Data mining
c. Text mining
d. Data selection
60. Data mining can also applied to other forms such as ................ i) Data
streams ii) Sequence data iii) Networked data iv) Text data v) Spatial data
a. i, ii, iii and v only
b. ii, iii, iv and v only
c. i, iii, iv and v only
d. All i, ii, iii, iv and v
61. Which of the following is not a data mining functionality?
a. Characterization and Discrimination
b. Classification and regression
c. Selection and interpretation
d. Clustering and Analysis
62. ............................. is a summarization of the general characteristics or
features of a target class of data.
a. Data Characterization
b. Data Classification
c. Data discrimination
d. Data selection
63. ............................. is a comparison of the general features of the target
class data objects against the general features of objects from one or
multiple contrasting classes.
a. Data Characterization
b. Data Classification
c. Data discrimination
d. Data selection
64. Strategic value of data mining is ......................
a. cost-sensitive
b. work-sensitive
c. time-sensitive
d. technical-sensitive
a. i, ii and iv only
b. ii, iii and iv only
c. i, ii and iii only
d. All i, ii, iii and iv
67. The out put of KDD is .............
a. Data
b. Information
c. Query
d. Useful information
68. Bayesian classifiers is
a. A class of learning algorithm that tries to find an optimum
classification of a set of examples using the probabilistic theory.
b. Any mechanism employed by a learning system to constrain the
search space of a hypothesis.
c. An approach to the design of learning algorithms that is inspired by
the fact that when people encounter new situations, they often
explain them by reference to familiar experiences, adapting the
explanations to fit the new situation.
d.None of these
69. Classification is
a. A subdivision of a set of examples into a number of classes.
b. A measure of the accuracy, of the classification of a concept that is
given by a certain theory.
c. The task of assigning a classification to a set of examples
d. None of these
70. If the mean, median and mode of a distribution are 5, 6, 7 respectively, then
the distribution is:
a. skewed negatively d. symmetrical
b. not skewed e. bimodal.
c. skewed positively
71. Which of the following measures of central tendency tends to be most
influenced by an extreme score?
a. median c. mean
b. mode
72. Which of the following is not a measure of central tendency?
a. mean d. standard deviation
b. median e. none of these
c. mode
73. In a group of 12 scores, the largest score is increased by 36 points.What
effect will this have on the mean of the scores?
a. it will be increased by 12 points
b. it will remain unchanged
c. it will be increased by 3 points
d. it will increase by 36 points
e. there is no way of knowing exactly how many points the mean
will be increased.
74. Non-parametric data reduction strategies includes all the following except
a-Histogram b- regression c- clustering d- sampling
75. If you want to give all attributes an equal weight which preprocess task you
will use
a-Cleaning b-integration c-transformation d-reduction
77. Which of the following lists all parts of the five-number summary?
a. Mean, Median, Mode, Range, and Total
b. Minimum, Quartile1, Median, Quartile3, and Maximum
c. Smallest, Q1, Q2, Q3, and Q4
d. Minimum, Maximum, Range, Mean, and Median