Fdocuments - in - Data Mining MCQ
Fdocuments - in - Data Mining MCQ
Fdocuments - in - Data Mining MCQ
II M.Sc(IT) [2012-2014]
Semester III
Core: Data Warehousing and Mining - 363U1
Multiple Choice Questions.
C. Web Mining.
D. Text Mining.
ANSWER: B
4. The important aspect of the data warehouse environment is that data found within the data
warehouse is___________.
A. subject-oriented.
B.
C. time-variant.
integrated.
D. All of the above.
ANSWER: D
1 of 34 8/20/2013 2:47 PM
D. FTP.
ANSWER: B
8. ____________predicts future trends & behaviors, allowing business managers to make proactive,
knowledge-driven decisions.
A. Data warehouse.
B. Data mining.
C. Datamarts.
D. Metadata.
ANSWER: B
11. ________________defines the structure of the data held in operational databases and used by
operational applications.
A. User-level metadata.
B. Data warehouse metadata.
C. Operational metadata.
D. Data mining metadata.
ANSWER: C
13. _________maps the core warehouse metadata to business concepts, familiar and useful to end
users.
A. Application level metadata.
B. User level metadata.
C. Enduser level metadata.
D. Core level metadata.
ANSWER: A
2 of 34 8/20/2013 2:47 PM
19. The key used in operational environment may not have an element of__________.
A. time.
B. cost.
C. frequency.
D. quality.
ANSWER: A
3 of 34 8/20/2013 2:47 PM
D. data warehouse
ANSWER: D
23. Data warehouse contains_____________data that is never found in the operational environment.
A. normalized.
B. informational.
C. summary.
D. denormalized.
ANSWER: C
24. Data redundancy between the environments results in less than ____________percent.
A. one.
B. two.
C. three.
D. four.
ANSWER: A
25. Bill Inmon has estimated___________of the time required to build a data warehouse, is consumed
in the conversion process.
A. 10 percent.
B. 20 percent.
C. 40 percent
D. 80 percent.
ANSWER: D
29. The biggest drawback of the level indicator in the classic star-schema is that it limits_________.
4 of 34 8/20/2013 2:47 PM
A. quantify.
B. qualify.
C. flexibility.
D. ability.
ANSWER: C
ANSWER: B
35. Transient data is _____________.
A. data in which changes to existing records cause the previous version of the records to be
eliminated.
B. data in which changes to existing records do not cause the previous version of the records to be
eliminated.
C. data that are never altered or deleted once they have been added.
D. data that are never deleted once they have been added.
ANSWER: A
5 of 34 8/20/2013 2:47 PM
A. completely demoralized.
B. partially demoralized.
C. completely normalized.
D. partially normalized.
ANSWER: C
6 of 34 8/20/2013 2:47 PM
B. Data Mining.
C. Analysis of large volumes of product sales data.
D. All of the above.
ANSWER: D
45. The data administration subsystem helps you perform all of the following, except__________.
A. backups and recovery.
B. query optimization.
C. security management.
D. create, change, and delete information.
ANSWER: D
46. The most common source of change data in refreshing a data warehouse is _______.
A. queryable change data.
B. cooperative change data.
C. logged change data.
D. snapshot change data.
ANSWER: A
47. ________ are responsible for running queries and reports against data warehouse tables.
A. Hardware.
B. Software.
C. End users.
D. Middle ware.
ANSWER: C
50. Dimensionality reduction reduces the data set size by removing ____________.
A. relevant attributes.
B. irrelevant attributes.
C. derived attributes.
D. composite attributes.
ANSWER: B
7 of 34 8/20/2013 2:47 PM
52. Effect of one attribute value on a given class is independent of values of other attribute is called
_________.
A. value independence.
B. class conditional independence.
C. conditional independence.
D. unconditional independence.
ANSWER: A
53. The main organizational justification for implementing a data warehouse is to provide ______.
A. cheaper ways of handling transportation.
B. decision support.
C. storing large volume of data.
D. access to data.
ANSWER: C
C. Selection.
D. Filtering.
ANSWER: D
8 of 34 8/20/2013 2:47 PM
D. Symmetric Microprogramming.
ANSWER: A
60. __________ are designed to overcome any limitations placed on the warehouse by the nature of the
relational data model.
A. Operational database.
B. Relational database.
C. Multidimensional database.
D. Data repository.
ANSWER: C
61. __________ are designed to overcome any limitations placed on the warehouse by the nature of the
relational data model.
A. Operational database.
B. Relational database.
C. Multidimensional database.
D. Data repository.
ANSWER: C
9 of 34 8/20/2013 2:47 PM
68. The term that is not associated with data cleaning process is ______.
A. domain consistency.
B. deduplication.
C. disambiguation.
D. segmentation.
ANSWER: D
D. technical-sensitive.
ANSWER: C
74. The terms equality and roll up are associated with ____________.
A. OLAP.
B. visualization.
C. data mart.
D. decision tree.
10 of 34 8/20/2013 2:47 PM
ANSWER: C
79. The first International conference on KDD was held in the year _____________.
A. 1996.
B. 1997.
C. 1995.
D. 1994.
ANSWER: C
B. data cleaning.
C. data cleansing.
D. data pruning.
ANSWER: B
81. ____________ contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse.
A. Business metadata.
B. Technical metadata.
C. Operational metadata.
D. Financial metadata.
ANSWER: A
82. _______________ helps to integrate, maintain and view the contents of the data warehousing
system.
11 of 34 8/20/2013 2:47 PM
A. Business directory.
B. Information directory.
C. Data dictionary.
D. Database.
ANSWER: B
84. Data marts that incorporate data mining tools to extract sets of data are called ______.
A. independent data mart.
B. dependent data marts.
C. intra-entry data mart.
D. inter-entry data mart.
ANSWER: B
85. ____________ can generate programs itself, enabling it to carry out new tasks.
A. Automated system.
B. Decision making system.
C. Self-learning system.
D. Productivity system.
ANSWER: D
87. Building the informational database is done with the help of _______.
A. transformation or propagation tools.
B. transformation tools only.
C. propagation tools only.
D. extraction tools.
ANSWER: A
88. How many components are there in a data warehouse?
A. two.
B. three.
C. four.
D. five.
ANSWER: D
12 of 34 8/20/2013 2:47 PM
90. ________ is data that is distilled from the low level of detail found at the current detailed leve.
A. Highly summarized data.
B. Lightly summarized data.
C. Metadata.
D. Older detail data.
ANSWER: B
92. A directory to help the DSS analyst locate the contents of the data warehouse is seen in ______.
A. Current detail data.
B. Lightly summarized data.
C. Metadata.
D. Older detail data.
ANSWER: C
95. The data from the operational environment enter _______ of data warehouse.
A. Current detail data.
B. Older detail data.
96. The data in current detail level resides till ________ event occurs.
A. purge.
B. summarization.
C. archieved.
D. all of the above.
ANSWER: D
13 of 34 8/20/2013 2:47 PM
D. units of measures.
ANSWER: B
98. The granularity of the fact is the _____ of detail at which it is recorded.
A. transformation.
B. summarization.
C. level.
D. transformation and summarization.
ANSWER: C
101. ___________ of data means that the attributes within a given entity are fully dependent on the
entire primary key of the entity.
A. Additivity.
B. Granularity.
C. Functional dependency.
D. Dimensionality.
ANSWER: C
105. Non-additive measures can often combined with additive measures to create new _________.
14 of 34 8/20/2013 2:47 PM
A. additive measures.
B. non-additive measures.
C. partially additive.
D. All of the above.
ANSWER: A
106. A fact representing cumulative sales units over a day at a store for a product is a _________.
A. additive fact.
B. fully additive fact.
C. partially additive fact.
D. non-additive fact.
ANSWER: B
107. ____________ of data means that the attributes within a given entity are fully dependent on the
entire primary key of the entity.
A. Additivity.
B. Granularity.
C. Functional Dependency.
D. Dependency.
ANSWER: C
D. Association rules.
ANSWER: C
15 of 34 8/20/2013 2:47 PM
ANSWER: B
114. __________ is used to map a data item to a real valued prediction variable.
A. Regression.
B. Time series analysis.
C. Prediction.
D. Classification.
ANSWER: B
B. Information.
C. Query.
D. Process.
ANSWER: A
16 of 34 8/20/2013 2:47 PM
C. five.
D. six.
ANSWER: C
122. Converting data from different sources into a common format for processing is called as ________.
A. selection.
B. preprocessing.
C. transformation.
D. interpretation.
ANSWER: C
126. __________ is used to proceed from very specific knowledge to more general information.
A. Induction.
B. Compression.
C. Approximation.
D. Substitution.
ANSWER: A
127. Describing some characteristics of a set of data by a general model is viewed as ____________
A. Induction.
B. Compression.
C. Approximation.
D. Summarization.
ANSWER: B
17 of 34 8/20/2013 2:47 PM
A. Induction.
B. Compression.
C. Approximation.
D. Summarization.
ANSWER: C
129. _______ are needed to identify training data and desired results.
A. Programmers.
B. Designers.
C. Users.
D. Administrators.
ANSWER: C
ANSWER: A
134. The ____________ of data could result in the disclosure of information that is deemed to be
confidential.
A. authorized use.
B. unauthorized use.
C. authenticated use.
D. unauthenticated use.
ANSWER: B
135. ___________ data are noisy and have many missing attribute values.
A. Preprocessed.
B. Cleaned.
C. Real-world.
D. Transformed.
18 of 34 8/20/2013 2:47 PM
ANSWER: C
139. Reducing the number of attributes to solve the high dimensionality problem is called as ________.
A. dimensionality curse.
B. dimensionality reduction.
C. cleaning.
D. Overfitting.
ANSWER: B
140. Data that are not of interest to the data mining task is called as ______.
A. missing data.
B. changing data.
C. irrelevant data.
D. noisy data.
ANSWER: C
B. Parallelization
C. Both A & B.
D. None of the above.
ANSWER: C
19 of 34 8/20/2013 2:47 PM
C. marketing strategies.
D. All of the above.
ANSWER: D
146. The value that says that transactions in D that support X also support Y is called ______________.
A. confidence.
B. support.
C. support count.
D. None of the above.
ANSWER: A
147. If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain
jam, 10000 transaction contain both bread and jam. Then the support of bread and jam is _______.
A. 2%
B. 20%
C. 3%
D. 30%
ANSWER: A
148. 7 If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain
jam, 10000 transaction contain both bread and jam. Then the confidence of buying bread with jam is
_______.
A. 33.33%
B. 66.66%
C. 45%
D. 50%
ANSWER: D
20 of 34 8/20/2013 2:47 PM
ANSWER: A
151. Which of the following is not a desirable feature of any efficient algorithm?
A. to reduce number of input operations.
B. to reduce number of output operations.
C. to be efficient in computing.
D. to have maximal code length.
ANSWER: D
152. All set of items whose support is greater than the user-specified minimum support are called as
_____________.
A. border set.
B. frequent set.
C. maximal frequent set.
D. lattice.
ANSWER: B
153. If a set is a frequent set and no superset of this set is a frequent set, then it is called ________.
A. maximal frequent set.
B. border set.
C. lattice.
D. infrequent sets.
ANSWER: A
156. If an itemset is not a frequent set and no superset of this is a frequent set, then it is _______.
21 of 34 8/20/2013 2:47 PM
161. The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,
from being considered for counting support.
A. Candidate generation.
B. Pruning.
C. Partitioning.
D. Itemset eliminations.
ANSWER: B
162. The a priori frequent itemset discovery algorithm moves _______ in the lattice.
A. upward.
B. downward.
C. breadthwise.
D. both upward and downward.
ANSWER: A
ANSWER: B
164. The number of iterations in a priori ___________.
A. increases with the size of the maximum frequent set.
B. decreases with increase in size of the maximum frequent set.
C. increases with the size of the data.
D. decreases with the increase in size of the data.
ANSWER: A
22 of 34 8/20/2013 2:47 PM
167. Itemsets in the ______ category of structures have a counter and the stop number with them.
A. Dashed.
B. Circle.
C. Box.
D. Solid.
ANSWER: A
168. The itemsets in the _______category structures are not subjected to any counting.
A. Dashes.
B. Box.
C. Solid.
D. Circle.
ANSWER: C
169. Certain itemsets in the dashed circle whose support count reach support value during an iteration
move into the ______.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. None of the above.
ANSWER: A
170. Certain itemsets enter afresh into the system and get into the _______, which are essentially the
supersets of the itemsets that move from the dashed circle to the dashed box.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. Dashed circle.
ANSWER: D
171. The itemsets that have completed on full pass move from dashed circle to ________.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. None of the above.
ANSWER: B
23 of 34 8/20/2013 2:47 PM
B. a frequent-item-header table.
C. a frequent-item-node.
D. both A & B.
ANSWER: D
176. The paths from root node to the nodes labelled 'a' are called __________.
A. transformed prefix path.
B. suffix subpath.
C. transformed suffix path.
D. prefix subpath.
ANSWER: D
177. The transformed prefix paths of a node 'a' form a truncated database of pattern which co-occur
with a is called _______.
A. suffix path.
B. FP-tree.
C. conditional pattern base.
D. prefix path.
ANSWER: C
178. The goal of _____ is to discover both the dense and sparse regions of a data set.
A. Association rule.
B. Classification.
C. Clustering.
D. Genetic Algorithm.
ANSWER: C
179. Which of the following is a clustering algorithm?
A. A priori.
B. CLARA.
C. Pincer-Search.
D. FP-growth.
ANSWER: B
180. _______ clustering technique start with as many clusters as there are records, with each cluster
having only one record.
A. Agglomerative.
B. divisive.
C. Partition.
D. Numeric.
24 of 34 8/20/2013 2:47 PM
ANSWER: A
181. __________ clustering techniques starts with all records in one cluster and then try to split that
cluster into small pieces.
A. Agglomerative.
B. Divisive.
C. Partition.
D. Numeric.
ANSWER: B
182. Which of the following is a data set in the popular UCI machine-learning repository?
A. CLARA.
B. CACTUS.
C. STIRR.
D. MUSHROOM.
ANSWER: D
183. In ________ algorithm each cluster is represented by the center of gravity of the cluster.
A. k-medoid.
B. k-means.
C. STIRR.
D. ROCK.
ANSWER: B
184. In ___________ each cluster is represented by one of the objects of the cluster located near the
center.
A. k-medoid.
B. k-means.
C. STIRR.
D. ROCK.
ANSWER: A
25 of 34 8/20/2013 2:47 PM
189. The cluster features of different subclusters are maintained in a tree called ___________.
A. CF tree.
B. FP tree.
C. FP growth tree.
D. B tree.
ANSWER: A
190. The ________ algorithm is based on the observation that the frequent sets are normally very few
in number compared to the set of all itemsets.
A. A priori.
B. Clustering.
C. Association rule.
D. Partition.
ANSWER: D
191. The partition algorithm uses _______ scans of the databases to discover all frequent sets.
A. two.
B. four.
C. six.
D. eight.
ANSWER: A
192. The basic idea of the apriori algorithm is to generate________ item sets of a particular size &
scans the database.
A. candidate.
B. primary.
C. secondary.
D. superkey.
ANSWER: A
193. ________is the most well known association rule algorithm and is used in most commercial
products.
A. Apriori algorithm.
B. Partition algorithm.
C. Distributed algorithm.
D. Pincer-search algorithm.
ANSWER: A
194. An algorithm called________is used to generate the candidate item sets for each pass after the
first.
A. apriori.
B. apriori-gen.
C. sampling.
D. partition.
ANSWER: B
195. The basic partition algorithm reduces the number of database scans to ________ & divides it into
partitions.
26 of 34 8/20/2013 2:47 PM
A. one.
B. two.
C. three.
D. four.
ANSWER: B
197. ___________can be thought of as classifying an attribute value into one of a set of possible
classes.
A. Estimation.
B. Prediction.
C. Identification.
D. Clarification.
ANSWER: B
199. _________data consists of sample input data as well as the classification assignment for the data.
A. Missing.
B. Measuring.
C. Non-training.
D. Training.
ANSWER: D
200. Rule based classification algorithms generate ______ rule to perform the classification.
A. if-then.
B. while.
C. do while.
D. switch.
ANSWER: A
201. ____________ are a different paradigm for computing which draws its inspiration from
neuroscience.
A. Computer networks.
B. Neural networks.
C. Mobile networks.
D. Artificial networks.
ANSWER: B
27 of 34 8/20/2013 2:47 PM
D. muscles.
ANSWER: A
204. The ___________is a long, single fibre that originates from the cell body.
A. axon.
B. neuron.
C. dendrites.
D. strands.
ANSWER: A
207. _________ is the connectivity of the neuron that give simple devices their real power. a. b. c. d.
A. Water.
B. Air.
C. Power.
D. Fire.
ANSWER: D
A. Artificial neurons.
B. Computational neurons.
C. Biological neurons.
D. Technological neurons.
ANSWER: A
209. The biological neuron's _________ is a continuous function rather than a step function.
A. read.
B. write.
C. output.
D. input.
ANSWER: C
210. The threshold function is replaced by continuous functions called ________ functions.
A. activation.
28 of 34 8/20/2013 2:47 PM
B. deactivation.
C. dynamic.
D. standard.
ANSWER: A
213. In a feed- forward networks, the conncetions between layers are ___________ from input to
output.
A. bidirectional.
B. unidirectional.
C. multidirectional.
D. directional.
ANSWER: B
ANSWER: A
216. RBF have only _______________ hidden layer.
A. four.
B. three.
C. two.
D. one.
ANSWER: D
217. RBF hidden layer units have a receptive field which has a ____________; that is, a particular input
value at which they have a maximal output.
A. top.
B. bottom.
C. centre.
D. border.
29 of 34 8/20/2013 2:47 PM
ANSWER: C
218. ___________ training may be used when a clear link between input data sets and target output
values does not exist.
A. Competitive.
B. Perception.
C. Supervised.
D. Unsupervised.
ANSWER: D
220. ________________ design involves deciding on their centres and the sharpness of their Gaussians.
A. DR.
B. AND.
C. XOR.
D. RBF.
ANSWER: D
223. ____________ is one of the most popular models in the unsupervised framework.
A. SOM.
B. SAM.
C. OSM.
D. MSO.
ANSWER: A
224. The actual amount of reduction at each learning step may be guided by _________.
A. learning cost.
B. learning level.
C. learning rate.
D. learning time.
ANSWER: C
30 of 34 8/20/2013 2:47 PM
B. Teuvokohonen.
C. Tomoki Toda.
D. Julia.
ANSWER: B
227. Investment analysis used in neural networks is to predict the movement of _________ from
previous data.
A. engines.
B. stock.
C. patterns.
D. models.
ANSWER: B
228. SOMs are used to cluster a specific _____________ dataset containing information about the
patient's drugs etc.
A. physical.
B. logical.
C. medical.
D. technical.
ANSWER: C
D. 1985.
ANSWER: C
231. Genetic algorithms are search algorithms based on the mechanics of natural_______.
A. systems.
B. genetics.
C. logistics.
D. statistics.
ANSWER: B
31 of 34 8/20/2013 2:47 PM
ANSWER: A
239. Web content mining describes the discovery of useful information from the _______contents.
A. text.
B. web.
C. page.
D. level.
ANSWER: B
32 of 34 8/20/2013 2:47 PM
C. meta.
D. digital.
ANSWER: B
241. _______ mining is concerned with discovering the model underlying the link structures of the web.
A. Data structure.
B. Web structure.
C. Text structure.
D. Image structure.
ANSWER: B
243. The ________ propose a measure of standing a node based on path counting.
A. open web.
B. close web.
C. link web.
D. hidden web.
ANSWER: B
244. In web mining, _______ is used to find natural groupings of users, pages, etc.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: A
245. In web mining, _________ is used to know the order in which URLs tend to be accessed.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: C
246. In web mining, _________ is used to know which URLs tend to be requested together.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: B
247. __________ describes the discovery of useful information from the web contents.
A. Web content mining.
B. Web structure mining.
C. Web usage mining.
D. All of the above.
ANSWER: A
248. _______ is concerned with discovering the model underlying the link structures of the web.
33 of 34 8/20/2013 2:47 PM
249. A link is said to be _________ link if it is between pages with different domain names.
A. intrinsic.
B. transverse.
C. direct.
D. contrast.
ANSWER: B
250. A link is said to be _______ link if it is between pages with the same domain name.
A. intrinsic.
B. transverse.
C. direct.
D. contrast.
ANSWER: A
Staff Name
LAXMI.SREE.B.R.