454U8-Big Data Analytics
454U8-Big Data Analytics
454U8-Big Data Analytics
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop
cluster using a live CD
A. OpenOffice.org
B. OpenSolaris
C. OpenSolaris
D. Linux
ANSWER: C
8. Hadoop achieves reliability by replicating the data across multiple hosts, and hence does not require
________ storage on hosts.
A. RAID
B. ZFS
C. Operating System
D. DFS
ANSWER: A
9. Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client
applications submit MapReduce jobs.
A. MapReduce
B. Google
C. Functional Programming
D. Facebook
ANSWER: A
10. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix
operations.
A. Machine learning
B. Pattern recognition
C. Statistical classification
D. Artificial intelligence
ANSWER: A
11. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing
and analysis of large datasets.
A. Pig Latin
B. Oozie
C. Pig
D. Hive
ANSWER: C
13. _________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.
A. Scalding
B. HCatalog
C. Cascalog
D. All of the mentioned
ANSWER: C
16. ________ is the most popular high-level Java API in Hadoop Ecosystem
A. Scalding
B. HCatalog
C. Cascalog
D. Cascading
ANSWER: D
17. ___________ is general-purpose computing model and runtime system for distributed data analytics.
A. Mapreduce
B. Drill
C. Oozie
D. None of the mentioned
ANSWER: A
18. The Pig Latin scripting language is not only a higher-level data flow language but also has operators
similar to :
A. JSON
B. XML
C. XSL
D. SQL
ANSWER: D
19. _______ jobs are optimized for scalability but not latency
A. Mapreduce
B. Drill
C. Hive
D. Chuckro
ANSWER: C
20. ______ is a framework for performing remote procedure calls and data serialization.
A. Mapreduce
B. Dril
C. Avro
D. Chuckro
ANSWER: C
21. As companies move past the experimental phase with Hadoop, many cite the need for additional
capabilities, including
A. As companies move past the experimental phase with Hadoop, many cite the need for additional
capabilities, including
B. Improved extract, transform and load features for data integration
C. Improved data warehousing functionality
D. Improved security, workload management and SQL support
ANSWER: D
23. According to analysts, for what can traditional IT systems provide a foundation when they are
integrated with big data technologies like Hadoop ?
A. Big data management and data mining
B. Data warehousing and business intelligence
C. Management of Hadoop clusters
D. Collecting and storing unstructured data
ANSWER: A
24. Hadoop is a framework that works with a variety of related tools. Common cohorts include
A. MapReduce, MySQL and Google Apps
B. MapReduce, Hive and HBase
C. MapReduce, Hummer and Iguana
D. MapReduce, Heron and Trumpet
ANSWER: B
28. __________ can best be described as a programming model used to develop Hadoop-based
applications that can process massive amounts of data.
A. MapReduce
B. Mahout
C. Oozie
D. All of the mentioned
ANSWER: A
31. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the
JobTracker.
A. MapReduce
B. Mapper
C. TaskTracker
D. JobTracker
ANSWER: C
33. ___________ part of the MapReduce is responsible for processing one or more chunks of data and
producing the output results.
A. Maptask
B. Mapper
C. Task execution
D. All of the mentioned
ANSWER: A
34. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
A. Map
B. Reduce
C. Reducer
D. Reduced
ANSWER: B
36. Although the Hadoop framework is implemented in Java ,MapReduce applications need not be written
in
A. C
B. C++
C. Java
D. VB
ANSWER: C
37. ________ is a utility which allows users to create and run jobs with any executables as the mapper
and/or the reducer.
A. HadoopStrdata
B. Hadoop Streaming
C. Hadoop Stream
D. None of the mentioned
ANSWER: B
38. __________ maps input key/value pairs to a set of intermediate key/value pairs.
A. Mapper
B. Reducer
C. Both Mapper and Reducer
D. None of the mentioned
ANSWER: A
41. Mapper implementations are passed the JobConf for the job via the ________ method
A. JobConfigure.configure
B. JobConfigurable.configure
C. JobConfigurable.configureable
D. None of the mentioned
ANSWER: B
46. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
A. Mapper
B. Cascader
C. Scalding
D. None of the mentioned
ANSWER: D
47. Which of the following phases occur simultaneously ?
A. Reduce and Sort
B. Shuffle and Sort
C. Shuffle and Map
D. All of the mentioned
ANSWER: B
48. Mapper and Reducer implementations can use the ________ to report progress or just indicate that they
are alive.
A. Partitioner
B. OutputCollector
C. Reporter
D. All of the mentioned
ANSWER: C
49. __________ is a generalization of the facility provided by the MapReduce framework to collect data
output by the Mapper or the Reducer
A. Partitioner
B. OutputCollector
C. Reporter
D. All of the mentioned
ANSWER: B
50. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework
for execution.
A. Map Parameters
B. JobConf
C. MemoryConf
D. All of the mentioned
ANSWER: B
51. A ________ serves as the master and there is only one NameNode per cluster
A. Data Node
B. NameNode
C. Data block
D. Replication
ANSWER: B
54. ________ NameNode is used when the Primary NameNode goes down.
A. Rack
B. Data
C. Secondary
D. None
ANSWER: C
56. Which of the following scenario may not be a good fit for HDFS?
A. HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
B. HDFS is suitable for storing data related to applications requiring low latency data access
C. HDFS is suitable for storing data related to applications requiring high latency data access
D. None of the mentioned
ANSWER: A
57. The need for data replication can arise in various scenarios like :
A. Replication Factor is changed
B. DataNode goes down
C. Data Blocks get corrupted
D. All of the mentioned
ANSWER: D
58. ________ is the slave/worker node and holds the user data in the form of Data Blocks
A. DataNode
B. NameNode
C. Data block
D. Replication
ANSWER: A
59. HDFS provides a command line interface called __________ used to interact with HDFS.
A. HDFS Shell
B. FS Shell
C. DFSA Shell
D. None
ANSWER: B
63. Cloudera ___________ includes CDH and an annual subscription license (per node) to Cloudera
Manager and technical support.
A. Enterprise
B. Express
C. Standard
D. All the above
ANSWER: A
64. Cloudera Express includes CDH and a version of Cloudera ___________ lacking enterprise features
such as rolling upgrades and backup/disaster recovery
A. Enterprise
B. Express
C. Standard
D. Manager
ANSWER: D
68. _______ is an open source set of libraries, tools, examples, and documentation engineered.
A. Kite
B. Kize
C. Ookie
D. All of the mentioned
ANSWER: A
69. To configure short-circuit local reads, you will need to enable ____________ on local Hadoop.
A. librayhadoop
B. libhadoop
C. libhad
D. hadoop
ANSWER: B
71. _______ can change the maximum number of cells of a column family
A. set
B. reset
C. alter
D. connect
ANSWER: C
74. You can delete a column family from a table using the method _________ of HBAseAdmin class.
A. delColumn()
B. removeColumn()
C. deleteColumn()
D. All of the mentioned
ANSWER: A
75. Point out the wrong statement
A. To read data from an HBase table, use the get() method of the HTable class
B. You can retrieve data from the HBase table using the get() method of the HTable class
C. While retrieving data, you can get a single row by id, or get a set of rows by a set of row ids, or scan
an entire table or a subset of rows
D. None of the mentioned
ANSWER: D
77. The ________ class provides the getValue() method to read the values from its instance
A. Get
B. Result
C. Put
D. Value
ANSWER: B
78. ________ communicate with the client and handle data-related operations.
A. Master Server
B. Region Server
C. Htable
D. All of the mentioned
ANSWER: B
80. HBase uses the _______ File System to store its data
A. Hive
B. Impala
C. Hadoop
D. Scala
ANSWER: C
83. Which of the following is true about the base plotting system ?
A. Margins and spacings are adjusted automatically depending on the type of plot and the data
B. Plots are typically created with a single function call
C. Plots are created and annotated with separate functions
D. The system is most useful for conditioning plots
ANSWER: C
87. Which of the following functions is typically used to add elements to a plot in the base graphics system
A. lines()
B. hist()
C. plot()
D. boxplot()
ANSWER: D
88. Which function opens the screen graphics device for the Mac ?
A. bitmap()
B. quartz()
C. pdf()
D. png()
ANSWER: B
93. In 2004, ________ purchased the S language from Lucent for $2 million
A. Insightful
B. Amazon
C. IBM
D. All the above
ANSWER: A
94. In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the
University of _________.
A. John Hopkins
B. California
C. Harvard
D. Auckland
ANSWER: D
97. R is technically much closer to the Scheme language than it is to the original _____ language.
A. B
B. C
C. R
D. S
ANSWER: D
98. The R-help and _____ mailing lists have been highly active for over a decade now
A. R-mail
B. R-devel
C. R-dev
D. R-d
ANSWER: B
100. The copyright for the primary source code for R is held by the ______ Foundation.
A. A
B. C
C. C++
D. R
ANSWER: D
104. The _________ R system contains, among other things, the base package which is required to run R
A. root
B. child
C. base
D. none of the above
ANSWER: C
109. Advanced users can write ___ code to manipulate R objects directly.
A. C
B. C++
C. Java
D. None of the mentioned
ANSWER: A
117. If a command is not complete at the end of a line, R will give a different prompt, by default it is :
A. *
B. -
C. +
D. All the above
ANSWER: C
118. Command lines entered at the console are limited to about ________ bytes
A. 3000
B. 4095
C. 5000
D. None
ANSWER: B
119. ._____ text editor provides more general support mechanisms via ESS for working interactively with
R.
A. EAC
B. Emac
C. Shell
D. None
ANSWER: B
120. What would be the result of following R code ? > x <- 1 >print(x)
A. 1
B. 2
C. 3
D. 4
ANSWER: A
126. _______ will divert all subsequent output from the console to an external file.
A. sink
B. div
C. dip
D. exp
ANSWER: A
127. The entities that R creates and manipulates are known as ________
A. task
B. objects
C. function
D. expression
ANSWER: B
128. Which of the following can be used to display the names of (most of) the objects which are currently
stored within R ?
A. object()
B. objects()
C. list()
D. none of the above
ANSWER: B
130. What will be the output of following code snippet ? > paste("a", "b", se = ":")
A. a+b
B. a-b
C. ab
D. none
ANSWER: D
134. You can check to see whether an R object is NULL with the _________ function.
A. is.nullobj()
B. null()
C. is.null()
D. obj.null()
ANSWER: C
138. For YARN, the ___________ Manager UI provides host and port information.
A. Data Node
B. NameNode
C. Resource
D. Replication
ANSWER: C
139. Point out the correct statement
A. The Hadoop framework publishes the job flow status to an internally running web server on the
master nodes of the Hadoop cluster
B. Each incoming file is broken into 32 MB by default
C. Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault
tolerance
D. None of the mentioned
ANSWER: A
140. For ________, the HBase Master UI provides information about the HBase Master uptime.
A. Oozie
B. HBase
C. Kafka
D. Afka
ANSWER: B
141. __________ is a standard Java API for monitoring and managing applications.
A. JVM
B. JVN
C. JMX
D. JMY
ANSWER: C
142. __________ Manager's Service feature monitors dozens of service health and performance metrics
about the services and role instances running on your cluster.
A. Microsoft
B. Cloudera
C. Amazon
D. None of the abovc
ANSWER: B
143. The IBM _____________ Platform provides all the foundational building blocks of trusted
information, including data integration, data warehousing, master data management, big data and
information governance.
A. InfoStream
B. InfoSphere
C. InfoSurface
D. InfoSurface
ANSWER: A
146. InfoSphere DataStage uses a client/server design where jobs are created and administered via a
________ client against central repository on a server
A. Ubuntu
B. Windows
C. Debian
D. Solaris
ANSWER: B
148. DataStage originated at __________, a company that developed two notable products: UniVerse
database and the DataStage ETL tool.
A. VMark
B. Vzen
C. Hatez
D. SMark
ANSWER: A
Staff Name
Suguna M .