4 - When creating a master workbook in BigSheets, what is responsible for formatting the data output
in the workbook?
A. Avro
B. Formula
C. Reader
D. Diagram
5 - Which AQL statement can be used to create components of a Text Analytics extractor?
6 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?
extract 'Names'
on R.text as match
from Document R;
Select an answer
from Document R;
on R.text as match
from Document R;
A. Hive
B. Data Warehouse
C. Zookeeper
A. social networks
B. GPS
C. cell phones
D. RFID
9 - Which of the four characteristics of Big Data indicates that many data formats can be stored and
analyzed in Hadoop?
A. Volume
B. Velocity
C. Volatility
D. Variety
10 - Which class of software is incorrectly perceived as the only software used for Big Data analysis?
A. Hadoop
B. Data Warehouse
C. Analytics
D. Data Quality
A. 16 MB
B. 32 MB
C. 64 MB
D. 128 MB
12 - Which MapReduce task is responsible for reading a portion of input data and producing <key,
value< pairs?
A. Map
B. Combiner
C. Reduce
D. Shuffle
13 - Which class in MapReduce takes each record and transforms it into a <key, value< pair?
A. RecordReader
B. InputFormat
C. LineRecordReader
D. InputSplitter
14 - Which part of the MapReduce engine controls job execution on multiple slaves?
A. JobTracker
B. Job Client
C. TaskTracker
D. MapTask
16 - When using BigInsights Eclipse to develop a new application, what must be done prior to testing the
application in the cluster?
Select an answer
17 - Which tool is included as part of the IBM BigInsights Eclipse development environment?
B. code converter
C. code generator
18 - Which part of the BigInsights web console, provides information that would help a user
troubleshoot and diagnose a failed application job?
B. Cluster tab
C. HTML, Avro
data = [ { from: 101, to: 102, msg: "Hello" }, { from: 103, to: 104, msg: "World!" }, { from: 105, to:106,
msg: "Hello World" } ];
{ "message": "Hello" }
]
What is the correct sequence of JAQL commands to select only the message text from sender 101?
A. Indexing
B. Clustering
C. Bucketing
D. sub-partitioning
A. SOAP
B. JDBC
C. RPC
D. Jet
23 - What is the name of the interface that allows Hive to read in data from a table, and write it back out
to HDFS in any custom format?
A. Thrift
B. BigSQL
C. SerDe
D. JDBC
24 - Which Hadoop component forms the software framework for distributed computing across a
cluster of commodity computers?
A. Parallel Computing
B. MapReduce
C. OLAP
D. Stream Processing
25 - Which administrative console feature of BigInsights is a visualization and analysis tool designed to
work with structured and unstructured data?
A. BigR
B. MapReduce
C. BigSheets
D. Text Analytics
26 - Which two technologies form the foundation of Hadoop? (Choose two.) (Please select ALL that
apply)
A. HDFS
B. MapReduce
C. GPFS
D. HBase
27 - Which two Hadoop features make it very cost-effective for Big Data analytics? (Choose two.) (Please
select ALL that apply)
28 - Which feature of Jaql provides native functions and modules that allow you to build re-usable
packages?
B. extensibility
C. support for XML data sources
A. Hive
B. Pig
C. Big SQL
D. HBase
31 - Which component of IBM Watson Explorer forms the foundation of the framework and allows
Watson to extract and index data from any source?
A. meta data
B. connector framework
C. application framework
D. search engine
32 - Which two Hadoop query languages do not require data to have a schema? (Choose two.) (Please
select ALL that apply)
A. Avro
B. Hive
C. BigSql
D. Pig
E. Jaql
33 - Which method is used by Jaql to operate on large arrays?
A. SQL
B. Parallelization
C. Buckets
D. Partitions
34 - What is the name of the Hadoop-based query language developed by Facebook that facilitates SQL-
like queries?
A. BigSql
B. Pig
C. Jaql
D. Hive
35 - What is the correct sequence of the three main steps in Text Analytics?
36 - When using IBM Text Analytics, which technique is used to identify and recognize patterns of text?
Select an answer
A. Annotation
B. Dictionary
C. Regular Expression
D. Tokenization
37 - IBM Text Analytics is embedded into the key components of which IBM solution?
A. InfoSphere BigInsights
B. Eclipse
C. Hadoop
D. Rational
A. Results on the real data are computed and the output is explored.
A. Table
B. Workbook
C. Reader
D. View
40 - In HDFS 2.0, how much RAM is required in a Namenode for one million blocks of data?
A. 1 GB
B. 2 GB
C. 5 GB
D. 10 GB
41 - High availability was added to which part of the HDFS file system in HDFS 2.0 to prevent loss of
metadata?
A. Blockpool
B. NameNode
C. DataNode
D. CheckPoint
42 - For scalability purposes, data in an HDFS cluster is broken down into what default block size?
A. 16 MB
B. 32 MB
C. 64 MB
D. 128 MB
43 - Where does HDFS store the file system namespace and properties?
A. Hive
B. hdfs.conf
C. DataNode
D. FsImage
A. EditLog
B. FsImage
C. NameSpace
D. DataNode
45 - Which two pre-requisites must be fulfilled when running a Java MapReduce program on the Cluster,
using Eclipse? (Choose two.) (Please select ALL that apply)
C. It is stored in memory.
48 - Which statement is true about where the output of MAP task is stored?
B. It is stored in memory.
51 - You have been asked to create an HBase table and populate it with all the sales transactions,
generated in the company in the last quarter. Currently, these transactions reside in a 300 MB tab
delimited file in HDFS. What is the most efficient way for you to accomplish this task?
A. pre-create the column families when creating the table and use the put command to load the
data
B. pre-create regions by specifying splits in create table command and use the insert command
to load data
C. pre-create the column families when creating the table and bulk loading the data
D. pre-create regions by specifying splits in create table command and bulk loading the data
52 - Which two commands are used to retrieve data from an HBase table? (Choose two.) (Please select
ALL that apply)
A. List
B. Scan
C. Describe
D. Read
E. Get
53 - When the following HIVE command is executed: LOAD DATA INPATH '/tmp/department.del'
OVERWRITE INTO TABLE department; What happens?
A. The department.del file is moved from the HDFS /tmp directory to the location
corresponding to the Hive table.
B. The department.del file is copied from the HDFS /tmp directory to the location corresponding
to the Hive table.
C. The department.del file is moved from the local file system /tmp directory in the local file
system to HDFS.
D. The department.del file is copied from the local file system /tmp directory to the location
corresponding to the Hive table in HDFS.
A. hive.conf
B. hive-site.xml
C. hive-env.config
D. hive-conf.xml
A. start.sh
B. start-all.sh
D. start.sh biginsights
56 - Which two commands are used to copy a data file from the local file system to HDFS? (Choose two.)
(Please select ALL that apply)
/mydir/test_file
59 - In 2003, IBM's System S was the first prototype of a new IBM Stream Computing solution that
performed which type of processing?
60 - Which Advanced Analytics toolkit in InfoSphere Streams is used for developing and building
predictive models?
A. SPSS
B. Geospatial
C. Time Series
D. CEP
A. Filter
B. Reader
C. Data Qualifier
D. Crawler
A. Hive
B. Data Warehouse
D. Zookeeper
A. range partitioning
B. time partitioning
C. interval partitioning
D. value partitioning
64 - What does NameNode use as a transaction log to persistently record changes to file system
metadata?
A. FSImage
B. CheckPoint
C. EditLog
D. DataNode
65 - Which Hadoop query language can manipulate semi-structured data and support the JSON format?
A. Python
B. Jaql
C. AQL
D. Ruby
66 - under IBM's Text Analytics framework, what programming language is used to write rules that can
extract information from unstructured text sources?
A. Hive
B. AQL
C. InfoSphere Streams
D. Avro
67 - Your company wants to utilize text analytics to gain an understanding of general public opinion
concerning company products. Which type of input source would you analyze for that information?
A. Twitter feeds
B. server logs
C. voicemail
D. email
68 - Which JSON-based query language was developed by IBM and donated to the open-source Hadoop
community?
A. Jaql
B. Pig
C. Avro
D. Big SQL
B. It uses sub-queries.
C. It supports Hcatalog.
70 - The Enterprise Edition of BigInsights offers which feature not available in the Standard Edition?
A. Dashboards
B. Eclipse Tooling
C. Adaptive MapReduce
D. Big SQL
A. EXT4
B. XFS
C. NTFS
D. HDFS
72 - hadoop is the primary software tool used for which class of computing?
73 - Which element of the Big Data Platform can cost-effectively store and manage many petabytes of
structured and unstructured information?
A. Data Warehouse
B. Contextual Discovery
C. Stream Computing
D. Hadoop System
74 - Under the MapReduce architecture, how does a JobTracker detect the failure of a TaskTracker?
C. receives no heartbeat
75 - Under the MapReduce architecture, when a line of data is split between two blocks, which class will
read over the split to the end of the line?
A. LineRecordReader
B. RecordReader
C. InputSplitter
D. InputFormat
76 - What is the process in MapReduce that moves all data from one key to the same worker node?
A. Map
B. Shuffle
C. Reduce
D. Split
77 - Which IBM Business Analytics solution facilitates collaboration and unifies disparate data across
multiple systems into a single access point?
A. InfoSphere Streams
D. HBase
78 - Which type of application can be published when using the BigInsights Application Publish wizard in
Eclipse?
A. BigSheets
B. Java
C. Hive
D. Big SQL
79 - Which BigInsights tool is used to access the BigInsights Applications Catalog?
A. Web Console
B. Application Wizard
C. Eclipse Plug-in
D. Eclipse Console
80 - Which built-in Hive storage file format improves performance by providing semi-columnar data
storage with good compression?
A. SEQUENCEFILE
B. DBOUTPUTFILE
C. TEXTFILE
D. RCFILE
A. sub-partitioning
B. indexing
C. clustering
D. bucketing
82 - In Hadoop's rack-aware replica placement, what is the correct default block node placement?
A. It is stored in memory.
85 - Which open-source software project provides a coordination service for HBase including naming,
configuration management and synchronization?
A. Zookeeper
B. Spark
C. Hive
D. Pig
86 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?
D. Operations analysis
E. Security/Intelligence extension
D. higher scalability
89 - What is one of the primary reasons that Hive is often used with Hadoop?
90 - Which class of software can store and manage only structured data?
A. MapReduce
B. Data Warehouse
C. HDFS
D. Hadoop
91 - Which of the four key Big Data Use Cases is used to lower risk and detect fraud?
D. Security/Intelligence Extension
92 - Which of the four characteristics of Big Data deals with trusting data sources?
A. Veracity
B. Volume
C. Velocity
D. Variety
94 - Which two file actions can HDFS complete? (Choose two.) (Please select ALL that apply)
A. Index
B. Update
C. Execute
D. Create
E. Delete
94 - Which capability of Jaql gives it a significant advantage over other query languages?