PDF

1 - What is the primary, outstanding feature offered by HBase?
A. It is optimized for high-speed, sequential batch operations.
B. It handles sharding automatically.
C. It is designed to run on high-end servers like IBM i or IBM z.
D. It can scale up to 512 terabytes per Hadoop cluster.
2 - Which statement is true about the data model used by HBase?
A. The table schema only defines column families.
B. A cell is specified by array, instance and version.
C. Data is stored in hierarchical arrays.
D. Schema defines a fixed number of columns per row.
3 - Which statement describes the data model used by HBase?
A. a multidimensional sorted map
B. a hierarchical tree structure
C. an object-relational data structure
D. a record/set-based network model
4 - When creating a master workbook in BigSheets, what is responsible for formatting the data output
in the workbook?
A. Avro
B. Formula
C. Reader
D. Diagram
5 - Which AQL statement can be used to create components of a Text Analytics extractor?
A. create AQL module <module_name<;
B. create view <view_name< as <select or extract statement<;
C. create object : <object_name< <object_type<;

D. create AQL table <table_name< as <select or extract statement<;
6 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?
A. create dictionary NamesDict as
extract 'Names'
on R.text as match
from Document R;
Select an answer
B. create view Names as
extract 'John', 'Mary', 'Eric', 'Eva'
from Document R;
export view Names;
C. create dictionary NamesDict as
('John', 'Mary', 'Eric', 'Eva');
export dictionary NamesDict;
D. create dictionary NamesDict as
create view Names as
extract dictionary 'NamesDict'
on R.text as match
from Document R;
output view Names;
7 - Which BigInsights feature helps to extract information from text data?
A. Hive
B. Data Warehouse
C. Zookeeper
D. Text Analytics Engine

8 - Which well known Big Data source has become the most popular source for business analytics?
A. social networks
B. GPS
C. cell phones
D. RFID
9 - Which of the four characteristics of Big Data indicates that many data formats can be stored and
analyzed in Hadoop?
A. Volume
B. Velocity
C. Volatility
D. Variety
10 - Which class of software is incorrectly perceived as the only software used for Big Data analysis?
A. Hadoop
B. Data Warehouse
C. Analytics
D. Data Quality
11 - What is the default data block size in BigInsights?
A. 16 MB
B. 32 MB
C. 64 MB
D. 128 MB
12 - Which MapReduce task is responsible for reading a portion of input data and producing <key,
value< pairs?
A. Map
B. Combiner
C. Reduce
D. Shuffle
13 - Which class in MapReduce takes each record and transforms it into a <key, value< pair?
A. RecordReader
B. InputFormat
C. LineRecordReader
D. InputSplitter
14 - Which part of the MapReduce engine controls job execution on multiple slaves?
A. JobTracker
B. Job Client
C. TaskTracker
D. MapTask
15 - What is the responsibility of the InputSplitter class on an HDFS file?
A. to read split files into records
B. to transform it into splits
C. to join two split blocks
D. to prevent fragmented splits
16 - When using BigInsights Eclipse to develop a new application, what must be done prior to testing the
application in the cluster?
Select an answer
A. compile the project

B. configure runtime properties
C. authenticate with Administrator permissions
D. install test environment
17 - Which tool is included as part of the IBM BigInsights Eclipse development environment?
A. test results analyzer
B. code converter
C. code generator
D. java script compiler
18 - Which part of the BigInsights web console, provides information that would help a user
troubleshoot and diagnose a failed application job?
A. Oozie workflow panel
B. Cluster tab
C. Application Status tab
D. Application tab and MapReduce service panel
19 - Which file formats can Jaql read?
A. Binary, Delimited, Portable Document Format (PDF)
B. JSON, MS Word document
C. HTML, Avro
D. JSON, Avro, Delimited
20 - Given the following array:
data = [ { from: 101, to: 102, msg: "Hello" }, { from: 103, to: 104, msg: "World!" }, { from: 105, to:106,
msg: "Hello World" } ];
And the following example of expected output:
{ "message": "Hello" }
]
What is the correct sequence of JAQL commands to select only the message text from sender 101?
A. data < filter $.from == 101
B. data < filter $.from == 101 < expand {message: $.msg};
C. data < filter $.from == 101 < transform {message: $.msg};
D. data < transform{message: $msg < filter $.from == 101};
21 - Which method is used by Hive to group data by hash within partitions?
A. Indexing
B. Clustering
C. Bucketing
D. sub-partitioning
22 - Which type of client interface does Hive support?
A. SOAP
B. JDBC
C. RPC
D. Jet
23 - What is the name of the interface that allows Hive to read in data from a table, and write it back out
to HDFS in any custom format?
A. Thrift
B. BigSQL
C. SerDe
D. JDBC
24 - Which Hadoop component forms the software framework for distributed computing across a
cluster of commodity computers?
A. Parallel Computing
B. MapReduce
C. OLAP
D. Stream Processing
25 - Which administrative console feature of BigInsights is a visualization and analysis tool designed to
work with structured and unstructured data?
A. BigR
B. MapReduce
C. BigSheets
D. Text Analytics
26 - Which two technologies form the foundation of Hadoop? (Choose two.) (Please select ALL that
apply)
A. HDFS
B. MapReduce
C. GPFS
D. HBase
27 - Which two Hadoop features make it very cost-effective for Big Data analytics? (Choose two.) (Please
select ALL that apply)
A. processes large data sets
B. processes transactional data
C. runs on commodity hardware
D. processes highly structured data
E. processes several small files
28 - Which feature of Jaql provides native functions and modules that allow you to build re-usable
packages?
A. JSON data structures
B. extensibility
C. support for XML data sources
D. MapReduce-based query language
29 - What is IBM's SQL interface to InfoSphere BigInsights?
A. Hive
B. Pig
C. Big SQL
D. HBase
30 - BigSQL has which advantage over Hive?
A. It provides better SerDe drivers.
B. It uses better storage handlers.
C. It uses the superior HCatalog table manager.
D. It supports standard SQL statements.
31 - Which component of IBM Watson Explorer forms the foundation of the framework and allows
Watson to extract and index data from any source?
A. meta data
B. connector framework
C. application framework
D. search engine
32 - Which two Hadoop query languages do not require data to have a schema? (Choose two.) (Please
select ALL that apply)
A. Avro
B. Hive
C. BigSql
D. Pig
E. Jaql
33 - Which method is used by Jaql to operate on large arrays?
A. SQL
B. Parallelization
C. Buckets
D. Partitions
34 - What is the name of the Hadoop-based query language developed by Facebook that facilitates SQL-
like queries?
A. BigSql
B. Pig
C. Jaql
D. Hive
35 - What is the correct sequence of the three main steps in Text Analytics?
A. categorize subjects; index columns; derive patterns
B. structure text; derive patterns; interpret output
C. index text; categorize subjects; parse the data
D. sort tables; index text; derive patterns
36 - When using IBM Text Analytics, which technique is used to identify and recognize patterns of text?
Select an answer
A. Annotation
B. Dictionary
C. Regular Expression
D. Tokenization
37 - IBM Text Analytics is embedded into the key components of which IBM solution?
A. InfoSphere BigInsights
B. Eclipse
C. Hadoop
D. Rational
38 - What happens when a user runs a workbook in BigSheets?
A. Results on the real data are computed and the output is explored.
B. Jaql scripts are compiled and run in the background.
C. Sample data is processed in a simulated environment.
D. Data is filtered and transformed as desired by the user.
39 - Which BigSheets component presents a spreadsheet-like representation of data?
A. Table
B. Workbook
C. Reader
D. View
40 - In HDFS 2.0, how much RAM is required in a Namenode for one million blocks of data?
A. 1 GB
B. 2 GB
C. 5 GB
D. 10 GB
41 - High availability was added to which part of the HDFS file system in HDFS 2.0 to prevent loss of
metadata?
A. Blockpool
B. NameNode
C. DataNode
D. CheckPoint
42 - For scalability purposes, data in an HDFS cluster is broken down into what default block size?
A. 16 MB
B. 32 MB
C. 64 MB
D. 128 MB
43 - Where does HDFS store the file system namespace and properties?
A. Hive
B. hdfs.conf
C. DataNode
D. FsImage
44 - In the master/slave architecture, what is considered the slave?
A. EditLog
B. FsImage
C. NameSpace
D. DataNode
45 - Which two pre-requisites must be fulfilled when running a Java MapReduce program on the Cluster,
using Eclipse? (Choose two.) (Please select ALL that apply)
A. Hadoop services must be running.
B. A connection to the BigInsights Server must be defined.
C. Zookeeper must be running.
D. BigInsights services must be running.
E. A repository connection under Data Source Explorer panel must be defined.

46 - What are two main components of a Java MapReduce job? (Choose two.) (Please select ALL that
apply)
A. Scheduler class which must should org.apache.hadoop.mapreduce.Scheduler class
B. Reducer class which should extend org.apache.hadoop.mapreduce.Reducer class
C. Job class which should extend org.apache.hadoop.mapreduce.Application class
D. Mapper class which should extend org.apache.hadoop.mapreduce.Mapper class
E. Configuration class which should extend org.apache.hadoop.mapreduce.JobConfiguration

class
47 - Which statement is true about storage of the output of REDUCE task?
A. It is stored on the local disk.
B. It is stored in HDFS, but only one copy on the local machine.
C. It is stored in memory.
D. It is stored in HDFS using the number of copies specified by replication factor.
48 - Which statement is true about where the output of MAP task is stored?
A. It is stored in HDFS using the number of copies specified by replication factor.
B. It is stored in memory.
C. It is stored on the local disk.
D. It is stored in HDFS, but only one copy on the local machine.
49 - After the following sequence of commands are executed:
create 'table1', 'columnfamily1', 'columnfamily2', 'columnfamily3'
put 'table1', 'row1', 'columnfamily1: c11', 'r1v11'
put 'table1’, 'row1', 'columnfamily1: c12', 'r1v12'
put 'table1', 'row2', 'columnfamily1: d11', 'r2v11'

What value will the count 'table_1'command return?
Answer: the value is 2
50 - Which element(s) must be specified when creating an HBase table?
A. only the table name
B. only the table name and column family(s)
C. table name, column names and column types
D. table name, column family(s) and column names
51 - You have been asked to create an HBase table and populate it with all the sales transactions,
generated in the company in the last quarter. Currently, these transactions reside in a 300 MB tab
delimited file in HDFS. What is the most efficient way for you to accomplish this task?
A. pre-create the column families when creating the table and use the put command to load the
data
B. pre-create regions by specifying splits in create table command and use the insert command
to load data
C. pre-create the column families when creating the table and bulk loading the data
D. pre-create regions by specifying splits in create table command and bulk loading the data
52 - Which two commands are used to retrieve data from an HBase table? (Choose two.) (Please select
ALL that apply)
A. List
B. Scan
C. Describe
D. Read
E. Get
53 - When the following HIVE command is executed: LOAD DATA INPATH '/tmp/department.del'
OVERWRITE INTO TABLE department; What happens?
A. The department.del file is moved from the HDFS /tmp directory to the location
corresponding to the Hive table.
B. The department.del file is copied from the HDFS /tmp directory to the location corresponding
to the Hive table.
C. The department.del file is moved from the local file system /tmp directory in the local file
system to HDFS.
D. The department.del file is copied from the local file system /tmp directory to the location
corresponding to the Hive table in HDFS.
54 - Which file allows you to update Hive configuration?
A. hive.conf
B. hive-site.xml
C. hive-env.config
D. hive-conf.xml
55 - Which command is used for starting all the BigInsights components?
A. start.sh
B. start-all.sh
C. start all biginsights
D. start.sh biginsights
56 - Which two commands are used to copy a data file from the local file system to HDFS? (Choose two.)
(Please select ALL that apply)
A. hadoop fs -sync test_file test_file
B. hadoop fs -put test_file test_file
C. hadoop fs -cp test_file test_file
D. hadoop fs -copyFromLocal test_file test_file
E. hadoop fs -get test_file test_file

57 - Which statement is true about the following command?
hadoop dfsadmin -report:
A. It displays the list of users having administration privileges.
B. It displays basic file system information and statistics.
C. It displays the list of the all files and directories in HDFS.
D. It is not a valid Hadoop command.
58 - A user enters following command:
hadoop fs -ls /mydir/test_file
and receives the following output:
rw-r--r-- 3 biadmin supergroup 714002200 2014-11-21 14:21
/mydir/test_file
What does the 3 indicate?
A. the number of times blocks in file were replicated
B. the number of blocks in which this file was stored
C. the version of the file stored with this name
D. expected replication factor for this file
59 - In 2003, IBM's System S was the first prototype of a new IBM Stream Computing solution that
performed which type of processing?
A. On Line Transaction Processing
B. Complex Event Processing
C. On Line Analytic Processing
D. Real Time Analytic Processing
60 - Which Advanced Analytics toolkit in InfoSphere Streams is used for developing and building
predictive models?
A. SPSS
B. Geospatial
C. Time Series
D. CEP
61 - Which BigSheets component applies a schema to the underlying data at runtime?
A. Filter
B. Reader
C. Data Qualifier
D. Crawler
62 - Which BigInsights feature helps to extract information from text data?
A. Hive
B. Data Warehouse
C. Text Analytics Engine
D. Zookeeper
63 - Which type of partitioning is supported by Hive?
A. range partitioning
B. time partitioning
C. interval partitioning
D. value partitioning
64 - What does NameNode use as a transaction log to persistently record changes to file system
metadata?
A. FSImage
B. CheckPoint
C. EditLog
D. DataNode
65 - Which Hadoop query language can manipulate semi-structured data and support the JSON format?
A. Python
B. Jaql
C. AQL
D. Ruby
66 - under IBM's Text Analytics framework, what programming language is used to write rules that can
extract information from unstructured text sources?
A. Hive
B. AQL
C. InfoSphere Streams
D. Avro
67 - Your company wants to utilize text analytics to gain an understanding of general public opinion
concerning company products. Which type of input source would you analyze for that information?
A. Twitter feeds
B. server logs
C. voicemail
D. email
68 - Which JSON-based query language was developed by IBM and donated to the open-source Hadoop
community?
A. Jaql
B. Pig
C. Avro
D. Big SQL
69 - Why does Big SQL perform better than Hive?
A. It has better storage handlers.
B. It uses sub-queries.
C. It supports Hcatalog.
D. It has a better optimizer.
70 - The Enterprise Edition of BigInsights offers which feature not available in the Standard Edition?
A. Dashboards
B. Eclipse Tooling
C. Adaptive MapReduce
D. Big SQL
71 - Which file system spans all nodes in a Hadoop cluster?
A. EXT4
B. XFS
C. NTFS
D. HDFS
72 - hadoop is the primary software tool used for which class of computing?
A. Online Transaction processing
B. Decision Support Systems
C. Big Data Analytics
D. Online Analytical processing
73 - Which element of the Big Data Platform can cost-effectively store and manage many petabytes of
structured and unstructured information?
A. Data Warehouse
B. Contextual Discovery
C. Stream Computing
D. Hadoop System
74 - Under the MapReduce architecture, how does a JobTracker detect the failure of a TaskTracker?
A. receives report from Child JVM
B. receives report from MapReduce
C. receives no heartbeat
D. detects that NameNode is offline
75 - Under the MapReduce architecture, when a line of data is split between two blocks, which class will
read over the split to the end of the line?
A. LineRecordReader
B. RecordReader
C. InputSplitter
D. InputFormat
76 - What is the process in MapReduce that moves all data from one key to the same worker node?
A. Map
B. Shuffle
C. Reduce
D. Split
77 - Which IBM Business Analytics solution facilitates collaboration and unifies disparate data across
multiple systems into a single access point?
A. InfoSphere Streams
B. IBM Watson Explorer
C. InfoSphere Text Analytics
D. HBase
78 - Which type of application can be published when using the BigInsights Application Publish wizard in
Eclipse?
A. BigSheets
B. Java
C. Hive
D. Big SQL
79 - Which BigInsights tool is used to access the BigInsights Applications Catalog?
A. Web Console
B. Application Wizard
C. Eclipse Plug-in
D. Eclipse Console
80 - Which built-in Hive storage file format improves performance by providing semi-columnar data
storage with good compression?
A. SEQUENCEFILE
B. DBOUTPUTFILE
C. TEXTFILE
D. RCFILE
81 - Which method is used by Hive to group data by hash within partitions?
A. sub-partitioning
B. indexing
C. clustering
D. bucketing
82 - In Hadoop's rack-aware replica placement, what is the correct default block node placement?
A. 1 block in 1 rack, 2 blocks in a second rack
B. 2 blocks in 1 rack, 2 blocks in a second rack
C. 3 blocks in same rack
D. 4 blocks in 4 separate racks
83 - Which element(s) must be specified when creating an HBase table?
A. only the table name
B. only the table name and column family(s)
C. table name, column names and column types
D. table name, column family(s) and column names

84 - Which statement is true about storage of the output of REDUCE task?
A. It is stored in memory.
B. It is stored in HDFS, but only one copy on the local machine.
C. It is stored in HDFS using the number of copies specified by replication factor.
D. It is stored on the local disk.
85 - Which open-source software project provides a coordination service for HBase including naming,
configuration management and synchronization?
A. Zookeeper
B. Spark
C. Hive
D. Pig
86 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?
A. create dictionary NamesDict as

create view Names as
extract dictionary 'NamesDict'
on R.text as match
from Document R;
output view Names;
B. create dictionary NamesDict as

export dictionary NamesDict;
C. create view Names as
extract 'John', 'Mary', 'Eric', 'Eva'
from Document R;
export view Names;
D. create dictionary NamesDict as
extract 'Names'
on R.text as match
from Document R;
87 - In which two use cases does IBM Watson Explorer differentiate itself from competing products?
(Choose two.) (Please select ALL that apply)
A. Data Warehouse augmentation
B. 360-degree view of the customer
C. Big Data exploration
D. Operations analysis
E. Security/Intelligence extension
88 - BigInsights offers which added value to Hadoop?
A. enhanced web-based UI and tools
B. higher levels of fault tolerance
C. increased agility and flexibility
D. higher scalability
89 - What is one of the primary reasons that Hive is often used with Hadoop?
A. Hive creates indexes for tables in Hadoop data sets.
B. It provides a graphical client to access Hadoop data.
C. MapReduce is difficult to use.
D. It provides a hierarchical schema for Hadoop data sets.
90 - Which class of software can store and manage only structured data?
A. MapReduce
B. Data Warehouse
C. HDFS
D. Hadoop
91 - Which of the four key Big Data Use Cases is used to lower risk and detect fraud?
A. Big Data Exploration
B. Data Warehouse Augmentation

C. Operations Analysis
D. Security/Intelligence Extension
92 - Which of the four characteristics of Big Data deals with trusting data sources?
A. Veracity
B. Volume
C. Velocity
D. Variety
94 - Which two file actions can HDFS complete? (Choose two.) (Please select ALL that apply)
A. Index
B. Update
C. Execute
D. Create
E. Delete
94 - Which capability of Jaql gives it a significant advantage over other query languages?
A. It provides a built-in command-line shell.
B. It can load data from HDFS.
C. It can handle deeply nested, semi-structured data.
D. It supports the HiveQL query language.
95 - What is the primary benefit of using Hive with Hadoop?
A. Hadoop data can be accessed through SQL statements.
B. Queries perform much faster than with MapReduce.
C. It supports materialized views.
D. It provides support for transactions.

PDF

Uploaded by

Copyright:

Available Formats

PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PDF

Uploaded by

Copyright:

Available Formats

1 - What is the primary, outstanding feature offered by HBase?

A. It is optimized for high-speed, sequential batch operations.

B. It handles sharding automatically.

C. It is designed to run on high-end servers like IBM i or IBM z.

D. It can scale up to 512 terabytes per Hadoop cluster.

2 - Which statement is true about the data model used by HBase?

A. The table schema only defines column families.

B. A cell is specified by array, instance and version.

C. Data is stored in hierarchical arrays.

D. Schema defines a fixed number of columns per row.

3 - Which statement describes the data model used by HBase?

A. a multidimensional sorted map

B. a hierarchical tree structure

C. an object-relational data structure

D. a record/set-based network model

A. create AQL module <module_name<;

B. create view <view_name< as <select or extract statement<;

C. create object : <object_name< <object_type<;

A. create dictionary NamesDict as

B. create view Names as

extract 'John', 'Mary', 'Eric', 'Eva'

export view Names;

C. create dictionary NamesDict as

('John', 'Mary', 'Eric', 'Eva');

export dictionary NamesDict;

D. create dictionary NamesDict as

('John', 'Mary', 'Eric', 'Eva');

create view Names as

extract dictionary 'NamesDict'

output view Names;

7 - Which BigInsights feature helps to extract information from text data?

D. Text Analytics Engine

11 - What is the default data block size in BigInsights?

15 - What is the responsibility of the InputSplitter class on an HDFS file?

A. to read split files into records

B. to transform it into splits

C. to join two split blocks

D. to prevent fragmented splits

A. compile the project

C. authenticate with Administrator permissions

D. install test environment

A. test results analyzer

D. java script compiler

A. Oozie workflow panel

C. Application Status tab

D. Application tab and MapReduce service panel

19 - Which file formats can Jaql read?

A. Binary, Delimited, Portable Document Format (PDF)

B. JSON, MS Word document

D. JSON, Avro, Delimited

20 - Given the following array:

And the following example of expected output:

A. data < filter $.from == 101

B. data < filter $.from == 101 < expand {message: $.msg};

C. data < filter $.from == 101 < transform {message: $.msg};

D. data < transform{message: $msg < filter $.from == 101};

21 - Which method is used by Hive to group data by hash within partitions?

22 - Which type of client interface does Hive support?

A. processes large data sets