PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

1 - What is the primary, outstanding feature offered by HBase?

A. It is optimized for high-speed, sequential batch operations.

B. It handles sharding automatically.

C. It is designed to run on high-end servers like IBM i or IBM z.

D. It can scale up to 512 terabytes per Hadoop cluster.

2 - Which statement is true about the data model used by HBase?

A. The table schema only defines column families.

B. A cell is specified by array, instance and version.

C. Data is stored in hierarchical arrays.

D. Schema defines a fixed number of columns per row.

3 - Which statement describes the data model used by HBase?

A. a multidimensional sorted map

B. a hierarchical tree structure

C. an object-relational data structure

D. a record/set-based network model

4 - When creating a master workbook in BigSheets, what is responsible for formatting the data output
in the workbook?

A. Avro

B. Formula

C. Reader

D. Diagram

5 - Which AQL statement can be used to create components of a Text Analytics extractor?

A. create AQL module <module_name<;

B. create view <view_name< as <select or extract statement<;

C. create object : <object_name< <object_type<;


D. create AQL table <table_name< as <select or extract statement<;

6 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?

A. create dictionary NamesDict as

extract 'Names'

on R.text as match

from Document R;

Select an answer

B. create view Names as

extract 'John', 'Mary', 'Eric', 'Eva'

from Document R;

export view Names;

C. create dictionary NamesDict as

('John', 'Mary', 'Eric', 'Eva');

export dictionary NamesDict;

D. create dictionary NamesDict as

('John', 'Mary', 'Eric', 'Eva');

create view Names as

extract dictionary 'NamesDict'

on R.text as match

from Document R;

output view Names;

7 - Which BigInsights feature helps to extract information from text data?

A. Hive

B. Data Warehouse

C. Zookeeper

D. Text Analytics Engine


8 - Which well known Big Data source has become the most popular source for business analytics?

A. social networks

B. GPS

C. cell phones

D. RFID

9 - Which of the four characteristics of Big Data indicates that many data formats can be stored and
analyzed in Hadoop?

A. Volume

B. Velocity

C. Volatility

D. Variety

10 - Which class of software is incorrectly perceived as the only software used for Big Data analysis?

A. Hadoop

B. Data Warehouse

C. Analytics

D. Data Quality

11 - What is the default data block size in BigInsights?

A. 16 MB

B. 32 MB

C. 64 MB

D. 128 MB
12 - Which MapReduce task is responsible for reading a portion of input data and producing <key,
value< pairs?

A. Map

B. Combiner

C. Reduce

D. Shuffle

13 - Which class in MapReduce takes each record and transforms it into a <key, value< pair?

A. RecordReader

B. InputFormat

C. LineRecordReader

D. InputSplitter

14 - Which part of the MapReduce engine controls job execution on multiple slaves?

A. JobTracker

B. Job Client

C. TaskTracker

D. MapTask

15 - What is the responsibility of the InputSplitter class on an HDFS file?

A. to read split files into records

B. to transform it into splits

C. to join two split blocks

D. to prevent fragmented splits

16 - When using BigInsights Eclipse to develop a new application, what must be done prior to testing the
application in the cluster?

Select an answer

A. compile the project


B. configure runtime properties

C. authenticate with Administrator permissions

D. install test environment

17 - Which tool is included as part of the IBM BigInsights Eclipse development environment?

A. test results analyzer

B. code converter

C. code generator

D. java script compiler

18 - Which part of the BigInsights web console, provides information that would help a user
troubleshoot and diagnose a failed application job?

A. Oozie workflow panel

B. Cluster tab

C. Application Status tab

D. Application tab and MapReduce service panel

19 - Which file formats can Jaql read?

A. Binary, Delimited, Portable Document Format (PDF)

B. JSON, MS Word document

C. HTML, Avro

D. JSON, Avro, Delimited

20 - Given the following array:

data = [ { from: 101, to: 102, msg: "Hello" }, { from: 103, to: 104, msg: "World!" }, { from: 105, to:106,
msg: "Hello World" } ];

And the following example of expected output:

{ "message": "Hello" }
]

What is the correct sequence of JAQL commands to select only the message text from sender 101?

A. data < filter $.from == 101

B. data < filter $.from == 101 < expand {message: $.msg};

C. data < filter $.from == 101 < transform {message: $.msg};

D. data < transform{message: $msg < filter $.from == 101};

21 - Which method is used by Hive to group data by hash within partitions?

A. Indexing

B. Clustering

C. Bucketing

D. sub-partitioning

22 - Which type of client interface does Hive support?

A. SOAP

B. JDBC

C. RPC

D. Jet

23 - What is the name of the interface that allows Hive to read in data from a table, and write it back out
to HDFS in any custom format?

A. Thrift

B. BigSQL

C. SerDe

D. JDBC

24 - Which Hadoop component forms the software framework for distributed computing across a
cluster of commodity computers?

A. Parallel Computing
B. MapReduce

C. OLAP

D. Stream Processing

25 - Which administrative console feature of BigInsights is a visualization and analysis tool designed to
work with structured and unstructured data?

A. BigR

B. MapReduce

C. BigSheets

D. Text Analytics

26 - Which two technologies form the foundation of Hadoop? (Choose two.) (Please select ALL that
apply)

A. HDFS

B. MapReduce

C. GPFS

D. HBase

27 - Which two Hadoop features make it very cost-effective for Big Data analytics? (Choose two.) (Please
select ALL that apply)

A. processes large data sets

B. processes transactional data

C. runs on commodity hardware

D. processes highly structured data

E. processes several small files

28 - Which feature of Jaql provides native functions and modules that allow you to build re-usable
packages?

A. JSON data structures

B. extensibility
C. support for XML data sources

D. MapReduce-based query language

29 - What is IBM's SQL interface to InfoSphere BigInsights?

A. Hive

B. Pig

C. Big SQL

D. HBase

30 - BigSQL has which advantage over Hive?

A. It provides better SerDe drivers.

B. It uses better storage handlers.

C. It uses the superior HCatalog table manager.

D. It supports standard SQL statements.

31 - Which component of IBM Watson Explorer forms the foundation of the framework and allows
Watson to extract and index data from any source?

A. meta data

B. connector framework

C. application framework

D. search engine

32 - Which two Hadoop query languages do not require data to have a schema? (Choose two.) (Please
select ALL that apply)

A. Avro

B. Hive

C. BigSql

D. Pig

E. Jaql
33 - Which method is used by Jaql to operate on large arrays?

A. SQL

B. Parallelization

C. Buckets

D. Partitions

34 - What is the name of the Hadoop-based query language developed by Facebook that facilitates SQL-
like queries?

A. BigSql

B. Pig

C. Jaql

D. Hive

35 - What is the correct sequence of the three main steps in Text Analytics?

A. categorize subjects; index columns; derive patterns

B. structure text; derive patterns; interpret output

C. index text; categorize subjects; parse the data

D. sort tables; index text; derive patterns

36 - When using IBM Text Analytics, which technique is used to identify and recognize patterns of text?

Select an answer

A. Annotation

B. Dictionary

C. Regular Expression

D. Tokenization

37 - IBM Text Analytics is embedded into the key components of which IBM solution?

A. InfoSphere BigInsights

B. Eclipse
C. Hadoop

D. Rational

38 - What happens when a user runs a workbook in BigSheets?

A. Results on the real data are computed and the output is explored.

B. Jaql scripts are compiled and run in the background.

C. Sample data is processed in a simulated environment.

D. Data is filtered and transformed as desired by the user.

39 - Which BigSheets component presents a spreadsheet-like representation of data?

A. Table

B. Workbook

C. Reader

D. View

40 - In HDFS 2.0, how much RAM is required in a Namenode for one million blocks of data?

A. 1 GB

B. 2 GB

C. 5 GB

D. 10 GB

41 - High availability was added to which part of the HDFS file system in HDFS 2.0 to prevent loss of
metadata?

A. Blockpool

B. NameNode

C. DataNode

D. CheckPoint
42 - For scalability purposes, data in an HDFS cluster is broken down into what default block size?

A. 16 MB

B. 32 MB

C. 64 MB

D. 128 MB

43 - Where does HDFS store the file system namespace and properties?

A. Hive

B. hdfs.conf

C. DataNode

D. FsImage

44 - In the master/slave architecture, what is considered the slave?

A. EditLog

B. FsImage

C. NameSpace

D. DataNode

45 - Which two pre-requisites must be fulfilled when running a Java MapReduce program on the Cluster,
using Eclipse? (Choose two.) (Please select ALL that apply)

A. Hadoop services must be running.

B. A connection to the BigInsights Server must be defined.

C. Zookeeper must be running.

D. BigInsights services must be running.

E. A repository connection under Data Source Explorer panel must be defined.


46 - What are two main components of a Java MapReduce job? (Choose two.) (Please select ALL that
apply)

A. Scheduler class which must should org.apache.hadoop.mapreduce.Scheduler class

B. Reducer class which should extend org.apache.hadoop.mapreduce.Reducer class

C. Job class which should extend org.apache.hadoop.mapreduce.Application class

D. Mapper class which should extend org.apache.hadoop.mapreduce.Mapper class

E. Configuration class which should extend org.apache.hadoop.mapreduce.JobConfiguration


class

47 - Which statement is true about storage of the output of REDUCE task?

A. It is stored on the local disk.

B. It is stored in HDFS, but only one copy on the local machine.

C. It is stored in memory.

D. It is stored in HDFS using the number of copies specified by replication factor.

48 - Which statement is true about where the output of MAP task is stored?

A. It is stored in HDFS using the number of copies specified by replication factor.

B. It is stored in memory.

C. It is stored on the local disk.

D. It is stored in HDFS, but only one copy on the local machine.

49 - After the following sequence of commands are executed:

create 'table1', 'columnfamily1', 'columnfamily2', 'columnfamily3'

put 'table1', 'row1', 'columnfamily1: c11', 'r1v11'

put 'table1’, 'row1', 'columnfamily1: c12', 'r1v12'

put 'table1', 'row1', 'columnfamily2: c21', 'r1v21'

put 'table1', 'row1', 'columnfamily3: c31', 'r1v31'

put 'table1', 'row2', 'columnfamily1: d11', 'r2v11'

put 'table1', 'row2', 'columnfamily1: d12', 'r2v12'


put 'table1', 'row2', 'columnfamily2: d21', 'r2v21'

What value will the count 'table_1'command return?

Answer: the value is 2

50 - Which element(s) must be specified when creating an HBase table?

A. only the table name

B. only the table name and column family(s)

C. table name, column names and column types

D. table name, column family(s) and column names

51 - You have been asked to create an HBase table and populate it with all the sales transactions,
generated in the company in the last quarter. Currently, these transactions reside in a 300 MB tab
delimited file in HDFS. What is the most efficient way for you to accomplish this task?

A. pre-create the column families when creating the table and use the put command to load the
data

B. pre-create regions by specifying splits in create table command and use the insert command
to load data

C. pre-create the column families when creating the table and bulk loading the data

D. pre-create regions by specifying splits in create table command and bulk loading the data

52 - Which two commands are used to retrieve data from an HBase table? (Choose two.) (Please select
ALL that apply)

A. List

B. Scan

C. Describe

D. Read

E. Get
53 - When the following HIVE command is executed: LOAD DATA INPATH '/tmp/department.del'
OVERWRITE INTO TABLE department; What happens?

A. The department.del file is moved from the HDFS /tmp directory to the location
corresponding to the Hive table.

B. The department.del file is copied from the HDFS /tmp directory to the location corresponding
to the Hive table.

C. The department.del file is moved from the local file system /tmp directory in the local file
system to HDFS.

D. The department.del file is copied from the local file system /tmp directory to the location
corresponding to the Hive table in HDFS.

54 - Which file allows you to update Hive configuration?

A. hive.conf

B. hive-site.xml

C. hive-env.config

D. hive-conf.xml

55 - Which command is used for starting all the BigInsights components?

A. start.sh

B. start-all.sh

C. start all biginsights

D. start.sh biginsights

56 - Which two commands are used to copy a data file from the local file system to HDFS? (Choose two.)
(Please select ALL that apply)

A. hadoop fs -sync test_file test_file

B. hadoop fs -put test_file test_file

C. hadoop fs -cp test_file test_file

D. hadoop fs -copyFromLocal test_file test_file

E. hadoop fs -get test_file test_file


57 - Which statement is true about the following command?

hadoop dfsadmin -report:

A. It displays the list of users having administration privileges.

B. It displays basic file system information and statistics.

C. It displays the list of the all files and directories in HDFS.

D. It is not a valid Hadoop command.

58 - A user enters following command:

hadoop fs -ls /mydir/test_file

and receives the following output:

rw-r--r-- 3 biadmin supergroup 714002200 2014-11-21 14:21

/mydir/test_file

What does the 3 indicate?

A. the number of times blocks in file were replicated

B. the number of blocks in which this file was stored

C. the version of the file stored with this name

D. expected replication factor for this file

59 - In 2003, IBM's System S was the first prototype of a new IBM Stream Computing solution that
performed which type of processing?

A. On Line Transaction Processing

B. Complex Event Processing

C. On Line Analytic Processing

D. Real Time Analytic Processing

60 - Which Advanced Analytics toolkit in InfoSphere Streams is used for developing and building
predictive models?

A. SPSS

B. Geospatial
C. Time Series

D. CEP

61 - Which BigSheets component applies a schema to the underlying data at runtime?

A. Filter

B. Reader

C. Data Qualifier

D. Crawler

62 - Which BigInsights feature helps to extract information from text data?

A. Hive

B. Data Warehouse

C. Text Analytics Engine

D. Zookeeper

63 - Which type of partitioning is supported by Hive?

A. range partitioning

B. time partitioning

C. interval partitioning

D. value partitioning

64 - What does NameNode use as a transaction log to persistently record changes to file system
metadata?

A. FSImage

B. CheckPoint

C. EditLog

D. DataNode
65 - Which Hadoop query language can manipulate semi-structured data and support the JSON format?

A. Python

B. Jaql

C. AQL

D. Ruby

66 - under IBM's Text Analytics framework, what programming language is used to write rules that can
extract information from unstructured text sources?

A. Hive

B. AQL

C. InfoSphere Streams

D. Avro

67 - Your company wants to utilize text analytics to gain an understanding of general public opinion
concerning company products. Which type of input source would you analyze for that information?

A. Twitter feeds

B. server logs

C. voicemail

D. email

68 - Which JSON-based query language was developed by IBM and donated to the open-source Hadoop
community?

A. Jaql

B. Pig

C. Avro

D. Big SQL

69 - Why does Big SQL perform better than Hive?

A. It has better storage handlers.

B. It uses sub-queries.
C. It supports Hcatalog.

D. It has a better optimizer.

70 - The Enterprise Edition of BigInsights offers which feature not available in the Standard Edition?

A. Dashboards

B. Eclipse Tooling

C. Adaptive MapReduce

D. Big SQL

71 - Which file system spans all nodes in a Hadoop cluster?

A. EXT4

B. XFS

C. NTFS

D. HDFS

72 - hadoop is the primary software tool used for which class of computing?

A. Online Transaction processing

B. Decision Support Systems

C. Big Data Analytics

D. Online Analytical processing

73 - Which element of the Big Data Platform can cost-effectively store and manage many petabytes of
structured and unstructured information?

A. Data Warehouse

B. Contextual Discovery

C. Stream Computing

D. Hadoop System
74 - Under the MapReduce architecture, how does a JobTracker detect the failure of a TaskTracker?

A. receives report from Child JVM

B. receives report from MapReduce

C. receives no heartbeat

D. detects that NameNode is offline

75 - Under the MapReduce architecture, when a line of data is split between two blocks, which class will
read over the split to the end of the line?

A. LineRecordReader

B. RecordReader

C. InputSplitter

D. InputFormat

76 - What is the process in MapReduce that moves all data from one key to the same worker node?

A. Map

B. Shuffle

C. Reduce

D. Split

77 - Which IBM Business Analytics solution facilitates collaboration and unifies disparate data across
multiple systems into a single access point?

A. InfoSphere Streams

B. IBM Watson Explorer

C. InfoSphere Text Analytics

D. HBase

78 - Which type of application can be published when using the BigInsights Application Publish wizard in
Eclipse?

A. BigSheets

B. Java

C. Hive

D. Big SQL
79 - Which BigInsights tool is used to access the BigInsights Applications Catalog?

A. Web Console

B. Application Wizard

C. Eclipse Plug-in

D. Eclipse Console

80 - Which built-in Hive storage file format improves performance by providing semi-columnar data
storage with good compression?

A. SEQUENCEFILE

B. DBOUTPUTFILE

C. TEXTFILE

D. RCFILE

81 - Which method is used by Hive to group data by hash within partitions?

A. sub-partitioning

B. indexing

C. clustering

D. bucketing

82 - In Hadoop's rack-aware replica placement, what is the correct default block node placement?

A. 1 block in 1 rack, 2 blocks in a second rack

B. 2 blocks in 1 rack, 2 blocks in a second rack

C. 3 blocks in same rack

D. 4 blocks in 4 separate racks

83 - Which element(s) must be specified when creating an HBase table?

A. only the table name

B. only the table name and column family(s)

C. table name, column names and column types

D. table name, column family(s) and column names


84 - Which statement is true about storage of the output of REDUCE task?

A. It is stored in memory.

B. It is stored in HDFS, but only one copy on the local machine.

C. It is stored in HDFS using the number of copies specified by replication factor.

D. It is stored on the local disk.

85 - Which open-source software project provides a coordination service for HBase including naming,
configuration management and synchronization?

A. Zookeeper

B. Spark

C. Hive

D. Pig

86 - In the context of a Text Analytics project, which set of AQL commands, will identify and extract any
matching person names, when run across an input data source?

A. create dictionary NamesDict as


('John', 'Mary', 'Eric', 'Eva');
create view Names as
extract dictionary 'NamesDict'
on R.text as match
from Document R;
output view Names;

B. create dictionary NamesDict as


('John', 'Mary', 'Eric', 'Eva');
export dictionary NamesDict;
C. create view Names as
extract 'John', 'Mary', 'Eric', 'Eva'
from Document R;
export view Names;
D. create dictionary NamesDict as
extract 'Names'
on R.text as match
from Document R;
87 - In which two use cases does IBM Watson Explorer differentiate itself from competing products?
(Choose two.) (Please select ALL that apply)

A. Data Warehouse augmentation

B. 360-degree view of the customer

C. Big Data exploration

D. Operations analysis

E. Security/Intelligence extension

88 - BigInsights offers which added value to Hadoop?

A. enhanced web-based UI and tools

B. higher levels of fault tolerance

C. increased agility and flexibility

D. higher scalability

89 - What is one of the primary reasons that Hive is often used with Hadoop?

A. Hive creates indexes for tables in Hadoop data sets.

B. It provides a graphical client to access Hadoop data.

C. MapReduce is difficult to use.

D. It provides a hierarchical schema for Hadoop data sets.

90 - Which class of software can store and manage only structured data?

A. MapReduce

B. Data Warehouse

C. HDFS

D. Hadoop

91 - Which of the four key Big Data Use Cases is used to lower risk and detect fraud?

A. Big Data Exploration

B. Data Warehouse Augmentation


C. Operations Analysis

D. Security/Intelligence Extension

92 - Which of the four characteristics of Big Data deals with trusting data sources?

A. Veracity

B. Volume

C. Velocity

D. Variety

94 - Which two file actions can HDFS complete? (Choose two.) (Please select ALL that apply)

A. Index

B. Update

C. Execute

D. Create

E. Delete

94 - Which capability of Jaql gives it a significant advantage over other query languages?

A. It provides a built-in command-line shell.

B. It can load data from HDFS.

C. It can handle deeply nested, semi-structured data.

D. It supports the HiveQL query language.

95 - What is the primary benefit of using Hive with Hadoop?

A. Hadoop data can be accessed through SQL statements.

B. Queries perform much faster than with MapReduce.

C. It supports materialized views.

D. It provides support for transactions.

You might also like