Bda Test1 Key Answers
Bda Test1 Key Answers
Bda Test1 Key Answers
PART-A
1a) Big Data is high-volume, high velocity and high variety of information asset that requires
new forms of processing for enhanced decision making, decision making, insight discovery and
process optimization. 8M
Characteristics:
• Volume: Size defines the amount or quantity of data which is generated from an
application.The size determines the processings considerations needed for handling that
data.
• Velocity: Velocity is the a measure of how fast the data generates and processes.
• Variety: Big data comprises of a variety of data. Data is generated from multiple sources
in a system.This introdues variety in data and therefore introduces complexity. Data
consists of various forms and formats.
• Veracity: It is an important characteristic to take into account the quality of data captured
which can vary greatly affecting its accurate analysis.
Big Data Types:
• Social networks and web data such as Facebook,Twitter.
• Transactions data and Business Processes data such as credit card transactions,flight
bookings.
• Customer master data such as data for facial recognition and for the name, date of birth,
gender,location and income category.
• Machine generated data such as machine-to-machine or Internet of Things data and data
form sensors,trackers, web logs and computer system logs.
• Human generated data such as biometrics data,human-machine interaction data,email
records with a mail server and MySQL database of student grades.
1b)
Big Data analytics in health care use the following data sources: 7M
1.clinical records
2.pharmacy records
3.electronic medical records
4.diagnosis logs and notes
5.additional data such as deviations from person usual activities, medical leaves from job,
social interaction.
Health care using Big Data can facilitate the following:
1.Provisioning of value based and customer centric healthcare
2.Utilizing the Internet of Things for health care
3.Preventing fraud, waste, abuse in the healthcare industry and reduce healthcare costs.
4.Improving outcomes
5.Monitoring patients in real time.
• Value based and customer centric healthcare means cost effective patient care by
improving healthcare quality using latest knowledge, usages of electronic health and
medical records and improving coordination among the healthcare providing agencies
which reduce avoidable overuse and healthcare costs.
• Healthcare Internet of Things create unstructured data.
• The data enables the monitoring of the devices data for patient parameters such as
BP,ECG’s.
• Prevention of fraud, waste and abuse uses Big Data predictive analytics and help resolve
excessive or duplicate claims in a systematic manner.
• Patient real time monitoring uses machine learning algorithms which process real-time
events.
2a) Data processing architecture consists of five layers: (i) identification of data sources, (ii)
acquisition, ingestion, extraction, pre-processing, transformation of data, (iii) data storage at files,
servers, cluster or cloud, (iv) data-processing, and (v) data consumption in the number of programs
and tools.
L1 considers the following aspects in a design:
• Amount of data needed at ingestion layer 2 (1.2)
• Push from L1 or pull by L2 as per the mechanism for the usages
• Source data-types: Database, files, web or service
• Source formats, i.e., semi-structured, unstructured or structured.
• Ingestion and ETI. processes either in real time, which means store and use the data generated,
or in batches.
• Batch processing is using discrete datasets at scheduled or periodic intervals of time.
• Data processing software such as MapReduce, Hive, Pig. Spark, Spark Mahout,Spark
Streaming
• Processing in scheduled batches or real time or hybrid Processing as per synchronous or
asynchronous processing requirements at L.5.
Following are the five application areas in order of the popularity of Big Data use cases:
1. CVA using the inputs of evaluated purchase patterns, preferences, quality, price and post
sales servicing requirements
2. Operational analytics for optimizing company operations
3. Detection of frauds and compliances
4. New products and innovations in service
5. Enterprise data warehouse optimization.
3b) i)Format of the HBase that stores rows line by line is: 4M
4a) 7M
4b) i)Format of the HBase that stores rows line by line is: 4M
Row-Key Column-Family{Column-specifier: Version: Value}
The first row stores in the HBase as follows:
CCSR_id ‘2206' {'DT :1600080000024. 121217', JLRWS: 1600081010821: '28', 'HWS':
1600082010582: 23', 'ZWS': 1600082018001: '38', 'NWS': 1600080158868: '8', 'SSWS':
1600038028229: 50"}