Chapter 10
Chapter 10
Chapter 10
• DataNode
• A DataNode stores data in an HDFS file system.
• A functional HDFS filesystem has more than one DataNode,
with data replicated across them.
• DataNodes respond to requests from the NameNode for
filesystem operations.
• Client applications can talk directly to a DataNode,
once the NameNode has provided the location of the
data.
• Similarly, MapReduce operations assigned to TaskTracker
instances near a DataNode, talk directly to the DataNode to
access the files.
• TaskTracker instances can be deployed on the same servers
that host DataNode instances, so that MapReduce operations
are performed close to the data.
MapReduce Job Execution
Workflow
• MapReduce job execution starts when the client applications submit jobs to the Job tracker.
• The JobTracker returns a JobID to the client application. The JobTracker talks to the NameNode to
determine the location of the data.
• The JobTracker locates TaskTracker nodes with available slots at/or near the data.
• The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to
reassure the JobTracker that they are still alive. These messages also inform the JobTracker of the
number of available slots, so the JobTracker can stay up to date with where in the cluster, new
work can be delegated.
NoSQL is an umbrella term for all databases and data stores that don’t
follow the RDBMS principles
A class of products
A collection of several (related) concepts about data storage and manipulation
Often related to large data sets
Limits to scaling up (or vertical scaling: make a “single” machine
more powerful) → dataset is just too big!
Scaling out (or horizontal scaling: adding more smaller/cheaper
servers) is a better choice
Different approaches for horizontal scaling (multi-node database):
Master/Slave
Sharding (partitioning)
33
What is NOSQL?
Key features (advantages):
non-relational
don’t require schema
data are replicated to multiple
nodes (so, identical & fault-tolerant)
and can be partitioned:
down nodes easily replaced
no single point of failure
horizontal scalable
cheap, easy to implement
(open-source)
massive write performance
fast key-value access
Hadoop
Schedulers
• Hadoop scheduler is a pluggable component that makes it open to
support different scheduling algorithms.
• The default scheduler in Hadoop is FIFO.
• Two advanced schedulers are also available –
• the Fair Scheduler, developed at Facebook
• the Capacity Scheduler, developed at Yahoo.
• The pluggable scheduler framework provides the flexibility to
support a variety of workloads with varying priority and
performance constraints.
• Efficient job scheduling makes Hadoop a multi-tasking system
Book website: Bahga & Madisetti, © 2015
FIFO
Scheduler
• FIFO is the default scheduler in Hadoop that maintains a work
queue in which the jobs are queued.
• The scheduler pulls jobs in first in first out manner (oldest job
first) for scheduling.
• There is no concept of priority or size of job in FIFO scheduler.
Bluetooth Low Energy (BLE) - Low power consumption - High data rate - Limited data handling capacity
- No single point of failure - Better scalability - Takes longer to setup
- Better reliability - Limited (short) range of operation
- Faster (automatic) reconfiguration - Limited connection handling capacity (only seven
devices per master/slave connection)
- Susceptible to attacks - Only star (no mesh)
topology
- No security implemented
Wireless Technology Benefits Drawbacks
WiMax - Able to support very high speed voice and data - Line of Sight (LoS) connection is needed - Serving large number
transfers over longer distances - Single base station can of clients may result in lower available bandwidths - Vulnerable
support large number of users - Low operational cost - to disruption by environmental factors such as rain, noise etc., -
Symmetrical bandwidth over long ranges - Hundreds of clients High power consumption - High initial cost - High latency
can be served from a single WiMAX station
Wi-Fi - High data rates supported - Becomes slower with increasing user connections
- Easier and cheaper to setup - High power consumption - Topology has a single paint of failure
- Universally standardized - Vulnerable to attacks
- Supports advanced encryption standards for enhancing - Requires large memory capacity and processing power - Signals
security blocked by obstacles
- Limited to indoor operations
LoRa & LoRaWAN - Very large ranges possible - Only point to point (no mesh) connection
- Supports star-of-stars topology - Large number of clients per - Using of gateways may cause bottlenecks and become single
gateway module points of failure
- Supports variable data rates - Operates in un-licensed band - Support for variable frame
- Has the ability to trade-off between range and data rate length reduces predictability
- Offers three types (classes) of devices supporting different - Low bandwidth support
purposes - Suffers from near/far problem
- Larger areas can be covered with few gateway nodes - Relatively high packet loss rates during congestion times
- Supports interoperability with other standards - All gateway nodes are tuned to the same frequencies reducing
the ability to control them individually