GCC Unit-I

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

UNIT-I: Introduction

 Evaluation of distributed computing: Evolutionary changes that


occurred in parallel, distributed, and cloud computing over the past three to four decades(Driven by
Applications with Variable work loads & Large data sets).
 We Study
 High-performance and high-throughput computing systems in parallel computers,
 Service-Oriented Architecture,
 Computational grids,
 Peer-to-Peer networks,
 Internet Clouds and
 The Internet of Things.
 All these systems are distinguished by their :
 Hardware Architectures,
 OS Platforms,
 Processing Algorithms
 Communication Protocols, and
 Service Models applied.
 Finally, introduces essential issues on the Scalability, Performance, Availability, Security, and
Energy efficiency in distributed systems.
Unit-I Topics

 Scalable computing over the internet –


 Technologies for network based systems –
 Clusters of cooperative computers –
 Grid computing infrastructures –
 Cloud computing –
 Service Oriented Architecture –
 Introduction to grid architecture and standards - elements of
grid - overview of grid architecture.
1.1 Scalable computing over the internet: Tremendous changes
occurred in the last six decades. Here, we assess evolutionary changes in
machine architecture, OS Platform, n/w Connectivity, and Application
workload.
1.1.1 The Age of Internet Computing
Billions of people use the Internet every day. As a result, supercomputer sites and large data
centers must provide high-performance computing services to huge numbers of Internet
users concurrently. Because of this demand the Linpack Benchmark for high-performance
computing (HPC) applications is no longer optimal for measuring system performance. The
emergence of computing clouds instead demands high-throughput computing (HTC)
systems built with parallel and distributed computing technologies.

We have to upgrade data centers using fast servers, storage systems, and high-bandwidth
networks.
1.1.1.1 The Platform Evolution: Computer Technology has gone five generations with each
generation spans 10 to 20 years. Some generations overlapped.

Evolutionary trend toward parallel, distributed, and cloud computing with clusters, MPPs, P2P
networks, grids, clouds, web services, and the Internet ofThings.
1.1.1.2 High Performance Computing:
HPC : requires large amounts of computing power for short periods of time.
The speed of HPC systems has increased from Gflops in the early 1990s to Pflops in 2010.
Now Tflops and in Future it may be Zflops and more….
The reason behind this improvement is the demand from scientific , engineering and
manufacturing communities.
1.1.1.3 High Throughput Computing:
HTC: require large amounts of computing, but for much longer times.
The development of market-oriented high-end computing systems is undergoing a strategic
change from an HPC paradigm to an HTC paradigm.
But HPC not possible to vanish completely.
1.1.1.4 Three New Computing Paradisms:
With the introduction of SOA, Web 2.0 services become available.
Advances in virtualization make it possible to see the growth of Internet clouds as a new
computing paradigm.
The maturity of radio-frequency identification (RFID), Global Positioning System (GPS),
and sensor technologies has triggered the development of the Internet of Things (IoT).
Design Objectives:
Efficiency: Effective Utilization of Resources related to job throughput, data access,
storage, and power efficiency.
Dependability: Reliability and Self Management to achieve High-throughput service
with QoS, even under failure conditions.
Adaptation in the programming model: Ability to support large job requests over
massive data sets and virtual cloud services under various workload and service models.
Flexibility in application deployment: Ability to run well in both HPC(Science &
Engineering) and HTC(business) applications.
1.1.2 Scalable Computing Trends and New Paradisms
Designers and Programmers need to predict the technological capabilities of future systems.
Much research related activities are essential.
Moore’s law indicates that processor speed doubles every 18 months. Although Moore’s
law has been proven valid over the last 30 years, it is difficult to say whether it will
continue to be true in the future.
Gilder’s law indicates that network bandwidth has doubled each year in the past. Will that
trend continue in the future?
To understand How distributed systems emphasize both resource distribution and
concurrency or high DoP. We need to discuss Degree of Parallelism and the special
requirements for distributed computing.
1.1.2.1 Degree of Parallelism: Earlier Bit-Serial, Later Bit Level Parallelism(BLP)
:converts Bit-Serial toWord Level Processing.
Over the years users graduated from 4-bit microprocessor to 8-,16-, 32-, 64- & 128-Bit
CPUs.
This led to next wave of improvement called Instruction Level Parallelism(ILP): Multiple
Instructions executed Simultaneously.
In the past 30 to 40 Years ILP achieved through
Pipelining : It allows storing and executing instructions in an orderly process.
Super-Scalar Computing: Instruction Level Parallelism within a Processor.
Very Long Instruction Word(VLIW) architectures and Multithreading.
ILP requires Branch Prediction, Dynamic Scheduling, Speculation (overlap), and Compiler Support to
work efficiently.
Data-level Parallelism (DLP) was made popular with SIMD and Vector machines using vetor or array
types of instructions.
1.1.2.2 Innovative Applications: Both HPC and HTC systems desire transparency in many
application aspects. Transparency means that the resource sharing should happen automatically without
the awareness of concurrent execution by multiple users.
Domain Specific Applications
Science and engineering Scientific simulations, genomic analysis, etc.
Earthquake prediction, global warming, weather forecasting, etc.
Business, education, services Telecommunication, content delivery, e-commerce, etc.
industry, and health care Banking, stock exchanges, transaction processing, etc.
Air traffic control, electric power grids, distance education, etc.
Health care, hospital automation, telemedicine, etc.
Internet and web services, Internet search, data centers, decision-making systems, etc.
and government applications Traffic monitoring, worm containment, cyber security, etc.
Digital government, online tax return processing, social networking, etc.
Mission-critical applications Military command and control, intelligent systems, crisis management, etc.
1.1.2. 3 The Trend toward Utility Computing: Utility computing focuses on a business
model in which customers receive computing resources from a paid service provider.

These paradigms share some common characteristics.


Ubiquitous: Available Everywhere. Reliability and scalability are two major design
objectives in these computing models.
Automatic: Autonomic operations that can be self-organized to support dynamic discovery.
Composable: composable with QoS and SLAs.
1.1.2. 4 The Hype Cycle of New Technologies :

Any new and emerging computing and information technology may go through a hype cycle.
The expectations rise sharply from the trigger period to a high peak of inflated expectations.
Through a short period of disillusionment, the expectation may drop to a valley and then
increase steadily over a long enlightenment period to a plateau of productivity.
1.1.3 The Internet of Things and Cyber-Physical Systems
Internet of Things and Cyber-Physical systems are two Internet development trends.
1.1.3. 1 The Internet of Things:
The dynamic connections will grow exponentially into a new dynamic network of networks,
called the Internet of Things (IoT).
The concept of the IoT was introduced in 1999 at MIT.
The IoT refers to the networked interconnection of everyday objects, tools, devices, or
computers. One can view the IoT as a wireless network of sensors that interconnect all things in
our daily life.
The idea is to tag every object using RFID or a related sensor or electronic technology such as
GPS.
With the introduction of the IPv6 protocol, 2128 IP addresses are available to distinguish all the
objects on Earth, including all computers and pervasive devices.
The IoT needs to be designed to track 100 trillion static or moving objects simultaneously.
The IoT demands universal addressability of all of the objects or things.
In the IoT era, all objects and devices are instrumented, interconnected, and interacted with each
other intelligently.
The IoT is still in its infancy stage of development. Many prototype IoTs with restricted areas of
coverage are under experimentation at the time of this writing.
Cloud computing researchers expect to use the cloud and future Internet technologies to support
fast, efficient, and intelligent interactions among humans, machines, and any objects on Earth.
1.1.3. 2 Cyber-Physical Systems
CPSs are systems that link the physical world (e.g., through sensors or actuators) with the
virtual world of information processing. They are composed from diverse constituent parts
that collaborate together to create some global behaviour. These constituents will include
software systems, communications technology, and sensors/actuators that interact with the
real world, often including embedded technologies.
1.2 Technologies for network-based systems
We explore hardware, software, and network technologies for distributed computing system design and applications.
Following figure shows Improvement in processor and network technologies over 40 years.

1.2.1 Multicore CPUs and Multithreading Technologies


A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores.
Multithreading is the ability of a central processing unit to provide multiple threads of execution concurrently, supported by the
operating system.
1.2.1.1 Advances in CPU Processors

Processor speed is measured in millions of instructions per second (MIPS) and network bandwidth is measured in megabits per second
(Mbps) or gigabits per second (Gbps). The unit GE refers to 1 Gbps Ethernet bandwidth.
The Figure shows the architecture of a typical
multicore processor. Each core is essentially a
processor with its own private cache (L1 cache).
Multiple cores are housed in the same chip with an
L2 cache that is shared by all cores.

1.2.1.2 Multicore CPU and Many-Core GPU Architectures


Multicore CPUs may increase from the tens of cores to hundreds or more in the future. But
the CPU has reached its limit due to memory wall problem. This has triggered the
development of many-core GPUs with hundreds or more thin cores.
Both IA-32 and IA-64 instruction set architectures are built into commercial CPUs.
x-86 processors have been extended to serve HPC and HTC systems in some high-end server
processors.
Many RISC processors have been replaced with multicore x-86 processors and many-core
GPUs in the Top 500 systems.
The GPU also has been applied in large clusters to build supercomputers in MPPs.

In the future asymmetric or heterogeneous chip multiprocessors that can house both fat CPU
cores and thin GPU cores on the same chip.
1.2.1. 3 Multithreading Technology

In the above figure the dispatch of five independent threads of instructions to four pipelined
data paths (functional units).
1.2.2 GPU Computing to Exascale and Beyond
A GPU is a graphics coprocessor or accelerator mounted on a computer’s graphics card or
video card.
A GPU offloads the CPU from tedious graphics tasks in video editing applications.
The world’s first GPU, the GeForce 256, was marketed by NVIDIA in 1999.
These GPU chips can process a minimum of 10 million polygons per second
1.2.2.1 How GPUs Work
Early GPUs functioned as coprocessors attached to the CPU.
Today, the NVIDIA GPU has been upgraded to 128 cores on a single chip. Each core can handles
eight threads of instructions(8x128=1024 threads executed concurrently).
Modern GPUs are not restricted to accelerated graphics or video coding. They are used in HPC
systems to power supercomputers with massive parallelism at multicore and multithreading
levels. GPUs are designed to handle large numbers of floating-point operations in parallel.
Some GPUs offloads CPUs
GPUs widely used in mobile phones, game consoles, embedded systems, PCs, and servers.
1.2.2.2 GPU Programming Model

The use of a GPU along with a CPU for massively parallel execution in hundreds or thousands of
processing cores.
The GPU has a many-core architecture that has hundreds of simple processing cores organized as
multiprocessors. Each core can have one or more threads.
SFU: Special Functional
Unit
LD/ST:Load/Store

The above figure is the NVIDIA Fermi GPU built with 16 streaming multiprocessors (SMs) of 32
CUDA cores each;
The above Figure shows only one SM(streaming multiprocessor). Each SM comprises up to 512
streaming processors (SPs), known as CUDA cores.
1.2.2.3 Power Efficiency of the GPU
Bill Dally of Stanford University considers power and massive parallelism as the major
benefits of GPUs over CPUs for the future.
By extrapolating current technology and computer architecture, it was estimated that 60
Gflops/watt per core is needed to run an exaflops system.
Dally has estimated that the CPU chip consumes about 2 nJ/instruction, while the GPU chip
requires 200 pJ/instruction, which is 1/10 less than that of the CPU.

Figure:The GPU performance (middle line, measured 5 Gflops/W/core in 2011),


compared with the lower CPU performance (lower line measured 0.8 Gflops/W/core in
2011) and the estimated 60 Gflops/W/core performance in 2011 for the Exascale (EF in
upper curve) in the future.
Exascale computing refers to computing systems capable of at least one exaFLOPS, or a
billion billion (i.e. a quintillion) calculations per second.
1.2.3 Memory, Storage, and Wide-Area Networking
1.2.3.1 Memory Technology

Figure shows Improvement in memory and disk technologies over 33 years. The Seagate
Barracuda XT disk has a capacity of 3 TB in 2011.
The upper curve plots the growth of DRAM chip capacity from 16 KB in 1976 to 64 GB in
2011. It shows a 4x increase in capacity every three years.
For hard drives, capacity increased from 260 MB in 1981 to 250 GB in 2004. The Seagate
Barracuda XT hard drive reached 3 TB in 2011. This represents an approximately 10x
increase in capacity every eight years.
1.2.3.2 Disks and Storage Technology
The rapid growth of flash memory and solid-state drives (SSDs) also impacts the future of HPC
and HTC systems.
A typical SSD can handle 300,000 to 1 million write cycles perblock. So the SSD can last for
several years, even under conditions of heavy write usage. Flash and SSD will demonstrate
impressive speedups in many applications.
Eventually, power consumption, cooling, and packaging will limit large system development.
Power increases linearly with respect to clock frequency and quadratic ally with respect to
voltage applied on chips. Clock rate cannot be increased indefinitely.
1.2.3.3 System-Area Interconnects
SAS: Storga Area Network
NAS: Network Attached Storage

Three interconnection networks for connecting servers, client hosts, and storage devices; the
LAN connects client hosts and servers, the SAN connects servers with disk arrays, and the
NAS connects clients with large storage systems in the network environment.
1.2.3.4 Wide-Area Networking
There is a rapid growth of Ethernet bandwidth from 10 Mbps in 1979 to 1 Gbps in 1999.
An increase factor of two per year on network performance was reported, which is faster than
Moore’s law on CPU speed doubling every 18 months. The implication is that more
computers will be used concurrently in the future. High-bandwidth networking increases
the capability of building massively distributed systems.
Most data centers are using Gigabit Ethernet as the interconnect in their server clusters.
1.2.4 Virtual Machines and Virtualization Middleware
A virtual machine is defined as a computer file, typically called an image,
which behaves like an actual computer.
Virtual machines (VMs) offer novel solutions to underutilized resources,
application inflexibility, software manageability, and security concerns in
existing physical machines.
To build large clusters, grids, and clouds, we need to access large amounts of
computing, storage, and networking resources in a virtualized manner.
1.2.4.1 Virtual Machines
Following figure illustrates the architectures of three VM configurations.

A hypervisor or virtual machine monitor (VMM) is computer software,


firmware or hardware that creates and runs virtual machines. A computer
on which a hypervisor runs one or more virtual machines is called a host
machine, and each virtual machine is called a guest machine.
1.2.4.2 VM Primitive Operations

First, the VMs can be multiplexed between hardware machines as shown in Figure(a).
Second, a VM can be suspended and stored in stable storage, as shown in Figure(b).
Third, a suspended VM can be resumed or provisioned to a new hardware platform as
shown in Figure(c).
Finally, a VM can be migrated from one hardware platform to another as shown in
Figure(d).
1.2.4.3 Virtual Infrastructures
A virtual infrastructure lets you share your physical resources of multiple machines
across your entire infrastructure. A virtual machine lets you share the resources of a
single physical computer across multiple virtual machines for maximum efficiency.
Resources are shared across multiple virtual machines and applications.

The above figure shows Growth and cost breakdown of data centers over the years.
1.2.5 Data Center Virtualization for Cloud Computing
we discuss basic architecture and design considerations of data centers.
Cloud architecture is built with commodity hardware and network devices.
Almost all cloud platforms choose the popular x86 processors. Low-cost terabyte disks and
Gigabit Ethernet are used to build data centers.
1.2.5.1 Data Center Growth and Cost Breakdown
The cost to build and maintain data center servers has increased over the years.
According to a 2009 IDC report , typically only 30 percent of data center costs goes toward
purchasing IT equipment (such as servers and disks), 33 percent is attributed to the chiller,
18 percent to the uninterruptible power supply (UPS), 9 percent to computer room air
conditioning (CRAC), and the remaining 7 percent to power distribution, lighting, and
transformer costs.
Thus, about 60 percent of the cost to run a data center is allocated to management and
maintenance.
The server purchase cost did not increase much with time. The cost of electricity and cooling
did increase from 5 percent to 14 percent in 15 years.
1.2.5.2 Low-Cost Design Philosophy
High-bandwidth networks with high-end switches or routers may not fit the economics of cloud
computing.
Given a fixed budget, commodity switches and networks are more desirable in data centers.
The software layer handles network traffic balancing, fault tolerance, and expandability.
Currently, nearly all cloud computing data centers use Ethernet as their fundamental network
technology.
1.2.5.3 Convergence of Technologies
Cloud computing is enabled by the convergence of technologies in four areas:
(1) Hardware virtualization and multi-core chips: Enable the existence of dynamic configurations
in the cloud.
(2) Utility and Grid computing: Lay the necessary foundation for computing clouds.
(3) SOA, Web 2.0, and WS mashups,
Recent advances in SOA, Web 2.0, and mashups of platforms are pushing the cloud another step
forward.
A mashup is a technique by which a website or Web application uses data, presentation or
functionality from two or more sources to create a new service. Mashups are made possible
via Web services or public APIs that (generally) allow free access. Most mashups are visual and
interactive in nature.
(4) Atonomic computing and data center automation: Achievements in autonomic computing and
automated data center operations contribute to the rise of cloud computing.
1.3 SYSTEM MODELS FOR DISTRIBUTED AND CLOUD COMPUTING
Distributed and cloud computing systems are built over a large number of autonomous computer
nodes. These node machines are interconnected by SANs, LANs, or WANs in a hierarchical
manner.
Massive systems are considered highly scalable, and can reach web-scale connectivity, either
physically or logically. Massive systems are classified into four groups: clusters, P2P networks,
computing grids, and Internet clouds over huge data centers.
1.3.1 Clusters of Cooperative Computers
A computing cluster consists of interconnected stand-alone computers which work cooperatively
as a single integrated computing resource.
1.3.1.1 Cluster Architecture: The following shows the architecture of a typical server cluster built
around a low-latency, high band width interconnection network.

Servers interconnected by a high-bandwidth SAN or LAN with shared I/O devices and disk arrays.
The cluster acts as a single computer attached to the Internet.
To build a larger cluster with more nodes, the interconnection network can be built with multiple levels of
Gigabit Ethernet, Myrinet(ANSI/VITA 26-1998, is a high-speed local area networking system designed by
the company Myricom), or InfiniBand switches(InfiniBand (IB) is a computer-networking communications
standard used in high-performance computing that features very high throughput and very low latency.).
Through hierarchical construction using a SAN, LAN, or WAN, one can build scalable clusters
The cluster is connected to the Internet via a virtual private network (VPN) gateway. The gateway
IP address locates the cluster.
All resources of a server node(autonomous) are managed by their own OS.
1.3.1.2 Single-System Image
A single system image (SSI) is the property of asystem that hides the heterogeneous and
distributed nature of the available resources and presents them to users and appli- cations as
a single unified computing resource.
Cluster designers design a cluster operating system or some middleware to support SSI at various
levels, including the sharing of CPUs, memory, and I/O across all cluster nodes.
1.3.1.3 Hardware, Software, and Middleware Support
Clusters exploring massive parallelism are commonly known as MPPs.
The building blocks are computer nodes (PCs, workstations, servers, or SMP), special communication
software such as PVM(Parallel Virtual Machine) or MPI(Message–Passing Interface), and a network
interface card in each computer node.
Most clusters run under the Linux OS.
The computer nodes are interconnected by a high-bandwidth network (such as Gigabit Ethernet, Myrinet,
InfiniBand, etc.).
Special cluster middleware supports are needed to create SSI or high availability (HA). Both sequential and
parallel applications can run on the cluster, and special parallel environments are needed to facilitate use
of the cluster resources.
Distributed shared memory (DSM) shared by all servers.
1.3.1.4 Major Cluster Design Issues
 Middleware or OS extensions were developed at the user space to achieve SSI at selected
functional levels. Without this middleware, cluster nodes cannot work together effectively to
achieve cooperative computing.
 The software environments and applications must rely on the middleware to achieve high
performance.
 The scaling of resources (cluster nodes, memory, IO bandwidth, etc.) leads to proportional
increase in performance. As per the requirements scaling down or Up are needed.
 The effective message passing.
 High System availability: systems that are durable and likely to operate continuously without
failure for a long time.
 Seamless fault tolerance: a system to continue operating properly in the event of the failure
 Cluster-wide job management.
1.3.2 Grid Computing Infrastructures
Grid Computing Infrastructure will support the sharing and coordinated use of resources in
dynamic global heterogeneous distributed environments. This includes resources that can
manage computers, data, telecommunication, network facilities and software applications.
 Internet services such as the Telnet command enables a local computer to connect to a remote
computer.
 A web service such as HTTP enables remote access of remote web pages.
1.3.2.1 Computational Grids
A computing grid offers an infrastructure that couples computers, software/middleware, special
instruments, and people and sensors together.
The grid is often constructed across LAN, WAN, or Internet backbone networks at a regional,
national, or global scale.
Enterprises or organizations present grids as integrated computing resources viewed as virtual
platforms to support virtual organizations..
The computers used in a grid are primarily workstations, servers, clusters, and supercomputers.
Personal computers, laptops, and PDAs can be used as access devices to a grid system.
The Figure below Computational grid or data grid providing computing utility, data, and
information services through resource sharing and cooperation among participating
organizations.
1.3.2.2 Grid Families
National grid projects are followed by industrial grid platform development by IBM, Microsoft,
Sun, HP, Dell, Cisco, EMC, Platform Computing, and others.
New grid service providers (GSPs) and new grid applications have emerged rapidly, similar to the
growth of Internet and web services in the past two decades.
Grid systems are classified in essentially two categories: computational or data grids and P2P
grids.

Computing or data grids are built primarily at the national level.


1.3.3 Peer-to-Peer Network Families:
Well-established distributed system is the client-server architecture, client machines (PCs and
workstations) are connected to a central server for compute, e-mail, file access, and database
applications.
The P2P architecture offers a distributed model of networked systems.
A P2P network is client-oriented instead of server-oriented.
1.3.3.1 P2P Systems:
In a P2P system, every node acts as both a client and a server.
All client machines act autonomously to join or leave the system freely.
This implies that no master-slave relationship exists among the peers. No central coordination or
central database is needed.
Figure shows the architecture of a P2P
network at two abstraction levels.

Initially, the peers are totally unrelated. Each


peer machine joins or leaves P2P voluntarily.

The physical network is simply an ad hoc


network formed at various Internet domains
randomly using the TCP/IP and NAI
protocols. Thus, the physical network varies
in size and topology dynamically
1.3.3.2 Overlay Networks
The overlay is a virtual network formed by mapping each physical machine with
its ID, logically, through a virtual mapping.
When a new peer joins the system, its peer ID is added as a node in the overlay
network. When an existing peer leaves the system, its peer ID is removed from
the overlay network automatically.
There are two types of overlay networks: unstructured and structured. An
unstructured overlay network is characterized by a random graph.
Structured overlay networks follow certain connectivity topology and rules for
inserting and removing nodes (peer IDs) from the overlay graph.
1.3.3.3 P2P Application Families
Based on application, P2P networks are classified into four groups.

The first family is for distributed file sharing of digital contents (music, videos, etc.) on the P2P
network.
The second family Collaboration P2P networks include MSN or Skype chatting, instant
messaging, and collaborative design, among others.
The third family is for distributed P2P computing in specific applications. For example,
SETI@home provides 25 Tflops of distributed computing power, collectively, over 3 million
Internet host machines.
The fourth family P2P platforms, such as JXTA, .NET, and FightingAID@home, support naming,
discovery, communication, security, and resource aggregation in some P2P applications.
1.3.3.4 P2P Computing Challenges
P2P computing faces three types of heterogeneity problems in hardware, software, and
network requirements.
1. There are too many hardware models and architectures to select from.
2. Incompatibility exists between software and the OS.
3. Different network connections and protocols make it too complex to apply in real
applications.
4. System scaling is directly related to performance and bandwidth.
5. Data location is also important to affect collective performance.
6. Data locality, network proximity, and interoperability are three design objectives in
distributed P2P applications.
7. P2P performance is affected by routing efficiency and self-organization by
participating peers.
8. Fault tolerance, failure management, and load balancing are other important issues
in using overlay networks.
9. Peers are strangers to one another. Security, privacy, and copyright violations are
major worries by those in the industry in terms of applying P2P technology in
business applications.
10. The distributed nature of P2P networks also increases robustness, because limited
peer failures do not form a single point of failure.
1.3.4 Cloud Computing over the Internet
Cloud computing is the on-demand delivery of compute power, database, storage, applications,
and other IT resources via the internet.
1.3.4.1 Internet Clouds
Cloud computing applies a virtualized platform with elastic resources on demand by provisioning
hardware, software, and data sets dynamically.
Figure shows Virtualized resources from data centers to form
an Internet cloud, provisioned with hardware, software,
storage, network, and services for paid users to run their
applications.

The idea is to move desktop computing to a service-oriented platform using server clusters and
huge databases at data centers.
1.3.4.2 The Cloud Landscape
The Cloud Landscape Described, Categorized, and Compared.
Traditionally, a distributed computing system have encountered several performance bottlenecks:
constant system maintenance, poor utilization, and increasing costs associated with
hardware/software upgrades.
Following figure depicts the cloud landscape and major cloud players, based on three cloud
service models.

Internet clouds offer four deployment modes: private, public, managed, and
hybrid.
These modes demand different levels of security implications. The different SLAs
imply that the security responsibility is shared among all the cloud providers, the
cloud resource consumers, and the third party cloud-enabled software providers.
The following list highlights eight reasons to adapt the cloud for upgraded Internet applications
and web services:
1. Desired location in areas with protected space and higher energy efficiency
2. Sharing of peak-load capacity among a large pool of users, improving overall
utilization
3. Separation of infrastructure maintenance duties from domain-specific
application development
4. Significant reduction in cloud computing cost, compared with traditional
computing paradigms
5. Cloud computing programming and application development
6. Service and data discovery and content/service distribution
7. Privacy, security, copyright, and reliability issues
8. Service agreements, business models, and pricing policies
4 Service-Oriented Architecture (SOA)
Service-oriented architecture is a style of software design where services are provided to the
other components by application components, through a communication protocol over a
network.
The basic principles of service-oriented architecture are independent of vendors, products and
technologies.
4.1 Layered Architecture forWeb Services and Grids &Web services and tools
Following Figure shows Layered architecture for web services and the grids.
A web service is a computer program running on either
the local or remote machine with a set of well defined
interfaces (ports) specified in XML (WSDL: Web
Services Description Language).

WSDL, Java method and CORBA Interface Defintion


Language(IDL) are interface specifications to
distributed systems. These interfaces are linked with
customized, high-level communication systems: SOAP,
RMI, and IIOP.

The WSDL is an XML-based interface description language that is used for describing the
functionality offered by a web service.
CORBA describes a messaging mechanism by which objects distributed over a network can
communicate with each other irrespective of the platform and language used to develop those
objects.
SOAP ( Simple Object Access Protocol) is a message protocol that allows distributed elements of
an application to communicate. SOAP can be carried over a variety of lower-level protocols,
including the web-related Hypertext Transfer Protocol (HTTP).

RMI is an API which allows an object to invoke a method on an object that exists in another
address space, which could be on the same machine or on a remote machine.

IIOP (Internet Inter-ORB Protocol) is a protocol that makes it possible for distributed programs
written in different programming languages to communicate over the Internet.
These communication systems support features including particular message patterns (such as
Remote Procedure Call or RPC), fault recovery, and specialized routing.
Often, these communication systems are built on message-oriented middleware (enterprise bus)
infrastructure such as Web- Sphere MQ or Java Message Service (JMS) which provide rich
functionality and support virtualization of routing, senders, and recipients.
In the case of fault tolerance, the features in the Web Services Reliable Messaging (WSRM)
framework mimic the OSI layer capability (as in TCP fault tolerance) modified to match the
different abstractions (such as messages versus packets, virtualized addressing) at the entity
levels.
Security is a critical capability that either uses or reimplements the capabilities such as Internet
Protocol Security (IPsec) and secure sockets in the OSI layers.
4.2 The Evolution of SOA
The following figure shows SOA evolution over the years.
The evolution of SOA: grids of clouds and grids, where “SS” refers to a sensor
service and “fs” to a filter or transforming service.
SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also
known as interclouds), and systems of systems.
A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a
GPA, or a wireless phone.
Raw data is collected by sensor services.
All the SS devices interact with large or small computers, many forms of grids, databases, the
compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on.
Filter services ( fs in the figure) are used to eliminate unwanted raw data, in order to respond to
specific requests from the web, the grid, or web services.
A collection of filter services forms a filter cloud.
SOA aims to search for, or sort out, the useful data from the massive amounts of raw data items.
Processing this data will generate useful information, and subsequently, the knowledge for our
daily use.
Finally, we make intelligent decisions based on both biological and machine wisdom.
4.3 Grids versus Clouds
Computing Grids are a form of distributed computing whereby a “super
virtual computer” is composed of many networked loosely coupled computers
acting together to perform very large tasks. Since 2000, the emergence of
grids has revolutionized many services already offered by the internet because
it offers rapid computation, large scale data storage and flexible collaboration
by harnessing together the power of a large number of commodity computers
or clusters of other basic machines. The grid was devised for use in scientific
fields, such as particle physics and bioinformatics, where large volumes of data,
or very rapid processing, or both, are necessary..
Cloud computing is a computing term or metaphor that evolved in the late
2000s, based on utility and consumption of computer resources. Cloud
computing involves deploying groups of remote servers and software networks
that allow different kinds of data sources be uploaded for real time processing
to generate computing results without the need to store processed data on the
cloud. Main cloud charactersitics include resources pooling and elasticity.
According to the National Institute of Standards and Technology (NIST), Cloud
computing means accessing shared and configurable computing resources, on-
demand, through a broad network access.
5 GRID computing standards: The Global Grid Forum (GGF): is the community of users,
developers and vendors leading the global standardization effort for grid computing. Following
Figure shows Grid Standards andToolkits for Scientific and Engineering Applications

5.1 Open Grid Services Architecture (OGSA)


OGSA is a common standard for general public use of grid services.Key features include a
distributed execution environment, Public Key Infrastructure (PKI) services using a local
certificate authority (CA), trust management, and security policies in grid computing.

OGSA definitions and criteria apply to hardware, platforms and software in


standards-based grid computing. The OGSA is, in effect, an extension and
refinement of the service-oriented architecture (SOA).
5.2 Globus Toolkits
Globus is a middleware library jointly developed by the U.S. Argonne National Laboratory and
USC Information Science Institute.
This library implements some of the OGSA standards for resource discovery, allocation, and
security enforcement in a grid environment.
The Globus packages support multisite mutual authentication with PKI certificates.
Globus isoftware components and capabilities that includes:
 A set of service Implementations that Indicate resource management, data alterations service
finding and relevant issues
 Tools for building web services
 A powerful standards-based security prerequisites for authentication and authorisation.
 Various services in java c and python for clients of API and command line programs
 Detailed documentation on these various components
5.3 IBM Grid Toolbox
IBM has extended Globus for business applications.
The IBM Grid Toolbox can assist enterprises that deploy, manage, and control grid computing, as
well as developers who create products that assist in managing and deploying grids. This grid-
enabling toolkit contains standardized development code, much of which was harvested from
the open source community, plus an added database and run-time environment.
6 Grid Architecture : The architecture of a grid system is described in terms of Layesrs.
Higher Layers are user centric and Lower Layers are
Hardware centric.
Network layer: is the bottom layer which assures the
connectivity for the resources in the grid.
Resource layer: is made up of actual resources that are part
of the grid, such as computers, storage systems, electronic
data catalogues, and even sensors such as telescopes or other
instruments, which can be connected directly to the network.
Middleware layer: provides the tools that enable various
elements (servers, storage, networks, etc.) to participate in a
unified grid environment.
Application layer: which includes different user applications (science, engineering, business,
financial), portal and development toolkits-supporting applications.
Characteristics: 1) Large Scale: Number of Resources few to millions2) Geographical
Disribution: Resourced may spread geographically 3) Heterogeneity: data, files, programs,
sensors, scientific instruments, display devices, PDAs & organizers, computers, Super Computers,
Networks etc,. 4) Resource sharing and coordination: one organization allow other
organizations to access resources. 5)Multiple Administration 6) Transparent Access: a
grid should be seen as a single virtual computer. 7) Dependable access: assure the delivery of
services under established QoS requirements. 8) Consistent access: provide services at all
circumstances with standard services, protocols and inter-faces and thus hiding the heterogeneity of
the resources 9) Pervasive access: the grid must grant access to available resources by adapting to a
dynamic environment in which resource failure is commonplace.
7 Elements of Grid
Grid computing combines elements such as distributed computing, high-performance computing
and disposable computing as per the application requirements.
Present-day grids are:
Computational grids: Computing systems linked to perform grid computing. In a
computational grid, a large computational task is divided up among individual machines, which
run calculations in parallel and then return results to the original computer.
Scavenging grids: search for extraterrestrial intelligence computation, which includes more
than three million computers in which system that moves projects from one PC to another as
needed.
Data grids: A data grid is an architecture or set of services that gives individuals or groups of
users the ability to access, modify and transfer extremely large amounts of geographically
distributed data.
Market-Oriented grids: deal with price setting and negotiation, grid economy management
and utility driven scheduling and resource allocation.
7.1 Key Components of grid computing
Resource management: a grid must be aware of what resources are available for different
tasks.
Security management: the grid needs to take care that only authorized users can access and
use the available resources.
Data management: data must be transported, cleansed, parceled and processed.
Services management: users and applications must be able to query the grid in an effective and
efficient manner.
There are three views of grid computing: PhysicalView, Functional View and Service View.

The functional constituents of a grid are:


Security, Resource Broker, Scheduler, Data Management, Job and resource management , and
Resources.
A resource is an entity that is to be shared, for example: computers, storage, data and software.
Centrally managed Security system is impractical. The grid security infrastructure (GSI) provides a
“single sign-on”, runany where authentication service with support for local control over access
rights and mapping from global to local identities.

You might also like