Assignment 1: CT074-3-M-RELM Reliability Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CT074-3-M-RELM Reliability Management TP063111

CT074-3-M-RELM
Reliability Management

Assignment 1
Evaluating major issues regarding Reliability Management for Cloud-
based Application of Mass Media Company

Md Sakhiruj jaman salim


Student ID: TP063111

Intake: Sept 2021

Submission Date: 29 Oct 2021

1
CT074-3-M-RELM Reliability Management TP063111

Evaluating major issues regarding Reliability


Management for Cloud-based Application of Mass
Media Company
Md Sakhiruj jaman salim, Dr. Kesava Pillai
Asia Pacific University of Technology & Innovation,
Technology Park, Kuala Lumpur, Malaysia
[email protected]

Asia Pacific University of Technology & Innovation,


Technology Park, Kuala Lumpur, Malaysia
[email protected]

Abstract: Cloud based application or cloud computing, as a new computing model, is widely concerned by mass media companies
and industries. Based on resource virtualization technology, cloud computing can provide users with infrastructure, platforms,
software, and other services, such as e-to-go, pay-per-use. As a result, more and more companies are choosing cloud computing to
place their scientific or commercial applications. However, with the growing number of users, the size of the data center is rapidly
expanding, and the architecture is becoming more complex, resulting in huge losses due to frequent failures of cloud computing
systems. Therefore, in the large-scale, complex architecture of cloud computing systems, how to ensure the reliability of the system
has become a very challenging problem. In the context of cloud computing reliability issues, this paper presents an overview of the
cloud computing architecture and service model, also describes cloud computing reliability, principles, and affected reliable factors.
Then, a short description of the evaluation of reliability of cloud computing system. Additionally, presents future challenges to the
reliability of cloud computing systems.

Keywords: Cloud Computing, Reliability, Cloud Architecture, Evaluation, Future Challenge.

1. Introduction the number and hardware resources, also increasing the


uncertainty of several resources. As a result, the system's
It is difficult to define cloud computing as a technology runtime failure possibility is also increased. If cloud
because it is developing without a clear starting point and computing systems are down by any hardware or software
without any clear consequences of its future path. Cloud issue, then it makes trouble for cloud-dependent mass media
computing is a model that provides users with a high level of companies. This failure is costly for both service providers
flexible computing services based on several computing and users. That's why the reliability of cloud computing
developments. Many media companies choose this cloud systems is a major problem for all mass media companies
computing service because without buying IT infrastructure, that have become the most important topic to be researched.
developing private computing platforms, and managing
upgraded hardware and software resources, companies are The reliability of cloud computing is a very important topic
deciding to outsource their computing from cloud computing but difficult to analyze due to the large-scale service-sharing
service providers. Thereby, cloud computing companies are features, large networks area, different software/hardware
gaining attention from many masses media company. Many equipment, and complex interconnection between them.
cloud computing service provider companies such as Thus, software/hardware reliability models cannot be applied
Amazon Web Services, Microsoft Azure, Google Cloud, only to the study of cloud computing reliability. The main
Alibaba Cloud, IBM Cloud, and more are releasing several idea of this paper, evaluate the major reliability issue of
cloud platforms for users like academia, organization, and Cloud-based applications or cloud computing technology.
industry. Cloud computing has been considered a major
revolution in the IT industry. According to Fortune Business 2. Overview of Cloud Computing
Insights, the revenue of the global cloud computing market is
$ 219.00 Billion in 2020. In the covid-19 pandemic, the This section presents an introduction of the cloud computing
global market growth is increasing 13.7% in 2020 compared basic knowledge and services. Those section tried to present
to the average of 2017-2019. The global cloud computing cloud computing full basic overview for any reader can
market will grow from $ 250.04 billion in 2021 to $ 791.48 understand easily.
billion in 2028. According to the Mel and Grance (2011), the National
Cloud computing technology provides dynamic data centers Institute of Standards and Technology (NIST) provide an
and easy-to-use resources, but cloud computing resources are accurate definition of cloud computing. The concept of cloud
facing many challenges due to its features such as resource computing is a powerful, extremely suitable, and on-demand
virtualization, free and open-source software, browser as a data recover that allows access to various computational
platform, web application frameworks, and autonomic resources such as applications, network servers, additional
systems. Nowadays, cloud computing systems are increasing storage, database, and other services. This system provides

2
CT074-3-M-RELM Reliability Management TP063111
quick action and libertine system for their end-users or Service) model provides you with access to application
clients. cloud computing management efforts already depend software that is often Mentioned as an on-demand service.
on the service provider, which provides remarkable support SaaS provides an application to many users, anyhow of their
of its full Efficacy to end users. In simple terms, Cloud location, instead of the traditional thematic model of an
computing is a service that provides access to end-users or application per desktop. With SaaS, there is no necessity to
clients to the resources (tools or applications) that use the install, set up, and run the application. These activities will
Internet (Frankenfield, 2020). be conducted by the central location, the service provider.
Cloud computing's on-demand self-service setup, extensive Platform as a Service (PaaS): PaaS provides a platform
network access, collaboration, fast resilience, and for program and application development without required
measurement services make it extremely useful for end for any software. Applications are created by programming
users. Many mass media industries support this model languages and tools which are supported by the PaaS
because it enables end-users to recover their resources such provider.
as hardware and software via the Internet in a measurable Infrastructure as a Service (IaaS): IaaS provides the
and virtual manner (Lewis, 2010). As a result, could cloud computing environment for sharing resources such as
computing does not only provide unlimited operation for its servers, data storage, network, and operating system. IaaS
end users, but also provides economic benefits over the helps users use shared resources to set up and run their
industries that use this IT technology. So, cloud computing applications. Clients purchase those resources as complete
offers a significant benefit economically, that is best for outsourced services on demand.
those clients whose are considering cost-effectiveness.

2.1 Basic Concept of cloud computing

Cloud computing is a significant model which provides


accessible computing model for end-user. The working
models for cloud computing are as follows:

2.1.1 Cloud Deployment Models

Cloud computing has several deployment models, which are


used for delivering their resources to the end-users. Cloud
computing have four types of cloud which are:

Public Clouds: Public cloud can be used by ordinary


people, it includes personal, corporations, and other types of
organizations. Public clouds are created, installed, and
maintained by third parties or vendors over the internet, and
services are provided on a pay-per-use basis. Those clouds
are also called provider clouds. Public clouds merge with Figure 1: Overview of Cloud computing deployment and
each other and allow companies to take advantage of shared services model.
IT resources and services.
Private Clouds: This cloud computing environment is
used within the boundaries and only for the benefit of the 3. Overview of Cloud Computing Reliability
organization. Private cloud services are developed, deployed,
and maintained for a single organization. These are primarily Ensuring reliability in the cloud computing environment is
created by IT departments in initiatives that seek to optimize an important issue. Reliability is defined as the probability
the use of infrastructure resources in applications using grid that a given item will perform its specific function for a
and virtualization concepts. In addition, the private cloud is given time period (Kaur, 2015). This section first introduces
more secure and controls more than the public cloud. the definition of cloud computing system reliability, then
Community Clouds: Community Cloud is a cloud that provides a reliable design for its system, and finally gives an
is created for various organizations to share resources. This overview of the major factors affecting reliability.
cloud computing environment is beyond the confines of the
organization, although this cloud is not bound to be a public 3.1 Definition of the reliability of cloud computing
cloud. These clouds have been created for a specific purpose, systems
such as security requirements.
Hybrid Clouds: Hybrid cloud is made up of a The reliability of a cloud computing system measures the
combination of the above three deployment models. This level of stability that the system provides continuous and
cloud provides the properties of the cloud from which it is hassle-free service. Basically, reliability can be defined as
made. follows: Under certain conditions, the system can provide
consistently uninterrupted and hassle-free service within an
2.1.2 Cloud Service Models agreed time period. Fault service capacity, cloud computing
is service-oriented architecture (Duan, 2015), below figure
There are three basic service models for cloud computing as show the architecture. Through the client network, cloud
listed below: computing systems can provide a variety of services anytime
Software as a Service (SaaS): The SaaS (Software as a and anywhere, such as infrastructure as a service (IaaS),

3
CT074-3-M-RELM Reliability Management TP063111
platform as a service (PaaS), and software as a service service so that the failure does not reduce the effectiveness of
(SaaS). Therefore, the reliability of cloud computing systems the service. At the same time, labor should be reduced as
is closely linked to specific service models. For cloud much as possible during the service recovery process.
creation, the services provided by computing systems are
more reliable. Each service model has the responsibility to 3.3 Factors that affect the reliability of cloud computing
ensure the service reliability of the service provider and their systems
respective responsibilities vary with the service model. For
example, the service provider of the IaaS service level should According to Javadi et al (2012), system failure can be
ensure that the hardware benefits are always in a stable defined as an event where the system cannot function
operational condition and hardware failure will not affect the properly according to the consent. When the system
quality of service running on the infrastructure; And SaaS withdraws from normal operation, it can be considered that
service providers must ensure, as a service level, that the the system has failed. Therefore, system failure is a major
software system does not contain any serious software errors, factor affecting system reliability. In a cloud computing
to avoid software failures that affect the user's service system, the resources of the system are large, the architecture
experience. is very complex, the number of services is very high and due
to the interdependence between different service models,
failures often occur compared to normal systems. Therefore,
failure is the main threat to the reliability of cloud computing
systems.
In the case of software testing, failure refers to an
unwanted or unacceptable external behavior that occurs
during the operation of the software. Failure refers to an
unwanted or unacceptable internal state where the software is
running, such as entering the wrong conditional branch while
executing the software and the software fails. Errors are
unwanted or unacceptable differences, such as code errors
that exist in software programs, data, or documents. This
paper introduces the definition of failure in software testing,
failure of hardware, software, and other components or
systems to behave incorrectly, such as due to aging of
memory reading and writing so that the computer system
does not work properly, at this point the system can be said
Figure 2: Cloud computing Service model (Wikipedia to have a memory failure. This section provides an overview
contributors, 2021) of common failures and major causes of failure in cloud
computing systems.
3.2 Design principles of reliable cloud computing Based on the characteristics of failure, there are three main
systems types of failures in cloud computing systems: resource
failure, service failure, and other failures. Resource failure
According to Gill and Buyya (2020), three design principles refers to a misplaced physical asset, such as hardware failure,
are proposed to avoid cloud computing system failures software error, power outage, network outage, etc. Currently,
during operation or to restore cloud computing services if the most of the fault tolerance work is mainly focused on
system fails and to ensure the reliability of the cloud resource failure. Service failures in cloud computing systems
computing system. are service failures that the service provider cannot meet the
Design for resilience: Without human interaction, cloud quality of service specified in the service level agreement.
computing services can tolerate the failure of system Resource failure usually results in service failure when error
components. It can detect system failures and take corrective tolerance measures are not taken, but service failure can still
action automatically in the event of a failure, so that users are occur when physical resources function normally. Other
not aware of service interruptions. When the service fails, the failures of cloud computing systems usually refer to
system provides some function instead of crashing unforeseen failures caused by natural or man-made causes,
completely. such as high-energy particles, cyber-attacks, and human
Design for data integrity: If a system fails, the service error. Understanding the reasons for failure is crucial to
can manipulate, store, and discard data in continuous normal ensuring the reliability and availability of cloud computing
operations to maintain the integrity of user-managed data. services. Below figure 2 gives an overview of the common
Design for recoverability: When an abnormal situation causes of failure in the cloud computing systems.
happens on the system, it can ensure that the service is
automatically restored as soon as possible. 3.3.1 Hardware failure: Hardware failure refers to
system failures caused by the failure of hardware facilities
Following these design principles will help improve the such as hard disks, memory, and other devices. About 50%
reliability of cloud computing systems and reduce the impact of all data center failures are caused by hardware. As the size
of failures. In addition to these principles, if an application and usage time of the data center increases, so does the
service example fails, the unfinished part of the service and frequency of hard disk failures. Studies by Vishwanath et al,
the delayed service should eventually be able to complete it show that 78% of hardware failures are caused by disk
smoothly as needed. Once the system fails, the service drives, and the number of hard drive failures increases
provider or user should take appropriate steps to recover the rapidly as usage time increases. Therefore, regular disk

4
CT074-3-M-RELM Reliability Management TP063111
replacement or the use of unnecessary disk arrays can noise, and hardware aging, instantaneous and intermittent
significantly reduce the chances of disk failure and increase failures can occur in hardware circuits, which can lead to soft
system reliability. bugs. As soft bugs spread throughout the system, they can
manifest themselves as various forms of system failure. Such
3.3.2 Software failure: As systems and software in as incorrect output or system crash. At the cloud computing
cloud computing systems become more complex, software system level, this problem can get worse. But system
failure has become a major cause of system crashes. designers can eliminate the effects of soft errors on services
Software failures are mainly caused by software design through error prediction and error tolerance.
errors, update failures, and potential operating failures due to
system reboots. According to the research AppDynamics, the 3.3.7 Cyber Attack: In recent years, network attacks
unfortunate software failure cost of 1000 businesses between have become one of the main reasons for the rapid growth of
$1.25 billion to $2.5 billion every year. Sometimes, an data center failures. According to a report released by the
unexpected error in the software update upgrade process can Ponemon Research Center, since 2015, companies have
cause the entire system to crash. According to The Propitt greatly improved their cyber resilience: the percentage of
survey, about 20 percent of restarts fail due to data companies that have achieved a high level of cyber resilience
inconsistencies. Of course, memory leaks, indeterminate has increased from 35% in 2015 to 53% in 2020, an increase
threads, data mishandling, storage space fragmentation, etc. of 51%. Despite the increase in the amount and intensity of
may also cause other system failures or system performance attacks in the last 12 months, 67% and 64%, respectively,
degradation. companies are feeling more confident.

3.3.3 Power failure: In cloud computing data centers, 3.3.8 Human Error: Like cyberattacks, human factors
about 33 percent of services are disrupted due to power account for a significant proportion of cloud computing
outages, which could easily occur during a natural disaster or system failures (22%), with the average cost of human-
war. In 2012, six of the 27 major power outages at the cloud induced failures being $489. Zhao et al research presents that
computing data center were caused by Hurricane Sandstorm inexperience and operational error are the main causes of
and all customer service was disrupted. Another major cause human error and human error is responsible for a significant
of power failure is the failure of uninterruptible power proportion in the early days of cloud computing
systems, which causes about 25% power outage and one infrastructure. Therefore, more experience gained by cloud
failure causes a loss of about $ 1,000. computing system managers can help reduce the likelihood
of people making mistakes.
3.3.4 Network Failure: In distributed computing
architectures, especially cloud computing systems, all 4. Reliability evaluation of cloud computing
services are supported by communication networks, and all systems
information is exchanged between servers. Disruption of the
underlying network can also result in the disruption of cloud To ensure the reliability of cloud computing systems,
computing services. For some cloud-based real-time researchers need to design appropriate methods and
applications, network performance often plays a key role. techniques to mitigate the effects of system failures. Before
Short network congestion can cause network transmission designing a particular approach, researchers must first
delays, resulting in a breach of the Service Level Agreement determine evaluation criteria for system reliability based on
(SLA) provided by the system. Of all service failures, some actual requirements. This section will introduce three ways
failures are due to network disconnections, and network of measuring system reliability evaluation.
services interruptions may be due to physical or logical
arrangements. 4.1 Time indicator for measuring system reliability
In general, for cloud computing service providers, high-
3.3.5 Service Failure: In cloud computing systems, reliant cloud computing systems are usually characterized by
service failures can occur whether resource failures occur. a small number of failures, long working hours, and short
Bai et al.'s research show that the occurrence of service repair times after failures. According to the Alavian (2020),
failures is closely related to the stage in which the job is There are three-time indicators for measuring the reliability
submitted. On the one hand, at the stage of a job request, a of cloud computing systems that reflect the system's ability
job submitted by the user with a specific service requirement to maintain normal activity over a period.
is stored in a prepared queue. At this stage, resource Mean-time-to-failure (MTTF): MTTF is the average
overload, such as peak time for service requests, may end time between the start of normal operation and the time the
during service requests, at which time the user will not be failure occurs, the average time the system runs without
able to access the service. In this case, although the system's failure. The longer the average trouble-free time of the
built-in resources are working well, they may not be able to system, the higher the reliability of the system.
fully accommodate all requests, leading to service failures.
On the other hand, at the execution stage of work, the work Mean-time-to-repair (MTTR): MTTR is the average of
is committed to the underlying physical resources, and the intervals experienced from the time the system fails to
therefore the service may be disrupted due to resource the end of the fault repair and can be re-worked. The longer
constraints. the average failure-free time of the system, the better the
3.3.6 Soft Bugs: With the continuous development of recovery performance of the system.
CMOS technology and the constant control of processor
voltage, soft bugs have become an important concern in Mean-time-between-failures (MTBF): MTBF is the
modern computer systems. Due to high-energy particles, average of the interval experienced by the system for two
5
CT074-3-M-RELM Reliability Management TP063111
adjacent failures. The greater the average failure interval of To allocate cloud computing reliability resources
the system, the greater the reliability of the system and the efficiently, scheduling in a cloud computing environment
stronger the correct performance. becomes a highly complex task where many alternative
computing tools with different capabilities are available in
4.2 Reliability of cloud service system the market. Effective task scheduling methods can fulfill the
requirement of users and enhance resource usage. According
A cloud service system is designed to be a cloud to new research (M., 2021), cloud service providers at the
management system (CMS), which can perform four same time receive huge computing requests from users with
different functions: 1) managing the request queue consisting different needs and preferences. Accepting each requested
of job requests from different users, 2) managing computing task is different from one and another, some request tasks
resources on the Internet (e.g. personal computers, clusters, require less cost and less computing resources, while some
supercomputers, etc.), 3) managing data resources on the request tasks require higher computing capacity and more
network (e.g. databases, web pages, etc.), and 4) assign bandwidth and computing resources. When service providers
subtasks to multiple different computing resources that can got the tasks from users, tasks can be paired-based
access data resources. When a user requests a cloud service, comparisons using comparative matrix techniques. Service
the CMS system first uses workflows to describe the providers deal with users on work requirements, including
subtasks that the cloud service contains, the data resources network bandwidth, full-time, task costs, and task reliability.
that the subtasks require, and the dependencies between the In a cloud computing environment, computing resource
resources and then distribute the subtasks to each computing storage can be allocated to the task at hand once per job
resource node that can access the data resources. requirement.
In the process of managing cloud services, there are
various factors that can affect the reliability of cloud 5. Cloud computing reliability challenge
services, including: request queue overflow, request timeout,
data resource loss, loss of computing resources, software Currently, there are many new applications that have been
failure, database failure, hardware failure, and network created and deployed based on cloud computing, and more
failure. According to the Zhou et al, evaluating reliability in and more applications are shifting from traditional
cloud computing is not a easy task, but they are presenting a computing platforms to cloud computing-based platforms. In
tool that helps evaluate and improve cloud service reliability, the future, with the further development of science and
this tool name is FTCloudSim (Fat-Tree cloud simulation). technology, the application of cloud computing technology
FTCloudSim is a cloud simulation-based tool that provides will become more widespread. However, since cloud
an expansible process to increase cloud service reliability. computing is based on the implementation of virtualization
FTCloudSim can manage failures with a check-point technology, and features multi-leased, large-scale, and
process. complex architecture, it is very difficult to properly manage
FTCloudSim has 5 steps to simulate the cloud computing software and hardware resources in a cloud computing
system which are 1) fat-tree data center network system. At the same time, the development of data
construction, 2) failure and repair event triggering, 3) protection, edge computing, and other technologies, and their
checkpoint image generation and storage, 4) check-point- deep integration with cloud computing, has also brought
based cloud let recovery, and 5) result generation. many challenges to the reliability of cloud computing. Taken
Advantages and disadvantages of each stage. The first metric together, the future of cloud computing reliability research
can increase the reliability of cloud services. The metrics could continue in five areas:
include total execution time and average lost time. The 5.1 Service reliability problems of virtualization
second metric is the use of network resources. In addition, technology in case of hardware resource failure.
the check-point-image-data transferred by all device Virtualization technology is the key to the realization of
switches, this metric collect all the check-point-image-data cloud computing. Virtualization failure and resource
which is transferred by all available switches. The last metric contention are two main problems in computing systems,
is storing all the disk usage for the storage of checkpoint which increase the response time. The reliability of Cloud
images. computing systems can reduce problems with real-time
applications such as video broadcasting and video
4.3 Reliability of cloud resource systems conferencing, which can reduce delays in data transfer (Gill
et al, 2020).
In a cloud computing environment, service providers need
to manage a variety of resource components such as 5.2 With the rapid growth of the cloud computing
processors, memory modules, storage units, network market, data security problems due to system reliability,
switches, and so on. The more components there are, the users' reliance on cloud computing are increasing and more
more likely the cloud computing system is to fail. If service and more users are storing personal data in the cloud. the
providers understand the failure characteristics of different vulnerability in the cloud computing system allows system
resource components, it will help them better manage attackers access to another person's personal data, network
computing resources to make their systems error-tolerant and intrusions can cause widespread security issues and hidden
provide high-performance services. The reliability of cloud threats to services. Thus, how to improve the reliability of
computing systems is studied from the interrelationship cloud computing systems to ensure user data protection has
between software component (including process, hypervisor, become a major challenge for cloud computing service
and management programs) failures and hardware providers (Wenxue et al,2019).
component (server node) failures and component failures.

6
CT074-3-M-RELM Reliability Management TP063111
5.3 Reliability leads to increased service costs. To computing. This paper provides some reliability evaluation
improve the reliability of services provided by cloud of cloud computing. in addition, how to solve the problem of
computing systems, many service providers use redundant cost growth and service performance degradation caused by
resources to improve service fault tolerance. While improving system reliability, how to find and prevent the
increasing unnecessary resources can significantly improve potential information security problems brought about by
the reliability of services, it also increases the cost of system improving system reliability will be a great challenge. at the
services accordingly. This not only increases the cost for same time, how to not reduce the reliability of application
users of cloud computing services but can also reduce the services when hardware resources fail is also a challenge. In
rate of return for service providers. Thus, when studying how the future, with the deep integration of cloud computing and
to improve system reliability, designing a reasonable edge computing, the internet of things, and other emerging
resource allocation strategy to reduce the cost of using cloud technologies and cloud computing, cloud computing
computing services is also a problem that future researchers reliability will guide in more and greater challenges.
will need to consider (Wenxue et al,2019).

5.4 Cloud Computing systems face a challenge of data


synchronization because data is stored geographically, which 7. References
overloads cloud services. To solve this problem, Quick
Flexibility can be used to find overloaded cloud services and [1] Fortune business insights. (2021, May). Cloud
it adds new examples to manage current work pressures. Computing market size, share & Covid-19 Impact
Furthermore, efficient data backup is required to recover data analysis (No. 102697).
in case of server downtime (Gill et al, 2020). https://www.fortunebusinessinsights.com/cloud-
computing-market-102697
5.5 Reliability degrades service performance. To improve [2] Frankenfield, J. (2021, May 19). How cloud computing
the tolerance of cloud computing systems to system failures, works. Investopedia. Retrieved October 24, 2021, from
almost all data centers are fault tolerant. Cloud computing https://www.investopedia.com/terms/c/cloud-
systems also face the challenge of data synchronization computing.asp.
because data is stored in large data centers, which overloads
cloud services. For this problem solution, Quick Flexibility [3] Mell, P. M. (2018, November 10). The NIST Definition
can be used to find overloaded cloud services and it adds of Cloud Computing. NIST.
new examples to manage current work pressures. https://www.nist.gov/publications/nist-definition-
Furthermore, efficient data backup is required to recover data cloud-computing
in server downtime.
[4] Mesbahi, M. R., Rahmani, A. M., & Hosseinzadeh, M.
5.6 The development of Internet of Things, Smart City, (2018). Reliability and high availability in cloud
Smart Homes, and other Internet of Thing applications will computing environments: a reference roadmap.
greatly enrich people's lives soon. Cloud computing, because Human-Centric Computing and Information Sciences,
of its unlimited resources and convenient usage mode, has 8(1). https://doi.org/10.1186/s13673-018-0143-8
naturally become the first choice to provide intelligence to
many end devices on the internet of things. However, cloud [5] Proffitt, B. (2013, March 5). Software-as-a-Service: The
computing is not suitable for IoT applications with high Dirty Little Secrets Of SaaS. ReadWrite.
network latency, network congestion, and low reliability https://readwrite.com/2013/03/05/software-as-a-
because of the large number of resources provided in a service-the-dirty-little-secrets-of-saas/
centralized manner. In cloud computing, the difficulty of [6] Wikipedia contributors. (2021, October 23). Cloud
central control and the relative limitation of resources will computing. Wikipedia.
have a major impact on the efficiency of the services. How to https://en.wikipedia.org/wiki/Cloud_computing#cite_n
ensure the reliability of application services in a situation of ote-71
cloud computing and IoT deep fusion is an important [7] Gill, S. S. & Buyya, R. (2020). Failure Management for
research work in the future (Botta, 2016). Reliable Cloud Computing: A Taxonomy, Model, and
Future Directions. Computing in Science and
6. Conclusion: Engineering, 22 (3), pp.52-62.
https://doi.org/10.1109/MCSE.2018.2873866.
cloud computing, as new computing technology, can [8] Javadi, B., Thulasiraman, P., & Buyya, R. (2012).
provide users with highly scalable computing services, has Enhancing performance of failure-prone clusters by
attracted wide attention from mass media companies and adaptive provisioning of cloud resources. The Journal
industries. Presently, cloud computing services are of Supercomputing, 63(2), 467–489.
increasing with the huge number of users and make https://doi.org/10.1007/s11227-012-0826-2
complexity for data centers, also increasing operational [9] Vishwanath, K. V., & Nagappan, N. (2010).
failures, that's why reliability is the major challenge of the Characterizing cloud computing hardware reliability.
cloud computing technology. Firstly, this paper introduces Proceedings of the 1st ACM Symposium on Cloud
the basics of cloud computing, and its service model tries to Computing - SoCC ’10. DOI:
present a brief description. Second, define the reliability of https://doi.org/10.1145/1807128.1807161
cloud computing systems, the current reliability of services, [10] Parkerson, S. (2015, February 11). New AppDynamics
the design policy of reliability, and the major influencing Study Shows Critical Failures Can Cost $1 Million Per
factors of reliability that are making a huge impact on cloud Hour. App Developer Magazine.

7
CT074-3-M-RELM Reliability Management TP063111
https://appdevelopermagazine.com/2371/2015/2/11/N [24] Alazie Dagnaw, G., & Ebabye Tsige, S. (2019).
ew-AppDynamics-Study-Shows-Critical-Failures- Challenges and Opportunities of Cloud Computing in
Can-Cost-$1-Million-Per-Hour/ Social Network; Survey. Internet of Things and Cloud
[11] F. (2013, February 27). Lessons Learned from Recent Computing, 7(3), 73.
Cloud Outages. Flexera Blog. https://doi.org/10.11648/j.iotcc.20190703.13
https://www.flexera.com/blog/cloud/lessons-learned- [25] Alavian, P., Eun, Y., Liu, K., Meerkov, S. M., &
from-recent-cloud-outages/ Zhang, L. (2019). The (α, β)-Precise Estimates of
[12] Bai, Y., Zhang, H., & Fu, Y. (2016). Reliability MTBF and MTTR: Definitions, Calculations, and
modeling and analysis of cloud service based on Induced Effect on Machine Efficiency Evaluation.
complex network. 2016 Prognostics and System IFAC-PapersOnLine, 52(13), 1004–1009.
Health Management Conference (PHM-Chengdu). https://doi.org/10.1016/j.ifacol.2019.11.326
Published. https://doi.org/10.1109/phm.2016.7819907
[13] Bai, Y., Zhang, H., & Fu, Y. (2016). Reliability
modeling and analysis of cloud service based on
complex network. 2016 Prognostics and System
Health Management Conference (PHM-Chengdu).
Published. https://doi.org/10.1109/phm.2016.7819907
[14] Ponemon, L. (2020, July 7). The 2020 Cyber Resilient
Organization: Preparation and Technology
Differentiate High Performers. Security Intelligence.
https://securityintelligence.com/posts/2020-cyber-
resilient-organization-preparation-technology-
differentiate-high-performers/
[15] Zhao, E., & Wu, C. (2021). Long-term safety
assessment of large-scale arch dam based on non-
probabilistic reliability analysis. Structures, 32, 298–
312. https://doi.org/10.1016/j.istruc.2021.03.012
[16] Zhou, A., Wang, S., Yang, C., Sun, L., Sun, Q., &
Yang, F. (2015). FTCloudSim: support for cloud
service reliability enhancement simulation.
International Journal of Web and Grid Services, 11(4),
347. https://doi.org/10.1504/ijwgs.2015.072804
[17] M., B. K. (2021). Hybrid Evolutionary Algorithm
based Task Scheduling Mechanism for Resource
Allocation in Cloud Environment. Revista Gestão
Inovação e Tecnologias, 11(4), 194–209.
https://doi.org/10.47059/revistageintec.v11i4.2101
[18] Gill, S. S., & Buyya, R. (2020). Failure Management
for Reliable Cloud Computing: A Taxonomy, Model,
and Future Directions. Computing in Science &
Engineering, 22(3), 52–63.
https://doi.org/10.1109/mcse.2018.2873866
[19] Botta, A., de Donato, W., Persico, V., & Pescapé, A.
(2016). Integration of Cloud computing and Internet of
Things: A survey. Future Generation Computer
Systems, 56, 684–700.
https://doi.org/10.1016/j.future.2015.09.021
[20] Alazie Dagnaw, G., & Ebabye Tsige, S. (2019).
Challenges and Opportunities of Cloud Computing in
Social Network; Survey. Internet of Things and Cloud
Computing, 7(3), 73.
https://doi.org/10.11648/j.iotcc.20190703.13
[21] Duan Wenxue, Hu Ming, Zhou Qiong, Wu Tingming,
Zhou Junlong, Liu Xiao, Wei Tongquan, Chen
Mingsong, 2020. Reliability in Cloud Computing
System: A Review[J]. Journal of Computer Research
and Development, 2020, 57(1): 102-123.
https://crad.ict.ac.cn/EN/Y2020/V57/I1/102
[22] YU, X. X., & BIAN, J. (2017). Reliability Analysis of
Cloud Computing Service System. DEStech
Transactions on Computer Science and Engineering,
aice-ncs. https://doi.org/10.12783/dtcse/aice-
ncs2016/5746
[23]

You might also like