FULLTEXT01
FULLTEXT01
FULLTEXT01
AXEL ROOTH
do not run Linux get charged by the hour) of usage, this D. Companies and cloud computing
is what is known as pay-as-you-go. Depending on hardware More and more companies are switching over to public CC
configuration there is a differing price for the usage amount, from private clouds and other types of on-premise computing
meaning that it is in the customer’s interest to match the [1]. There are many aspects associated with this change in
hardware capacity with the requirements to avoid paying how the companies operate.
unnecessary costs. First of all, cloud computing allows companies to focus on
With on-demand instances it is possible to scale capacity improving customer experience, increase productivity, lower
up or down and therefore it is possible to match capacity cost and – by enabling quicker time to market – also helps in
with fluctuating capacity requirements. The problem is that the generating revenue.
adjustments are not guaranteed to be committed instantly; in Second of all, CC is helping companies during the digiti-
the case of Microsoft Azure scaling can take up to 30 minutes. zation trends where the users demand access to systems from
To save cost it would be beneficial for the customer to reliably anywhere and at any time. Also, the ability of CC to rapidly
know – up to 30 minutes in advance – whether they should scale is well suited for companies building APIs since the APIs
scale up, scale down or keep the current level. might handle huge amounts of information which increases the
chances of unpredictable load.
There is a specific reason for why this, and previous,
Third and last, when switching to CC, a company can
research focuses on the CPU utilization part of CC. Here, the
be seen as moving away from CapEx (capital expenditure)
reason for choosing to analyze CPU – and not the other parts
to OpEx (operatinonal expenditure). Instead of paying for
that together with CPU constitute cloud computing, such as
hardware, software and implementation the costs are ongoing.
bandwidth and memory – will be elaborated upon.
This can be misinterpreted as reducing investment in IT while
Pricing for both memory and bandwidth (data transfer) the costs of operating are increasing [1]. To avoid confusion
relies on bytes used. Although there are different underlying the transition must be communicated clearly and might require
hardware optimized for different types of applications with new ways of analyzing IT.
varying pricing, storage pricing for both AWS and Azure is
directly related to how many gigabytes are used on average per
month [9] [10]. Storage can of course fluctuate over time, and E. Cost management
thus affect price per month, but it is usually not as volatile as A company’s cost management is crucial for their business
CPU usage. The cost of bandwidth is also calculated based on to run efficiently. If costs exceed the profits it can have a
bytes used each month and AWS offers a discount for larger detrimental effect on the future of the firm. In the worst
volumes [11] [10]. To save money on bandwidth it would case, poor cost management could even lead to bankruptcy.
likely be best for the customer to look at ways to reduce data However, tracking and managing a firm’s costs is not always
transfer. easy. Therefore, companies should emphasise the importance
of cost management and implement tools and metrics to help
When it comes to CPU the issues are more complex. As
managers understand their actions.
mentioned, it is common for CSPs to use pay-as-you-go
Research has shown that computerised tools can be bene-
pricing. The customer is billed based on the amount of time the
ficial for managers in retaining that understanding [16]. As
CPU was used in some way; it could be CPU cycles, seconds
mentioned, companies moving their operations to CC also
or hours. This may be interpreted as customers being charged
switch from CapEx to OpEx. Managers therefore have to
the minimal amount for their computing needs but that notion
reevaluate how their cost management is handled which can
is somewhat misleading.
be difficult with limited resources.
AWS and Google Cloud calls it vCPU and Azure calls it As was just suggested, computerized tools can be beneficial
vCore. The ”v” stands for virtual since customers do not have for understanding and managing cost. Microsoft Azure, for
to worry about how the hardware infrastructure is implemented instance, has some tools built in their PaaS platform which
by the CSP. One vCore (or vCPU) essentially corresponds to can be a great start for a new way to manage cost. However,
a core on a physical CPU [12]. Consequently, more vCores managers should not be satisfied and stop at that point because
roughly translates to more computing power. these tools may not be suited for that particular firm. Com-
The final cost that the customer gets billed is based in part panies have therefore started to build tools to analyse costs
on how much CPU time was consumed and in part on how on top of Microsoft Azure to tailor to the firms own specific
many vCores were used during that time. This applies to all needs regarding cost management. This paper will provide an
of AWS, Azure and Google Cloud [10] [13] [14]. In addition, example of what such a tool could look like.
the price varies depending on which generation of CPU the
cores are a part of [15] but this will not affect this paper’s III. R ELATED W ORK
research.
Prediction of workload in CC services has been researched
This paper will focus on how companies can save money extensively. Calheiros et al. [17] point out the importance of
by scaling the number of vCores used to improve utilization. understanding the future demand to provide good quality of
Part of the reason to use vCores is to allow scaling so this service, to avoid customer dissatisfaction and to avoid poten-
feature should be taken advantage of [10]. tially losing customers. Therefore, predicting the workload is
4
Figure 2. LSTM architecture with forget gate ft ,input gate it , output gate
ot [25]
1
(x) = x
(7)
1+e
Figure 1. RNN architecture, folded to the left and unfolded to the right [23]
ex e x
tanh(x) = (8)
A recurrent neural network (RNN) is one of the machine ex + e x
learning models that will be used for predictions in this paper. The difference between the forget gate and the input gate is
RNNs work by recurrently using the output from itself to ĉt function which will be used to update it which later will
predict next output with the help of the next input as shown be used in the next phase of calculation of the cell state ct
in Figure 1. Thus creating a sort of memory of past data. (14) which is then used to update the hidden state ht with the
Therefore, RNNs are well suited for sequenced data and time output gate vector.
series prediction. Furthermore, there exists different definitions
ĉt = tanh(Wc · [ht 1 , xt ]) + bc (9)
of standard RNNs. In this paper a standard RNN is defined
as an Elman network with the equations of the RNN shown
below [23]. c t = ft ⇤ c t 1 + it ⇤ ĉt (10)
ht = (Wh xt + Uh ht 1 + bh ) (2)
ht = ot ⇤ tanh(ct ) (11)
yt = (Wy ht + by ) (3) The hidden state vector and the cell state vector are then
connected to the LSTM again to account for the recurrent
The most basic of RNNs, have however a rather insufficient feature of the network. Therefore, with every iteration the
memory because of the vanishing gradient problem [24]. hidden state and the cell state is updated to fit the data
Therefore, other types of RNNs were created to account for accordingly while the weights and biases stay the same until
the lack of memory to be able to remember longer sequences updated with back-propagation through time (BPTT) [26].
to improve accuracy. One of these RNNs is called long short- 2) Optimizer Adam: The optimizer which is used to update
term memory (LSTM) which this paper uses to compare with the weights’ each input sequence during training is Adam
the standard RNN. (adaptive moment estimation). The method uses an adaptive
1) Long short-term memory: The LSTM architecture is learning rate to update the parameters. Meaning that the
based on three gates: The forget gate ft (8), the input gate learning rate for the parameters are individually updated.
it (9) and the output gate ot (10) [25]. An overall preview of Adam achieves this by using both the second and the first
the architecture is shown in Figure 2. gradient of the cost function, where mt is the mean and vt
is the uncentered variance. 1 and 2 are hyper-parameters
ft = (Wf · [ht 1 , xt ]) + bf (4)
which is recommended to be initialized to 0.99 and 0.999
respectively.
it = (Wi · [ht 1 , xt ]) + bi (5)
mt = 1 mt 1 + (1 1 )gt (12)
setting the hyper-parameters ⌘ (learning rate) and ✏ to 0.001 the dimension of the measured error unit changes. The m
and 10 7 respectively, which is rarely changed [27]. in the equation is the total number of predictions and by
dividing with m the mean squared error of all the predictions
mt
m̂t = t (14) is obtained.
1 1 The fact that the error becomes smaller when – and even
vt 0 when predictions are completely accurate (although this is
vˆt = (15)
1 t
2
unlikely) – can be taken advantage of when training a machine
learning model [22]. To improve the performance the model
⌘
✓t+1 = ✓ p m̂t (16) simply has to reduce the error.
vˆt + ✏ As mentioned there are many different measures to choose
from and they should be chosen with regards to the problem
D. Machine learning implementation in practice at hand. To broaden the perspectives and help interpretation of
In practice, implementing a machine learning model to results a second error measurement was chosen as well, mean
solve a problem can be a nuisance and demand persistence. absolute error (MAE).
Successfully finding an accurate model is not a trivial task. m
1 X
Having a systematical design framework when implementing M AE = |(ŷ y)i |, (18)
a machine learning model can be a savior. Therefore, this paper m i
adheres to the following common practices.
Unlike MSE, the MAE is only a function of one charac-
Andrew Ng proposes four key points to consider for a
teristic instead of three. MSE can therefore be a more natural
practitioner [28]:
error measurement to interpret [29].
• Determine your goals - what error metric are you going
2) Training and testing: Furthermore, one should acknowl-
to use and what your target value is. The goals and error edge the importance of how to go about testing the model’s
metrics should be driven by the task the machine learning performance. The provided data is not recommended to fully
model is intended to solve. be used when training the algorithm. The reason for this is
• Set up a end-to-end pipeline as soon as possible. To
because of the implication of then measuring the algorithms
also make sure to find an appropriate estimation of the performance to generalize. Therefore, is is recommended to
performance metrics. split the data in a training set, validation set and a test set
• Measure the system accordingly to find performance
with a common ratio of 8:1:1. The training set and validation
bottlenecks in the model due to over-fitting, under-fitting, set is used during training, where the validation set is used to
defects in data or software bugs. alert for over-fitting of the training data. The test set is used
• Frequently make incremental changes in regards to find-
to understand how the model performs on unseen data and it
ing more data, adjusting hyper-parameters, or changing is therefore attractive to adjust the model to find the lowest
model based on findings regarding performance bottle- possible test error.
necks. 3) Baseline models: When having calculated a test error the
With this design framework in mind, there are also three question of what a decent error result is emerges. In practice,
fundamental concepts within machine learning that should be the error will never reach 0 so a practitioner has to find
elaborated upon. another way of bench-marking the test error [22]. What is most
1) Error measurements: There exists an abundance of stan- commonly used is a baseline model which is not necessary
dard error measurements. Which measurement that eventually a machine learning model. For time series prediction, there
gets chosen should stem from the characteristics of the prob- exists plenty of methods but moving average and weighted
lem that the model is facing. The chosen error measurement moving average are two common ones. Moving average is
can then be used to measure the models performance. A simply calculated as:
better performing model will produce smaller errors. One m
measurement of a models performance is the mean square 1 X
ŷt+1 = yi , (19)
error (MSE) [22]. m i=t m
m
1 X where t+1 is the next time-step to be predicted and m is
M SE = (ŷ y)2i , (17) the total number of steps to be used in the average.
m i
If the test error is worse than the baseline model, the
Upon examination of the equation above, it is apparent practitioner should then use the design framework’s fourth key
that if the predictions, ŷ, are closer to the real values, y, point and change the model accordingly.
the error becomes smaller. When squaring the difference of
the prediction and real value, three things happen. Firstly,
it is ensured that all summed values are positive to avoid E. One-way ANOVA
positive and negative values cancelling each other and thus To ensure that results are faithful, an analysis of variance
underestimating the error. Secondly, larger individual errors (ANOVA) is used. First a null-hypothesis must be established.
receive greater punishment since their weight in the final error If the null-hypothesis were to be true, there is no significance
will be greater than the weight of their initial size. Thirdly, to be found in the results. An alternative hypothesis which is
7
considered true if the null-hypothesis is rejected must also be the main page a request is made to read the database so news
prepared. can be displayed. Each read, write or delete operation requires
An F-test is then conducted to calculate the probability (p- CPU power, and more active users requires more CPU power.
value) of the null-hypothesis being false. If the probability In Figure 3 a pattern is discernible. The first day in the
is below a chosen probability called ↵, five percent being a dataset is a Sunday which can be seen as low activity in the
common threshold for ↵, the null-hypothesis can be rejected figure. During the weeks higher activity can be seen at the start
and the alternative hypothesis be considered as truthful [30]. of the workdays with a decline throughout the day. During the
weekends the activity is close to 0. The second week coincided
F. Central Processing Unit (CPU) with Easter holiday on Friday stretching to Monday, which is
reflected as low activity for those days in the figure.
Since a general understanding of a CPU is beneficial for The data points in the dataset are the Max CPU Utilization
the full understanding of this paper, a simple explanation is Percentage for each minute during the months that the dataset
provided. stretches over. A choice was made to use max CPU utilization
The central processing unit is the primary component to instead of average CPU utilization to avoid under-provisioning.
process instructions in a machine. These instructions vary from In total there are 58691 data points.
logical, arithmetic, controls and I/O (input/output) operations.
Essentially, the CPU is the brain of a machine. The interpre-
tation of the data processed in the CPU is in the form of
bytes, 1’s and 0’s. The particular operation process to execute B. Forecasting models used
commands is called the instruction cycle. The instruction is Three different forecasting models are implemented:
divided into three stages: fetch, decode and execute. First 1) LSTM,
some data is fetched in bytes from the program memory. The 2) Standard RNN,
instructions are then decoded into signals which the internal 3) Moving Average.
components of the CPU can comprehend. The CPU can then LSTM and standard RNN are both part of this paper’s research
start processing the instruction and make the executions in the whereas moving average will serve as a baseline model
form of actions [31]. for analysing results. Since LSTM is a type of RNN, the
parameters that need to be chosen are identical for LSTM
V. M ETHODOLOGY and standard RNN.
This section details how the study was conducted. First the For the first research question each model uses the 60
data used for the experiments is explained in detail. Then previous minutes as input to predict one step, or one minute,
follows an overview of which models were used and how they ahead. The second research question uses the same number
were implemented. of input steps but to instead predict several steps ahead. Both
the LSTM and RNN have 1 feature as input (historical CPU),
A. Description of dataset meaning they are univariate, and use 11 hidden hidden units
to produce 1 output. Adam is used as an optimizer for both
The data used to train and test the models was received LSTM and RNN, and the learning rate was 0.001.
by the engineering consultancy firm Afry (Afry is their brand Moving average also used 60 input steps since it is fair
name, the company is formally called ÅF Pöyry AB). Afry to compare them under the same conditions. If a different
operates globally but the lion’s share of operations is within amount of input steps were to be chosen it would complicate
Europe, and in particular across Scandinavia. As a conse- comparing performance since one model used less input for
quence the behavior of the roughly 16000 employees will to a its predictions which an entirely different aspect that would
great extent be related to Scandinavian office hours, weekdays need to be considered.
and holidays. This is apparent in Figure 3 which will be
discussed in more detail.
The data reaches from March 21 to May 10, year 2021 C. Procedure
and is from a database with company news connected to the In order to be able to train the LSTM and RNN models,
intranet. Every time an employee logs into the intranet or visits the input data X was reshaped into the dimensions (number
Figure 3. CPU percentage utilization from March 21 to May 10, 2021 (outliers removed in graph to improve readability)
8
Furthermore, by visually inspecting the predictions in Figure Compared with Figure 4, both predictions in Figure 5 are
4 it is apparent that all three methods have made fairly accurate less accurate and less smooth as a result. The most drastic
predictions and follows the pattern pretty well. A few decimals change is the RNN prediction where the model has piece-
of difference in MSE and MAE does not seem to be sufficient wise linear patterns which do not correspond to the real
to completely disqualify any of the methods used. However, data whatsoever. LSTM behaved better overall which was
while the statement holds true, the moving average seems to confirmed by the error metrics in table 2. Where both models
predict prematurely compared to the real data which in turn struggled the most was still during drastic change of CPU
impairs the accuracy of the prediction. capacity.
When observing the RNN and the LSTM, both methods
converge with a similar pattern with a few distinctions. LSTM
seems more adept at predicting the peaks, nevertheless, it also
often falls short here.
better decisions will be important. The reason is because of not very volatile. It followed a somewhat clear pattern over the
the complexity of data and cloud management that companies weeks and in addition the CPU utilization percentage remained
have to account for. Therefore, making some tasks automated low even during the peaks. It is difficult to say how the model
and liberating managers to focus on more difficult tasks at would fare if there was a bigger spread between the peaks and
hand can indeed be beneficial. troughs.
One task that could be automated with accurate predictive Fourthly, when evaluating the model on the testing data,
tools is regulating the cloud capacity. Making sure that the it still came from the same company. Thus it is not known
business is not overpaying for the cloud providers’ services how well the trained model generalizes to other companies.
while still operating at an optimal level. Predictive models If it is possible to develop a model that generalizes between
such as the LSTM can therefore be a great tool to have in a similar companies it would be advantageous. This would
company’s toolbox. To excel even more, the model should be enable companies without much data of their own, or com-
integrated with additional software to be able to change the putational capacity to train a model on their existing data, to
amount of vCores needed during a particular time. This will instead implement a pre-trained model. Perhaps eliminating
indeed demand more ingenious work to handle. However, if the problems that were mentioned as being associated with
the benefits transcend the drawbacks, the question of acting collecting more data.
upon an implementation should be an obvious choice. Fifthly, what is considered a good result is very much
While the incentive of creating an optimizing tool usually subjective. When comparing the models it is reasonable to
is budgetary, managers should acknowledge the environmental say that the LSTM achieved good results if it outperformed
aspect of using excessive computing power in their businesses. both the RNN and the baseline moving average. In the eyes
Cloud waste can to some extent be minimized from the of a manager, however, it is difficult to say what is regarded
provider but the best understanding of ones business is retained as a good enough result. A good enough result could be both
by the ones who operate that business. Therefore, making the higher or lower, and it could differ between managers.
argument that cloud waste is not only an issue that has to Lastly, if a company were to apply this there is an important
be solved on the host end of the cloud, but rather as a co- problem that has to be solved for it to be practical. The
operation by the involved parts. A step in the right direction problem stems from the fact that the measurement used is CPU
to achieving zero cloud waste could be by implementing a utilization percentage. If the company adjusts the numbers of
tool incorporating a model similar to the LSTM. Saving both vCores, meaning increased or decreased computational power,
managers’ time and effort in the processes while reducing the model has no understanding of how it affects the measure-
unnecessary cost and keeping the ecological footprint at a ment. To solve this, the values must somehow be translated to
minimum. a different measurement of computional power which might
require collaborating with the cloud service providers.
C. Other considerations
VIII. C ONCLUSION
When analysing results there are a few points that are When embarking upon this research topic the aim was to
believed to be very important to bear in mind. In total six increase the ability to predict CPU utilization of a cloud server
considerations are presented. by using an LSTM. Previous research had showed that RNN
Firstly, very little data (given the context) was used to train performs with this type of prediction and this paper shows
the model. The model still seems to perform well but the that LSTM outperforms the RNN, although on a somewhat
results can likely be improved upon. In the real world one way different dataset with a clearer pattern. With little training data
to improve performance is to collect more data. However, the the LSTM was able to obtain results with a high degree of
collection of data must be contrasted to the expected benefit accuracy for one-step predictions as well as up to 30 steps.
from improved results [22]. Even if a few uncertainties remain that need to be solved
In the context of this paper’s research, collecting more data before this paper’s findings can see practical use without
requires little effort but a lot of time and patience; Microsoft afterthought, the findings should at least serve as a proof of
Azure only stores historical minute-wise data for the past concept. This paper has showed that:
month. What is believed to be more important in this context is
that having a model which performs well with as little training
1) LSTM outperforms a standard RNN when predicting
data as possible could in itself be an advantage.
CPU utilization,
Secondly, this paper only uses a univariate LSTM. It is very
2) LSTM can predict CPU utilization up to 30 steps (30
likely that a multivariate model (a model which uses more than
minutes) while retaining a high degree of accuracy.
one feature) could produce better results. Examples of features
to consider are: hour in the day, day of the week, weekend or
IX. F UTURE WORK
workday, holiday or not and so on. An attempt was made to
use a multivariate LSTM but gave up at an early stage due to As mentioned, several uncertainties remain which could
bad performance, which is believed to be due to the curse of be investigated in future work for other researchers. These
dimensionality [22]. The curse of dimensionality entails the suggestions are:
need for more data, which could be seen as a disadvantage 1) Train an LSTM model with more data and with data of
when evaluating the model holistically. Thirdly, the data was a different pattern,
12
2) Test a multivariate LSTM model, [14] ”Machine types — Compute Engine Documentation —
3) Evaluate how well a pre-trained model generalizes to a Google Cloud”, Google Cloud, 2021. [Online]. Available:
https://cloud.google.com/compute/docs/machine-types.
separate – but similar – company or application, [15] S. Stein, ”Översikt över köpmodell för virtuella kärnor - Azure
4) Research further how to apply the model in practice and SQL Database & Azure SQL Managed Instance”, Docs.microsoft.com,
what the practical implications are of a model that can 2021. [Online]. Available: https://docs.microsoft.com/sv-se/azure/azure-
sql/database/service-tiers-vcore?tabs=azure-portal.
accurately predict CPU utilization on a cloud server. [16] Mansor, Zulkefli Razali, Rozilawati Yahaya, Jamaiah Yahya, Saadiah
Arshad, Noor. (2016). Issues and Challenges of Cost Management in
Agile Software Development Projects. Advanced Science Letters. 22.
ACKNOWLEDGMENT 1981-1984. 10.1166/asl.2016.7752.
Foremost, we would like to express our sincerest gratitude [17] R. N. Calheiros, E. Masoumi, R. Ranjan and R. Buyya, ”Workload
Prediction Using ARIMA Model and Its Impact on Cloud Applications’
towards to the organizations and people who gave us invalu- QoS,” in IEEE Transactions on Cloud Computing, vol. 3, no. 4, pp.
able support in our research. 449-458, 1 Oct.-Dec. 2015, doi: 10.1109/TCC.2014.2350475.
A sincere thank you to Kemal Karahmetovic at Afry for [18] Cao, J., Fu, J., Li, M. and Chen, J. (2014), CPU load prediction for
cloud environment based on a dynamic ensemble model. Softw. Pract.
always being available to help us when necessary. Exper., 44: 793-804. https://doi.org/10.1002/spe.2231
We also wish to show our gratitude to the other employees [19] L. Nashold and R. Krishnan, ”Using LSTM and SARIMA Models
at Afry who helped us on at least one occasion. A big thank to Forecast Cluster CPU Usage”, arXiv.org, 2021. [Online]. Available:
https://arxiv.org/abs/2007.08092.
you to Patrik Sjölin, Emelie Hedqvist and Peter Wallberg. [20] Hamilton, James Douglas. Time series analysis. Princeton university
We want to thank our supervisors at KTH – Royal Institute press, 1994.
of Technology. Thank you Jonas Beskow and Mattias Wigg- [21] T. Mitchell, Machine Learning. New York: McGraw-Hill, 1997.
[22] I. Goodfellow, Y. Bengio and A. Courville, Deep learning. .
berg for valuable feedback during our research. [23] Elman, Jeffrey L. (1990). ”Finding Structure in Time”. Cognitive Sci-
ence. 14 (2): 179–211. doi:10.1016/0364-0213(90)90002-E
[24] Hochreiter, Sepp. ”The vanishing gradient problem during learning
R EFERENCES recurrent neural nets and problem solutions.” International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems 6.02 (1998): 107-
[1] J. Ward, ”The rise and rise of Cloud Computing”, ey.com, 2019. 116.
[Online]. Available: https://www.ey.com/en ie/technology/the-rise-and- [25] Sepp Hochreiter; Jürgen Schmidhuber (1997). ”Long short-
rise-of-cloud-computing. term memory”. Neural Computation. 9 (8): 1735–1780.
[2] ”Making the cloud pay: How industrial companies can accelerate doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014
impact from the cloud”, mckinsey.com, 2020. [Online]. Available: [26] Werbos, Paul J. ”Backpropagation through time: what it does and how
https://www.mckinsey.com/industries/advanced-electronics/our- to do it.” Proceedings of the IEEE 78.10 (1990): 1550-1560.
insights/making-the-cloud-pay-how-industrial-companies-can- [27] D. Kingma and J. Ba, ”Adam: A Method for Stochastic Optimization”,
accelerate-impact-from-the-cloud#. arXiv.org, 2021. [Online]. Available: https://arxiv.org/abs/1412.6980.
[3] S. Lohr, ”Cloud Computing Is Not the Energy Hog That Had Been [28] A. Ng, Machine Learning Yearning, 1st ed. 2018.
Feared (Published 2020)”, Nytimes.com, 2020. [Online]. Available: [29] C. Willmott and K. Matsuura, ”Advantages of the mean absolute
https://www.nytimes.com/2020/02/27/technology/cloud-computing- error (MAE) over the root mean square error (RMSE) in assessing
energy-usage.html. average model performance”, Int-res.com, 2021. [Online]. Available:
[4] M. Armbrust et al., ”A view of cloud com- https://www.int-res.com/articles/cr2005/30/c030p079.pdf.
puting”, dl.acm.org, 2021. [Online]. Available: [30] Abenius, T., 2021. Envägs variansanalys (ANOVA) för test av
https://dl.acm.org/doi/fullHtml/10.1145/1721654.1721672#T1. olika väntevärde i flera grupper. [online] Math.chalmers.se. Available
[5] Duggan, Martin Mason, Karl Duggan, Jim Howley, Enda Barrett, at: ¡http://www.math.chalmers.se/Stat/Grundutb/CTH/lma136/1112/25-
Enda. (2017). Predicting Host CPU Utilization in Cloud Computing ANOVA.pdf¿
using Recurrent Neural Networks. 10.23919/ICITST.2017.8356348. [31] Kuck, David (1978). Computers and Computations, Vol 1. John Wiley
[6] ”What Is Cloud Computing? A Beginner’s Guide — Microsoft & Sons, Inc. p. 12. ISBN 978-0471027164.
Azure”, Azure.microsoft.com, 2021. [Online]. Available: [32] ”Keras documentation: About Keras”, Keras.io, 2021. [Online]. Avail-
https://azure.microsoft.com/en-us/overview/what-is-cloud-computing/. able: https://keras.io/about/.
[7] P. Mell and T. Grance, ”The NIST definition of
Cloud Computing”, nist.gov, 2011. [Online]. Available:
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800- AUTHORS
145.pdf.
[8] ”AWS leads $42 bn global Cloud services market in Q1, Axel Rooth is currently studying Industrial Engineering and Management
Microsoft follows”, Business-standard.com, 2021. [Online]. about to pursue a Master’s degree in machine learning at KTH Royal Institute
Available: https://www.business-standard.com/article/technology/aws- of Technology in Stockholm, Sweden. He contributed to all parts in the
leads-42-bn-global-cloud-services-market-in-q1-microsoft-follows- research paper and was mainly responsible for implementing the algorithms.
121050200180 1.html. ”Hacking life since 98”
[9] ”Amazon EC2 Reserved Instances”, Amazon Web Services, Inc.,
2021. [Online]. Available: https://aws.amazon.com/ec2/pricing/reserved- Filip Nääs Starberg is currently studying Industrial Engineering and Man-
instances/. agement and about to pursue a Master’s degree in machine learning at KTH
[10] ”Pricing - Azure SQL Database Single Database — Microsoft Royal Institute of Technology in Stockholm, Sweden. He is a good person
Azure”, Azure.microsoft.com, 2021. [Online]. Available: and wants a career where he can earn a living. He contributed to all parts
https://azure.microsoft.com/en-us/pricing/details/azure-sql- in the research paper but was mainly responsible for researching the cloud
database/single/#pricing. computing market.
[11] ”Amazon S3 Simple Storage Service Pricing - Amazon Web Ser- ”Earning cred since 98”
vices”, Amazon Web Services, Inc., 2021. [Online]. Available:
https://aws.amazon.com/s3/pricing/.
[12] ”Knowledge center — Microsoft Azure”, Azure.microsoft.com,
2021. [Online]. Available: https://azure.microsoft.com/en-
us/resources/knowledge-center/what-is-a-vcore/.
[13] ”Optimize CPU options - Amazon Elastic Compute
Cloud”, Docs.aws.amazon.com, 2021. [Online]. Available:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-
optimize-cpu.html.
TRITA-EECS-EX-2021:366
www.kth.se