Metricas Web
Metricas Web
Metricas Web
Duarte Duarte
Duarte Duarte
Customers interact with e-commerce websites in multiple ways and the companies operating them
rely on optimizing success metrics for profit. Changing what, how and when content such as
product recommendations and ads are displayed can influence customers’ actions.
Multiple algorithms and techniques in data mining and machine learning have been applied in
this context. Summarizing and analysing user behaviour can be expensive and tricky since it’s hard
to extrapolate patterns that never occurred before and the causality aspects of the system are not
usually taken into consideration. Commonly used online techniques have the downside of having
a high operational cost. However, there has been studies about characterizing user behaviour and
interactions in e-commerce websites that could be used to improve this process.
The goal of this dissertation is to create a framework capable of running a multi-agent simula-
tion, by regarding users in an e-commerce website that react to stimuli that influence their actions.
By taking input from web mining, which includes both static and dynamic content of websites as
well as user personas, the simulation should collect success metrics so that the experimentation
being run can be evaluated.
i
ii
Resumo
Consumidores interagem com websites de comércio eletrónico de várias formas e as empresas que
os operam dependem da otimização de métricas de sucesso tais como CTR (Click through Rate),
CPC (Cost per Conversion), Basket e Lifetime Value e User Engagement para lucro. Alterar
como, onde e quando o conteúdo de páginas web como por exemplo recomendação de produtos e
publicidade é mostrado pode influenciar as ações dos consumidores.
Vários algoritmos e técnicas em data mining e machine learning têm sido aplicados neste
contexto. Sumarizar e analisar comportamento de utilizadores pode ser custoso e complicado
porque é difícil extrapolar padrões que nunca ocorreram antes e os aspetos causais do sistema
geralmente não são tidos em consideração. Técnicas online geralmente usadas têm o problema
de ter um custo operacional elevado. Porém, existem estudos sobre caracterizar comportamento
e interações de utilizadores em sites de comércio eletrónico que podem ser usados para melhorar
este processo.
O objetivo desta dissertação é criar uma framework capaz de correr uma simulação multi-
agente, tendo em conta os utilizadores de um site de comércio eletrónico que reagem a estímulos
que influenciam as suas ações. Extraindo dados de web mining, que inclui tanto conteúdo es-
tático como dinâmico de websites assim como de perfis de utilizadores, a simulação deve reportar
métricas de sucesso para que a experiência possa ser avaliada.
iii
iv
“In recent years, hundreds of the brightest minds of modern
civilization have been hard at work not curing cancer.
Instead, they have been refining techniques
for getting you and me to click on banner ads.”
Steve Hanov
v
vi
Contents
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem and goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5 Implementation 21
5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Website Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Navigation Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.3 Website Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.4 Simulation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.5 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
CONTENTS
5.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1 Multi-agent Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2 Simulation Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.3 Class Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.4 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Validation 33
6.1 Sanity checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.1 Expected number of agents in the simulation . . . . . . . . . . . . . . . 33
6.1.2 Expected number of visits . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.3 Expected bounce rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 Online store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.1 Input data and configuration . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
References 43
viii
List of Figures
ix
LIST OF FIGURES
x
List of Tables
2.1 Main building blocks of Web experience and their sub-categories [Con04] . . . . 8
5.1 Simulation running time (in seconds) for different number of navigation agents
and simulation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
xi
LIST OF TABLES
xii
Abbreviations
xiii
Chapter 1
Introduction
In this chapter we intend to introduce the report, starting by describing its context, motivations and
objectives that will drive the dissertation. It ends with a description of the report structure.
1.1 Context
Customers interact with e-commerce websites in multiple ways and the companies operating them
rely on optimizing success metrics such as CTR (Click through Rate), CPC (Cost per Conversion),
Basket and Lifetime Value and User Engagement for profit. Changing what, how and when content
such as product recommendations and ads are displayed can influence customers’ actions.
Multiple algorithms and techniques in data mining and machine learning have been applied in
this context.
1.2 Motivation
Modelling user behaviour on the web is not a new problem. It has been applied with different ob-
jectives, from improving the performance of cache servers, to the improvement of search engine,
influencing purchase patterns or recommending related pages or products [DK01, J S00]. How-
ever, all these approaches were done with a machine learning mindset – predicting which page the
user or customer will browse next. This requires extensive use of existing and historical training
datasets which might not expose all the causality aspects of the system. What if the data (or the
time needed to gather it) is simply not available?
1
Introduction
Let’s imagine that we developed a new recommendation engine algorithm. One of the most com-
mon ways to evaluate it is by testing the engine with A/B testing1 , which is a randomized experi-
ment where a group of users are presented with one version of the engine (control) and the other
group is shown the improved version of the engine. By analysing how the two groups behave
differently, it’s possible to assess the quality of the two versions, comparatively. However, this
approach may not be feasible in all situations. For the experiment to be statistically significant, the
number of users shown the two versions of the product must be enough. The experiment also takes
time to run and the metrics used to compare both versions might not be easy to choose. [Ama15].
The goal of this dissertation is to create a framework capable of running a multi-agent sim-
ulation (chapter 3), by regarding users in an e-commerce website and react to stimuli that influ-
ence their actions (chapter 2). Furthermore, some statistical constructs such as Baysian networks,
Markov chains or probability distributions (chapter 4) can be used to guide how these agents in-
teract with the system. By taking input from web mining (Web structure mining (WSM), Web
usage mining (WUM) and Web content mining (WCM)), which includes both static and dynamic
content of websites as well as user personas, the simulation should collect success metrics so that
the experimentation being run can be evaluated.
This dissertation is focused on the framework for the simulation and not on the required input
of the simulation, however that is a very important aspect. Luckily, web mining has been well
studied. The works of [Dia16] proposes a methodology to extract and combine the sheer amount of
data related to an e-commerce website, including structure, content and modelling user behaviour.
2
Introduction
The final chapter, chapter 7, concludes the work realized, presents the main contributions of
this dissertation and proposes future work.
3
Introduction
4
Chapter 2
In this chapter we discuss some key concepts related to e-commerce, for the purpose of giving
context to the dissertation. We discuss the typical customer life cycle in an e-commerce website,
some metrics that might be used and some ways on how the customer interaction with the website
might be influenced and improved.
2.1 Introduction
E-commerce, or electronic commerce, can be described by the trading of products or services over
the Internet (or other computer networks). The type of e-commerce businesses we are interested
are those who sell their goods directly to the customer, e.g online shopping, using an online store
or catalog of products. Some popular online stores [AI16] are Amazon1 , Ebay2 and Alibaba3 .
• Reach happens outside of the website and refers to the number of potential customers. For
example, if the online store is advertised on a social network, the reach is the number of
users who were served the ad in that other website, they may or may not ignore it.
1 http://www.amazon.com/
2 http://www.ebay.com/
3 http://www.alibaba.com/
5
Literature Review: E-commerce background
Loyalty
Retention
Conversion
Churn
Reactivation
Acquisition
Attrition
Reach
Abandonment
• Acquisition is the next stage, where the user decides to act on and visits the website (or
some other action like subscribing to a newsletter).
• Conversion is the stage where a visitor stops being a user and starts being a customer. It
usually means that the user made a purchase but some companies might consider a sign up
or registration in the website as a conversion.
• Retention focuses on making existing customers, that made at least one purchase before,
repeat purchases.
• Loyalty is a stronger form of retention, which represents a greater trust level of the customer
in the store.
• Abandonment is defined by the customers that started the buying process but do not finish
it. For example, a customer may add items to the online shopping cart but instead of moving
to the next step, e.g. enter credit card details, they exit the website or go elsewhere. This
may happen in any store with a multi-step buying process, which is very common.
• Attrition happens when a retained customer ceases buying from the store and starts using a
competitor store.
• Churn is defined by the number of customers that attrited during a certain period divided
by the total number of customers at the end of that period. It measures how much of the
customer base "rolls over" in a certain time period.
6
Literature Review: E-commerce background
[MAFM99] describes how CBMGs can be used to analyse the workload of an e-commerce
store server and how metrics can be derived directly from the CBMG alone.
• Conversion Rate (CR) is the percentage of visitors that buy a product or a service;
• Shopping Cart Abandonment is the percentage of visitors that added a product to the online
cart but did not complete the process;
7
Literature Review: E-commerce background
• Customer Lifetime Value (LTV) is the projected value that a customer will spend on the
store;
• Clicks to Buy (CTB) is the average number of clicks a visitor has to do to complete a buy
order;
• Churn Rate the percentage of customers that do not make a repeated purchase;
• Bounce Rate is the percentage of visitors that arrive at the homepage of the online store but
leave immediately, without clicking anything or visiting a different page.
There are other common metrics such as Acquisition Cost, Cost Per Conversion, Net Yield or
Connection Rate however they are associated with promotion campaigns that happen outside of
the store website, therefore they are not interesting in the context of our work.
Table 2.1: Main building blocks of Web experience and their sub-categories [Con04]
Functionality factors
Usability Interactivity
Convenience Customer
Site navigation Interaction with company personnel
Information architecture Customization
Ordering/payment process Network effects
Search facilities and process
Site speed
Findability/accessibility
Psychological factors Content factors
Trust Aesthetics Marketing mix
Transaction security Design Communication
Customer data misuse Presentation quality Product
Customer data safety Design elements Fulfillment
Uncertainty reducing elements Style/atmosphere Price
Guarantees/return policies Promotion
Characteristics
Regarding usability of the online store, providing a personalized experience to each customer
can be very beneficial for both the customer and the business. A common way to do this is by
8
Literature Review: E-commerce background
recommending products that the customer might be interested in [AT05]. For example, if we
know that a customer buys mostly football related products, recommending her more products in
the same category might increase sales.
2.6 Summary
In this chapter we covered a brief overview of e-commerce, starting with the customer lifecycle,
how to measure it using metrics and presenting a common way to model the users’ behaviour, the
CBMG.
9
Literature Review: E-commerce background
10
Chapter 3
This chapter intends to introduce some approaches to computational simulation systems and en-
gines, namely agent based and discrete event simulation. To finish the chapter, we show some
novel approaches to simulation.
3.1 Introduction
Simulations are used to reproduce the behaviour of a system. They have been applied to different
areas like physics, weather, biology, economics and many others. There are many types of sim-
ulations: stochastic or deterministic, steady-state or dynamic, continuous or discrete and local or
distributed [Wik15]. These categories are not exhaustive nor exclusive.
In this literature review, we are particularly interested in studying simulations which can model
stochastic processes and not dynamic (dynamic systems are usually described by differential equa-
tions and are continuous by definition).
In agent based simulation (ABS), sometimes described as agent based computing [Woo98, Jen99],
the individual entities in the model are represented discretely and maintain a set of behaviours,
beliefs or rules that determine how their state is updated. [Nia11] lists three different approaches
to agent based modelling and simulation:
• Multi-agent oriented programming focus on adding some intelligence to agents and observe
their interactions;
11
Literature Review: System Simulation
• Agent-based or massively multi-agent modelling where the main idea is to build simple
models for the agents which interact with a large population of other agents to observe the
global behaviour.
[SA10] describes ABS as “well suited to modelling systems with heterogeneous, autonomous
and pro-active actors, such as human-centred systems.”, which make them a good candidate to
be used in the development of this dissertation. However, existing literature is quite confusing
and broad, using different terms to refer to the same concepts, without clear distinctions between
different agent based approaches and without consensus [Nia11, Bra14].
Many platforms and frameworks were developed to support agent-based modelling and sim-
ulation. Some notable examples include Repast [Col03], NetLogo [WE99], StarLogo [Res96] or
MASON [PL05]. An updated list is maintained at OpenABM [Ope16].
Agents have been applied to e-commerce context mostly in two distinct areas: recommen-
dation systems [XB07, WBS08] and negotiation [RKP02, MGM99]. No relevant literature was
found regarding simulating user behaviour in websites with agents.
• Event, an occurrence that changes the state of the system (e.g a customer enters the website);
• Event list (or future event list or pending event set), a list of future events, ordered by time
of occurrence;
12
Literature Review: System Simulation
Event list is one of the fundamental parts of the system and it has been widely researched
[HOP+ 86, Jon86, TT00, DGW13].
Pidd [Pid98] proposes a three-phased approach that consists of: jump to the next chronolog-
ical event, executing all the unconditional events (or type B) that happen that moment and then
executing all the conditional events (or type C). This approach has advantages in terms of less
usage of resources compared to other simplistic approaches. Also, there has been studies on how
to scale DES to distributed and parallel (PDES) executions [Mis86, Fuj90].
[SMG+ 10] states that “DES is useful for problems (...) in which the processes can be well
defined and their emphasis is on representing uncertainty through stochastic distributions”, which
makes DES a good candidate to model the problem at hand.
3.5 Summary
This literature reviews shows that there is vast research regarding simulation, either agent based
or DES, however not everyone is speaking the same language. The extensions to DES seen above
are particularly interesting since they can be used to scale the simulation to a greater number of
entities as well as modelling real world processes with more fidelity.
13
Literature Review: System Simulation
14
Chapter 4
4.1 Introduction
Probabilistic or statistical models represent explicit assumptions about a problem domain, in the
form of a model. This model usually encompasses random variables1 , in the form of probability
distributions, and the relation and dependence between the variables. [WB13]
In the following sections we describe a common way to represent probabilistic models, prob-
abilistic graphical models (PGM) or, simply, graphical models.
A PGM is a graph based model where the nodes represent random variables and the (directed or
undirected) edges represent a conditional dependence between variables. An example is shown in
figure 4.1.
PGMs and their extensions, where we show some examples of them in the following sections,
are exceptionally well suited for reasoning and to reach conclusions based on available information
(both domain expert and data), even in the presence of uncertainty. PGMs provide a general
framework that allows representation, inference and learning on these models. [KF09]
There is extensive research and available literature in this area. Some notable examples in-
clude, but are not limited to, the books "Probabilistic Graphical Models: Principles and Tech-
niques" by Daphne Koller and Nir Friedman [KF09] and "Pattern Recognition and Machine
Learning" (Chapter 8: Graphical Models) by Christopher Bishop [Bis06]. It is also worth men-
tioning that there is a MOOC 2 named "Probabilistic Graphical Models", also by Daphne Koller
(Stanford), freely available on Coursera 3 .
1 Variable whose value is given by a probability distribution, commonly represented by Θ.
2 Massive Open Online Course
3 https://www.coursera.org/course/pgm
15
Literature Review: Probabilistic Models
B C
In the following sections, we describe three important categories of graphical models: Bayesian
networks, Markov random fields and its extension to hidden Markov models. There are plenty of
other graphical models however they were deemed not relevant enough to be included in this
literature review.
Formally [Pea88], a Bayesian network B represents a joint probability distribution (JPD) over
a set of variables U and can be defined by a pair B = hG, Θi. B is a DAG (directed acyclic graph)
where the vertices represent the random variables X1 , ..., Xn . Θ represents the set of parameters
4H : hypotesis or assumptions the probabilities are based
16
Literature Review: Probabilistic Models
that quantify the network. For each possible value xi of Xi , and ∏xi of ∏Xi (set of parents of Xi in
G), it contains a parameter θxi |∏x = PB (xi | ∏xi ). Therefore, the JPD can be defined as
i
n n
PB (X1 , ..., Xn ) = ∏ PB (Xi | ∏X ) = ∏ θXi |∏X (4.3)
i i
i=1 i=1
which expresses the factorization properties of the JPD. [Bis06, section 8.1.] goes in detail on
how to apply the equation 4.3.
These properties of Bayesian networks make it an excellent tool for expressing causal relation-
ships. Heckerman [Hec96] lists multiple advantages of Bayesian networks on modelling and data
analysis: “readily handles situations where some data entries are missing”, “gain understanding
about a problem domain and to predict the consequences of intervention”, “ideal representation
for combining prior knowledge and data” and “efficient and principled approach for avoiding the
overfitting of data”.
Regarding the area of e-commerce specifically, some research has been done where Bayesian
networks are applied. [NMK14] is an attempt at predicting sales in e-commerce using social media
data. [MCGM02] also proposes a Bayesian based model to predict online purchasing behaviour
using navigational clickstream data.
Markov random fields (MRF) or Markov networks are undirected graphical models [Kin80] (in
contrast to Bayesian networks which are directed and acyclic). The nodes still represent variables
or group of variables however the links do not carry arrows. The concept was originally proposed
as the general setting for the Ising model5 [Kin80]. Again, Bishop [Bis06] provides a very good
overview of this topic.
MRFs factorize as
1
p(x1 , ..., xn ) = ∏ ψC (xC ) (4.4)
Z C⊂C
where C is a clique6 of the graph and xC is the set of variables in that clique, Z is a constant
used to normalize the distribution (might be defined for each x), ψC is a compatibility or potential
function [WJ08, section 2.1.2] [Bis06, section 8.3]. The equation 4.4 highlights an important prop-
erty of MRFs: the Markov property or memoryless property. That is, the conditional probability
distribution of future states depends only on the present state.
Markov models were shown to be well suited for modelling and predicting e-commerce pur-
chasing and user’s browsing behaviour [DK01].
17
Literature Review: Probabilistic Models
Hidden Markov models (HMMs) are a PGM with unobserved or hidden states. They are consid-
ered a dynamic Bayesian network7 . They have been originally defined in the 60s by Baum and
colleagues [BP66]. [Rab89] defines HMMs as “the resulting model (...) is a doubly embeded
stochastic process that is not observable, but can only be observed though another set of stochastic
processes that produce the sequence of observations.”.
A common example found in literature is the Coin Toss Model [Rab89]: imagine someone on
one side of a curtain performing a coin (or multiple coin) tossing experiment. The other person
will not tell us about what she is doing, only the outcome of each coin flip (heads or tails). Multiple
HMMs can be built to explain the coin toss outcomes, i.e, assuming that one, two or more biased
coins are being used in the experiment. The figure 4.2 is a possible model that can account to 3
coins being tossed.
• N which is the number of states in the model where individual states are represented by
S = {S1 , ..., SN } and the state at time t is qt ;
18
Literature Review: Probabilistic Models
• M which is the number of distinct observation symbols per state (individual symbols are
represented by V = {V1 , ...,VM });
ai j = p(qt+1 = S j | qt = Si ), 1 ≤ i, j ≤ N (4.5)
πi = p(q1 = Si ), 1 ≤ i ≤ N (4.7)
4.6 Summary
In this section we reviewed the literature for graphical models. They provide a tool of excellence to
model real world phenomena, enabling decision making under uncertainty and noisy observations.
There are multiple categories of graphical models however we focused on Bayesian and Markov
networks and hidden Markov models, due to their applicability in the work at hand (in chapter 5
we will define how PGMs can be applied).
19
Literature Review: Probabilistic Models
20
Chapter 5
Implementation
This chapter presents the implementation of the framework. It starts by describing the method-
ology followed in the realization of the dissertation. It is followed by the identification of re-
quirements, the architecture of the framework, a study on the scalability of the solution and it is
finalized by the description of the technology used in the implementation.
5.1 Methodology
Like any software development project, a simulation project also has a life cycle. In this section
we describe the steps to apply in the simulation methodology, based on Ulgen et al. [UBJK94] and
Banks et al. [BCNN04, section 1.11], which can be summarized as follows:
1. Problem formulation: Clear statement of the problem by the analyst and stakeholders;
2. Setting of objectives and overall project plan: Questions to be answered by the simulation,
plans for the study, cost and number of days for each phase, with the results expected at
each stage;
3. Model conceptualization: Select, modify and iterate over the assumptions that characterize
the system;
4. Data collection: Collect the necessary data to run and validate the model, assuming that
required data will change with the increasing complexity of the system;
6. Verification: Making sure that the program behaves correctly accordingly to its inputs;
7. Validation: Calibration of the model, comparing the model against an actual system;
21
Implementation
9. Production runs and analysis: Estimate measures of performance for the systems that are
being simulated;
10. Documentation and reporting: Document both the program and the progress of the study;
11. Implementation: End result of the study, including the entire simulation process.
5.2 Requirements
We have looked at several e-commerce websites, both national and worldwide, like Amazon1 ,
eBay2 , PCDIGA3 , Clickfiel4 , KuantoKusta5 , and analysed features and characteristics common to
all of them, in order to better assess what the framework should be able to represent and model.
To keep things simple and realistically implementable in the given time frame, some limitations
had to be done. This section lists the requirements and assumptions of the framework.
• The common entry point is named homepage but it is possible to enter the website directly
from a different page;
• Pages have a purpose like displaying information about a product, listing multiple products,
informing about warranty and payment of products and services, etc.. We categorize the
pages by using tags;
• Product pages have, at least, the product name, its description and price;
• A virtual shopping cart is used as a staging area for the products that are going to be bought;
• Checkout is the act of taking all the products in the shopping cart and effectively buying and
paying them;
• Usually, a customer has to create an account and login in the website in order to buy some-
thing or access restricted pages.
1 https://www.amazon.com/
2 http://www.ebay.com/
3 https://www.pcdiga.com
4 http://clickfiel.pt/
5 http://www.kuantokusta.pt/
22
Implementation
Problem
Formulation
Setting of
objectives and
overall project plan
Model
Data collection
conceptualization
Model translation
No
Verified?
Yes No
No
Validated?
Yes
Experimental
design
Production runs
Yes
and analysis
Yes
More runs?
No
Documentation and
reporting
Implementation
23
Implementation
• A simulation run can have thousands of navigation agents entering the simulation at each
step;
5.2.5 Reporting
• Once a simulation run ends, it can be analysed by taking a look at its results, metrics and
other previously stored characteristics;
• At least two simulation runs can be put side and by side so they can be quickly compared;
• The calculated metrics should be relevant to the business, some examples are [Wat15]:
24
Implementation
– Bounce rate
– Conversion rate
– Total/average order value
– Average order value
– Items per order
– New visitor conversion rate
– Shopping cart sessions
– Shopping cart conversion rate
– Shopping cart abandonment rate
– Average session length
– Number of browsing sessions
– Page views per session
– Product views per session
5.2.6 Limitations
Some requirements did not make it to the actual implementation:
• Adding a product to the cart and the checkout are a single step;
• There are no customer accounts, logins, registration, sign ups or sign ins;
• Visual information about the pages and products is not represented, e.g., a customer cannot
pick a product to buy because its associated picture is appealing;
• Interactions with the website are limited and "hard coded" (listed in sub-section 5.3.1), not
extensible;
• The metrics gathered during the simulation are limited, we have implemented some of the
metrics listed above, non exhaustively.
5.3 Architecture
5.3.1 Multi-agent Architecture
The simulation framework encompasses two different kinds of agents, navigation agents and web-
site agents, as shown in figure 5.2.
Navigation agents represent users interacting with the website. They have a limited view of
the system: they have access to the website (pages and links between them) and they know the
25
Implementation
modifications
actions (browse, buy,
navigation activity
Website
current page they are visiting. Each simulation step, the framework asks each navigation agent
which action will they pick. The action may be to visit another page (BrowseToAction), exit
the website (ExitAction), add a product to the cart (AddToCartAction), finish the purchase
(CheckoutAction) or simply do nothing (IdleAction). Also related to the navigation agents
subsystem, an implementation of NavigationAgentFactory is used to decide how many nav-
igation agents are added to the system in each step. For example, a simplistic implementation
might create a fixed number of navigation agents or a different one closer to reality could follow a
Poisson distribution model [GÖ03].
Website agents are able to modify the pages before they are served to the users. They have
a broader view of the system than navigation agents. They are notified of all the actions that
navigation agents do. The most common use case of the website agents is to recommend products
to the users: before the page is served to a user, a website agent can modify a section of the page
to display a custom list of products, based on the previous activity of the other users or preferences
of the current user. However they are not limited to only recommendations, a website agent might
replace a page’s content entirely, increase or decrease the price of products (e.g promotions, sales),
do nothing, etc.
The framework does not assume how these agents behave however the interactions between
them are limited. The agents do not send messages between each other and may only interact
indirectly, through the framework (e.g a website agent modifies a page before it is "seen" by a
navigation agent). While a simulation run might have hundreds or thousands of navigation agents,
to simplify, each run only has one website agent instance (this does not impose a limit on the
solution, the agent can still be modelled after a composite agent6 ).
It is out of the scope of the framework to provide concrete implementations of the agents but
we provide 2 implementations of navigation agents and 3 implementations of website agents, as a
way to validate and verify the simulation runs. This will be further discussed in chapter 6.
26
Implementation
The simulation engine follows a fairly standard and simple discrete event simulation architecture,
as described in 3.3. The domain model we are dealing with allows certain simplifications of the
simulation:
• the event list only contains events scheduled for the next step;
• the events do not depend on other events, they do not require synchronization and may be
implemented in a single-threaded engine.
The process that the simulation engine follows is described next. In each simulation loop, the
engine starts by calling NEW NAVIGATIONAGENTS () which adds new navigation agents to the sim-
ulation. The number and type of these agents are decided by the NavigationAgentFactory.
After that, each navigation agent currently active (i.e did not leave the website) chooses an ac-
tion to do (buy, browse, etc.). Depending on the action that was picked, the engine updates its
internal state. The simulation state is represented by WebsiteState and contains statistics and
other performance metrics. Whenever the picked action implies presenting the navigation agent a
page from the website, the website agent can modify that page before it is presented, by calling
MODIFY PAGE ( NAVAGENT, PAGE ). The website agent is also notified about all actions that the
navigation agents do (NOTIFY ( NAVAGENT, ACTION )). The simulation is configured to end after a
fixed number of steps, otherwise it could run forever.
This process is illustrated in figure 5.3.
In this sub-section we describe all the classes used to represent all the entities in the simulation
engine (figure 5.4).
Website represents a website, it contains a set of pages and a reference to its homepage,
the entry point of the website. A Page has a set of links, which are all the outbound hyperlinks
that a page contains, it has a set of tags, which is used to categorize a page (e.g electronics
category, clothing category, cart page, product search page, etc.) and the page may also contain a
Product, if the page is a product page. A Product has a name, a description and a price.
The Simulation is an abstract class that contains an agenda which stores all the Actions
(an arbitrary function) to be executed in the next steps. It provides a way to enqueue work in the
simulation using SCHEDULE ( DELAY, ACTION ) and a RUN () method that consumes the agenda
until there’s no more work to do. A subclass of Simulation, WebsiteSimulation repre-
sents a simulation happening over websites. It contains the Website itself, a WebsiteState, a
NavigationUserFactory and a WebsiteAgent. The WebsiteState is used to keep track
27
Implementation
loop
loop
emitAction
(navigation agent)
action
updateState(navAgent, action)
notify(navAgent, action)
alt
modifyPage(navAgent, page)
browse
action
showPage(navAgent, page)
addToCart
action
showPage(navAgent, cartPage)
checkout
action
showPage(navAgent, homePage)
idle action
exit
action
of all the statistics and metrics that the simulation produces. This state can be stored in a database
to analyse the results once the simulation is finished.
NavigationAgent is an interface that represents users interacting with the website. Imple-
mentations of it have to implement EMITACTION, which returns the Action the agents wants to
do based on their internal state and their current page. These agents are added to the simulation by
an implementation of NavigationAgentFactory. WebsiteAgent is an interface that repre-
sents the agents that may modify the website and that are notified about all the navigation agent
activity. The code for these three interfaces is displayed in listing 5.1.
The mutability of the system is contained to the Simulation (due to its agenda) and
WebsiteState which is updated every simulation step.
The points of extensibility of the framework are the agents interfaces (NavigationAgent,
NavigationAgentFactory and WebsiteAgent) and WebsiteState (e.g provide addi-
tional tracking metrics or visualizations).
28
Implementation
1 trait NavigationAgentFactory[+T] {
2 def users: Iterator[List[T]]
3 }
4
5 trait NavigationAgent {
6 def emitAction(currentPage: Page, website: Website): Action
7 }
8
9 trait WebsiteAgent {
10 def modifyPage(page: Page, user: NavigationAgent): Page
11 def notifyUserAction(user: NavigationAgent, currentPage: Option[Page],
12 action: Action)
13 }
• The simulation list view (GET /simulations) (figure A.1) displays a table with all the
simulation runs stored in the database. It shows the identifier, name, agent types and times-
tamp of each run.
• The detail view (GET /simulations/<id>) (figure A.2) display information regarding
a single simulation run. This info describes the simulation and it contains data regarding
the types of the agents used, start and finish time of the simulation, collected metrics (e.g
bounce rate, conversion rate, total order value, etc.), visits per page, visits per page category,
purchases per product and others. This information is displayed using mostly tables and
charts.
5.4 Scalability
To assess the scalability and performance of the simulation engine, some benchmarks were made
and they are described next. The tests were ran in a Windows 10 laptop with a Intel R CoreTM i7-
29
Implementation
<<Trait>>
Action <<Trait>>
WebsiteAgent
id: String
BrowseToAction modifyPage(page: Page, visitor: User)
notifyUserAction(user: User, currentPage: Option[Page], action: Action)
websiteAgent 1
ExitAction
+T <<Trait>>
<<Trait>> NavigationAgent
AddToCartAction 1
NavigationAgentFactory id: String
userFactory emitAction(currentPage: Page, website: Website): Action
users(): Iterator[List[T]]
IdleAction getNewUserId(): String
tickCount: Long
websiteAgent1
CheckoutAction Website
website
1
navAgentFactory
1
0..1 product
1 product
Product
WebsiteSimulation
product + id: String page
+ name: String 1..* pages
1 1 links 1 homepage
+ description: String *
+ price: Double
+ currency: String 1 Page
page
1 id: String
tags: Set[String]
cartPage
Simulation
- curTime: Long
- agenda: Agenda
1 websiteState
- triggers: List[Trigger]
schedule(delay: Int)(block :=> Unit)
schedule(condition :=> Boolean)(action :=> WebsiteState
Unit)
currentTime: Long - visits: Map[Page, Long]
run() - visitsPerUser: Map[User, Long]
- step() - visitsPerCategory: Map[String, Map[String, Long]]
- processActions(actions: Seq[Action]) - purchases: Map[Product, Long]
- processTriggers() - uniqueUserCounter: Long
- hasNext(): Boolean + users: Map[User, Page]
- singleSessionCount: Long
- sessionCount: Long
- bounceRate: Double
- simulationStartTime: Option[Date]
- simulationEndTime: Option[Date]
display()
visitPage(user: User, page: Page]
addToCart(product: Product)
exit(user: User)
newUser()
startSimulation()
finishSimulation()
toJson(sim: WebsiteSimulation): String
saveToDb(sim: WebsiteSimulation): Future
4710HQ CPU @ 2.50GHz (8 CPUs) processor. A modified7 version of the library Benchmark.scala8
was used, which is based on Ruby’s Benchmark module9 . The focus is not necessarily in the raw
speed of the engine but rather in the variation of the simulation time when the number of agents in
the system or the number of steps of the simulation are increased.
The test performed consists of running the same simulation with an increasing number of
navigation agents and number of simulation steps, set up in the following way:
7 Changed each measurement to run the same block of code 10 times, drop the first 2 runs and take the average of
the 8 runs instead of running it only once.
8 https://github.com/balagez/Benchmark.scala
9 http://ruby-doc.org/stdlib-1.9.2/libdoc/benchmark/rdoc/Benchmark.html
30
Implementation
• Website: Toy sample website with 9 pages and 32 total links between pages (1 homepage,
1 cart page, 3 product list pages and 4 product pages);
• Navigation agent: Sample agent implementation which picks the next action randomly.
1
Configured with a chance of exiting the website of 3 and a change of adding a product to
1
the cart of 20 ;
The result of the 100 simulation runs is shown in figure 5.5 (whose data is in table 5.1). A
quick analysis shows that the simulation time scales linearly (R¯2 = 0.99149, σ = 0.00648) with
both the number of agents and the number of simulation steps. For instance, a simulation with
1000 steps and 10000 navigation agents (entering the system each step) took 41.95 seconds. These
initial results are very satisfactory however they should be improved, especially when the number
of steps is increased, so that simulations that span a longer period of time can be evaluated (e.g
simulate the effect of seasonal customers over an entire year).
50
45
R² = 0.9958
40
R² = 0.9913
400 steps
25 R² = 0.9875
500 steps
600 steps
20 R² = 0.9943
700 steps
R² = 0.995 800 steps
15
R² = 0.9768 900 steps
10 1000 steps
R² = 0.9979
5
R² = 0.9862
0
0 2000 4000 6000 8000 10000 12000
NUMBER OF AGENTS
Figure 5.5: Simulation running time for different number of navigation agents and simulation
steps
5.5 Technology
Scala10 was the language of choice to implement the framework and accompanying projects. Scala
is a statically typed, general purpose programming language that leverages both object oriented
10 http://www.scala-lang.org/
31
Implementation
Table 5.1: Simulation running time (in seconds) for different number of navigation agents and
simulation steps
b
b Agents
b 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Steps bb
b
100 0.62 0.69 1.45 1.72 1.81 2.34 2.76 3.23 3.42 4.16
200 0.67 1.43 2.20 3.17 3.84 4.49 5.22 6.30 6.93 7.89
300 0.95 2.07 3.90 4.44 5.72 7.01 8.05 11.62 11.89 12.53
400 1.32 2.91 4.42 5.88 7.69 9.39 12.01 12.93 14.15 16.18
500 1.58 3.48 5.34 7.49 11.03 11.44 13.65 15.77 18.20 20.00
600 2.10 4.31 6.59 9.26 11.52 14.71 19.44 21.47 22.83 24.29
700 2.55 5.24 7.97 11.14 13.89 18.91 18.89 22.04 26.33 33.35
800 2.56 6.10 10.46 14.48 16.31 18.56 24.49 27.52 31.19 36.26
900 2.77 6.94 11.37 15.97 18.77 22.11 25.36 33.56 36.57 39.21
1000 3.07 8.47 12.64 17.40 21.59 25.62 28.94 36.46 39.40 41.95
and functional programming paradigms, while being fully interoperable with the JVM (and Java).
The version of Scala used was 2.11.8 with sbt 0.13.1111 , the de facto build tool for Scala
projects.
The library Breeze (version 0.12)12 of the ScalaNPL13 package was used for numerical pro-
cessing and statistics.
Apache SparkTM 1.514 was used to train recommendation models in one of the implementations
of website agents.
GraphStream 1.315 was used to visualize websites as a dynamic graph.
To store simulation run results, the database MongoDB 3.2.316 was used due to its practicality
and rapid development.
The frontend was built with the Play Framework 2.517 .
11 http://www.scala-sbt.org/
12 https://github.com/scalanlp/breeze
13 http://www.scalanlp.org/
14 http://spark.apache.org/
15 http://graphstream-project.org/
16 https://www.mongodb.com/
17 https://www.playframework.com/
32
Chapter 6
Validation
To validate the framework 2 testbeds were prepared. The first is a collection of small fabricated
test cases where we compare the output of multiple simulation runs to the expected results. The
second case deals with a real use of the framework, applied to an online store.
This test compares the number of navigation agents expected to be alive at each simulation step
with the actual number of them.
At each simulation step, k navigation agents enter the system and pexit of them leaves, which
leads to the recurrence equation 6.1.
a = (1 − p ) k k(pexit − 1)((1 − pexit )n − 1)
1 exit
⇔ an = (6.1)
a = (1 − p ) (k + a
n exit n−1 )
pexit
• Website: Sample website with 9 pages and 32 total links between pages
1
• Navigation agent: Sample agent implementation with a chance of exiting the website of 3
(pexit );
33
Validation
2n
Replacing the values in equation 6.1: an = −200 . After a few simulation runs, the 3 −1
n
expected number of agents in the system stabilizes: limn→∞ −200 23 − 1 = 200.
The results of a simulation run were gathered and plotted in figure 6.1. Triangles (4) represent
the actual value (At ) and circles (•) represent the expected value (Et ) according to the equations
above. SMAPE (symmetric mean absolute percentage error)[Mak93] is used to measure the accu-
|Et −At |
racy of the results: SMAPE = n1 ∑t=1
n
|At |+|Et | = 3.07%, which is a reasonable low error rate given
the randomness of the system.
250
200
Number of agents
150
100
50
0
0 100 200 300 400 500 600 700 800 900 1000
Simulation steps
expected actual
• Website: Website configured as displayed in figure 6.2. The homepage links to 5 product
pages and the product page link to the cart page.
• Navigation agent: The agent picks one linked page randomly, however, if current page is
for a product, it always buys it. If the current page is the cart page, it leaves the website;
The table 6.1 displays the expected and observed number of visits for a simulation run as
described above. The percent error is calculated and the obtained results are very close to the
predicted values.
34
Validation
homepage
cart
35
Validation
This case compares the bounce rate for a website that only has one page. We define the bounce
rate as the percentage of navigation agent sessions that only view a single page before existing the
website.
The simulation run was configured in the following way:
As expected, the simulation results yield 100% bounce rate, all of the visits were to the home-
page, 10000 unique users (100 × 1000) and no purchases, as it can be seen on figure 6.3.
Figure 6.3: Screenshot of the frontend results for this test run
36
Validation
The website consists of 2540 pages with 343201 links between pages, spanning 25 base product
categories and 103 sub-categories. There are 750 product list pages, 1748 product pages, 1 cart
page and 41 uncategorised/generic pages, visualised in figure 6.4.
product lists
750 memories
drives
29% 105
keyboards / mice 36
227 portable
43
graphics cards
67
gaming
software
14 190
multimedia
138
sound cards
monitors 21
discs 107
products 139
1748 networks
69% 149
motherboards
128 accessories columns
416 56
Figure 6.4: Distribution of the type of pages in the website (left) and distribution of the categories
of the products (right)
To simulate users and customers (the NavigationAgents) interacting with this particular
website, a model based on affinities was built. This model is composed by the affinities themselves
(a mapping between product categories and the likelihood of the user liking or having interest
on products of that category), the probability of buying a product, the probability of exiting the
website and the arrival rate.
Because real usage website data is not available for this website, a sample profile was created
with the following properties: the affinities were set up as displayed in table 6.2, probability of
buying set to 5%, probability of leaving the website of 15% and a rate of arrival to the website
following a Poisson distribution with λ = 500.
Category Weight
Computadores 14.29%
MSI 14.29%
Pen Drives 7.14%
Portáteis 14.29%
Intel 2011 14.29%
Cartões de Memória 7.14%
Brand 14.29%
Processadores 14.29%
37
Validation
6.2.2 Simulation
The simulation was configured as described in the subsection above. All the navigation agents use
the same profile. The "thought" process for each agent is fairly simple: at each step, they try to
buy a product and exit the website in accordance to the probabilities defined a priori or navigate
to a different page based on their categories, with preference as stated by the affinity table. The
simulation was run for 30 steps.
6.2.3 Results
The results of a sample simulation run are summarized in the tables 6.3 and 6.4. They are expected:
the number of unique users is 14894 and the expected value is 15000 (500 × 25); the bounce rate
is 14.58% and the prior leaving rate is 15%; and the conversion rate is 4.77% and the prior buy
rate is 5%.
Table 6.3: Visits per category for a sample simulation run
38
Validation
Field Value
Unique users 14894
Bounce rate 14.58%
Conversion rate 4.77%
Purchases 676
NavAgentFactory AffinityFactory
NavAgent AffinityUser
WebsiteAgent DummyWebsiteAgent
Start time Thu Jul 07 14:14:36 BST 2016
End time Thu Jul 07 14:14:39 BST 2016
39
Validation
40
Chapter 7
This chapter concludes the work realized for the dissertation, it presents the main contributions
and proposes future work on the framework.
• A novel way to exploit multi-agent interaction, in this context, by using two representation
of agents, the navigation agents (i.e users) and website agents (i.e recommendation engines),
which interact with each other indirectly and form a feedback loop;
• A tool which appeals to both the academic community and the industry. The framework
can be used, for example, to validate and test recommendation engines and algorithms or be
used by an e-commerce company to optimize their own platform (e.g A/B testing);
• Implementation available to the community, licensed under MIT and hosted on GitHub1 .
41
Conclusion and Future Work
• Parallel simulator. The implemented discrete event simulator is single threaded and it
consumes the events sequentially, which limits the size of the simulation as seen in section
5.4. The implementation could change to a parallel simulator which hopefully processes
more events in the same time frame, taking advantage of multi-core setups.
• More metrics. We have implemented only a handful of metrics to be calculated after each
simulation run, however, there is a myriad of other metrics and statistics that we have not
looked at. Further development could increase the pool of available metrics.
• Metrics extensibility. Related to the point above, current implementation hardcodes the
calculation of certain metrics in the framework itself and it is not very practical to extend
and add new metrics to the system. The framework could be modified to ease the process
of adding new metrics, the visitor design pattern [Gam95] seems particularly well suited for
this task.
• Metrics for website agents. While there are plenty of metrics for the navigation agents (i.e
users/consumers), metrics for website agents (i.e recommendation engines) were overlooked
and are not present in the current implementation. It might be useful to gather metrics and
statistics regarding the behaviour of website agents.
• Hypothesis testing. Especially relevant when comparing two simulation runs, simply com-
paring single numeric metrics side by side might not be the best approach. In the field of
statistical hypothesis testing (e.g A/B testing) there has been plenty of research in which
standard tests to use for each case. For example, to compare conversion rates Fisher’s exact
test could be used however to compare the number of products bought a χ 2 test would be
more appropriate [Wik16].
• Limited number of actions. The actions emitted by the navigation agents are finite and not
exhaustive. The framework does not currently support extending the number of actions.
• Visual aspects. Pages are currently represented by their name/URL, tags and links, leaving
no space to represent visual information and other meta-data. A navigation agent cannot
use visual properties of the pages or products (usability, aesthetics) to decide on which
action to do. If the intention is to model human behaviour with fidelity, this might be a
major hindrance. A future version of the framework should take into account these aspects
however it requires further research, since it is not obvious how to model and represent these
concepts.
42
References
[ADW02] Corin R Anderson, Pedro Domingos, and Daniel S Weld. Relational Markov Models
and their Application to Adaptive Web Navigation. pages 143–152, 2002.
[AI16] Inc. Alexa Internet. Alexa - top sites by category: Shopping, 2016. [Online; accessed
24-June-2016].
[Ama15] Xavier Amatriain. How do you measure and evaluate the quality of recommendation
engines?, 2015. Available on http://qr.ae/RUNcIK, accessed last time at 14 of
February 2016.
[AT05] Gediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of rec-
ommender systems: A survey of the state-of-the-art and possible extensions. IEEE
Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005.
[Bay63] Mr Bayes. An essay towards solving a problem in the doctrine of chances. by the
late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs
philosophical transactions, 53: 370–418. URL http://rstl. royalsocietypublishing.
org/content/53/370. short, 1763.
[BCNN04] Jerry Banks, John Carson, Barry L Nelson, and David Nicol. Discrete-Event System
Simulation. PrenticeHall international series in industrial and systems engineering,
page 624, 2004.
[BE+ 67] Leonard E Baum, John Alonzo Eagon, et al. An inequality with applications to
statistical estimation for probabilistic functions of markov processes and to a model
for ecology. Bull. Amer. Math. Soc, 73(3):360–363, 1967.
[BP66] Leonard E. Baum and Ted Petrie. Statistical Inference for Probabilistic Functions
of Finite State Markov Chains. The Annals of Mathematical Statistics, 37(6):1554–
1563, 1966.
[Col03] Nick Collier. Repast: An extensible framework for agent simulation. The University
of Chicago’s Social Science Research, 36:371–375, 2003.
43
REFERENCES
[Con04] Efthymios Constantinides. Influencing the online consumer’s behavior: the Web
experience. Internet Research, 14(2):111–126, 2004.
[DGW13] Tom Dickman, Sounak Gupta, and Philip A. Wilsey. Event pool structures for pdes
on many-core beowulf clusters. In Proceedings of the 1st ACM SIGSIM Conference
on Principles of Advanced Discrete Simulation, SIGSIM PADS ’13, pages 103–114,
New York, NY, USA, 2013. ACM.
[Dia16] João Pedro Matos Teixeira Dias. Reverse engineering static content and dynamic
behaviour of e-commerce websites for fun and profit, July 2016.
[DK01] M Deshpande and G Karypis. Selective Markov Models for Predicting Web Page
Access. Proc. of First SIAM Intl Conf on Data Mining, 4(2):163–184, 2001.
[FJ05] G David Forney Jr. The viterbi algorithm: A personal history. arXiv preprint
cs/0504020, 2005.
[FRJ11] Pau Fonseca i Casas, Miquel Ramo Nñerola, and Angel A. Juan. Using specifica-
tion and description language to represent users’ profiles in OMNET++ simulations.
Proceedings of the 2011 Symposium on Theory of Modeling & Simulation: DEVS
Integrative M&S Symposium, 2011.
[Fuj90] Richard M. Fujimoto. Parallel discrete event simulation. Commun. ACM, 33:30–53,
1990.
[Gam95] Erich Gamma. Design patterns: elements of reusable object-oriented software. Pear-
son Education India, 1995.
[GÖ03] Şule Gündüz and M Tamer Özsu. A poisson model for user accesses to web pages.
In Computer and Information Sciences-ISCIS 2003, pages 332–339. Springer, 2003.
[HOP+ 86] James O Henriksen, Robert M O’Keefe, C Dennis Pegden, Robert G Sargent,
Brian W Unger, and Douglas W Jones. Implementations of time (panel). Proceed-
ings of the 18th conference on Winter simulation, pages 409–416, 1986.
[J S00] M Deshpande J Srivastava, R Cooley. Web Usage Mining : Discovery and Applica-
tions of Usage Patterns from Web Data. ACM SIGKDD Explorations Newsletter 1.2,
1(2):12–23, 2000.
[KF09] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and
techniques. MIT press, 2009.
44
REFERENCES
[KKK13] Aditya Kurve, Khashayar Kotobi, and George Kesidis. An agent-based framework
for performance modeling of an optimistic parallel discrete event simulator. Complex
Adaptive Systems Modeling, 1(1):12, 2013.
[Mac05] David J C MacKay. Information Theory, Inference, and Learning Algorithms David
J.C. MacKay, volume 100. 2005.
[MAFM99] Daniel A. Menascé, Virgilio A. F. Almeida, Rodrigo Fonseca, and Marco A. Mendes.
A Methodology for Workload Characterization of E-commerce Sites. Proceedings
of the 1st ACM conference on Electronic commerce - EC ’99, pages 119–128, 1999.
[Mak93] Spyros Makridakis. Accuracy measures: theoretical and practical concerns. Interna-
tional Journal of Forecasting, 9(4):527–529, 1993.
[MCGM02] Wendy M Moe, Hugh Chipman, Edward I George, and Robert E McCulloch. A
Bayesian Treed Model of Online Purchasing Behavior Using In-Store Navigational
Clickstream. (April), 2002.
[MGM99] Pattie Maes, Robert H Guttman, and Alexandros G Moukas. Agents that buy and
sell. Communications of the ACM, 42(3):81–ff, 1999.
[Mis86] Jayadev Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39–
65, March 1986.
[MJ00] James H Martin and Daniel Jurafsky. Speech and language processing. International
Edition, 2000.
[Nia11] Muaz Ahmed Khan Niazi. Towards A Novel Unified Framework for Developing For-
mal , Network and Validated Agent-Based Simulation Models of Complex Adaptive
Systems. page 275, 2011.
[NMK14] Wamukekhe Everlyne Nasambu, Waweru Mwangi, and Michael Kimwele. Predict-
ing Sales In E-commerce Using Bayesian Network Model. 11(6):144–152, 2014.
[Pea85] Judea Pearl. Bayesian Networks A Model of Self-Activated Memory for Evidential
Reasoning, 1985.
[PL05] Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art.
Autonomous Agents and Multi-Agent Systems, 11(3):387–434, 2005.
[Rab89] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications
in speech recognition, 1989.
[Ram31] Frank P Ramsey. Truth and Probability. The Foundations of Mathematics and Other
Logical Essays, Ch. VII(1926):156–198, 1931.
45
REFERENCES
[Res96] Mitchel Resnick. Starlogo: An environment for decentralized modeling and decen-
tralized thinking. In Conference companion on Human factors in computing systems,
pages 11–12. ACM, 1996.
[RKP02] Iyad Rahwan, Ryszard Kowalczyk, and Ha Hai Pham. Intelligent agents for auto-
mated one-to-many e-commerce negotiation. In Australian Computer Science Com-
munications, volume 24, pages 197–204. Australian Computer Society, Inc., 2002.
[SA10] Peer-Olaf Siebers and Uwe Aickelin. Introduction to Multi-Agent Simulation. Ecol-
ogy, pages 1–25, 2010.
[SC00] Jim Sterne and Matt Cutler. E-Metrics: Business Metrics for the New Economy.
page 61, 2000.
[Sha88] Ross D Shachter. Decision Making Using Probabilistic Inference Methods. 1988.
[SMG+ 10] P. O. Siebers, Charles M. Macal, J. Garnett, D. Buxton, and M. Pidd. Discrete-
event simulation is dead, long live agent-based simulation! Journal of Simulation,
4(3):204–210, 2010.
[TK74] Amos; Tversky and Daniel Kahneman. Judgment under Uncertainty: Heuristics and
Biases. Science (New York, N.Y.), 185(4157 (Sept. 27, 1974)):1124–31, 1974.
[TT00] Kah Leong Tan and Li-Jin Thng. Snoopy calendar queue. In Proceedings of the
32Nd Conference on Winter Simulation, WSC ’00, pages 487–495, San Diego, CA,
USA, 2000. Society for Computer Simulation International.
[UBJK94] Onur M. Ulgen, John J. Black, Betty Johnsonbaugh, and Roger Klungle. Simula-
tion Methodology - a Practitioner’S Perspective. International Journal of Industrial
Engineering, Applications and Practice, 1(2):16, 1994.
[Var01] Andras Varga. The OMNeT++ Discrete Event Simulation System. Proceedings of
the European Simulation Multiconference, pages 319–324, 2001.
[Wat15] IBM Watson. Cyber Monday Report 2015. Technical report, 2015.
[WB13] John Winn and Christopher Bishop. Model-based machine learning, 2013. Available
on http://www.mbmlbook.com/, accessed last time at 9 of February 2016.
[WBS08] Frank Edward Walter, Stefano Battiston, and Frank Schweitzer. A model of a trust-
based recommendation system on a social network. Autonomous Agents and Multi-
Agent Systems, 16(1):57–74, 2008.
[WE99] Uri Wilensky and I Evanston. Netlogo: Center for connected learning and computer-
based modeling. Northwestern University, Evanston, IL, pages 49–52, 1999.
[Wik15] Wikipedia. Computer simulation — wikipedia, the free encyclopedia, 2015. [Online;
accessed 14-February-2016].
[Wik16] Wikipedia. A/b testing — Wikipedia, the free encyclopedia, 2016. [Online; accessed
26-June-2016].
[WJ08] Martin J. Wainwright and M I Jordan. Graphical Models, Exponential Families, and
Variational Inference. Foundations and Trends in Machine Learning, 1(1-2):1–305,
2008.
46
REFERENCES
[XB07] Bo Xiao and Izak Benbasat. E-commerce product recommendation agents: Use,
characteristics, and impact. Mis Quarterly, 31(1):137–209, 2007.
[XY09] Yi Xie and Shun-zheng Yu. A Large-Scale Hidden Semi-Markov Model for Anomaly
Detection on User Browsing Behaviors. IEEE/ACM Transactions on Networking,
17(1):54–65, 2009.
47
REFERENCES
48
Appendix A
The following images are screenshots of the interface developed to visualize simulation runs.
49
Graphical User Interface
50
Figure A.1: Screenshot of the simulations list page
51
Graphical User Interface
52
Figure A.3: Screenshot of the simulation comparison page