Acta Informatica (2024) 61:1–21
https://doi.org/10.1007/s00236-023-00445-5
ORIGINAL ARTICLE
Discovering workflow nets of concurrent iterative processes
Tonatiuh Tapia-Flores2 · Ernesto López-Mellado1
Received: 22 August 2022 / Accepted: 10 August 2023 / Published online: 14 September 2023
© The Author(s) 2023
Abstract
A novel and efficient method for discovering concurrent workflow processes is presented. It
allows building a suitable workflow net (WFN) from a large event log λ, which represents
the behaviour of complex iterative processes involving concurrency. First, the t-invariants
are determined from λ; this allows computing the causal and concurrent relations between
the events and the implicit causal relations between events that do not appear consecutively
in λ. Then a 1-bounded WFN is built, which could be eventually adjusted if its t-invariants
do not match with those computed from λ. The discovered model allows firing all the traces
in λ. The procedures derived from the method are polynomial time on |λ|; they have been
implemented and tested on artificial logs.
1 Introduction
Modelling is a crucial stage in the design of a system; obtained models from systems functioning specifications are helpful to synthesise control or management systems of discrete
event processes. Conversely, for analysing an existing system, obtaining a model of such a
system is worthwhile. In this case, models are built manually by an expert on the process or
automatically by a computer tool that handles the behaviour exhibited by the process.
Computer aided modelling
The automated modelling of discrete event processes from event data in the form of event
sequences issued by the process is a challenging approach to performing reverse engineering
analysis. Nowadays, several work teams in the research areas of discrete event systems (DES)
and workflow management systems (WMS) are interested in this research matter.
The first publications on automated modelling, called language learning techniques, have
been proposed in computer science. The goal was to obtain formal models (finite automata
or grammars) to represent languages from positive samples of accepted words [1, 2].
B
Ernesto López-Mellado
[email protected]
Tonatiuh Tapia-Flores
[email protected]
1
CINVESTAV Unidad Guadalajara, Av. del Bosque 1145, 45019 Zapopan, Jal., Mexico
2
Nextiva Mexico, Rubén Darío 425, 44680 Guadalajara, Jal., Mexico
123
2
T. Tapia-Flores, E. López-Mellado
Process identification
In the DES area, the problem is named process identification; several approaches and
methods have been proposed to build models that represent the behaviour of automated manufacturing processes, exhibited as sequences of events recorded during the process execution.
The incremental approach proposed in [3, 4], obtains 1-bounded interpreted Petri nets (PN)
from a large stream of the process output signals. In Giua and Seatzu [5], a method based on
the statement and solution of an integer linear programming problem is proposed; it allows
building PN from a set of event sequences. Extensions to this method have been proposed in
[6, 7]. Then, in Klein et al. [8], a technique to determine finite automata from input–output
sequences is presented; it is applied to fault detection in industrial processes; an extension
to this method allows obtaining distributed models [9]. In Estrada-Vargas et al. [10], input–
output identification of automated manufacturing processes is addressed; the identification
method builds an interpreted PN from a set of sequences of input–output vectors sampled
from the controller during the cyclic operation of the system. This method is extended to
incrementally update the model when new sequences are processed [11]. Surveys on DES
identification are presented in [12, 13].
Process discovery
In the research area of WMS [14–16], the equivalent concern is named process discovery;
in its statement, the systems dealt with are business processes whose behaviour is presented
by a multiset of task sequences from a finite alphabet. Earlier methods have been proposed
in [17, 18]; in [19], Agrawal proposed a method in which a finite automaton, called the
conformal graph, is obtained. In Cook [18], presented a probabilistic technique to determine
the concurrent and direct relations between tasks; the obtained model is a graph akin a PN.
Later, in [20], a technique to discover finite automata from task sequences is presented. In
Wang et al. [21] a discovery method called Algorithm Alpha is presented; in this method, an
event log composed of several traces is mined, yielding a subclass of PN called workflow net
(WFN). Numerous publications present extensions of this algorithm, namely [21–23]. Wide
overviews on discovery techniques are [24–28].
Problem and approach
The aim of the discovery/identification methods is to build models that represent the
behaviour captured in the event log. However, a common issue in the current discovery
techniques is that the obtained models represent more behaviour than that exhibited by the
process; that is, the language of the model built is greater than that represented by the event
log. This problem arises when concurrent iterative processes are dealt with.
In this paper, a novel model discovery method is proposed. The technique allows synthesising
a suitable WFN from a large log λ of task traces, which include the iterative behaviour drawn
from complex business processes that exhibit concurrency and iterations. The obtained WFN
has a reduced surplus language with respect to the event log.
The discovery method follows the approach held in [29, 30] for DES and it is adapted to the
WMS field for dealing with WFN; furthermore, an important extension is proposed allowing
addressing more complex behaviours such as causal dependencies between tasks that do not
appear consecutively in the traces; this feature allows reducing the exceeding language. The
method determines from a log λ, causal and concurrency relations between tasks and the
t-invariants Y of the PN to discover. Then, the obtained t-invariants allow determining a first
structure of a PN N1 . Afterwards, N1 can be adjusted if some of its t-invariants J do not
agree with those derived from λ(Y ). This paper is a revised extension of the conference paper
presented in [31] in which the method for inferring the t-invariants is presented.
123
Discovering workflow nets of concurrent iterative processes
3
All the algorithms derived from this method have polynomial-time complexity. They have
been implemented and tested with artificial logs obtained from known WFN inspired from
models reported in literature. Tests on artificial logs and the computational complexity are
compared with those obtained by the process mining method named the alpha++ algorithm [32].
Paper organization
The paper is organised as follows. In Sect. 2, the basic notions on PN are recalled. Section 3
formulates the discovery problem. Section 4 introduces basic relations derived from the tasks
sequences. Section 5 proposes a technique for determining the t-invariants from λ. In Sect. 6,
the WFN discovery method is presented. Section 7 outlines implementation and presents the
tests. Finally, section 8 discusses the main features and limitations of the proposed method
in the scope of relevant related works.
2 Background
This section recalls the basic concepts and notation of ordinary PN and WFN used in this
paper.
Definition 1 An ordinary Petri Net structure G is a bipartite digraph represented by the 3-tuple
G = (P, T, F); where:
• P = { p1 , p2 , ..., p|P| } and T = {t1 , t2 , ..., t|T | } are finite sets of nodes named places and
transitions, respectively;
• F ⊆ P × T ∪ T × P is a relation representing the arcs between the nodes. For any node
x ∈ P ∪ T , · x = {x|(x, y) ∈ F} and x · = {y|(y, x) ∈ F}.
• The incidence matrix of G is C = C + − C − , where
C − = [ci−j ]; ci−j = 1 if ( pi , t j ) ∈ F, and ci j = 0 otherwise; and
C + = [ci+j ]; ci+j = 1 if (t j , pi ) ∈ F, and ci j = 0 otherwise;
C − and C + are called the pre-incidence and post-incidence matrices, respectively.
Thus, C = [ci j ], where ci j ∈ {−1, 0, 1}
Definition 2 A marking M : P → N≥0 determines the number of tokens within the places;
where N≥0 is the set of non-negative integers. A marking M, usually denoted by a vector
(N)|P| , describes the current state of the modelled system.
PN dynamics
• A Petri Net system or Petri Net (P N ) is the pair N = (G, M0 ), where G is a P N
structure and M0 is an initial marking.
• In a PN system, a transition t j is enabled at marking Mk if ∀ pi ∈ P, Mk ( pi ) ≥ ci−j .
• An enabled transition t j can be fired reaching a new marking Mk+1 . This behaviour is
tj
represented as Mk −→ Mk+1 . The new marking can be computed as Mk+1 = Mk + Cu k
where u k is the firing vector; when t j is fired u k ( j) = 1 whilst u k (i) = 0, ∀i = j. This
equation is called the PN state equation.
• The reachability set of a P N is the set of all possible reachable markings from M0 firing
only enabled transitions; this set is denoted by R(G, M0 ).
Definition 3 A PN system is 1-bounded or safe iff, for any Mi ∈ R(G, M0 ) and any p ∈
P, Mi ( p) ≤ 1. A PN system is live iff, for every reachable marking Mi ∈ R(G, M0 ) and
t ∈ T there is a Mk ∈ R(G, Mi ) such that t is enabled in Mk .
123
4
T. Tapia-Flores, E. López-Mellado
Definition 4 A t-invariant Yi of a P N is a non-negative integer solution to the equation
CYi = 0. The support of Yi (t-support) denoted as < Yi > is the set of transitions whose
corresponding elements in Yi are positive. Yi is said to be minimal if its support is not
included in the support of other t-invariants. A t-component G(Yi ) is a subnet of PN induced
by a < Yi >: G(Yi ) = (Pi , Ti , Fi ), where Pi = · < Yi > ∪ < Yi > ·, Ti =< Yi >,
Fi = (Pi × T j ∪ Pi × T j ) ∩ F. In a t-invariant Yi , if we have initial marking (M0 ) that enables
a ti ∈< Yi >, when ti is fired, then M0 can be reached again by firing only transitions in
< Yi >.
Definition 5 A WorkFlow net (WFN) N is a subclass of PN owning the following properties
[14]: (1) it has two special places: i and o. Place i is a source place: · i = ∅, and place o is
a sink place: o· = ∅. (2) If a transition te is added to PN connecting place o to i, then the
resulting PN called extended WFN is strongly connected.
Definition 6 A WFN N is said to be sound iff (N , M0 ) is safe and for any marking Mi ∈
R(N , M0 ), o ∈ Mi → Mi = [o] and [o] ∈ R(N , Mi ) and (N , M0 ) contains no dead
transitions. The extended WFN of a sound WFN is live and 1-bounded.
3 The discovery problem
In this section, the problem of WFN discovery in the context of workflow management
systems is formulated and then, the proposed method is outlined.
3.1 Problem statement
Definition 7 Let T = {t1 , t2 , . . . , tn } be a finite set of tasks; a workflow log λ is a multiset of
tasks traces σi ∈ T ∗ , |σi | < ∞. Given a workflow log λ = {σ1 , σ2 , . . . σn }, the PN discovery
problem consists of building a sound WFN using only transitions in T , which reproduces the
observed log. The number of places is unknown.
Example 1 Consider the log λ = {σ1 , σ2 , σ3 , σ4 , σ5 , σ6 , σ7 } composed by the following tasks
traces as the result of the execution of some process. σ1 = t1 t6 t3 t4 t7 ; σ2 = t1 t3 t6 t4 t5 t3 t4 t7 ;
σ3 = t2 t3 t4 t5 t3 t4 t8 ; σ4 = t2 t3 t4 t8 ; σ5 = t1 t3 t4 t5 t6 t3 t4 t7 ; σ6 = t1 t3 t4 t5 t3 t4 t6 t7 ; σ7 = t1 t3 t4 t9 .
A suitable discovery technique should be able to build a model such as depicted in Fig. 1
from the previous traces.
3.2 Assumptions
It is assumed that the event log is complete and generated by a process that behaves as an
unknown sound WFN, which has no duplicate task labels or silent transitions. The soundness
requirement implies that the process behaviour captured in the event log corresponds to
a “well-behaved” process, which does not exhibit deadlocks or anomalies such as buffer
overflows.
123
Discovering workflow nets of concurrent iterative processes
5
Fig. 1 Sound workflow net
3.3 Outline of the method
The proposed discovery method obtains a model of an unknown or ill-known process; the
model reproduces the observed behaviour and exhibits the causality and concurrency relationships between the tasks.
The method builds a 1-bounded PN (a WFN including a transition te ) from which all
the task traces σi in the log λ can be fired. It focuses on the computation of the causal and
concurrent relations between the tasks. This is accomplished by computing the t-invariants
Y from λ, which must exist in a strongly connected PN exhibiting iterative behaviour; also,
t-invariants are used to find causal relations between tasks that do not appear consecutively
(called implicit causal relations).
In the first stage, the method determines from λ several binary relations between transitions; based on these relations, the t-invariants are discovered. Then, the causal and concurrent
relations are determined, and together with the computed t-invariants, a first structure N of
a WFN is built. Finally, the t-invariants are used again for adjusting the language of N by
determining implicit causality between tasks.
4 Basic concepts and relations
First, several relations obtained directly from λ are introduced. Some definitions have been
taken and adapted from [29, 31].
4.1 Structuring the observed behaviour
Definition 8 (Event precedence relation) The precedence relationship between transitions
that are observed consecutively is stated by the relation R<⊆ T × T , which is defined as
R< = {(ta , tb )|∃σi ∈ λ. σi ( j) = ta and σi ( j + 1) = tb ; 1 ≤ j < |σi | − 1}; σi ( j) denotes the
symbol in position j in the trace σi . Thus, ta R< tb (denoted also as ta < tb ) expresses that ta
has been observed before tb in at least one trace σi . When ta is related by R< to more than
one task, this will be denoted as ta < t1 , t2 , . . . , tn . The relationship between transitions that
123
6
T. Tapia-Flores, E. López-Mellado
never occur consecutively in the traces of λ is given by T × T \R< ; a pair in this relation is
denoted as ta >< tb .
4.1.1 Causal and concurrency relationships
The aim of existing PN discovery methods is to determine from observed precedence between
transitions, actual causal or concurrency relationships between transitions, which will be
useful to build a PN structure. Below, notions and properties for determining causality and
concurrency relationships are stated.
Definition 9 (Two-length cycles) Two transitions ta , tb are in a two-length cycle relation (Tc)
if the tasks traces in the log λ contain the sub-sequences ta tb ta or tb ta tb . Tc is the set of
transition pairs (ta , tb ) fulfilling this condition.
It is clear that simple substructures of PN can be straightforward determined from Tc.
Definition 10 (Causal and concurrent relations) Every pair of consecutive transitions
(ta , tb ) ∈ R< may be classified into one of the following relationships:
• Causal relationship, denoted as [ta , tb ], expresses that the occurrence of ta enables tb ;
in a PN structure this implies that there must be at least one place from ta to tb . The
set of transition pairs in a causal relation in λ, named Causal R, is defined as follows:
Causal R = {(ta , tb )|(ta < tb ∧ ¬(ta ||tb )) ∨ (ta , tb ) ∈ T c).
• Concurrent relationship, denoted as ta ||tb . It means that when both ta and tb are simultaneously enabled, if ta fires first, tb is not disabled and vice versa; in a PN structure, ta ||tb
implies that there are no places connecting ta to tb and viceversa. ta ||tb is determined if
(ta , tb ), (tb , ta ) ∈ R< , i.e., ta , tb have been observed consecutively in the tasks traces in
the log λ in both orders, and ta , tb do not form a T c. Then, the set of concurrent transition
pairs derived from λ is Conc R = {(ta , tb )|ta < tb ∧ tb < ta ∧ (ta , tb ∈
/ T c)}
Example 2 From the tasks traces of Example 1, the following relations among tasks
were found (t1 < t3 , t6 ), (t2 < t3 ), (t3 < t4 , t6 ), (t4 < t5 , t6 , t7 , t8 , t9 ), (t5 <
t3 , t6 ), (t6 < t7 , t3 , t4 , t5 ), T c = ∅, Conc R = {(t3 , t6 ), (t4 , t6 ), (t5 , t6 )}, and Causal R =
{(t1 , t3 ), (t1 , t6 ), (t2 , t3 ), (t3 , t4 ), (t4 , t5 ), (t4 , t8 ), (t4 , t7 ), (t4 , t9 ), (t5 , t3 ), (t6 , t7 )}.
5 Discovering t-invariants
During the execution of workflow processes, the tasks occur sequentially as cases; this is
captured as task traces. If for all cases, their tasks appear once, there is no iterative behaviour
captured in the traces; then the alphabet of every trace is the support of a t-invariant. However,
processes often include repetitive subprocesses, such as those modelled by the WFN in Fig. 1;
for such processes, extracting the minimal support of t-invariants is not trivial.
This section describes a novel algorithm to derive the minimal supports of t-invariants
from an event log that includes traces involving repetitive behaviour. We will refer in the
presentation to the t-invariants of the extended WFN.
The t-invariants computed form λ are those that must fulfil the structure of the WFN
to be built. Thus, the method presented herein determines the t-invariants of the unknown
WFN that generates the log. Several notions used for defining the t-invariants computation
technique are introduced below.
A trace σ that contains transitions t j such that #(t j , σ ) > 1, where #(t j , σ ) is the number
of occurrences of t j in σ , includes sub-sequences representing a repetitive behaviour.
123
Discovering workflow nets of concurrent iterative processes
7
Fig. 2 G r graph of the cycs in
Example 1
Definition 11 (Cyclic sub-sequences) We will call cyc a sub-sequence of σ starting with the
task ti until before the next occurrence of t j in σ . If ∀t j ∈ cyc, #(t j , σ ) = 1, then it is called
elementary cyc, which is denoted as cyce . A cyc may contain other cycs.
Example 3 Several traces of Example 1 have a cyc within; it is the case of σ2 = t1 t3 t6 t4 t5 t3 t4 t7 ,
where cyc = t3 t6 t4 t5 ; furthermore it is elementary (cyce ) because ∀t j ∈ cyc, #(t j , σ2 ) = 1;
instead, σ1 and σ4 do not include a cyc.
Proposition 1 The tasks in a σ ∈ λ form the support of a t-invariant of the extended WFN.
The t-invariant is minimal iff ∀t j ∈ σx , #(t j , σx ) = 1 and ∄σ y . σx ⊆ σ y
Proof (Direct) As stated in Definition 5 (ii), the extended WFN is strongly connected. Thus,
the transitions in σi , whose first and last tasks of σi belong to i· and ·o, respectively, together
with te can be fired repeatedly. When every t j ∈ σi , #(t j , σi ) = 1, then σi does not have
cycs, therefore the transitions in σi are the support of a minimal t-invariant.
In Example 1, is easy to see that the tasks in σ1 and σ4 (together with te of the extended
WFN) are the supports of minimal t-invariants.
⊔
⊓
Proposition 2 A trace σi ∈ λ includes cycs if it contains tasks belonging to two or more
t-invariants.
Proof (Contraposition) Suppose that all the tasks in σi belong to a single t-invariant. Then,
all the tasks t j in σi occur once, i.e. #(t j , σx ) = 1; thus, there is no iterative behaviour, that
is σi , does not share tasks with nested cycles (cycs).
⊔
⊓
The algorithm prunes the interleaving tasks in traces to separate them by supports of
minimal t-invariants. The procedure for determining the t-invariants processes recursively
every trace σi from the most external cyc in σi to shorter nested cycs.
Definition 12 (Causality graph) The causality graph of an elementary cyc, G r describes
the relations between tasks in a cyce . G r (cyce ) = (V , E); where V = {tk |tk ∈ cyce } and
E = {(tk , tl ) ∈ V × V |(tk , tl ) ∈ Causal R}.
The G r that can be formed with the cycs of Example 3 and the Causal R set of Example
2 is shown in Fig. 2.
Definition 13 (Strongly connected subgraphs) The function Scc(G r ) returns a set of strongly
connected components {G 1sc (V1 , E 1 ), G 2sc (V2 , E 2 ), ..., G nsc (Vn , E n )} in Gr.
Proposition 3 Let G isc (Vi , E i ) be a strongly connected component in a G r ; then, the transitions in Vi such that |Vi | > 1, form the support of a minimal t-invariant of the WFN to
discover.
123
8
T. Tapia-Flores, E. López-Mellado
Proof (Contraposition) Suppose that the transitions in Vi are not the support of a t-invariant;
then, there exist at least a tk ∈
/ Vi that must occur to perform the repetitive firing of transitions
in G r . Thus, there are no cycles in G r and then, it is not strongly connected.
⊔
⊓
Next, several simple operators allowing handling task traces are outlined.
• Sym. The operator sym(σ ) returns the set of different transitions (the alphabet) used in
a sequence σ .
• Pos. The operator pos(t, σ ) returns the set of positions where the transition t appears in
a trace σ .
• Clear. The operator clear (s, A), where s ∈ T ∗ is a sequence and A ∈ 2T is a set of tasks,
returns a sequence such that the occurrences of ti ∈ A in s are deleted; if sym(s) ∪ A = ∅
then, clear (s, A) = s; if sym(s) = A then, clear (s, A) = ∅
• Replace. The operator r eplace(r , s, t), where r , s, and t are sequences (|r | ≥ |s| ≥
|t| ≥ 0), returns a sequence such that the first occurrence of s in r is replaced by t;
r eplace(r , s, t) returns r if s is not a sub-sequence of r , and returns t if r = s; when
t = ǫ, the first occurrence of substring s is deleted form r .
Example 4 Consider σ2 = t1 , t3 , t6 , t4 , t5 , t3 , t4 , t7 , cyce = t3 , t6 , t4 , t5 , and τ = t3 , t4 , t5 .
The result of applying the above operators is sym(τ ) = t3 , t4 , t5 ; pos(t3 , σ2 ) =
2, 6; clear (cyce , τ ) = t6 ; r eplace(σ2 , cyce , t6 ) = t1 , t6 , t3 , t4 , t7 .
Below, a procedure for extracting elementary cycles of traces is presented (Algorithm 1). It
explores nested cycs and then, it returns one elementary cyc(cyce ) if there exists; otherwise,
it returns the empty set.
Algorithm 1 e-cycle(σ )
Require: a trace σ ∈
/∅
Ensure: cyse : a set of ec ycles
1: cyce ⇐ ∅
2: if |σ | = |sym(σ )| then
3:
Return cyce
4: end if
5: ∀tx ∈ σ |#(tx , σ ) > 1
6:
aux ⇐ pos(tx , σ )
7:
i ⇐ first item aux
8:
j ⇐ second item in aux
9:
cyc ⇐ sub-sequence of aux from i to j − 1
10:
if |cyc| = |sym(cyc)| then
11:
cyce ⇐ cyce ∪ {cyc}
12:
else
13:
return e-cycle(cyc)
14: return cyce
⊲ there are no cycles
⊲ analyses the repeated tasks
⊲ the sub-sequence is analysed
In line 5 of Algorithm 1, every task tx that appears more than once in σ is analysed.
The sub-sequence between the first and the second occurrence analysed to verify if it is an
elementary cycle; if not, the sub-sequence is analysed recursively to extract the inner cycle.
Algorithm 2 shows the procedure for obtaining the minimal t-invariant supports is presented. Consider that each trace in the event log ends with the task te .
Each trace in λ is analysed (Line 2) of Algorithm 2; if a trace does not have repeated tasks,
then its symbols are added as a t-invariant support; otherwise, the elementary cycles are
123
Discovering workflow nets of concurrent iterative processes
9
Algorithm 2 Getting minimal t-invariants supports
Require: λ
1: Y (λ) ⇐ ∅; A ⇐ ∅
2: ∀σi ∈ λ
3:
if |σi | = |sym(σi )| ∧ sym(σi ) ∈
/ Y (λ) then
4:
if sym(σi ) ⊂ Y (λ) then
5:
Y (λ) ⇐ Y (λ) ∪ {sym(σi )}
6: else
7:
repeat
8:
cyce ⇐ e-cycle(σi )
9:
Graphs ⇐ Scc(G r (cyce ))
10:
∀G i ∈ Graphs
11:
if |Vi | > 1 ∧ Vi ∈
/ Y (λ) then
12:
Y (λ) ⇐ Y (λ) ∪ {Vi }
13:
A ⇐ A ∪ Vi
14:
σi ⇐ replace(σi , cyce , clear(cyce , A))
15:
cyce ⇐ e-cycle(σi )
16:
until cyce = ∅
17:
if σi ∈
/ Y (λ) then Y (λ) ⇐ Y (λ) ∪ {sym(σi )}
18: return Y (λ)
⊲ minimal t-inv
⊲ σi contains iterations
⊲ extracting nested t-inv
⊲ minimal supports
extracted from σi to obtain the corresponding graphs; consequently, the supports of nested
t-invariants are found.
Property 1. Algorithm 2 determines all the t-invariant supports of the extended WFN to build
from λ
Proof It is easy to observe that in the repeat loop, the procedure extracts the evident cycles
including te , and the nested iterations of traces.
⊔
⊓
The supports of t-invariants obtained by the application of the above algorithm to the task
traces in Example 1 are < Y1 >= t1 , t3 , t4 , t6 , t7 , te , < Y2 >= t2 , t3 , t4 , t8 , te , < Y3 >=
t3 , t4 , t5 , and < Y4 >= t1 , t3 , t4 , t9 , te . Notice that in < Y1 >, < Y2 >, and < Y4 >, the
transition te of the extended WFN is included, since such invariants involve transitions in i·
and ·o, whilst te is not included in < Y3 > because it is the support of a nested t-invariant.
6 Building the PN model
Causal relations [ti , t j ] imply the existence of a place between the related transitions. For
source and sink places (i, o), causal relations are denoted as[−, t j ] and [ti , −], respectively.
Using this basic structure, named dependency, together with the computed t-invariants, a
technique for building a PN is now presented.
Definition 14 (First and last tasks) TI = σk ∈λ f ir st(σk ), and TO = σk ∈λ last(σk ),
where f ir st(σk ) and last(σk ) provide the first and last tasks in σk , respectively.
6.1 Composing substructures of dependencies
The substructures corresponding to causal dependencies must be composed by merging all
the transitions that have the same label ti into a single one. The merging of transitions may
lead to merge also the places in the involved dependencies; the merging strategy is simple;
it is performed using two construction operators [31].
123
10
T. Tapia-Flores, E. López-Mellado
Fig. 3 Operators for merging dependencies
Operator 1. The composition of two dependencies in the form [ti , t j ] and [t j , tk ] yields
a sequential substructure including two places, allowing firing the sequence ti t j tk ; this is
illustrated in Fig. 3a.
Operator 2. The composition of two dependencies where the first transitions in two dependencies are the same ([ti , t j ] and [ti , tk ]) yields two possible substructures:
• (a) The places of each dependency are merged into a single one iff each of the transitions
t j and tk belong to different t-invariants. This substructure is called Or − split; it is
denoted as [ti , t j + tk ].
• (b) The places of the dependencies are not merged iff both transitions t j and tk belong
to a same t-invariant. This substructure is called And-split; it is denoted as [ti , t j ||tk ].
Similarly, for dependencies having the same second transition ([ti , tk ] and [t j , tk ]), the
substructure yielded will be either [ti + t j , tk ] (Or - join) or [ti ||t j , tk ] (And- join). In both
cases the observations (ti , tk ), (t j , tk ) ∈ R< , which have induced the dependencies, are
preserved. This merging operator is illustrated in Fig. 3b. In general, a set of dependencies in
the form ([ti , t j ], [ti , tk ], ...[ti , tr ]) may produce either [ti , t j +tk +...+tr ] or [ti , t j ||tk ||...||tr ]
according to the relations between transitions i.e., whether t j , tk , .., tr belong to different tinvariants or t j , tk , .., tr belong to the same t-invariant, respectively.
The merging of transitions can be applied iteratively to composed dependencies that
exactly match with one expression of transitions of type ti + t j or ti ||t j . For example, the
composition of dependencies [ti + t j , tk ] and [ti + t j , tr ] produces [ti + t j , tk + tr ] if both tk
and tr do not belong to the same invariant.
All t j in dependencies in the form [−, t j ] have the input place i. Similarly, for transitions
in dependencies of the form [ti , −], they have the same output place o.
Property 2. The application of these merging operators O perator 1 and O perator 2 to
the dependencies derived from the pairs in Causal R, and the knowledge of the t-invariant
supports, leads to a net structure W F N N1 , which includes all the transitions.
Proof O perator 1 forms paths of places and transitions, whilst O perator 2 determines
when split and join substructures are created according to the computed t-invariants.
⊔
⊓
123
Discovering workflow nets of concurrent iterative processes
11
Fig. 4 N1 built from λ
In Example 2, the application of merging operators to the relations in Causal R
yields the set composed dependencies: [t1 , t6 ||t3 ], [t2 , t3 ], [t3 , t4 ], [t4 , t5 + t7 + t8 +
t9 ], [t5 , t3 ], [t6 , t7 ], [t1 + t2 + t5 , t3 ]. Afterwards, the obtained dependencies by applying
Operator 1 and Operator 2 are i: [−, t ∈ TI ] p1 : [t1 + t2 + t5 , t3 ], p1 − p2 : [t1 , t3 ||t6 ], p3 :
[t3 , t4 ], p5 : [t4 , t5 + t7 + t8 + t9 ], p4 − p5 : [t6 ||t4 , t7 ], o : [t ∈ TO , −]. The subsequent
merging of transitions in dependencies substructures yields the WFN N1 shown in Fig. 4.
6.2 Model adjustment
The discovered model N1 replays all the traces in λ; besides, it could execute some additional
traces (surplus language). Eventually, it is possible that N1 could not replay some traces in λ.
The WFN in Fig. 4 reproduces λ of Example 1, but also other traces; in particular, the traces
t2 t3 t4 t9 and t1 t3 t4 t8 , which do not belong to λ, can be fired in N1 . This behaviour is because the
computed model N1 does not include PN elements (places and arcs) that ensure behaviours
of dependencies not exhibited explicitly by the traces in λ, named implicit dependencies;
therefore, N1 must be adjusted.
6.2.1 Implicit dependencies
In a PN, the implicit dependencies represent the recall of the occurrence of a ti , which is used
as a precondition to enable a non-immediate subsequent transition t j . In general, an implicit
dependency [ti , t j ] represents a constraint in the flow of tokens in the PN by ensuring that t j
can be fired only when ti has occurred before; thus, the absence of such an implicit dependency
will allow the occurrence of more sequences in the net.
Definition 15 (Implicit dependency) In a 1-bounded PN, [ti , t j ] is called an implicit dependency, if albeit there exists a place between the transitions, the occurrence of ti does not
produce a marking that immediately enables t j ; i.e., it is necessary for the occurrence of at
least one transition tk before t j .
After building the first model, implicit dependencies may be deduced to be included in
N1 in two ways: Type 1: adding a new place between two transitions, or Type 2: using a
place already included in N1 . These situations are illustrated in Fig. 5, where the dependency
[tx , tw ] is represented by a new place pi in Fig. 5a, and [tx , t y ] is represented using a previously
123
12
T. Tapia-Flores, E. López-Mellado
Fig. 5 N1 Implicit dependencies
computed place p j in Fig. 5b. Similarly, for the dependency [tx , tz ], the place of [t y , tz ] is
used (Fig. 5c).
The following notions and conditions are useful to find both kinds of implicit dependencies
in the traces of λ, which will be added to N1 .
Definition 16 (Implicit precedence) Let ti , t j ∈ T be tasks. ti has an implicit precedence
over t j , denoted as ti ≪ t j , if ti >< t j and for every trace σk ∈ λ, ti always appears before
tj
The Implicit precedence between two transitions suggests an implicit dependency, but it is
necessary to analyse other underlying properties to ensure the existence of such a dependency.
Definition 17 (Support-dependent tasks) The set of tasks support-dependent of a Yi ∈ Y (λ),
denoted as Sd(Yi ), contains tasks tx ∈ T which appear only in the support of Yi . Sd(Yi ) =
{tx ∈< Yi > |∄ < Yi >, tx ∈< Yi >}
For the t-invariants supports of the Example 1 (< Y1 >= t1 , t3 , t4 , t6 , t7 , te , < Y2 >=
t2 , t3 , t4 , t8 , te , < Y3 >= t3 , t4 , t5 , < Y4 >= t1 , t3 , t4 , t9 , te ), the support-dependent sets are
Sd(Y1 ) = t7 , t6 , Sd(Y2 ) = t2 , t8 , Sd(Y3 ) = t5 , Sd(Y4 ) = t9 .
6.2.2 Implicit dependencies of Type 1
Now we can state the conditions in which a place must be added to relate two transitions that
are not observed consecutively.
Proposition 4 Let ti and t j be transitions in N1 . If (i) ti and t j are related by an implicit
precedence (ti ≪ t j ), and (ii) there exists a support-dependent set Sd(Yk ) that contains
both transitions, then ti and t j are related by an implicit dependency [ti , t j ], which must
be added to the structure of N1 . The set of all the implicit dependencies of N1 is I Dep =
{[ti , t j ]|(ti << t j ) ∧ ∃Yk where {ti , t j } ⊂ Sd(Yk )}
Proof (Contraposition) Suppose that the dependency [ti , t j ] must not be added to the structure
of N1 ; this is because,
• (i) A place pi of the dependency [ti , t j ] already exists as the result of applying Operator
1 or Operator 2; therefore, such transitions are not related by an implicit precedence, i.e.
¬(ti << t j ), or
• (ii) ti does not need to occur always before t j ; then, both transitions may fire independently
since they belong to different t-invariants; thus, there is not a support-dependent set that
contains both transitions.
⊔
⊓
Corollary 1 Let [ti , t j ] be an implicit dependency where ti , t j ∈< Yr > and Yr ∈ Y (λ). If
CYr = 0, then, a new place pk ∈
/ P2 must be added to N2 to ensure [ti , t j ].
Proof (contraposition) Suppose that pk ∈ P2, since it is linked to either ti , or t j , then,
CYr = 0 (it is the case of dependencies of Type 2).
⊔
⊓
123
Discovering workflow nets of concurrent iterative processes
13
Fig. 6 N2 built by adding p6 : [t2 , t8 ] to N1
Conditions of Proposition 4 determine the existence of places that do not represent causal
relationships. This is valuable because implicit dependencies are not exhibited in λ. The
absence of such places would cause an exceeding language in the PN. In Example 1, the
transitions t2 , t8 meet the conditions of the proposition because t2 ≪ t8 and t2 , t8 Sd(t2 ),
therefore [t2 , t8 ] must be added to in N1 yielding the model N2 shown in Fig. 6.
6.2.3 Implicit dependencies of Type 2
Now, the supports of minimal t-invariants of N2 in Fig. 6 are J (N2 ) :< J1 >=
t1 , t3 , t4 , t6 , t7 , te , < J2 >= t2 , t3 , t4 , t8 , te , < J3 >= t3 , t4 , t5 ; these invariants differ from
Y (λ) computed in Subsection 6.1. The discrepancy between Y (λ) and J (N2 ) is because
the computed PN does not include the arcs (implicit dependencies type 2, Fig. 5b, c) which
ensure the behaviours due to implicit dependencies not exhibited in λ.
N2 must be adjusted by determining the suitable implicit dependencies that transform N2
into N3 , whose t-invariants match with Y (λ). The mismatching is detected when ∃Yi ∈ Y (λ)
such that CYr = 0, where C is the incidence matrix of N2 . To amend N2 the next strategy
must be applied.
Consider a Yr ∈ Y (λ). Let pk be the place that corresponds to the row in which CYr = 0,
more precisely, C( pk )Yr = 0. To determine the dependency [ti , t j ], another transition of N2
must be linked through pk to one of the transitions in · pk (Fig. 5c) or pk· (Fig. 5b) following
the construction procedure derived from the proof of the proposition stated below.
Proposition 5 Let [ti , t j ] be an implicit dependency where ti , t j ∈< Yr > and Yr ∈ Y (λ).
[ti , t j ] must be added to N2 through a place pk of N2 if C( pk )Yr = 0, to ensure C( pk )Yr = 0
([ti , t j ] is of Type 2).
Proof (Direct) To ensure C( pk )Yr = 0, two cases are considered:
• (i) C( pk )Yr = 1 This requires that C( pk , t j ) = −1 to get C( pk )Yr = 0; thus ti ∈ · pk
(the arc ( pk , t j ) must be added to get [ti , t j ]).
• (ii) C( pk )Yr = −1 This requires that C( pk , t j ) = 1 to get C( pk )Yr = 0; thus ti ∈ pk ·
(the arc (t j , pk ) must be added to get [ti , t j ]).
Since ti ∈ Sd(Yr ), the added arc ( pk , ti ) only affects Yr . Similarly, since t j ∈ Sd(Yr ) the
new arc (t j , pk ) does not alter the other t-invariants.
⊔
⊓
123
14
T. Tapia-Flores, E. López-Mellado
Proposition 6 If all the implicit dependencies added to N2 through places pk , such that
∀ pk C( pk)Yr = 0, ∀Yr ∈ Y |CYr = 0, then the amended net N3 fulfils CY = 0.
Proof (Direct) When all the amendments to N2 are performed through the procedure derived
from the proof of Proposition 5, the amended net N3 fulfils CY (λ) = 0 and then Y (λ) =
J (N3 ).
⊔
⊓
Algorithm 3 summarises the procedure derived from the previous result to obtain the implicit
dependencies.
Algorithm 3 Determining implicit dependencies
Require: N2 = (P2 , T2 , F2 ), I Dep, Y (λ)
1: P3 ⇐ P2 ; T3 ⇐ T2 ; F3 ⇐ F2 ;
2: ∀[ti , t j ] ∈ I Dep
3:
if C(N2 )Y (λ) = 0 then
4:
∀Yi ∈ Y (λ)|C(N2 )Y (λ) = 0
5:
∀ pi ∈ P2
6:
if C( pr )Yi = 1 then F2 ⇐ F2 ∪ {( pr , t j )}
7:
if C( pr )Yi = −1 then F2 ⇐ F2 ∪ {(ti , pr )}
8:
else
9:
Create a new pk|k > |P3 |
10:
P3 ⇐ P3 ∪ { pk }
11:
F3 ⇐ F3 ∪ {(ti , pk ), ( pk , t j )}
12: return N3 = (P3 , T3 , F3 )
⊲ Type 2 dependency
⊲ Type 1 dependency: pk ∈
/ P3
Consider N2 in Fig. 6, obtained from the event log in Example 5. First, it is computed
J (N1 ) :< J1 >= {t1 , t3 , t4 , t6 , t7 , te }, < J2 >= {t2 , t3 , t4 , t8 , te }, < J3 >= {t3 , t4 , t5 }.
There exists a mismatching between both sets since Y (λ) ⊂ J (N1 ). It can be noticed that
Y4 ∈
/ J (N1 ), whilst Y1 = J1 , Y2 = J2 and Y3 = J3 . In the analysis of Y4 , pk = p2 because
it fulfils the condition C N1 ( p2 )Yi = 0, as shown in the equation 1.
⎡ ⎤
⎡ ⎤
⎤ 1
⎡
⎢0 ⎥
0
−1 −1 0
0
0
0
0
0 0 1
⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢1
1 −1 0
1
0
0
0 0 0⎥
⎥ ⎢1⎥ ⎢ 0 ⎥
⎢
⎢ ⎥ ⎢ ⎥
⎢1
0
0
0
0 −1 0
0 0 0⎥
⎥ ⎢1⎥ ⎢ 1 ⎥
⎢
⎥ ⎢0 ⎥ ⎢ 0 ⎥
⎢0
0
1
−1
0
0
0
0
0
0
⎥·⎢ ⎥=⎢ ⎥
⎢
(1)
⎢ ⎥ ⎢ ⎥
⎢0
0
0
0
0
1 −1 0 0 0 ⎥
⎥ ⎢0 ⎥ ⎢ 0 ⎥
⎢
⎥
⎢
⎥
⎢
⎥
⎢0
0
0
1 −1 0 −1 −1 −1 0 ⎥ ⎢0⎥ ⎢ 0 ⎥
⎢
⎥ ⎣0⎦
⎣0
1
0
0
0
0
0 −1 0 0 ⎦ ⎢
⎢0 ⎥
⎣
0)
1⎦
0
0
0
0
0
0
1
1 1 −1
1
The Support dependent sets (Sd) computed in Y3 (∀Yi ∈ Y |Yi ∈ J (N1 )) are Sd(t1 ) =
Sd(t9 ) = {t1 , t9 }. The transition t1 ∈ p2 · is selected to find the implicit dependency [t1 , t j ]
because t1 ∈< Y4 >. The transition that fulfils the conditions t1 ≪ t j and t j = t9 ; therefore,
the implicit dependency [t1 , t9 ] is added to N1 by the arc ( p2 , t9 ). Finally, the amended PN
N3 , which replays λ is shown in Fig. 7.
Remark 1 The procedure of Algorithm 3 does not need to compute the t-invariants of N2 . It
only operates on the computed invariants Y (λ) that do not agree with the computed net N2
Yr ∈ Y (λ) such that Yr C = 0
123
Discovering workflow nets of concurrent iterative processes
15
Fig. 7 Resulting net after model adjustment
Property 3. Given an event log of task traces λ ∈ T ∗ , a safe PN model N (N3 ) that reproduces
λ can be obtained by computing the invariants from λ, applying Operators 1 and 2, and
performing the amendments of Algorithm 3.
′ represents
Proof The causality between transitions, stated by the pairs in Causal R ∪ R<
the precedence relationship between consecutive transitions in the traces of λ, which are
not in Conc R. Then, the t-invariants can be determined from λ (Property 1). Besides, the
substructure associated with a dependency [ti , t j ] ensures the consecutive occurrence of these
transitions; then, based on the t-invariants, the application of O perator 1 and O perator
2 to all the dependencies [ti , t j ] lead to a PN structure that ensure the flow determined by
the dependencies (Property 2). Finally, adjustments to N1 provided by Algorithm 3 allow
matching the t-invariants determined from λ with those of the discovered model.
⊔
⊓
6.3 Complexity of the method
The method is based on the notions introduced in Section 4, whose determination procedures have a computational complexity of O(|λ|). Thus, the complexity of computing the
t-invariants Y (|λ|) is O((|V | + |E|) ∗ ||λ||), which is the time for determining the strongly
connected components in a graph, with V nodes and E edges, multiplied by the size of the
log. Notice that this is the worst case in which each trace has a different e-cycle. Finally,
the complexity of the procedure to compute each implicit dependency is O(|P| ∗ |T |); it is
related to the matrix vector product. Thus, the complexity of the algorithm is polynomial
on |λ|.
7 Implementation issues
7.1 Testing scheme
Algorithms and auxiliary procedures derived from the proposed discovery method have been
implemented; the software has been tested on numerous WFNs of diverse structural complexity. The tests were performed on artificial logs following the scheme shown in Fig. 8. First, a
WFN N including a transition te is proposed; then, with the help of the PN editor/simulator
PIPE [33], a workflow log λ is produced. Then the discovery method module processes λ
123
16
T. Tapia-Flores, E. López-Mellado
Fig. 8 Test procedure of the discovery method
Fig. 9 Detecting overlapped cycles
Fig. 10 Detecting cycles in parallel threads
yielding a model coded in XML, which is displayed using PIPE again. The obtained model
N’ is then compared to N. This scheme allows testing the method in a controller manner by
rediscovering WFN with diverse structures, which include cycles nested into t-components,
concurrency, and implicit dependencies.
This scheme allows testing the method in a controller manner by rediscovering WFN with
diverse structures, which include cycles nested into t-components, concurrency, and implicit
dependencies.
7.2 Illustrative experiments
Artificial logs were produced using PN models that include diverse substructures, which
exhibit complex repetitive behaviour: overlapped cycles (Fig. 9) and cycles in parallel threads
(Fig. 10). Logs used in these experiments, named λ1 and λ2 , are given below.
λ1 = {t0 t1 t2 t4 t5 t6 t2 t3 t1 t2 t4 t5 t7 t8 t2 t4 t5 t7 }, {t0 t1 t2 t3 t1 t2 t4 t5 t6 t2 t4 t5 t7 t8 t2 t4 t5 t7 t9 }
λ2 = {t0 t1 t4 t5 t6 t2 t3 t1 t2 t4 t5 t7 }, {t0 t1 t4 t2 t5 t3 t6 t4 t1 t5 t2 t6 t3 t4 t5 t1 t2 t3 t1 t6 t2 },
{t4 t3 t1 t2 t3 t5 t6 t1 t4 t5 t2 t7 }, {t0 t4 t1 t5 t2 t7 }, {t0 t1 t4 t2 t5 t7 }
Other experiments have been performed using several WFNs reported in the literature.
Figure 11 shows five models obtained by applying the method to logs taken from [34],
123
Discovering workflow nets of concurrent iterative processes
17
Fig. 11 WFN discovered from logs in [21]
which present implicit dependencies. The dashed places and their respective input/output
arcs correspond to implicit dependencies of type 1, whereas the dashed arcs joining existing
places correspond to implicit dependencies of type 2.
A complete log for Fig. 11a is ACD, BCFE, BFCE; from this log the implicit dependency
of type 1 [A,D] is detected. For the WFN shown in Fig. 11b, the complete log accordant is
ACD, BCE, AFCE, ACFE; in this case, implicit dependencies of type 2 [A, D] and [B, E] are
found. The corresponding log for the WFN in Fig. 11c is ACFBGE, AFCBGE, AFBCGE,
AFBGCE, AFDGE; from this log, the method determined that [A, D], [D, E] are implicit
dependencies of type 2. The processing of the complete log ACDEGH, ACDGEH, ACGDEH,
BCDFH yields the workflow net in Fig. 11d; for this log, the method first found the implicit
dependencies [A, E], [A, G], [B, F]; nevertheless, the net still does not match the t-invariants
of the log; hence, a type 2 implicit dependency [C, F] is devised. Finally, the corresponding
log of the WFN in Fig. 11e is FBG, ABC, FDBEG, FBDEG, FDEBG, ADEDEBG, ABDEC;
in this case, implicit dependencies of t ype 1 [A, C] and [F, G] are found, and the arcs assuring
implicit dependencies of t ype 2 are (A, pk ), (F, pk ), ( pk , C), and ( pk , G). Similar to the
procedure for obtaining the WFN of Fig. 11b, pk is the result of merging two places.
123
18
T. Tapia-Flores, E. López-Mellado
Fig. 12 Actual, observed, and
computed behaviours
8 Discussion
8.1 Main features
The proposed discovery method includes alternative strategies to that found in the literature,
namely the search of invariants and the discovery of concurrent cyclic behaviours. The discovered WF-net is a qualitative model that allows reproducing the logs obtained from the
execution of WF processes that behave as sound WF-nets, as specified in Sect. 3.1. This
feature, called fitness in [35], is assured to be valued as 1.0 since all the precedence declared
by pairs in R< (issued from λ) are represented in the discovered model as stated in Property
3. Furthermore, the procedures that implement the method are based on polynomial time
algorithms on the size of the log, which is a welcomed feature for dealing with large logs.
Comparing the proposal with an outstanding published method, the alpha++ algorithm [32],
our approach can discover the reported models; besides, the computational complexity is
lower.
8.2 Limitations and challenges
The first limitation we can point out arises from the assumptions stated in the problem
formulation, which require that the obtained model uses only the transitions in T once. This
constraint, issued from the standard problem formulation, can be relaxed when tasks symbols
may be associated to more than one transition or when non-observable (silent) transitions are
allowed. Another assumption held in this paper is that in the observed behaviour, the traces are
recorded correctly; in particular, any task is missed in a trace. Although the method can build
WF-nets that can reproduce the input logs, the discovered model could represent exceeding
behaviour due to cycles in the synthesised PN. For example, the trace abcbcbd includes
a repetition of the sub-trace cb; then, the model will represent ab(cb)+ d. The language
overrepresentation (computed as a measure of precision in [35]) is due in part to the above
feature; this analysis is out of the scope of the paper and currently is a research matter of
the authors. The relationships between the actual, observed, and computed behaviours are
depicted in Fig. 12.
During the tests of the method using artificial logs obtained through known WFN, we
detected some particular sound WF-nets in which this method fails to rediscover all dependencies between tasks. Since the method is based on representing repetitive behaviour exhibited
by the log through inferring the t-invariants, it cannot distinguish the supports of t-invariants
in which one or several tasks need to occur a given number of times to reproduce the
repetitive behaviour. In other words, when a t-component has a cycle in its execution, the
algorithm may find more than one t-invariant; the outcome is a WF-net that can reproduce
the observed log and other traces involving such a cycle, which are not in the log. Consider
the two WFNs of Figure. 13; the net in Fig. 13a is executed to generate the complete workflow log ABCFGECDH, ABCECDFGH, ABFCGECDH, ABFCEGCDH, ABCEFCDGH,
ABCFEGCDH; notice that this WF-net (more precisely, the extended WF-net) has only one
123
Discovering workflow nets of concurrent iterative processes
19
Fig. 13 WF-Net with a nested cycle in the t-invariant
t-invariant. During the application of the discovery method, two t-invariants supports < Y1 >
= A, B, C, F, D, G, H and < Y2 > = C, E are computed; then the WF-net built is that shown in
Fig. 13b, which in fact (the extended WF-net) has two t-invariants. The implicit dependencies
of Type 1 [B, E] and [E, D] are missed and should be computed to rediscover the WFN used
to generate the logs. In particular, for this example, a subsequent analysis must determine
that the cycle of transitions C and E in < Y2 > occurs once every time < Y1 > is executed.
The dependency between the executions of the t-invariants is still under research.
9 Conclusion
The discovery method proposed in this paper is based on determining the supports of tinvariants from the log λ; it allows building an initial model, which can be adjusted later, if
needed, with the help of the computed t-invariants; the final model includes implicit causal
relationships between transitions that have not been observed consecutively in the traces of λ.
The discovered WFN replays all the traces in λ from M0 and may eventually accept exceeding
iterative sub-sequences, which correspond to the behaviour inherent to PN with repetitive
components. Based on polynomial-time algorithms, the method allows processing large event
logs. The implemented software has been tested on artificial logs corresponding to WFNs
with diverse structures; tests demonstrated the accuracy and efficiency of the method when
complex PN structures are addressed. Further work regards the application of the method
to event logs issued from actual processes. Current research addresses the problem of PN
discovery from incomplete observed sequences and quality measures to assess the obtained
model regarding the event log.
Acknowledgements The first author is Tonatiuh Tapia-Flores; he has been sponsored by CONACYT under
the Ph.D. Grant No. 263566.
Declarations
Conflict of interest The authors declare that they have no conflict of interest financial or non-financial with
any person or organisation.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
123
20
T. Tapia-Flores, E. López-Mellado
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
1. Gold, M.E.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
2. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988). https://doi.org/10.1023/
a:1022821128753
3. Meda-Campana, M., Ramirez-Treviro, A., López-Mellado, E.: Asymptotic identification of discrete event
systems. In: Proceedings of the 39th IEEE conference on, pp. 2266–2271. IEEE (2000)
4. Meda-Campana, M., López-Mellado, E.: Identification of concurrent discrete event systems using petri
nets. In: Proceedings of the 17th IMACS world congress on computational and applied mathematics, pp.
11–15 (2005)
5. Giua, A., Seatzu, C.: Identification of free-labeled petri nets via integer programming. In: Decision and
Control, 2005 and 2005 European Control Conference. CDC-ECC’05. 44th IEEE conference on, pp.
7639–7644 (2005). IEEE
6. Cabasino, M.P., Giua, A., Seatzu, C.: Linear programming techniques for the identification of
place/transition nets. In: Decision and control, 2008. CDC 2008. 47th IEEE conference on, pp. 514–
520 (2008). IEEE
7. Dotoli, M., Pia Fanti, M., Mangini, A.M., Ukovich, W.: Identification of the unobservable behaviour of
industrial automation systems by petri nets. Control. Eng. Pract. 19(9), 958–966 (2011)
8. Klein, S., Litz, L., Lesage, J.-J.: Fault detection of discrete event systems using an identification approach.
In: 16th IFAC world congress (2005)
9. Roth, M., Schneider, S., Lesage, J.-J., Litz, L.: Fault detection and isolation in manufacturing systems
with an identified discrete event model. Int. J. Syst. Sci. 43(10), 1826–1841 (2012)
10. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: Input-output identification of controlled discrete
manufacturing systems. Int. J. Syst. Sci. 45(3), 456–471 (2014)
11. Estrada-Vargas, A.P., Lesage, J.-J., López-Mellado, E.: A stepwise method for identification of controlled
discrete manufacturing systems. Int. J. Comput. Integr. Manuf. 28(2), 187–199 (2015). https://doi.org/
10.1080/0951192X.2013.874591
12. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: A comparative analysis of recent identification
approaches for discrete-event systems. Math. Prob. Eng. (2010). https://doi.org/10.1155/2010/453254
13. Cabasino, M.P., Darondeau, P., Fanti, M.P., Seatzu, C.: Model identification and synthesis of discreteevent systems. In: Zhou, M., Li, H.X., Weijnen, M. (eds.) Contemporary issues in systems science and
engineering. Wiley, London (2013)
14. Aalst, W.M.: The application of petri nets to workflow management. J. Circuits Syst. Comput. 8(01),
21–66 (1998)
15. Ou-Yang, C., Winarjo, H.: Petri-net integration–an approach to support multi-agent process mining.
Expert Syst. Appl. 38(4), 4039–4051 (2011). https://doi.org/10.1016/j.eswa.2010.09.066
16. Ma, J., Wang, K., Xu, L.: Modelling and analysis of workflow for lean supply chains. Enterp. Inf. Syst.
5(4), 423–447 (2011). https://doi.org/10.1080/17517575.2011.580007
17. Cook, J.E., Wolf, A.L.: Automating process discovery through event-data analysis. In: 1995 17th international conference on software engineering, pp. 73–73 (1995). https://doi.org/10.1145/225014.225021
18. Cook, J.E., Du, Z., Liu, C., Wolf, A.L.: Discovering models of behavior for concurrent workflows. Comput.
Ind. 53(3), 297–319 (2004)
19. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.J.,
Saltor, F., Ramos, I., Schek, H.J., Saltor, F., Ramos, I., Alonso, G., Alonso, G. (eds.) EDBT Lecture
Notes in Computer Science, vol. 1377, pp. 469–483. Springer, Berlin (1998). https://doi.org/10.1007/
BFb0101003
20. Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs.
Knowledge and data engineering, ieee transactions on 16(9), 1128–1142 (2004)
21. Wang, D., Ge, J., Hu, H., Luo, B.: A new process mining algorithm based on event type. In: Dependable,
Autonomic and Secure Computing (DASC), 2011 IEEE Ninth international conference on, pp. 1144–1151
(2011). IEEE
22. Wen, L., Wang, J., Sun, J.: Detecting implicit dependencies between tasks from event logs. In: Proceedings
of the 8th Asia-Pacific Web conference on frontiers of WWW research and development. APWeb’06, pp.
591–603. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11610113_52
123
Discovering workflow nets of concurrent iterative processes
21
23. Wang, D., Ge, J., Hu, H., Luo, B., Huang, L.: Discovering process models from event multiset. Expert
Syst. Appl. 39(15), 11970–11978 (2012)
24. Aalst, W.M.P.: Process mining: discovery, conformance and enhancement of business Processes, 1st edn.
Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-19345-3
25. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.:
Automated discovery of process models from event logs: review and benchmark. IEEE Trans. Knowl.
Data Eng. 31(4), 686–705 (2019). https://doi.org/10.1109/TKDE.2018.2841877
26. Santos Garcia, C., Meincheim, A., Faria Junior, E.R., Dallagassa, M.R., Sato, D.M.V., Carvalho, D.R.,
Santos, E.A.P., Scalabrin, E.E.: Process mining techniques and applications—a systematic mapping study.
Expert Syst. Appl. 133, 260–295 (2019). https://doi.org/10.1016/j.eswa.2019.05.003
27. Aalst, W.: Process mining. Data science in action. Springer, Berlin (2016)
28. Aalst, J.C.: Process Mining Handbook. Lecture Notes in Business Information Processing, vol. 448, 1st
edn. Springer, Berlin (2022). https://doi.org/10.1007/978-3-031-08848-3
29. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.J.: A black-box identification method for automated
discrete-event systems. IEEE Trans. Autom. Sci. Eng. 99, 1–16 (2015). https://doi.org/10.1109/TASE.
2015.2445332
30. Tapia-Flores, T., López-Mellado, E., Estrada-Vargas, A.P., Lesage, J.J.: Petri net discovery of discrete
event processes by computing t-invariants. In: Emerging technology and factory automation (ETFA),
2014 IEEE, pp. 1–8 (2014). https://doi.org/10.1109/ETFA.2014.7005080
31. Tapia-Flores, T., López-Mellado, E.: Inferring the repetitive behaviour from event logs for process mining
discovery. In: Prasath, R., Gelbukh, A. (eds.) Min. Intell. Knowl. Explorat., pp. 164–173. Springer, Cham
(2017)
32. Wen, L., Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data
Min. Knowl. Disc. 15(2), 145–180 (2007). https://doi.org/10.1007/s10618-007-0065-y
33. Dingle, N.J., Knottenbelt, W.J., Suto, T.: Pipe2: a tool for the performance evaluation of generalised
stochastic petri nets. SIGMETRICS Perform. Eval. Rev. 36(4), 34–39 (2009). https://doi.org/10.1145/
1530873.1530881
34. Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: In: Colom, J.-M., Desel, J. (eds.) Discovering BlockStructured Process Models from Event Logs - A Constructive Approach, pp. 311–329. Springer, Berlin,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17
35. Buijs, J.C.A.M., Dongen, B.F., Aalst, W.M.P.: Quality dimensions in process discovery: the importance of
fitness, precision, generalization and simplicity. Int. J. Cooper. Inf. Syst. 23(01), 1440001 (2014). https://
doi.org/10.1142/S0218843014400012
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123