Academia.eduAcademia.edu

Discovering workflow nets of concurrent iterative processes

2023, Acta Informatica

A novel and efficient method for discovering concurrent workflow processes is presented. It allows building a suitable workflow net (WFN) from a large event log λ, which represents the behaviour of complex iterative processes involving concurrency. First, the t-invariants are determined from λ; this allows computing the causal and concurrent relations between the events and the implicit causal relations between events that do not appear consecutively in λ. Then a 1-bounded WFN is built, which could be eventually adjusted if its t-invariants do not match with those computed from λ. The discovered model allows firing all the traces in λ. The procedures derived from the method are polynomial time on |λ|; they have been implemented and tested on artificial logs.

Acta Informatica (2024) 61:1–21 https://doi.org/10.1007/s00236-023-00445-5 ORIGINAL ARTICLE Discovering workflow nets of concurrent iterative processes Tonatiuh Tapia-Flores2 · Ernesto López-Mellado1 Received: 22 August 2022 / Accepted: 10 August 2023 / Published online: 14 September 2023 © The Author(s) 2023 Abstract A novel and efficient method for discovering concurrent workflow processes is presented. It allows building a suitable workflow net (WFN) from a large event log λ, which represents the behaviour of complex iterative processes involving concurrency. First, the t-invariants are determined from λ; this allows computing the causal and concurrent relations between the events and the implicit causal relations between events that do not appear consecutively in λ. Then a 1-bounded WFN is built, which could be eventually adjusted if its t-invariants do not match with those computed from λ. The discovered model allows firing all the traces in λ. The procedures derived from the method are polynomial time on |λ|; they have been implemented and tested on artificial logs. 1 Introduction Modelling is a crucial stage in the design of a system; obtained models from systems functioning specifications are helpful to synthesise control or management systems of discrete event processes. Conversely, for analysing an existing system, obtaining a model of such a system is worthwhile. In this case, models are built manually by an expert on the process or automatically by a computer tool that handles the behaviour exhibited by the process. Computer aided modelling The automated modelling of discrete event processes from event data in the form of event sequences issued by the process is a challenging approach to performing reverse engineering analysis. Nowadays, several work teams in the research areas of discrete event systems (DES) and workflow management systems (WMS) are interested in this research matter. The first publications on automated modelling, called language learning techniques, have been proposed in computer science. The goal was to obtain formal models (finite automata or grammars) to represent languages from positive samples of accepted words [1, 2]. B Ernesto López-Mellado [email protected] Tonatiuh Tapia-Flores [email protected] 1 CINVESTAV Unidad Guadalajara, Av. del Bosque 1145, 45019 Zapopan, Jal., Mexico 2 Nextiva Mexico, Rubén Darío 425, 44680 Guadalajara, Jal., Mexico 123 2 T. Tapia-Flores, E. López-Mellado Process identification In the DES area, the problem is named process identification; several approaches and methods have been proposed to build models that represent the behaviour of automated manufacturing processes, exhibited as sequences of events recorded during the process execution. The incremental approach proposed in [3, 4], obtains 1-bounded interpreted Petri nets (PN) from a large stream of the process output signals. In Giua and Seatzu [5], a method based on the statement and solution of an integer linear programming problem is proposed; it allows building PN from a set of event sequences. Extensions to this method have been proposed in [6, 7]. Then, in Klein et al. [8], a technique to determine finite automata from input–output sequences is presented; it is applied to fault detection in industrial processes; an extension to this method allows obtaining distributed models [9]. In Estrada-Vargas et al. [10], input– output identification of automated manufacturing processes is addressed; the identification method builds an interpreted PN from a set of sequences of input–output vectors sampled from the controller during the cyclic operation of the system. This method is extended to incrementally update the model when new sequences are processed [11]. Surveys on DES identification are presented in [12, 13]. Process discovery In the research area of WMS [14–16], the equivalent concern is named process discovery; in its statement, the systems dealt with are business processes whose behaviour is presented by a multiset of task sequences from a finite alphabet. Earlier methods have been proposed in [17, 18]; in [19], Agrawal proposed a method in which a finite automaton, called the conformal graph, is obtained. In Cook [18], presented a probabilistic technique to determine the concurrent and direct relations between tasks; the obtained model is a graph akin a PN. Later, in [20], a technique to discover finite automata from task sequences is presented. In Wang et al. [21] a discovery method called Algorithm Alpha is presented; in this method, an event log composed of several traces is mined, yielding a subclass of PN called workflow net (WFN). Numerous publications present extensions of this algorithm, namely [21–23]. Wide overviews on discovery techniques are [24–28]. Problem and approach The aim of the discovery/identification methods is to build models that represent the behaviour captured in the event log. However, a common issue in the current discovery techniques is that the obtained models represent more behaviour than that exhibited by the process; that is, the language of the model built is greater than that represented by the event log. This problem arises when concurrent iterative processes are dealt with. In this paper, a novel model discovery method is proposed. The technique allows synthesising a suitable WFN from a large log λ of task traces, which include the iterative behaviour drawn from complex business processes that exhibit concurrency and iterations. The obtained WFN has a reduced surplus language with respect to the event log. The discovery method follows the approach held in [29, 30] for DES and it is adapted to the WMS field for dealing with WFN; furthermore, an important extension is proposed allowing addressing more complex behaviours such as causal dependencies between tasks that do not appear consecutively in the traces; this feature allows reducing the exceeding language. The method determines from a log λ, causal and concurrency relations between tasks and the t-invariants Y of the PN to discover. Then, the obtained t-invariants allow determining a first structure of a PN N1 . Afterwards, N1 can be adjusted if some of its t-invariants J do not agree with those derived from λ(Y ). This paper is a revised extension of the conference paper presented in [31] in which the method for inferring the t-invariants is presented. 123 Discovering workflow nets of concurrent iterative processes 3 All the algorithms derived from this method have polynomial-time complexity. They have been implemented and tested with artificial logs obtained from known WFN inspired from models reported in literature. Tests on artificial logs and the computational complexity are compared with those obtained by the process mining method named the alpha++ algorithm [32]. Paper organization The paper is organised as follows. In Sect. 2, the basic notions on PN are recalled. Section 3 formulates the discovery problem. Section 4 introduces basic relations derived from the tasks sequences. Section 5 proposes a technique for determining the t-invariants from λ. In Sect. 6, the WFN discovery method is presented. Section 7 outlines implementation and presents the tests. Finally, section 8 discusses the main features and limitations of the proposed method in the scope of relevant related works. 2 Background This section recalls the basic concepts and notation of ordinary PN and WFN used in this paper. Definition 1 An ordinary Petri Net structure G is a bipartite digraph represented by the 3-tuple G = (P, T, F); where: • P = { p1 , p2 , ..., p|P| } and T = {t1 , t2 , ..., t|T | } are finite sets of nodes named places and transitions, respectively; • F ⊆ P × T ∪ T × P is a relation representing the arcs between the nodes. For any node x ∈ P ∪ T , · x = {x|(x, y) ∈ F} and x · = {y|(y, x) ∈ F}. • The incidence matrix of G is C = C + − C − , where C − = [ci−j ]; ci−j = 1 if ( pi , t j ) ∈ F, and ci j = 0 otherwise; and C + = [ci+j ]; ci+j = 1 if (t j , pi ) ∈ F, and ci j = 0 otherwise; C − and C + are called the pre-incidence and post-incidence matrices, respectively. Thus, C = [ci j ], where ci j ∈ {−1, 0, 1} Definition 2 A marking M : P → N≥0 determines the number of tokens within the places; where N≥0 is the set of non-negative integers. A marking M, usually denoted by a vector (N)|P| , describes the current state of the modelled system. PN dynamics • A Petri Net system or Petri Net (P N ) is the pair N = (G, M0 ), where G is a P N structure and M0 is an initial marking. • In a PN system, a transition t j is enabled at marking Mk if ∀ pi ∈ P, Mk ( pi ) ≥ ci−j . • An enabled transition t j can be fired reaching a new marking Mk+1 . This behaviour is tj represented as Mk −→ Mk+1 . The new marking can be computed as Mk+1 = Mk + Cu k where u k is the firing vector; when t j is fired u k ( j) = 1 whilst u k (i) = 0, ∀i  = j. This equation is called the PN state equation. • The reachability set of a P N is the set of all possible reachable markings from M0 firing only enabled transitions; this set is denoted by R(G, M0 ). Definition 3 A PN system is 1-bounded or safe iff, for any Mi ∈ R(G, M0 ) and any p ∈ P, Mi ( p) ≤ 1. A PN system is live iff, for every reachable marking Mi ∈ R(G, M0 ) and t ∈ T there is a Mk ∈ R(G, Mi ) such that t is enabled in Mk . 123 4 T. Tapia-Flores, E. López-Mellado Definition 4 A t-invariant Yi of a P N is a non-negative integer solution to the equation CYi = 0. The support of Yi (t-support) denoted as < Yi > is the set of transitions whose corresponding elements in Yi are positive. Yi is said to be minimal if its support is not included in the support of other t-invariants. A t-component G(Yi ) is a subnet of PN induced by a < Yi >: G(Yi ) = (Pi , Ti , Fi ), where Pi = · < Yi > ∪ < Yi > ·, Ti =< Yi >, Fi = (Pi × T j ∪ Pi × T j ) ∩ F. In a t-invariant Yi , if we have initial marking (M0 ) that enables a ti ∈< Yi >, when ti is fired, then M0 can be reached again by firing only transitions in < Yi >. Definition 5 A WorkFlow net (WFN) N is a subclass of PN owning the following properties [14]: (1) it has two special places: i and o. Place i is a source place: · i = ∅, and place o is a sink place: o· = ∅. (2) If a transition te is added to PN connecting place o to i, then the resulting PN called extended WFN is strongly connected. Definition 6 A WFN N is said to be sound iff (N , M0 ) is safe and for any marking Mi ∈ R(N , M0 ), o ∈ Mi → Mi = [o] and [o] ∈ R(N , Mi ) and (N , M0 ) contains no dead transitions. The extended WFN of a sound WFN is live and 1-bounded. 3 The discovery problem In this section, the problem of WFN discovery in the context of workflow management systems is formulated and then, the proposed method is outlined. 3.1 Problem statement Definition 7 Let T = {t1 , t2 , . . . , tn } be a finite set of tasks; a workflow log λ is a multiset of tasks traces σi ∈ T ∗ , |σi | < ∞. Given a workflow log λ = {σ1 , σ2 , . . . σn }, the PN discovery problem consists of building a sound WFN using only transitions in T , which reproduces the observed log. The number of places is unknown. Example 1 Consider the log λ = {σ1 , σ2 , σ3 , σ4 , σ5 , σ6 , σ7 } composed by the following tasks traces as the result of the execution of some process. σ1 = t1 t6 t3 t4 t7 ; σ2 = t1 t3 t6 t4 t5 t3 t4 t7 ; σ3 = t2 t3 t4 t5 t3 t4 t8 ; σ4 = t2 t3 t4 t8 ; σ5 = t1 t3 t4 t5 t6 t3 t4 t7 ; σ6 = t1 t3 t4 t5 t3 t4 t6 t7 ; σ7 = t1 t3 t4 t9 . A suitable discovery technique should be able to build a model such as depicted in Fig. 1 from the previous traces. 3.2 Assumptions It is assumed that the event log is complete and generated by a process that behaves as an unknown sound WFN, which has no duplicate task labels or silent transitions. The soundness requirement implies that the process behaviour captured in the event log corresponds to a “well-behaved” process, which does not exhibit deadlocks or anomalies such as buffer overflows. 123 Discovering workflow nets of concurrent iterative processes 5 Fig. 1 Sound workflow net 3.3 Outline of the method The proposed discovery method obtains a model of an unknown or ill-known process; the model reproduces the observed behaviour and exhibits the causality and concurrency relationships between the tasks. The method builds a 1-bounded PN (a WFN including a transition te ) from which all the task traces σi in the log λ can be fired. It focuses on the computation of the causal and concurrent relations between the tasks. This is accomplished by computing the t-invariants Y from λ, which must exist in a strongly connected PN exhibiting iterative behaviour; also, t-invariants are used to find causal relations between tasks that do not appear consecutively (called implicit causal relations). In the first stage, the method determines from λ several binary relations between transitions; based on these relations, the t-invariants are discovered. Then, the causal and concurrent relations are determined, and together with the computed t-invariants, a first structure N of a WFN is built. Finally, the t-invariants are used again for adjusting the language of N by determining implicit causality between tasks. 4 Basic concepts and relations First, several relations obtained directly from λ are introduced. Some definitions have been taken and adapted from [29, 31]. 4.1 Structuring the observed behaviour Definition 8 (Event precedence relation) The precedence relationship between transitions that are observed consecutively is stated by the relation R<⊆ T × T , which is defined as R< = {(ta , tb )|∃σi ∈ λ. σi ( j) = ta and σi ( j + 1) = tb ; 1 ≤ j < |σi | − 1}; σi ( j) denotes the symbol in position j in the trace σi . Thus, ta R< tb (denoted also as ta < tb ) expresses that ta has been observed before tb in at least one trace σi . When ta is related by R< to more than one task, this will be denoted as ta < t1 , t2 , . . . , tn . The relationship between transitions that 123 6 T. Tapia-Flores, E. López-Mellado never occur consecutively in the traces of λ is given by T × T \R< ; a pair in this relation is denoted as ta >< tb . 4.1.1 Causal and concurrency relationships The aim of existing PN discovery methods is to determine from observed precedence between transitions, actual causal or concurrency relationships between transitions, which will be useful to build a PN structure. Below, notions and properties for determining causality and concurrency relationships are stated. Definition 9 (Two-length cycles) Two transitions ta , tb are in a two-length cycle relation (Tc) if the tasks traces in the log λ contain the sub-sequences ta tb ta or tb ta tb . Tc is the set of transition pairs (ta , tb ) fulfilling this condition. It is clear that simple substructures of PN can be straightforward determined from Tc. Definition 10 (Causal and concurrent relations) Every pair of consecutive transitions (ta , tb ) ∈ R< may be classified into one of the following relationships: • Causal relationship, denoted as [ta , tb ], expresses that the occurrence of ta enables tb ; in a PN structure this implies that there must be at least one place from ta to tb . The set of transition pairs in a causal relation in λ, named Causal R, is defined as follows: Causal R = {(ta , tb )|(ta < tb ∧ ¬(ta ||tb )) ∨ (ta , tb ) ∈ T c). • Concurrent relationship, denoted as ta ||tb . It means that when both ta and tb are simultaneously enabled, if ta fires first, tb is not disabled and vice versa; in a PN structure, ta ||tb implies that there are no places connecting ta to tb and viceversa. ta ||tb is determined if (ta , tb ), (tb , ta ) ∈ R< , i.e., ta , tb have been observed consecutively in the tasks traces in the log λ in both orders, and ta , tb do not form a T c. Then, the set of concurrent transition pairs derived from λ is Conc R = {(ta , tb )|ta < tb ∧ tb < ta ∧ (ta , tb ∈ / T c)} Example 2 From the tasks traces of Example 1, the following relations among tasks were found (t1 < t3 , t6 ), (t2 < t3 ), (t3 < t4 , t6 ), (t4 < t5 , t6 , t7 , t8 , t9 ), (t5 < t3 , t6 ), (t6 < t7 , t3 , t4 , t5 ), T c = ∅, Conc R = {(t3 , t6 ), (t4 , t6 ), (t5 , t6 )}, and Causal R = {(t1 , t3 ), (t1 , t6 ), (t2 , t3 ), (t3 , t4 ), (t4 , t5 ), (t4 , t8 ), (t4 , t7 ), (t4 , t9 ), (t5 , t3 ), (t6 , t7 )}. 5 Discovering t-invariants During the execution of workflow processes, the tasks occur sequentially as cases; this is captured as task traces. If for all cases, their tasks appear once, there is no iterative behaviour captured in the traces; then the alphabet of every trace is the support of a t-invariant. However, processes often include repetitive subprocesses, such as those modelled by the WFN in Fig. 1; for such processes, extracting the minimal support of t-invariants is not trivial. This section describes a novel algorithm to derive the minimal supports of t-invariants from an event log that includes traces involving repetitive behaviour. We will refer in the presentation to the t-invariants of the extended WFN. The t-invariants computed form λ are those that must fulfil the structure of the WFN to be built. Thus, the method presented herein determines the t-invariants of the unknown WFN that generates the log. Several notions used for defining the t-invariants computation technique are introduced below. A trace σ that contains transitions t j such that #(t j , σ ) > 1, where #(t j , σ ) is the number of occurrences of t j in σ , includes sub-sequences representing a repetitive behaviour. 123 Discovering workflow nets of concurrent iterative processes 7 Fig. 2 G r graph of the cycs in Example 1 Definition 11 (Cyclic sub-sequences) We will call cyc a sub-sequence of σ starting with the task ti until before the next occurrence of t j in σ . If ∀t j ∈ cyc, #(t j , σ ) = 1, then it is called elementary cyc, which is denoted as cyce . A cyc may contain other cycs. Example 3 Several traces of Example 1 have a cyc within; it is the case of σ2 = t1 t3 t6 t4 t5 t3 t4 t7 , where cyc = t3 t6 t4 t5 ; furthermore it is elementary (cyce ) because ∀t j ∈ cyc, #(t j , σ2 ) = 1; instead, σ1 and σ4 do not include a cyc. Proposition 1 The tasks in a σ ∈ λ form the support of a t-invariant of the extended WFN. The t-invariant is minimal iff ∀t j ∈ σx , #(t j , σx ) = 1 and ∄σ y . σx ⊆ σ y Proof (Direct) As stated in Definition 5 (ii), the extended WFN is strongly connected. Thus, the transitions in σi , whose first and last tasks of σi belong to i· and ·o, respectively, together with te can be fired repeatedly. When every t j ∈ σi , #(t j , σi ) = 1, then σi does not have cycs, therefore the transitions in σi are the support of a minimal t-invariant. In Example 1, is easy to see that the tasks in σ1 and σ4 (together with te of the extended WFN) are the supports of minimal t-invariants. ⊔ ⊓ Proposition 2 A trace σi ∈ λ includes cycs if it contains tasks belonging to two or more t-invariants. Proof (Contraposition) Suppose that all the tasks in σi belong to a single t-invariant. Then, all the tasks t j in σi occur once, i.e. #(t j , σx ) = 1; thus, there is no iterative behaviour, that is σi , does not share tasks with nested cycles (cycs). ⊔ ⊓ The algorithm prunes the interleaving tasks in traces to separate them by supports of minimal t-invariants. The procedure for determining the t-invariants processes recursively every trace σi from the most external cyc in σi to shorter nested cycs. Definition 12 (Causality graph) The causality graph of an elementary cyc, G r describes the relations between tasks in a cyce . G r (cyce ) = (V , E); where V = {tk |tk ∈ cyce } and E = {(tk , tl ) ∈ V × V |(tk , tl ) ∈ Causal R}. The G r that can be formed with the cycs of Example 3 and the Causal R set of Example 2 is shown in Fig. 2. Definition 13 (Strongly connected subgraphs) The function Scc(G r ) returns a set of strongly connected components {G 1sc (V1 , E 1 ), G 2sc (V2 , E 2 ), ..., G nsc (Vn , E n )} in Gr. Proposition 3 Let G isc (Vi , E i ) be a strongly connected component in a G r ; then, the transitions in Vi such that |Vi | > 1, form the support of a minimal t-invariant of the WFN to discover. 123 8 T. Tapia-Flores, E. López-Mellado Proof (Contraposition) Suppose that the transitions in Vi are not the support of a t-invariant; then, there exist at least a tk ∈ / Vi that must occur to perform the repetitive firing of transitions in G r . Thus, there are no cycles in G r and then, it is not strongly connected. ⊔ ⊓ Next, several simple operators allowing handling task traces are outlined. • Sym. The operator sym(σ ) returns the set of different transitions (the alphabet) used in a sequence σ . • Pos. The operator pos(t, σ ) returns the set of positions where the transition t appears in a trace σ . • Clear. The operator clear (s, A), where s ∈ T ∗ is a sequence and A ∈ 2T is a set of tasks, returns a sequence such that the occurrences of ti ∈ A in s are deleted; if sym(s) ∪ A = ∅ then, clear (s, A) = s; if sym(s) = A then, clear (s, A) = ∅ • Replace. The operator r eplace(r , s, t), where r , s, and t are sequences (|r | ≥ |s| ≥ |t| ≥ 0), returns a sequence such that the first occurrence of s in r is replaced by t; r eplace(r , s, t) returns r if s is not a sub-sequence of r , and returns t if r = s; when t = ǫ, the first occurrence of substring s is deleted form r . Example 4 Consider σ2 = t1 , t3 , t6 , t4 , t5 , t3 , t4 , t7 , cyce = t3 , t6 , t4 , t5 , and τ = t3 , t4 , t5 . The result of applying the above operators is sym(τ ) = t3 , t4 , t5 ; pos(t3 , σ2 ) = 2, 6; clear (cyce , τ ) = t6 ; r eplace(σ2 , cyce , t6 ) = t1 , t6 , t3 , t4 , t7 . Below, a procedure for extracting elementary cycles of traces is presented (Algorithm 1). It explores nested cycs and then, it returns one elementary cyc(cyce ) if there exists; otherwise, it returns the empty set. Algorithm 1 e-cycle(σ ) Require: a trace σ ∈ /∅ Ensure: cyse : a set of ec ycles 1: cyce ⇐ ∅ 2: if |σ | = |sym(σ )| then 3: Return cyce 4: end if 5: ∀tx ∈ σ |#(tx , σ ) > 1 6: aux ⇐ pos(tx , σ ) 7: i ⇐ first item aux 8: j ⇐ second item in aux 9: cyc ⇐ sub-sequence of aux from i to j − 1 10: if |cyc| = |sym(cyc)| then 11: cyce ⇐ cyce ∪ {cyc} 12: else 13: return e-cycle(cyc) 14: return cyce ⊲ there are no cycles ⊲ analyses the repeated tasks ⊲ the sub-sequence is analysed In line 5 of Algorithm 1, every task tx that appears more than once in σ is analysed. The sub-sequence between the first and the second occurrence analysed to verify if it is an elementary cycle; if not, the sub-sequence is analysed recursively to extract the inner cycle. Algorithm 2 shows the procedure for obtaining the minimal t-invariant supports is presented. Consider that each trace in the event log ends with the task te . Each trace in λ is analysed (Line 2) of Algorithm 2; if a trace does not have repeated tasks, then its symbols are added as a t-invariant support; otherwise, the elementary cycles are 123 Discovering workflow nets of concurrent iterative processes 9 Algorithm 2 Getting minimal t-invariants supports Require: λ 1: Y (λ) ⇐ ∅; A ⇐ ∅ 2: ∀σi ∈ λ 3: if |σi | = |sym(σi )| ∧ sym(σi ) ∈ / Y (λ) then 4: if sym(σi )  ⊂ Y (λ) then 5: Y (λ) ⇐ Y (λ) ∪ {sym(σi )} 6: else 7: repeat 8: cyce ⇐ e-cycle(σi ) 9: Graphs ⇐ Scc(G r (cyce )) 10: ∀G i ∈ Graphs 11: if |Vi | > 1 ∧ Vi ∈ / Y (λ) then 12: Y (λ) ⇐ Y (λ) ∪ {Vi } 13: A ⇐ A ∪ Vi 14: σi ⇐ replace(σi , cyce , clear(cyce , A)) 15: cyce ⇐ e-cycle(σi ) 16: until cyce  = ∅ 17: if σi ∈ / Y (λ) then Y (λ) ⇐ Y (λ) ∪ {sym(σi )} 18: return Y (λ) ⊲ minimal t-inv ⊲ σi contains iterations ⊲ extracting nested t-inv ⊲ minimal supports extracted from σi to obtain the corresponding graphs; consequently, the supports of nested t-invariants are found. Property 1. Algorithm 2 determines all the t-invariant supports of the extended WFN to build from λ Proof It is easy to observe that in the repeat loop, the procedure extracts the evident cycles including te , and the nested iterations of traces. ⊔ ⊓ The supports of t-invariants obtained by the application of the above algorithm to the task traces in Example 1 are < Y1 >= t1 , t3 , t4 , t6 , t7 , te , < Y2 >= t2 , t3 , t4 , t8 , te , < Y3 >= t3 , t4 , t5 , and < Y4 >= t1 , t3 , t4 , t9 , te . Notice that in < Y1 >, < Y2 >, and < Y4 >, the transition te of the extended WFN is included, since such invariants involve transitions in i· and ·o, whilst te is not included in < Y3 > because it is the support of a nested t-invariant. 6 Building the PN model Causal relations [ti , t j ] imply the existence of a place between the related transitions. For source and sink places (i, o), causal relations are denoted as[−, t j ] and [ti , −], respectively. Using this basic structure, named dependency, together with the computed t-invariants, a technique for building a PN is now presented.   Definition 14 (First and last tasks) TI = σk ∈λ f ir st(σk ), and TO = σk ∈λ last(σk ), where f ir st(σk ) and last(σk ) provide the first and last tasks in σk , respectively. 6.1 Composing substructures of dependencies The substructures corresponding to causal dependencies must be composed by merging all the transitions that have the same label ti into a single one. The merging of transitions may lead to merge also the places in the involved dependencies; the merging strategy is simple; it is performed using two construction operators [31]. 123 10 T. Tapia-Flores, E. López-Mellado Fig. 3 Operators for merging dependencies Operator 1. The composition of two dependencies in the form [ti , t j ] and [t j , tk ] yields a sequential substructure including two places, allowing firing the sequence ti t j tk ; this is illustrated in Fig. 3a. Operator 2. The composition of two dependencies where the first transitions in two dependencies are the same ([ti , t j ] and [ti , tk ]) yields two possible substructures: • (a) The places of each dependency are merged into a single one iff each of the transitions t j and tk belong to different t-invariants. This substructure is called Or − split; it is denoted as [ti , t j + tk ]. • (b) The places of the dependencies are not merged iff both transitions t j and tk belong to a same t-invariant. This substructure is called And-split; it is denoted as [ti , t j ||tk ]. Similarly, for dependencies having the same second transition ([ti , tk ] and [t j , tk ]), the substructure yielded will be either [ti + t j , tk ] (Or - join) or [ti ||t j , tk ] (And- join). In both cases the observations (ti , tk ), (t j , tk ) ∈ R< , which have induced the dependencies, are preserved. This merging operator is illustrated in Fig. 3b. In general, a set of dependencies in the form ([ti , t j ], [ti , tk ], ...[ti , tr ]) may produce either [ti , t j +tk +...+tr ] or [ti , t j ||tk ||...||tr ] according to the relations between transitions i.e., whether t j , tk , .., tr belong to different tinvariants or t j , tk , .., tr belong to the same t-invariant, respectively. The merging of transitions can be applied iteratively to composed dependencies that exactly match with one expression of transitions of type ti + t j or ti ||t j . For example, the composition of dependencies [ti + t j , tk ] and [ti + t j , tr ] produces [ti + t j , tk + tr ] if both tk and tr do not belong to the same invariant. All t j in dependencies in the form [−, t j ] have the input place i. Similarly, for transitions in dependencies of the form [ti , −], they have the same output place o. Property 2. The application of these merging operators O perator 1 and O perator 2 to the dependencies derived from the pairs in Causal R, and the knowledge of the t-invariant supports, leads to a net structure W F N N1 , which includes all the transitions. Proof O perator 1 forms paths of places and transitions, whilst O perator 2 determines when split and join substructures are created according to the computed t-invariants. ⊔ ⊓ 123 Discovering workflow nets of concurrent iterative processes 11 Fig. 4 N1 built from λ In Example 2, the application of merging operators to the relations in Causal R yields the set composed dependencies: [t1 , t6 ||t3 ], [t2 , t3 ], [t3 , t4 ], [t4 , t5 + t7 + t8 + t9 ], [t5 , t3 ], [t6 , t7 ], [t1 + t2 + t5 , t3 ]. Afterwards, the obtained dependencies by applying Operator 1 and Operator 2 are i: [−, t ∈ TI ] p1 : [t1 + t2 + t5 , t3 ], p1 − p2 : [t1 , t3 ||t6 ], p3 : [t3 , t4 ], p5 : [t4 , t5 + t7 + t8 + t9 ], p4 − p5 : [t6 ||t4 , t7 ], o : [t ∈ TO , −]. The subsequent merging of transitions in dependencies substructures yields the WFN N1 shown in Fig. 4. 6.2 Model adjustment The discovered model N1 replays all the traces in λ; besides, it could execute some additional traces (surplus language). Eventually, it is possible that N1 could not replay some traces in λ. The WFN in Fig. 4 reproduces λ of Example 1, but also other traces; in particular, the traces t2 t3 t4 t9 and t1 t3 t4 t8 , which do not belong to λ, can be fired in N1 . This behaviour is because the computed model N1 does not include PN elements (places and arcs) that ensure behaviours of dependencies not exhibited explicitly by the traces in λ, named implicit dependencies; therefore, N1 must be adjusted. 6.2.1 Implicit dependencies In a PN, the implicit dependencies represent the recall of the occurrence of a ti , which is used as a precondition to enable a non-immediate subsequent transition t j . In general, an implicit dependency [ti , t j ] represents a constraint in the flow of tokens in the PN by ensuring that t j can be fired only when ti has occurred before; thus, the absence of such an implicit dependency will allow the occurrence of more sequences in the net. Definition 15 (Implicit dependency) In a 1-bounded PN, [ti , t j ] is called an implicit dependency, if albeit there exists a place between the transitions, the occurrence of ti does not produce a marking that immediately enables t j ; i.e., it is necessary for the occurrence of at least one transition tk before t j . After building the first model, implicit dependencies may be deduced to be included in N1 in two ways: Type 1: adding a new place between two transitions, or Type 2: using a place already included in N1 . These situations are illustrated in Fig. 5, where the dependency [tx , tw ] is represented by a new place pi in Fig. 5a, and [tx , t y ] is represented using a previously 123 12 T. Tapia-Flores, E. López-Mellado Fig. 5 N1 Implicit dependencies computed place p j in Fig. 5b. Similarly, for the dependency [tx , tz ], the place of [t y , tz ] is used (Fig. 5c). The following notions and conditions are useful to find both kinds of implicit dependencies in the traces of λ, which will be added to N1 . Definition 16 (Implicit precedence) Let ti , t j ∈ T be tasks. ti has an implicit precedence over t j , denoted as ti ≪ t j , if ti >< t j and for every trace σk ∈ λ, ti always appears before tj The Implicit precedence between two transitions suggests an implicit dependency, but it is necessary to analyse other underlying properties to ensure the existence of such a dependency. Definition 17 (Support-dependent tasks) The set of tasks support-dependent of a Yi ∈ Y (λ), denoted as Sd(Yi ), contains tasks tx ∈ T which appear only in the support of Yi . Sd(Yi ) = {tx ∈< Yi > |∄ < Yi >, tx ∈< Yi >} For the t-invariants supports of the Example 1 (< Y1 >= t1 , t3 , t4 , t6 , t7 , te , < Y2 >= t2 , t3 , t4 , t8 , te , < Y3 >= t3 , t4 , t5 , < Y4 >= t1 , t3 , t4 , t9 , te ), the support-dependent sets are Sd(Y1 ) = t7 , t6 , Sd(Y2 ) = t2 , t8 , Sd(Y3 ) = t5 , Sd(Y4 ) = t9 . 6.2.2 Implicit dependencies of Type 1 Now we can state the conditions in which a place must be added to relate two transitions that are not observed consecutively. Proposition 4 Let ti and t j be transitions in N1 . If (i) ti and t j are related by an implicit precedence (ti ≪ t j ), and (ii) there exists a support-dependent set Sd(Yk ) that contains both transitions, then ti and t j are related by an implicit dependency [ti , t j ], which must be added to the structure of N1 . The set of all the implicit dependencies of N1 is I Dep = {[ti , t j ]|(ti << t j ) ∧ ∃Yk where {ti , t j }  ⊂ Sd(Yk )} Proof (Contraposition) Suppose that the dependency [ti , t j ] must not be added to the structure of N1 ; this is because, • (i) A place pi of the dependency [ti , t j ] already exists as the result of applying Operator 1 or Operator 2; therefore, such transitions are not related by an implicit precedence, i.e. ¬(ti << t j ), or • (ii) ti does not need to occur always before t j ; then, both transitions may fire independently since they belong to different t-invariants; thus, there is not a support-dependent set that contains both transitions. ⊔ ⊓ Corollary 1 Let [ti , t j ] be an implicit dependency where ti , t j ∈< Yr > and Yr ∈ Y (λ). If CYr = 0, then, a new place pk ∈ / P2 must be added to N2 to ensure [ti , t j ]. Proof (contraposition) Suppose that pk ∈ P2, since it is linked to either ti , or t j , then, CYr  = 0 (it is the case of dependencies of Type 2). ⊔ ⊓ 123 Discovering workflow nets of concurrent iterative processes 13 Fig. 6 N2 built by adding p6 : [t2 , t8 ] to N1 Conditions of Proposition 4 determine the existence of places that do not represent causal relationships. This is valuable because implicit dependencies are not exhibited in λ. The absence of such places would cause an exceeding language in the PN. In Example 1, the transitions t2 , t8 meet the conditions of the proposition because t2 ≪ t8 and t2 , t8 Sd(t2 ), therefore [t2 , t8 ] must be added to in N1 yielding the model N2 shown in Fig. 6. 6.2.3 Implicit dependencies of Type 2 Now, the supports of minimal t-invariants of N2 in Fig. 6 are J (N2 ) :< J1 >= t1 , t3 , t4 , t6 , t7 , te , < J2 >= t2 , t3 , t4 , t8 , te , < J3 >= t3 , t4 , t5 ; these invariants differ from Y (λ) computed in Subsection 6.1. The discrepancy between Y (λ) and J (N2 ) is because the computed PN does not include the arcs (implicit dependencies type 2, Fig. 5b, c) which ensure the behaviours due to implicit dependencies not exhibited in λ. N2 must be adjusted by determining the suitable implicit dependencies that transform N2 into N3 , whose t-invariants match with Y (λ). The mismatching is detected when ∃Yi ∈ Y (λ) such that CYr  = 0, where C is the incidence matrix of N2 . To amend N2 the next strategy must be applied. Consider a Yr ∈ Y (λ). Let pk be the place that corresponds to the row in which CYr  = 0, more precisely, C( pk )Yr  = 0. To determine the dependency [ti , t j ], another transition of N2 must be linked through pk to one of the transitions in · pk (Fig. 5c) or pk· (Fig. 5b) following the construction procedure derived from the proof of the proposition stated below. Proposition 5 Let [ti , t j ] be an implicit dependency where ti , t j ∈< Yr > and Yr ∈ Y (λ). [ti , t j ] must be added to N2 through a place pk of N2 if C( pk )Yr  = 0, to ensure C( pk )Yr = 0 ([ti , t j ] is of Type 2). Proof (Direct) To ensure C( pk )Yr = 0, two cases are considered: • (i) C( pk )Yr = 1 This requires that C( pk , t j ) = −1 to get C( pk )Yr = 0; thus ti ∈ · pk (the arc ( pk , t j ) must be added to get [ti , t j ]). • (ii) C( pk )Yr = −1 This requires that C( pk , t j ) = 1 to get C( pk )Yr = 0; thus ti ∈ pk · (the arc (t j , pk ) must be added to get [ti , t j ]). Since ti ∈ Sd(Yr ), the added arc ( pk , ti ) only affects Yr . Similarly, since t j ∈ Sd(Yr ) the new arc (t j , pk ) does not alter the other t-invariants. ⊔ ⊓ 123 14 T. Tapia-Flores, E. López-Mellado Proposition 6 If all the implicit dependencies added to N2 through places pk , such that ∀ pk C( pk)Yr  = 0, ∀Yr ∈ Y |CYr  = 0, then the amended net N3 fulfils CY = 0. Proof (Direct) When all the amendments to N2 are performed through the procedure derived from the proof of Proposition 5, the amended net N3 fulfils CY (λ) = 0 and then Y (λ) = J (N3 ). ⊔ ⊓ Algorithm 3 summarises the procedure derived from the previous result to obtain the implicit dependencies. Algorithm 3 Determining implicit dependencies Require: N2 = (P2 , T2 , F2 ), I Dep, Y (λ) 1: P3 ⇐ P2 ; T3 ⇐ T2 ; F3 ⇐ F2 ; 2: ∀[ti , t j ] ∈ I Dep 3: if C(N2 )Y (λ)  = 0 then 4: ∀Yi ∈ Y (λ)|C(N2 )Y (λ)  = 0 5: ∀ pi ∈ P2 6: if C( pr )Yi = 1 then F2 ⇐ F2 ∪ {( pr , t j )} 7: if C( pr )Yi = −1 then F2 ⇐ F2 ∪ {(ti , pr )} 8: else 9: Create a new pk|k > |P3 | 10: P3 ⇐ P3 ∪ { pk } 11: F3 ⇐ F3 ∪ {(ti , pk ), ( pk , t j )} 12: return N3 = (P3 , T3 , F3 ) ⊲ Type 2 dependency ⊲ Type 1 dependency: pk ∈ / P3 Consider N2 in Fig. 6, obtained from the event log in Example 5. First, it is computed J (N1 ) :< J1 >= {t1 , t3 , t4 , t6 , t7 , te }, < J2 >= {t2 , t3 , t4 , t8 , te }, < J3 >= {t3 , t4 , t5 }. There exists a mismatching between both sets since Y (λ)  ⊂ J (N1 ). It can be noticed that Y4 ∈ / J (N1 ), whilst Y1 = J1 , Y2 = J2 and Y3 = J3 . In the analysis of Y4 , pk = p2 because it fulfils the condition C N1 ( p2 )Yi  = 0, as shown in the equation 1. ⎡ ⎤ ⎡ ⎤ ⎤ 1 ⎡ ⎢0 ⎥ 0 −1 −1 0 0 0 0 0 0 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 1 −1 0 1 0 0 0 0 0⎥ ⎥ ⎢1⎥ ⎢ 0 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢1 0 0 0 0 −1 0 0 0 0⎥ ⎥ ⎢1⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢0 ⎥ ⎢ 0 ⎥ ⎢0 0 1 −1 0 0 0 0 0 0 ⎥·⎢ ⎥=⎢ ⎥ ⎢ (1) ⎢ ⎥ ⎢ ⎥ ⎢0 0 0 0 0 1 −1 0 0 0 ⎥ ⎥ ⎢0 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0 0 0 1 −1 0 −1 −1 −1 0 ⎥ ⎢0⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎣0⎦ ⎣0 1 0 0 0 0 0 −1 0 0 ⎦ ⎢ ⎢0 ⎥ ⎣ 0) 1⎦ 0 0 0 0 0 0 1 1 1 −1 1 The Support dependent sets (Sd) computed in Y3 (∀Yi ∈ Y |Yi ∈ J (N1 )) are Sd(t1 ) = Sd(t9 ) = {t1 , t9 }. The transition t1 ∈ p2 · is selected to find the implicit dependency [t1 , t j ] because t1 ∈< Y4 >. The transition that fulfils the conditions t1 ≪ t j and t j = t9 ; therefore, the implicit dependency [t1 , t9 ] is added to N1 by the arc ( p2 , t9 ). Finally, the amended PN N3 , which replays λ is shown in Fig. 7. Remark 1 The procedure of Algorithm 3 does not need to compute the t-invariants of N2 . It only operates on the computed invariants Y (λ) that do not agree with the computed net N2 Yr ∈ Y (λ) such that Yr C  = 0 123 Discovering workflow nets of concurrent iterative processes 15 Fig. 7 Resulting net after model adjustment Property 3. Given an event log of task traces λ ∈ T ∗ , a safe PN model N (N3 ) that reproduces λ can be obtained by computing the invariants from λ, applying Operators 1 and 2, and performing the amendments of Algorithm 3. ′ represents Proof The causality between transitions, stated by the pairs in Causal R ∪ R< the precedence relationship between consecutive transitions in the traces of λ, which are not in Conc R. Then, the t-invariants can be determined from λ (Property 1). Besides, the substructure associated with a dependency [ti , t j ] ensures the consecutive occurrence of these transitions; then, based on the t-invariants, the application of O perator 1 and O perator 2 to all the dependencies [ti , t j ] lead to a PN structure that ensure the flow determined by the dependencies (Property 2). Finally, adjustments to N1 provided by Algorithm 3 allow matching the t-invariants determined from λ with those of the discovered model. ⊔ ⊓ 6.3 Complexity of the method The method is based on the notions introduced in Section 4, whose determination procedures have a computational complexity of O(|λ|). Thus, the complexity of computing the t-invariants Y (|λ|) is O((|V | + |E|) ∗ ||λ||), which is the time for determining the strongly connected components in a graph, with V nodes and E edges, multiplied by the size of the log. Notice that this is the worst case in which each trace has a different e-cycle. Finally, the complexity of the procedure to compute each implicit dependency is O(|P| ∗ |T |); it is related to the matrix vector product. Thus, the complexity of the algorithm is polynomial on |λ|. 7 Implementation issues 7.1 Testing scheme Algorithms and auxiliary procedures derived from the proposed discovery method have been implemented; the software has been tested on numerous WFNs of diverse structural complexity. The tests were performed on artificial logs following the scheme shown in Fig. 8. First, a WFN N including a transition te is proposed; then, with the help of the PN editor/simulator PIPE [33], a workflow log λ is produced. Then the discovery method module processes λ 123 16 T. Tapia-Flores, E. López-Mellado Fig. 8 Test procedure of the discovery method Fig. 9 Detecting overlapped cycles Fig. 10 Detecting cycles in parallel threads yielding a model coded in XML, which is displayed using PIPE again. The obtained model N’ is then compared to N. This scheme allows testing the method in a controller manner by rediscovering WFN with diverse structures, which include cycles nested into t-components, concurrency, and implicit dependencies. This scheme allows testing the method in a controller manner by rediscovering WFN with diverse structures, which include cycles nested into t-components, concurrency, and implicit dependencies. 7.2 Illustrative experiments Artificial logs were produced using PN models that include diverse substructures, which exhibit complex repetitive behaviour: overlapped cycles (Fig. 9) and cycles in parallel threads (Fig. 10). Logs used in these experiments, named λ1 and λ2 , are given below. λ1 = {t0 t1 t2 t4 t5 t6 t2 t3 t1 t2 t4 t5 t7 t8 t2 t4 t5 t7 }, {t0 t1 t2 t3 t1 t2 t4 t5 t6 t2 t4 t5 t7 t8 t2 t4 t5 t7 t9 } λ2 = {t0 t1 t4 t5 t6 t2 t3 t1 t2 t4 t5 t7 }, {t0 t1 t4 t2 t5 t3 t6 t4 t1 t5 t2 t6 t3 t4 t5 t1 t2 t3 t1 t6 t2 }, {t4 t3 t1 t2 t3 t5 t6 t1 t4 t5 t2 t7 }, {t0 t4 t1 t5 t2 t7 }, {t0 t1 t4 t2 t5 t7 } Other experiments have been performed using several WFNs reported in the literature. Figure 11 shows five models obtained by applying the method to logs taken from [34], 123 Discovering workflow nets of concurrent iterative processes 17 Fig. 11 WFN discovered from logs in [21] which present implicit dependencies. The dashed places and their respective input/output arcs correspond to implicit dependencies of type 1, whereas the dashed arcs joining existing places correspond to implicit dependencies of type 2. A complete log for Fig. 11a is ACD, BCFE, BFCE; from this log the implicit dependency of type 1 [A,D] is detected. For the WFN shown in Fig. 11b, the complete log accordant is ACD, BCE, AFCE, ACFE; in this case, implicit dependencies of type 2 [A, D] and [B, E] are found. The corresponding log for the WFN in Fig. 11c is ACFBGE, AFCBGE, AFBCGE, AFBGCE, AFDGE; from this log, the method determined that [A, D], [D, E] are implicit dependencies of type 2. The processing of the complete log ACDEGH, ACDGEH, ACGDEH, BCDFH yields the workflow net in Fig. 11d; for this log, the method first found the implicit dependencies [A, E], [A, G], [B, F]; nevertheless, the net still does not match the t-invariants of the log; hence, a type 2 implicit dependency [C, F] is devised. Finally, the corresponding log of the WFN in Fig. 11e is FBG, ABC, FDBEG, FBDEG, FDEBG, ADEDEBG, ABDEC; in this case, implicit dependencies of t ype 1 [A, C] and [F, G] are found, and the arcs assuring implicit dependencies of t ype 2 are (A, pk ), (F, pk ), ( pk , C), and ( pk , G). Similar to the procedure for obtaining the WFN of Fig. 11b, pk is the result of merging two places. 123 18 T. Tapia-Flores, E. López-Mellado Fig. 12 Actual, observed, and computed behaviours 8 Discussion 8.1 Main features The proposed discovery method includes alternative strategies to that found in the literature, namely the search of invariants and the discovery of concurrent cyclic behaviours. The discovered WF-net is a qualitative model that allows reproducing the logs obtained from the execution of WF processes that behave as sound WF-nets, as specified in Sect. 3.1. This feature, called fitness in [35], is assured to be valued as 1.0 since all the precedence declared by pairs in R< (issued from λ) are represented in the discovered model as stated in Property 3. Furthermore, the procedures that implement the method are based on polynomial time algorithms on the size of the log, which is a welcomed feature for dealing with large logs. Comparing the proposal with an outstanding published method, the alpha++ algorithm [32], our approach can discover the reported models; besides, the computational complexity is lower. 8.2 Limitations and challenges The first limitation we can point out arises from the assumptions stated in the problem formulation, which require that the obtained model uses only the transitions in T once. This constraint, issued from the standard problem formulation, can be relaxed when tasks symbols may be associated to more than one transition or when non-observable (silent) transitions are allowed. Another assumption held in this paper is that in the observed behaviour, the traces are recorded correctly; in particular, any task is missed in a trace. Although the method can build WF-nets that can reproduce the input logs, the discovered model could represent exceeding behaviour due to cycles in the synthesised PN. For example, the trace abcbcbd includes a repetition of the sub-trace cb; then, the model will represent ab(cb)+ d. The language overrepresentation (computed as a measure of precision in [35]) is due in part to the above feature; this analysis is out of the scope of the paper and currently is a research matter of the authors. The relationships between the actual, observed, and computed behaviours are depicted in Fig. 12. During the tests of the method using artificial logs obtained through known WFN, we detected some particular sound WF-nets in which this method fails to rediscover all dependencies between tasks. Since the method is based on representing repetitive behaviour exhibited by the log through inferring the t-invariants, it cannot distinguish the supports of t-invariants in which one or several tasks need to occur a given number of times to reproduce the repetitive behaviour. In other words, when a t-component has a cycle in its execution, the algorithm may find more than one t-invariant; the outcome is a WF-net that can reproduce the observed log and other traces involving such a cycle, which are not in the log. Consider the two WFNs of Figure. 13; the net in Fig. 13a is executed to generate the complete workflow log ABCFGECDH, ABCECDFGH, ABFCGECDH, ABFCEGCDH, ABCEFCDGH, ABCFEGCDH; notice that this WF-net (more precisely, the extended WF-net) has only one 123 Discovering workflow nets of concurrent iterative processes 19 Fig. 13 WF-Net with a nested cycle in the t-invariant t-invariant. During the application of the discovery method, two t-invariants supports < Y1 > = A, B, C, F, D, G, H and < Y2 > = C, E are computed; then the WF-net built is that shown in Fig. 13b, which in fact (the extended WF-net) has two t-invariants. The implicit dependencies of Type 1 [B, E] and [E, D] are missed and should be computed to rediscover the WFN used to generate the logs. In particular, for this example, a subsequent analysis must determine that the cycle of transitions C and E in < Y2 > occurs once every time < Y1 > is executed. The dependency between the executions of the t-invariants is still under research. 9 Conclusion The discovery method proposed in this paper is based on determining the supports of tinvariants from the log λ; it allows building an initial model, which can be adjusted later, if needed, with the help of the computed t-invariants; the final model includes implicit causal relationships between transitions that have not been observed consecutively in the traces of λ. The discovered WFN replays all the traces in λ from M0 and may eventually accept exceeding iterative sub-sequences, which correspond to the behaviour inherent to PN with repetitive components. Based on polynomial-time algorithms, the method allows processing large event logs. The implemented software has been tested on artificial logs corresponding to WFNs with diverse structures; tests demonstrated the accuracy and efficiency of the method when complex PN structures are addressed. Further work regards the application of the method to event logs issued from actual processes. Current research addresses the problem of PN discovery from incomplete observed sequences and quality measures to assess the obtained model regarding the event log. Acknowledgements The first author is Tonatiuh Tapia-Flores; he has been sponsored by CONACYT under the Ph.D. Grant No. 263566. Declarations Conflict of interest The authors declare that they have no conflict of interest financial or non-financial with any person or organisation. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory 123 20 T. Tapia-Flores, E. López-Mellado regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References 1. Gold, M.E.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967) 2. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988). https://doi.org/10.1023/ a:1022821128753 3. Meda-Campana, M., Ramirez-Treviro, A., López-Mellado, E.: Asymptotic identification of discrete event systems. In: Proceedings of the 39th IEEE conference on, pp. 2266–2271. IEEE (2000) 4. Meda-Campana, M., López-Mellado, E.: Identification of concurrent discrete event systems using petri nets. In: Proceedings of the 17th IMACS world congress on computational and applied mathematics, pp. 11–15 (2005) 5. Giua, A., Seatzu, C.: Identification of free-labeled petri nets via integer programming. In: Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC’05. 44th IEEE conference on, pp. 7639–7644 (2005). IEEE 6. Cabasino, M.P., Giua, A., Seatzu, C.: Linear programming techniques for the identification of place/transition nets. In: Decision and control, 2008. CDC 2008. 47th IEEE conference on, pp. 514– 520 (2008). IEEE 7. Dotoli, M., Pia Fanti, M., Mangini, A.M., Ukovich, W.: Identification of the unobservable behaviour of industrial automation systems by petri nets. Control. Eng. Pract. 19(9), 958–966 (2011) 8. Klein, S., Litz, L., Lesage, J.-J.: Fault detection of discrete event systems using an identification approach. In: 16th IFAC world congress (2005) 9. Roth, M., Schneider, S., Lesage, J.-J., Litz, L.: Fault detection and isolation in manufacturing systems with an identified discrete event model. Int. J. Syst. Sci. 43(10), 1826–1841 (2012) 10. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: Input-output identification of controlled discrete manufacturing systems. Int. J. Syst. Sci. 45(3), 456–471 (2014) 11. Estrada-Vargas, A.P., Lesage, J.-J., López-Mellado, E.: A stepwise method for identification of controlled discrete manufacturing systems. Int. J. Comput. Integr. Manuf. 28(2), 187–199 (2015). https://doi.org/ 10.1080/0951192X.2013.874591 12. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: A comparative analysis of recent identification approaches for discrete-event systems. Math. Prob. Eng. (2010). https://doi.org/10.1155/2010/453254 13. Cabasino, M.P., Darondeau, P., Fanti, M.P., Seatzu, C.: Model identification and synthesis of discreteevent systems. In: Zhou, M., Li, H.X., Weijnen, M. (eds.) Contemporary issues in systems science and engineering. Wiley, London (2013) 14. Aalst, W.M.: The application of petri nets to workflow management. J. Circuits Syst. Comput. 8(01), 21–66 (1998) 15. Ou-Yang, C., Winarjo, H.: Petri-net integration–an approach to support multi-agent process mining. Expert Syst. Appl. 38(4), 4039–4051 (2011). https://doi.org/10.1016/j.eswa.2010.09.066 16. Ma, J., Wang, K., Xu, L.: Modelling and analysis of workflow for lean supply chains. Enterp. Inf. Syst. 5(4), 423–447 (2011). https://doi.org/10.1080/17517575.2011.580007 17. Cook, J.E., Wolf, A.L.: Automating process discovery through event-data analysis. In: 1995 17th international conference on software engineering, pp. 73–73 (1995). https://doi.org/10.1145/225014.225021 18. Cook, J.E., Du, Z., Liu, C., Wolf, A.L.: Discovering models of behavior for concurrent workflows. Comput. Ind. 53(3), 297–319 (2004) 19. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.J., Saltor, F., Ramos, I., Schek, H.J., Saltor, F., Ramos, I., Alonso, G., Alonso, G. (eds.) EDBT Lecture Notes in Computer Science, vol. 1377, pp. 469–483. Springer, Berlin (1998). https://doi.org/10.1007/ BFb0101003 20. Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. Knowledge and data engineering, ieee transactions on 16(9), 1128–1142 (2004) 21. Wang, D., Ge, J., Hu, H., Luo, B.: A new process mining algorithm based on event type. In: Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth international conference on, pp. 1144–1151 (2011). IEEE 22. Wen, L., Wang, J., Sun, J.: Detecting implicit dependencies between tasks from event logs. In: Proceedings of the 8th Asia-Pacific Web conference on frontiers of WWW research and development. APWeb’06, pp. 591–603. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11610113_52 123 Discovering workflow nets of concurrent iterative processes 21 23. Wang, D., Ge, J., Hu, H., Luo, B., Huang, L.: Discovering process models from event multiset. Expert Syst. Appl. 39(15), 11970–11978 (2012) 24. Aalst, W.M.P.: Process mining: discovery, conformance and enhancement of business Processes, 1st edn. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-19345-3 25. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.: Automated discovery of process models from event logs: review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686–705 (2019). https://doi.org/10.1109/TKDE.2018.2841877 26. Santos Garcia, C., Meincheim, A., Faria Junior, E.R., Dallagassa, M.R., Sato, D.M.V., Carvalho, D.R., Santos, E.A.P., Scalabrin, E.E.: Process mining techniques and applications—a systematic mapping study. Expert Syst. Appl. 133, 260–295 (2019). https://doi.org/10.1016/j.eswa.2019.05.003 27. Aalst, W.: Process mining. Data science in action. Springer, Berlin (2016) 28. Aalst, J.C.: Process Mining Handbook. Lecture Notes in Business Information Processing, vol. 448, 1st edn. Springer, Berlin (2022). https://doi.org/10.1007/978-3-031-08848-3 29. Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.J.: A black-box identification method for automated discrete-event systems. IEEE Trans. Autom. Sci. Eng. 99, 1–16 (2015). https://doi.org/10.1109/TASE. 2015.2445332 30. Tapia-Flores, T., López-Mellado, E., Estrada-Vargas, A.P., Lesage, J.J.: Petri net discovery of discrete event processes by computing t-invariants. In: Emerging technology and factory automation (ETFA), 2014 IEEE, pp. 1–8 (2014). https://doi.org/10.1109/ETFA.2014.7005080 31. Tapia-Flores, T., López-Mellado, E.: Inferring the repetitive behaviour from event logs for process mining discovery. In: Prasath, R., Gelbukh, A. (eds.) Min. Intell. Knowl. Explorat., pp. 164–173. Springer, Cham (2017) 32. Wen, L., Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Disc. 15(2), 145–180 (2007). https://doi.org/10.1007/s10618-007-0065-y 33. Dingle, N.J., Knottenbelt, W.J., Suto, T.: Pipe2: a tool for the performance evaluation of generalised stochastic petri nets. SIGMETRICS Perform. Eval. Rev. 36(4), 34–39 (2009). https://doi.org/10.1145/ 1530873.1530881 34. Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: In: Colom, J.-M., Desel, J. (eds.) Discovering BlockStructured Process Models from Event Logs - A Constructive Approach, pp. 311–329. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17 35. Buijs, J.C.A.M., Dongen, B.F., Aalst, W.M.P.: Quality dimensions in process discovery: the importance of fitness, precision, generalization and simplicity. Int. J. Cooper. Inf. Syst. 23(01), 1440001 (2014). https:// doi.org/10.1142/S0218843014400012 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 123