Papers by Artem Polyvyanyy
Fundamenta Informaticae, 2024
A model of an information system describes its processes and how resources are involved in these ... more A model of an information system describes its processes and how resources are involved in these processes to manipulate data objects. This paper presents an extension to the Petri nets formalism suitable for describing information systems in which states refer to object instances of predefined types and resources are identified as instances of special object types. Several correctness criteria for resource- and object-aware information systems models are proposed, supplemented with discussions on their decidability for interesting classes of systems. These new correctness criteria can be seen as generalizations of the classical soundness property of workflow models concerned with process control flow correctness.
International Conference on Process Mining, 2023
A transhumeral prosthesis restores missing anatomical segments below the shoulder, including the ... more A transhumeral prosthesis restores missing anatomical segments below the shoulder, including the hand. Active prostheses utilize real-valued, continuous sensor data to recognize patient target poses, or goals, and proactively move the artificial limb. Previous studies have examined how well the data collected in stationary poses, without considering the time steps, can help discriminate the goals. In this case study paper, we focus on using time series data from surface electromyography electrodes and kinematic sensors to sequentially recognize patients' goals. Our approach involves transforming the data into discrete events and training an existing process mining-based goal recognition system. Results from data collected in a virtual reality setting with ten subjects demonstrate the effectiveness of our proposed goal recognition approach, which achieves significantly better precision and recall than the state-of-the-art machine learning techniques and is less confident when wrong, which is beneficial when approximating smoother movements of prostheses.
Artificial Intelligence, 2023
The problem of goal recognition requests to automatically infer an accurate probability distribut... more The problem of goal recognition requests to automatically infer an accurate probability distribution over possible goals an autonomous agent is attempting to achieve in the environment. The state-of-the-art approaches for goal recognition operate under full knowledge of the environment and possible operations the agent can take. This knowledge, however, is often not available in real-world applications. Given historical observations of the agents' behaviors in the environment, we learn skill models that capture how the agents achieved the goals in the past. Next, given fresh observations of an agent, we infer their goals by diagnosing deviations between the observations and all the available skill models. We present a framework that serves as an outline for implementing such data-driven goal recognition systems and its instance system implemented using process mining techniques. The evaluations we conducted using our publicly available implementation confirm that the approach is well-defined, i.e., all system parameters impact its performance, has high accuracy over a wide range of synthetic and real-world domains, which is comparable with the more knowledge-demanding state-of-the-art approaches, and operates fast.
International Conference on Business Process Management, 2023
Process discovery studies ways to use event data generated by business processes and recorded by ... more Process discovery studies ways to use event data generated by business processes and recorded by IT systems to construct models that describe the processes. Existing discovery algorithms are predominantly concerned with constructing process models that represent the control flow of the processes. Agent system mining argues that business processes often emerge from interactions of autonomous agents and uses event data to construct models of the agents and their interactions. This paper presents and evaluates Agent Miner, an algorithm for discovering models of agents and their interactions from event data composing the system that has executed the processes which generated the input data. The conducted evaluation using our open-source implementation of Agent Miner and publicly available industrial datasets confirms that our algorithm can provide insights into the process participants and their interaction patterns and often discovers models that describe the business processes more faithfully than process models discovered using conventional process discovery algorithms.
Application and Theory of Petri Nets and Concurrency, 2023
A process discovery algorithm aims to construct a model from data generated by historical system ... more A process discovery algorithm aims to construct a model from data generated by historical system executions such that the model describes the system well. Consequently, one desired property of a process discovery algorithm is rediscoverability, which ensures that the algorithm can construct a model that is behaviorally equivalent to the original system. A system often simultaneously executes multiple processes that interact through object manipulations. This paper presents a framework for developing process discovery algorithms for constructing models that describe interacting processes based on typed Jackson Nets that use identifiers to refer to the objects they manipulate. Typed Jackson Nets enjoy the reconstructability property which states that the composition of the processes and the interactions of a decomposed typed Jackson Net yields a model that is bisimilar to the original system. We exploit this property to demonstrate that if a process discovery algorithm ensures rediscoverability, the system of interacting processes is rediscoverable.
Advanced Information Systems Engineering , 2023
Increasing the success rate of a process, i.e. the percentage of cases that end in a positive out... more Increasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan oers to increase the probability that the customer takes a loan. Each treatment has a cost. Thus, when dening policies for prescribing treatments to cases, managers need to consider the net gain of the treatments. Also, the eect of a treatment varies over time: treating a case earlier may be more eective than later in a case. This paper presents a prescriptive monitoring method that automates this decision-making task. The method combines causal inference and reinforcement learning to learn treatment policies that maximize the net gain. The method leverages a conformal prediction technique to speed up the convergence of the reinforcement learning mechanism by separating cases that are likely to end up in a positive or negative outcome, from uncertain cases. An evaluation on two real-life datasets shows that the proposed method outperforms a state-of-the-art baseline.
Data & Knowledge Engineering, 2023
Process analytics is a collection of data-driven techniques for, among others, making predictions... more Process analytics is a collection of data-driven techniques for, among others, making predictions for individual process instances or overall process models. At the instance level, various novel techniques have been recently devised, tackling analytical tasks such as the next activity, remaining time, or outcome prediction. However, there is a notable void regarding predictions at the process model level. It is the ambition of this article to fill this gap. More specifically, we develop a technique to forecast the entire process model from historical event data. A forecasted model is a will-be process model representing a probable description of the overall process for a given period in the future. Such a forecast helps, for instance, to anticipate and prepare for the consequences of upcoming process drifts and emerging bottlenecks. Our technique builds on a representation of event data as multiple time series, each capturing the evolution of a behavioural aspect of the process model, such that corresponding time series forecasting techniques can be applied. Our implementation demonstrates the feasibility of process model forecasting using real-world event data. A user study using our Process Change Exploration tool confirms the usefulness and ease of use of the produced process model forecasts.
Information Systems, 2023
A process discovery algorithm aims to construct a process model that represents the real-world pr... more A process discovery algorithm aims to construct a process model that represents the real-world process stored in event data well; it is precise, generalizes the data correctly, and is simple. At the same time, it is reasonable to expect that better-quality input event data should lead to constructed process models of better quality. However, existing process discovery algorithms omit the discussion of this relationship between the inputs and outputs and, as it turns out, often do not guarantee it. We demonstrate the latter claim using several quality measures for event data and discovered process models. Consequently, this paper requests for more rigor in the design of process discovery algorithms, including properties that relate the qualities of the inputs and outputs of these algorithms. We present four incremental maturity stages for process discovery algorithms, along with concrete guidelines for formulating relevant properties and experimental validation. We then use these stages to review several state of the art process discovery algorithms to confirm the need to reflect on how we perform algorithmic process discovery.
Proceedings of the ICPM Doctoral Consortium and Demo Track 2022 co-located with 4th International Conference on Process Mining (ICPM 2022), 2022
ProLift is a Web-based tool that uses causal machine learning, specifically uplift trees, to disc... more ProLift is a Web-based tool that uses causal machine learning, specifically uplift trees, to discover rules for optimizing business processes based on execution data (event logs). ProLift allows users to upload an event log, to specify case treatments and case outcomes, and to visualize treatment rules that increase the probability of positive case outcomes. The target audience of ProLift includes researchers and practitioners interested in leveraging causal machine learning for process improvement.
IEEE Transactions on Knowledge and Data Engineering, 2022
Through the application of process mining, organisations can improve their business processes by ... more Through the application of process mining, organisations can improve their business processes by leveraging data recorded as a result of the performance of these processes. Over the past two decades, the field of process mining evolved considerably, offering a rich collection of analysis techniques with different objectives and characteristics. Despite the advances in this field, a solid statistical foundation is still lacking. Such a foundation would allow analysis outcomes to be found or judged using the notion of statistical significance, thus providing a more objective way to assess these outcomes. This paper contributes several statistical tests and association measures that treat process behaviour as a variable. The sensitivity of these tests to their parameters is evaluated and their applicability is illustrated through the use of real-life event logs. The presented tests and measures constitute a key contribution to a statistical foundation for process mining.
Process Mining Handbook, 2022
User interaction logs allow us to analyze the execution of tasks in a business process at a finer... more User interaction logs allow us to analyze the execution of tasks in a business process at a finer level of granularity than event logs extracted from enterprise systems. The fine-grained nature of user interaction logs open up a number of use cases. For example, by analyzing such logs, we can identify best practices for executing a given task in a process, or we can elicit differences in performance between workers or between teams. Furthermore, user interaction logs allow us to discover repetitive and automatable routines that occur during the execution of one or more tasks in a process. Along this line, this chapter introduces a family of techniques, called Robotic Process Mining (RPM), which allow us to discover repetitive routines that can be automated using robotic process automation technology. The chapter presents a structured landscape of concepts and techniques for RPM, including techniques for user interaction log preprocessing, techniques for discovering frequent routines, notions of routine automatability, as well as techniques for synthesizing executable routine specifications for robotic process automation.
Proceedings of the Workshop on Process Management in the AI Era (PMAI 2022) co-located with 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022), 2022
Goal Recognition (GR) is a research problem that studies ways to infer the goal of an intelligent... more Goal Recognition (GR) is a research problem that studies ways to infer the goal of an intelligent agent based on its observed behavior and knowledge of the environment. A common assumption of GR is that the underlying environment is stationary. However, in many real-world scenarios, it is necessary to recognize agents' goals over extended periods. Therefore, it is reasonable to assume that the environment will change throughout a series of goal recognition tasks. This paper introduces the problem of continuous GR over a changing environment. The solution to this problem is a GR system capable of recognizing agents' goals over an extended period where the environment in which the agents operate changes. To support the evaluation of candidate solutions to this new GR problem, in this paper, we present the Goal Recognition Amidst Changing Environments (GRACE) tool for generating instances of the new problem. Specifically, the tool can be configured to generate GR problems that account for different environmental changes and drifts. GRACE can generate a series of modified environments over discrete time steps and the data induced by agents operating in the environment while completing different goals.
Process Querying Methods, 2022
Process querying studies concepts and methods from fields like Big data, process modeling and ana... more Process querying studies concepts and methods from fields like Big data, process modeling and analysis, business process intelligence, and process analytics and applies them to retrieve and manipulate real-world and designed processes. This chapter reviews state-of-the-art methods for process querying, summarizes techniques used to implement process querying methods, discusses typical applications of process querying, and identifies research gaps and suggests directions for future research in process querying.
Process Querying Methods, 2022
A process is a collection of actions that were already, are currently being, or must be taken in ... more A process is a collection of actions that were already, are currently being, or must be taken in order to achieve a goal, where an action is an atomic unit of work, for instance, a business activity or an instruction of a computer program. A process repository is an organized collection of models that describe processes, for example, a business process repository and a software repository. Process repositories without facilities for process querying and process manipulation are like databases without Structured Query Language , that is, collections of elements without effective means for deriving value from them. Process Query Language (PQL) is a domain-specific programming language for managing processes described in models stored in process repositories. PQL can be used to query and manipulate process models based on possibly infinite collections of processes that they represent, including processes that support concurrent execution of actions. This chapter presents PQL, its current features, publicly available implementation, planned design and implementation activities, and open research problems associated with the design of the language.
Process Querying Methods, 2022
This chapter gives a brief introduction to the research area of process querying. Concretely, it ... more This chapter gives a brief introduction to the research area of process querying. Concretely, it articulates the motivation and aim of process querying, gives a definition of process querying, presents the core artifacts studied in process querying, and discusses a framework for guiding the design, implementation, and evaluation of methods for process querying.
Information Systems, 2022
There are many fields of computing in which having access to large volumes of data allows very pr... more There are many fields of computing in which having access to large volumes of data allows very precise models to be developed. For example, machine learning employs a range of algorithms that deliver important insights based on analysis of data resources. Similarly, process mining develops algorithms that use event data induced by real-world processes to support the modeling of – and hence understanding and long-term improvement of – those processes.
In process mining, the quality of the learned process models is assessed using conformance checking techniques, which measure how well the models represent and generalize the data. This article presents the entropic relevance measure for conformance checking of stochastic process models, which are models that also provide information in regard to the likelihood of observing each sequence of observed events. Accurate stochastic conformance measurement allows identification of models that describe the data better, including the captured sequences of process events and their frequencies, with information about the likelihood of the described processes being an essential step toward simulating and forecasting future processes.
Entropic relevance represents a blend between the traditional precision and recall quality criteria in conformance checking, in that it both penalizes observed processes that the model does not describe, and also penalizes processes that are permitted by the model yet were not observed. Entropic relevance can be computed in time linear in the size of the input data; and measures a fundamentally different phenomenon than other existing measures. Our evaluation over industrial datasets confirms the feasibility of using the measure in practice.
Information Systems, 2022
Robotic Process Automation (RPA) is a technology to automate routine work such as copying data ac... more Robotic Process Automation (RPA) is a technology to automate routine work such as copying data across applications or filling in document templates using data from multiple applications. RPA tools allow organizations to automate a wide range of routines. However, identifying and scoping routines that can be automated using RPA tools is time consuming. Manual identification of candidate routines via interviews, walk-throughs, or job shadowing allow analysts to identify the most visible routines, but these methods are not suitable when it comes to identifying the long tail of routines in an organization. This article proposes an approach to discover automatable routines from logs of user interactions with IT systems and to synthetize executable specifications for such routines. The proposed approach focuses on discovering routines where a user transfers data from a set of fields (or cells) in an application, to another set of fields in the same or in a different application (data transfer routines). The approach starts by discovering frequent routines at a control-flow level (candidate routines). It then determines which of these candidate routines are automatable and it synthetizes an executable specification for each such routine. Finally, it identifies semantically equivalent routines so as to output a set of non-redundant routines. The article reports on an evaluation of the approach using a combination of synthetic and real-life logs. The evaluation results show that the approach can discover automatable routines that are known to be present in a UI log, and that it discovers routines that users recognize as such in real-life logs.
Advanced Information Systems Engineering, 2022
Process mining extracts value from the traces recorded in the event logs of IT-systems, with proc... more Process mining extracts value from the traces recorded in the event logs of IT-systems, with process discovery the task of inferring a process model for a log emitted by some unknown system. Generalization is one of the quality criteria applied to process models to quantify how well the model describes future executions of the system. Generalization is also perhaps the least understood of those criteria, with that lack primarily a consequence of it measuring properties over the entire future behavior of the system when the only available sample of behavior is that provided by the log. In this paper, we apply a bootstrap approach from computational statistics, allowing us to define an estimator of the model's generalization based on the log it was discovered from. We show that standard process mining assumptions lead to a consistent estimator that makes fewer errors as the quality of the log increases. Experiments confirm the ability of the approach to support industry-scale data-driven systems engineering.
Application and Theory of Petri Nets and Concurrency , 2022
A model of an information system describes its processes and how these processes manipulate data ... more A model of an information system describes its processes and how these processes manipulate data objects. Object-aware extensions of Petri nets focus on modeling both the life-cycle of objects, and their interactions. In this paper, we focus on Petri nets with identifiers, where identifiers are used to refer to objects. These objects should "behave" well in the system from inception to termination. We formalize this intuition in the notion of identifier soundness, and show that although this property is undecidable in general, useful subclasses exist that guarantee identifier soundness by construction.
Information Systems, 2021
Initially, process mining focused on discovering process models from event data, but in recent ye... more Initially, process mining focused on discovering process models from event data, but in recent years the use and importance of conformance checking has increased. Conformance checking aims to uncover differences between a process model and an event log. Many conformance checking techniques and measures have been proposed. Typically, these take into account the frequencies of traces in the event log, but do not consider the probabilities of these traces in the model. This asymmetry leads to various complications. Therefore, we define conformance for stochastic process models taking into account frequencies and routing probabilities. We use the earth movers' distance between stochastic languages representing models and logs as an intuitive conformance notion. In this paper, we show that this form of stochastic conformance checking enables detailed diagnostics projected on both model and log. To pinpoint differences and relate these to specific model elements, we extend the so-called 'reallocation matrix' to consider paths. The approach has been implemented in ProM and our evaluations show that stochastic conformance checking is possible in real-life settings.
Uploads
Papers by Artem Polyvyanyy
In process mining, the quality of the learned process models is assessed using conformance checking techniques, which measure how well the models represent and generalize the data. This article presents the entropic relevance measure for conformance checking of stochastic process models, which are models that also provide information in regard to the likelihood of observing each sequence of observed events. Accurate stochastic conformance measurement allows identification of models that describe the data better, including the captured sequences of process events and their frequencies, with information about the likelihood of the described processes being an essential step toward simulating and forecasting future processes.
Entropic relevance represents a blend between the traditional precision and recall quality criteria in conformance checking, in that it both penalizes observed processes that the model does not describe, and also penalizes processes that are permitted by the model yet were not observed. Entropic relevance can be computed in time linear in the size of the input data; and measures a fundamentally different phenomenon than other existing measures. Our evaluation over industrial datasets confirms the feasibility of using the measure in practice.
In process mining, the quality of the learned process models is assessed using conformance checking techniques, which measure how well the models represent and generalize the data. This article presents the entropic relevance measure for conformance checking of stochastic process models, which are models that also provide information in regard to the likelihood of observing each sequence of observed events. Accurate stochastic conformance measurement allows identification of models that describe the data better, including the captured sequences of process events and their frequencies, with information about the likelihood of the described processes being an essential step toward simulating and forecasting future processes.
Entropic relevance represents a blend between the traditional precision and recall quality criteria in conformance checking, in that it both penalizes observed processes that the model does not describe, and also penalizes processes that are permitted by the model yet were not observed. Entropic relevance can be computed in time linear in the size of the input data; and measures a fundamentally different phenomenon than other existing measures. Our evaluation over industrial datasets confirms the feasibility of using the measure in practice.
In this tutorial, we demonstrate our recent approach to modeling and verification of models of information systems in three parts. Firstly, we present our Information Systems Modeling Language (ISML) for describing information and process constraints and the interplay between these two types of constraints [1,2]. Secondly, we demonstrate Information Systems Modeling Suite (ISM Suite) [3], an integrated environment for developing, simulating, and analyzing models of information systems described in ISML, released under an open-source license. In this part, using our tools, we show several example pitfalls at the level of information and process interplay. Finally, we discuss current and future research directions that aim at strengthening the theoretical foundations and practical aspects of our approach to the design of information systems.