A Machine Learning Based Framework for Business Process Mining

  • Ghalia Tello

Student thesis: Doctoral Thesis


Business Process Mining is a discipline whose aim is analyzing business processes using event data logged by IT systems. Extracting the business knowledge embedded in event logs can help improve business process efficiency and reduce operational risks. For instance, preventing business rule violations can protect companies from major losses. Accordingly, several research attempts have been proposed to verify the compliance of process executions with respect to the target business model. However, the existing approaches are affected by some major shortcomings. First of all, most of the existing approaches are reactive, as they identify and detect model violations rather than preventing them in the first place. Furthermore, only a few research attempts have been made to predict model violations using Machine Learning (ML) techniques, specifically memory-less ML techniques. Such approaches require a considerable amount of manual intervention in data preparation. Finally, such approaches assume that event logs are of the same level of granularity as the activities in the process models, i.e. that there is a one-to-one mapping between process model activities and events recorded during process execution. However, real-life event logs are typically much less structured and more complex than the predefined business activities they refer to: event logs and process model activities are almost always defined at different levels of granularity. The challenges posed by this discrepancy can be addressed by means of techniques establishing a mapping between granularity levels, such as log-lifting. To address these issues, this work proposes an integrated end-to-end framework to predict business model violations based on a stream of low-level event logs. The proposed framework integrates three main components: log-segmentation, log-lifting, and predictive modeling and consists in as many phases. The purpose of the log-segmentation phase is to identify the potential segments in a flow of low-level events: each segment corresponds to an unknown high-level activity. For this, we propose a segmentation algorithm based on maximum likelihood with n-gram analysis. In the log-lifting phase, event segments are mapped into their corresponding high-level activities using a supervised machine learning technique. Several machine learning classification methods are explored including ANNs, SVMs, and Random Forest. For the predictive phase, we utilize a Bidirectional Long-Short-Time-Memory (BiLSTM) model, a special recurrent neural network that exploits long-ranged sequence context. The model is integrated with an attention mechanism, which improves its capabilities in capturing discriminating features. With this model, we boost the overall performance of our framework. The proposed framework can forecast important attributes of the upcoming violations, such as the type and the time-location of the violation. This information is very useful when determining the type of countermeasures that need to be taken. We demonstrate the applicability of our framework using real-life event logs. The experimental results confirm the ability of the framework in the realistic task of predicting business model violations from a stream of low-level event logs.
Date of AwardOct 2020
Original languageAmerican English


  • Process Mining
  • Log Segmentation
  • Log lifting
  • Event Abstraction
  • Violation Prediction
  • Machine Learning
  • Artificial Neural Networks
  • Bidirectional Long-Short-Time- Memory model.

Cite this