Abstract
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, mainstream machine-learning approaches tend to ignore the challenges posed by concurrency in operational processes. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper understanding of the challenges associated with training machine learning models on process data is essential for establishing a robust integration of process mining and machine learning. Our analysis aims to lay the groundwork for a methodology that aligns machine learning with process mining requirements. We encourage further research in this direction to advance the field and effectively address these critical issues.
Original language | British English |
---|---|
Pages (from-to) | 24583-24595 |
Number of pages | 13 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
State | Published - 2024 |
Keywords
- concurrency
- encoding
- machine learning
- non-parametric distribution
- non-stationary
- Process mining
- training
- zero-shot learning