SQL Miner

Extracting Declarative Process Models from Relational Event Logs

The SQL Miner

Flexible business processes can often be modelled more easily using a declarative rather than a procedural modelling approach. Process mining aims at automating the discovery of business process models. Existing declarative process mining approaches either suffer from performance issues with real-life event logs or limit their expressiveness to a specific set of constaint types. Lately, RelationalXES, a relational database architecture for storing event log data, has been introduced. We introduce a mining approach that directly works on relational event data by querying the log with conventional SQL. By leveraging database performance technology, the mining procedure is fast without limiting itself to detecting certain control-flow constraints. Queries can be customised and cover process perspectives beyond control-flow, e.g., organisational aspects. We evaluated the performance and the capabilities of our approach with regard to several real-life event logs.

SQL Miner Functionality

Background

Process mining is the area of research that embraces the automated discovery, conformance checking and enhancement of business process models. All involved techniques are evidence-based, as the input is always event logs that comprise a collection of computer recorded information that track the executions of process instances. Declarative languages like Declare, Dynamic Condition Response (DCR) graphs or Declarative Process Intermediate Language (DPIL) can be used to represent these models, and tools like DeclareMiner, MINERful or DPILMiner offer capabilities to automatically discover such models from event logs. Existing declarative process mining approaches either suffer from performance issues with real-life event logs or limit their search space to a specific and fixed set of constaints to be able to cope with the size of real-life event logs. Both issues are highlighted in current literature. To the best of our knowledge an approach that is fast and customisable does not exist.

Solution

We fill this research gap by introducing a declarative mining approach that works on event data that is stored in relational databases by querying the log with conventional SQL. Process mining by means of SQL queries turns out to be an integrated and language overspanning solution to process discovery. By leveraging relational database performance technology, e.g., indexes on data columns it is fast without limiting itself to certain predefined constraints. Queries can be tailored to arbitrary aspects of a process, e.g., control-flow as well as organisational issues. Furthermore, the results can be transformed to each of the mentioned process modelling languages.

The Structured Query Language (SQL) is a declarative language designed for managing data held in a relational database. It is based upon relational algebra. We show the approach with an example, namely, the Response(A,B) constraint template, which aims at discovering all the activity combinations (a,b) where b is forced to be executed if a was completed at some point before. The query for discovering Response constraints is organised as follows:

SELECT  `response', A, B, [Support], [Confidence]
FROM Log l1, [ActivityCombinations] c
WHERE [Constraint Expressions Subqueries]
GROUP BY c.A, c.B
HAVING [Support] > minSupp AND [Confidence] > minConf

Other constraint templates of the Declare language, like AlternateResponse, Precedence or NotSuccession can be mapped in an analogous manner. Subsequently, the result set can be post-processed with existing pruning methods for declarative process mining, like pruning based on constraint hierarchies. Furthermore, resource assignment contraints can be discovered by means of SQL queries. Resource assignment in business processes is extensively discussed by the workflow resource patterns. These patterns capture the various ways in which resources are represented and utilised in processes. As an example, we explain the SQL query to extract role-based resource assignment constraints, i.e., that a certain activity can only be executed by resources assigned to a certain role. This constraint type is captured by the RoleBasedAllocation(A,R) template in the DPIL language and consists of parameters for activities A and roles R. Here, we assume organisational information to be available in a relational table Relations. Without loss of generality Relations has the attributes Resource, RelationType and Group. The FROM, WHERE and GROUP BY clauses of the query are then given as follows:

FROM Log l1, Relations r1, [ActivityCombinations] c
WHERE l.Activity = c.A AND r1.RelationType = 'role' 
AND l.Resource = r1.Resource AND
NOT EXISTS(SELECT *  FROM Log l2, Relation r2
           WHERE l2.Resource = r2.Resource AND r2.RelationType = 'role' 
           AND l2.Activity = l1.Activity AND NOT r2.Group = r1.Group)
GROUP BY c.A, r1.Group

Material for CAiSE Paper:

>>> Download Business Trip example event log

Declare Control-Flow Constraint Download SQL Query
Response Response.sql
Alternate Response AlternateResponse.sql
Chain Response ChainResponse.sql
Precedence Precedence.sql
AlternatePrecedence AlternatePrecedence.sql
ChainPrecedence ChainPrecedence.sql
Responded Existence RespondedExistence.sql
NotCoExistence NotCoExistence.sql
NotSuccession NotSuccession.sql
Resource Assignment Constraint Download SQL Query
Direct Allocation DirectAllocation.sql
Role-Based Allocation RoleBasedAllocation.sql
Binding of Duties BindingOfDuties.sql
Separation of Duties SeparationOfDuties.sql
Cross-Perspective Constraint Download SQL Query
Role-Based Response RoleBasedResponse.sql
Role-Based Precedence RoleBasedPrecedence.sql

Screencast