没有PM4PY的过程挖掘





从过程日志中构建图形非常容易。目前,分析人员可以使用各种专业技术,例如Celonis,Disco,PM4PY,ProM等,以促进工艺研究。在图中找到偏差并从中得出正确的结论要困难得多。



如果某些已经证明自己并且特别感兴趣的专业发展由于某种原因而无法使用,或者在使用图形时希望在计算上有更多的自由,该怎么办?自己编写一个矿工并实现一些使用图形的必要功能有多困难?我们将在实践中使用标准Python库进行此操作,实现计算并在他们的帮助下给出可能使流程所有者感兴趣的详细问题的答案。



我想立即保留一点,即本文中给出的解决方案不是工业实现。这是一些尝试,它借助明显起作用的简单代码开始自行处理日志,因此易于调整。此解决方案不应用于大数据;这需要进行重大改进,例如,使用矢量计算或通过更改收集和汇总有关事件的信息的方法。



在构建图形之前,您需要执行计算。该图的实际计算将与前面提到的相同的矿工。为了执行计算,有必要收集有关事件的知识-图形的顶点及其之间的连接,并将其写下来,例如在参考书中。使用calc计算程序填充引用(github上的代码)。完整的参考将作为参数传递到图形绘制过程(请参见上面链接中的代码)。此过程格式化数据,如下所示:



digraph f {"Permit SUBMITTED by EMPLOYEE (6255)" -> "Permit APPROVED by ADMINISTRATION (4839)" [label=4829 color=black penwidth=4.723857205400346] 
"Permit SUBMITTED by EMPLOYEE (6255)" -> "Permit REJECTED by ADMINISTRATION (83)" [label=83 color=pink2 penwidth=2.9590780923760738] 
"Permit SUBMITTED by EMPLOYEE (6255)" -> "Permit REJECTED by EMPLOYEE (231)" [label=2 color=pink2 penwidth=1.3410299956639813] 
start [color=blue shape=diamond] 
end [color=blue shape=diamond]}


并将其传递给Graphviz图形引擎进行渲染。



让我们开始使用已实现的矿工来构建和检查图形。我们将重复读取和排序数据,计算和绘制图形的过程,如以下示例所示。例如,事件日志摘自BPIC2020竞赛的国际声明。链接到比赛。



我们从日志中读取数据,并按日期和时间对其进行排序。.xes格式先前已转换为.xlsx。



df_full = pd.read_excel('InternationalDeclarations.xlsx')
df_full = df_full[['id-trace','concept:name','time:timestamp']]
df_full.columns = ['case:concept:name', 'concept:name', 'time:timestamp']
df_full['time:timestamp'] = pd.to_datetime(df_full['time:timestamp'])
df_full = df_full.sort_values(['case:concept:name','time:timestamp'], ascending=[True,True])
df_full = df_full.reset_index(drop=True)


让我们计算图。



dict_tuple_full = calc(df_full)


让我们画图。



draw(dict_tuple_full,'InternationalDeclarations_full')


完成这些过程之后,我们将获得流程图:







由于结果图不可读,因此我们对其进行了简化。



有几种方法可以提高可读性或简化图形:



  1. 使用根据顶点或链接的权重进行过滤;
  2. 消除噪音;
  3. 通过名称相似性对事件进行分组。


让我们采用方法3。



让我们创建一个用于合并事件的字典:



_dict = {'Permit SUBMITTED by EMPLOYEE': 'Permit SUBMITTED',
 'Permit APPROVED by ADMINISTRATION': 'Permit APPROVED',
 'Permit APPROVED by BUDGET OWNER': 'Permit APPROVED',
 'Permit APPROVED by PRE_APPROVER': 'Permit APPROVED',
 'Permit APPROVED by SUPERVISOR': 'Permit APPROVED',
 'Permit FINAL_APPROVED by DIRECTOR': 'Permit FINAL_APPROVED',
 'Permit FINAL_APPROVED by SUPERVISOR': 'Permit FINAL_APPROVED',
 'Start trip': 'Start trip',
 'End trip': 'End trip',
 'Permit REJECTED by ADMINISTRATION': 'Permit REJECTED',
 'Permit REJECTED by BUDGET OWNER': 'Permit REJECTED',
 'Permit REJECTED by DIRECTOR': 'Permit REJECTED',
 'Permit REJECTED by EMPLOYEE': 'Permit REJECTED',
 'Permit REJECTED by MISSING': 'Permit REJECTED',
 'Permit REJECTED by PRE_APPROVER': 'Permit REJECTED',
 'Permit REJECTED by SUPERVISOR': 'Permit REJECTED',
 'Declaration SUBMITTED by EMPLOYEE': 'Declaration SUBMITTED',
 'Declaration SAVED by EMPLOYEE': 'Declaration SAVED',
 'Declaration APPROVED by ADMINISTRATION': 'Declaration APPROVED',
 'Declaration APPROVED by BUDGET OWNER': 'Declaration APPROVED',
 'Declaration APPROVED by PRE_APPROVER': 'Declaration APPROVED',
 'Declaration APPROVED by SUPERVISOR': 'Declaration APPROVED',
 'Declaration FINAL_APPROVED by DIRECTOR': 'Declaration FINAL_APPROVED',
 'Declaration FINAL_APPROVED by SUPERVISOR': 'Declaration FINAL_APPROVED',
 'Declaration REJECTED by ADMINISTRATION': 'Declaration REJECTED',
 'Declaration REJECTED by BUDGET OWNER': 'Declaration REJECTED',
 'Declaration REJECTED by DIRECTOR': 'Declaration REJECTED',
 'Declaration REJECTED by EMPLOYEE': 'Declaration REJECTED',
 'Declaration REJECTED by MISSING': 'Declaration REJECTED',
 'Declaration REJECTED by PRE_APPROVER': 'Declaration REJECTED',
 'Declaration REJECTED by SUPERVISOR': 'Declaration REJECTED',
 'Request Payment': 'Request Payment',
 'Payment Handled': 'Payment Handled',
 'Send Reminder': 'Send Reminder'}


让我们对事件进行分组并再次绘制流程图。



df_full_gr = df_full.copy()
df_full_gr['concept:name'] = df_full_gr['concept:name'].map(_dict)
dict_tuple_full_gr = calc(df_full_gr)
draw(dict_tuple_full_gr,'InternationalDeclarations_full_gr'




在通过名称的相似性对事件进行分组之后,图形的可读性得到了提高。让我们尝试寻找问题的答案。链接到问题列表。例如,有多少个声明之前没有预先批准的授权?



为了回答提出的问题,让我们按感兴趣的事件过滤图形,然后再次绘制流程图。



df_full_gr_f = df_full_gr[df_full_gr['concept:name'].isin(['Permit SUBMITTED',
                                                            'Permit APPROVED',
                                                            'Permit FINAL_APPROVED',
                                                            'Declaration FINAL_APPROVED',
                                                            'Declaration APPROVED'])]
df_full_gr_f = df_full_gr_f.reset_index(drop=True)
dict_tuple_full_gr_f = calc(df_full_gr_f)
draw(dict_tuple_full_gr_f,'InternationalDeclarations_full_gr_isin')






借助结果图,我们可以轻松地回答所提出的问题-116和312声明之前没有预先批准的许可证。



您还可以为连接116和312“失败”(按'case:concept:name'进行过滤,参与所需的连接),并确保不会有任何与图表权限相关的事件。



让我们“失败”进行通信116:



df_116 = df_full_gr_f[df_full_gr_f['case:concept:name'].isin(d_case_start2['Declaration FINAL_APPROVED'])]
df_116 = df_116.reset_index(drop=True)
dict_tuple_116 = calc(df_116)
draw(dict_tuple_116,'InternationalDeclarations_full_gr_isin_116')






让我们对连接312“失败”:



df_312 = df_full_gr_f[df_full_gr_f['case:concept:name'].isin(d_case_start2['Declaration APPROVED'])]
df_312 = df_312.reset_index(drop=True)
dict_tuple_312 = calc(df_312)
draw(dict_tuple_312,'InternationalDeclarations_full_gr_isin_312')






由于在接收到的图上没有与权限相关的事件,因此可以确认答案116和312的正确性。



如您所见,编写矿工并实现处理图形的必要功能并不是一件容易的事,Python和Graphviz的内置函数已成功地作为图形引擎来解决。



All Articles