Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

Parallel and Distributed Pattern Mining

OAI: oai:igi-global.com:251898 • DOI: 10.4018/IJRSDA.2019070101

Published by: IGI Global

Abstract

The treatment of large data is difficult and it looks like the arrival of the framework MapReduce is a solution of this problem. This framework can be used to analyze and process vast amounts of data. This happens by distributing the computational work across a cluster of virtual servers running in a cloud or a large set of machines. Process mining provides an important bridge between data mining and business process analysis. Its techniques allow for extracting information from event logs. Generally, there are two steps in process mining, correlation definition or discovery and the inference or composition. First of all, their work mines small patterns from log traces. Those patterns are the representation of the traces execution from a log file of a business process. In this step, the authors use existing techniques. The patterns are represented by finite state automaton or their regular expression; and the final model is the combination of only two types of different patterns whom are represented by the regular expressions (ab)* and (ab*c)*. Second, they compute these patterns in parallel, and then combine those small patterns using the Hadoop framework. They have two steps; the first is the Map Step through which they mine patterns from execution traces, and the second one is the combination of these small patterns as a reduce step. The results show that their approach is scalable, general and precise. It minimizes the execution time by the use of the Hadoop framework.

IT Research & Theory Computer Science and Information Technology Data Mining and Databases