Visualization of generic utility of sequential patterns
Peer reviewed, Journal article
MetadataShow full item record
Original versionWiktorski, T., Krolak, A., Rosinska, K., Strumillo, P., & Lin, J. C.-W. (2020). Visualization of Generic Utility of Sequential Patterns. IEEE Access, 8, 78004-78014. 10.1109/ACCESS.2020.2989165
Most of the literature on utility pattern mining (UPM) assumes that the particular patterns' utility in known in advance. Concurrently, in frequent pattern mining (FPM) it is assumed that all patterns take the same value. In reality, the information about the utility of patterns is not or hardly available in most cases. Moreover, the utility and frequency of the particular pattern are not directly proportional. An algorithm for estimating a generic pattern utility has been recently proposed, but the numeric results might be difficult to interpret. In particular, in datasets with many independent instances or groups. In this paper, we present an approach to generating utility bitmaps that provide visual representation of the numeric data obtained using generic pattern utility algorithm. We demonstrate validity of this approach on two datasets: PAMAP2 Physical Activity Monitoring Data Set, an open dataset from the UCI Machine Learning Repository, and an ECG dataset collected using Biopac Student Lab during Ruffier's test. For PAMAP2 dataset, utility bitmaps allow for immediate separation of various physical activities. Variation between participants are present, but do not overshadow differences between the activity types. For the ECG dataset, utility bitmaps immediately indicate age and fitness differences between the participants, even thought this information was not available to the algorithm. In both cases, partial similarity in bitmaps can be traced back to partial similarity in activities or participants generating the data. Based on these tests, the approach seems to be promising for exploratory analysis of large collections of long time series and possibly other sequential patterns such as distance series common in sports data analysis and depth series common in petroleum engineering.