Mining Productive Itemsets in Dynamic Databases

Li, Xiang; Li, Jiaxuan; Fournier-Viger, Philippe; Nawaz, M. Saqib; Yao, Jie; Lin, Jerry Chun-Wei

dc.contributor.author	Li, Xiang
dc.contributor.author	Li, Jiaxuan
dc.contributor.author	Fournier-Viger, Philippe
dc.contributor.author	Nawaz, M. Saqib
dc.contributor.author	Yao, Jie
dc.contributor.author	Lin, Jerry Chun-Wei
dc.date.accessioned	2021-03-15T08:27:36Z
dc.date.available	2021-03-15T08:27:36Z
dc.date.created	2021-01-14T13:41:48Z
dc.date.issued	2020
dc.identifier.citation	Li, X., Li, J., Fournier-Viger, P., Nawaz, M. S., Yao, J., & Lin, J. C.-W. (2020). Mining Productive Itemsets in Dynamic Databases. IEEE Access, 8, 140122-140144.	en_US
dc.identifier.issn	2169-3536
dc.identifier.uri	https://hdl.handle.net/11250/2733292
dc.description.abstract	Discovering frequent itemsets is a data analysis task used in numerous domains. It consists of finding sets of items (itemsets) that frequently appear in a set of database records (also called transactions). Though discovering frequent itemsets is useful, it can produce a large amount of spurious patterns. As a result, the user may spend a great amount of time to analyze the itemsets found by a frequent itemset mining algorithm to find truly interesting patterns. Hence, in recent years, a key research topic has emerged which is to discover statistically significant patterns in databases. The most popular model for identifying itemsets that are statistically significant is to discover non-redundant productive itemsets. The state-of-the-art algorithm to extract this set of patterns is OPUS-Miner. A key drawback of that algorithm is that it is designed to be applied to a static database. Moreover, a second drawback of OPUS-Miner is that it discovers all patterns in a database. In other words, the user cannot search for itemsets containing some specific items. This paper addresses these issues by defining the novel problem of discovering targeted non redundant productive itemsets in dynamic databases. An algorithm named IDPI+ (Interactive Discovery of Productive Itemsets) is presented, storing transactions in a tree structure, which can then be interactively queried to identify productive and non redundant itemsets containing specific items. A structure named Query-Tree is also introduced to process many queries at the same time. Moreover, to handle dynamic databases, efficient transaction insertion and deletion algorithms are provided to update the tree. It was observed in an experimental evaluation on benchmark datasets containing various types of data that IDPI+ can handle thousands of queries per second on a desktop computer. Moreover, it was found that IPDI+ is more than an order of magnitude faster than a baseline algorithm.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Mining Productive Itemsets in Dynamic Databases	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	140122 - 140144	en_US
dc.source.volume	8	en_US
dc.source.journal	IEEE Access	en_US
dc.identifier.doi	10.1109/ACCESS.2020.3012817
dc.identifier.cristin	1871369
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Li.pdf
Størrelse:: 1.645Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Import fra CRIStin [3604]
Institutt for datateknologi, elektroteknologi og realfag [1163]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal