Itemset mining is one of the most essential tasks in the field of data mining. In this paper, we focus on minimum length as a mining measure for closed itemset mining. That is, our task is formalized as follows: Given a database and user-specified minimum length threshold, find all closed itemsets whose length is at least the minimum length. Closed itemset mining based on the minimum length threshold is preferable when it is difficult for users to determine the appropriate minimum support value. For our task, we propose TripleEye: an efficient algorithm of closed itemset mining that is based on the intersection of transactions in a database. Our algorithm utilizes the information of inclusion relations between itemsets to avoid the generation of duplicate itemsets and reduce the computational cost of intersection. During the mining procedure, the information of inclusion relations is maintained in a novel tree structure called Ordered Inclusion Tree. Experiments show that our algorithm dramatically reduces the computational cost, compared against naive intersection-based algorithm. Our algorithm also achieves up to twice the running speed of conventional algorithms given dense databases.
ASJC Scopus subject areas
- Computer Science(all)