Skip to main content

Approximating a Collection of Frequent Sets

Foto Afrati
National Technical University of Athens, Greece

One of most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle in the applicability of frequent set mining, often mentioned by data miners, is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much to alleviate the problem. We address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a set collection by k sets, is defined to be the size of the collection covered by the span of the k sets. Depending on the input representation, we examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide polynomial-time approximation algorithms. We also give empirical evidence showing that the approximation methods work well in practice. This is joint work with Aris Gionis and Heikki Mannila (University of Helsinki)