The goal of pattern mining is to discover patterns in data. Many techniques have been proposed for this task, differing in the type of patterns they find. To ensure that only patterns of interest are found, a common approach is to impose constraints on the patterns. Constraint-based mining systems exist in which multiple constraints can be specified. However, combining constraints in new ways or adding complex constraints requires changing the underlying algorithms. A truly general approach to constraint-based pattern mining has been missing.
In this thesis we propose a general, declarative approach to pattern mining based on constraint programming. In a declarative approach one specifies what patterns need to be found, instead of algorithmically specifying how they must be found. Constraint programming offers a methodology in which a problem is stated in terms of constraints and a generic solver finds the solutions.
A first contribution of this thesis is that we show how constraint programming can be used to solve constraint-based, closed and discriminative itemset mining problems as well as combinations thereof. A second contribution is that we demonstrate how the difference in performance between general constraint solvers and specialised mining algorithms can be reduced. A third contribution is the introduction of the k-pattern set mining problem, which involves finding a set of k patterns that together satisfy constraints. We propose a high-level declarative language for k-pattern set mining as well as a transformation of this language to constraint programming. Finally we apply our declarative pattern mining framework on a challenging problem in bioinformatics, namely cis- regulatory module detection. For this application, the ability to add domain- specific constraints and to combine them with existing constraints is essential.
Hence we investigate, for the first time, how constraint programming can be used in pattern mining. We conclude on this promising approach with several remaining challenges.