The wite paper 'Pragmatic Data Mining' (18 pages, 362K
pdf, August 2006) describes an algorithm to extract the
distinguishing features from sets of rules and group these into
new rules. The first part is an introductory overview with some
test results, the second part a mathematical description with
proofs, and the third lists a 'prototype' implementation in the
functional programming language Haskell. download Note: the code listing in PDM is now outdated.
Key Terms: categorical data analysis, multivariate data analysis, rule based knowledge, data mining, pragmatism
The white paper 'Deriving Heuristic Rulse from Facts' (21
pages, 210K pdf, January 2007) is a successor to 'Pragmatic Data
Mining', but it can be read separately. The first section defines
a fact model and a rule model in terms of partitions of a set.
The second section treats the reduction algorithm and proves it
finds all possible shortest rules. Therefore it produces a normal
form representation of the rule model. The third section shows
how to get the partial order, if there is one, of reduced
antecedents through abduction. Entailment and overlap of
underlying sets may allow further rule simplification. The fourth
section treats additions to a fact model (empirical induction)
and shows that reduction also works when some rules are
ambiguous. The effect of new rules on a reduced normal form is
summarized in two general propositions. Through semantic
extrapolation, soundness and completeness of reduced rules are
defined. All examples use a data table from a classic paper by
J.R. Quinlan, and the reductions were obtained using the computer
program listed in the earlier paper. download
Key Terms: categorical data analysis, multivariate data analysis, data mining, rule based knowledge, machine learning