Content area

Abstract

This dissertation enhances data mining processes by formalizing them in a logic framework, with the focuses on improving the efficiency of association rule mining and extending the use of association rules to make predictions based on the proposed framework.

Although extensive studies have been done on data mining, most of them concentrate on specific application domains. A logic framework to formally represent important notions and processes in data mining has attracted little attention. SPICE---Symbolic integration of Probability Inference and Concept Extraction, is therefore proposed, in which the logic representations of concepts, patterns, previously unknown and potentially interesting patterns are formalized. Two primary data mining tasks, association rule mining and classification, are formally represented as pattern discovery processes in SPICE.

Based on the SPICE framework, a new special type of patterns, Maximal Potentially Useful (MaxPUF) patterns, is formalized. The MaxPUF patterns lead to a new class of association rules, called MaxPUF rules. These rules are characterized by the minimum antecedents among all the high-confidence rules for the same consequent. At the same time, this minimum antecedent includes the most important factors to imply a consequent with high confidence. Thus, the MaxPUF rules are very interesting and potentially useful to the user. The mining of MaxPUF rules provides a solution to the rule redundancy problem in association rule mining, which occurs when a large number of rules are generated and many of them are uninteresting or unimportant.

A new mining approach called Succinct Worthy Association Rule Mining (SWARM) is proposed to improve mining efficiency. Different from previous mining approaches that only prune the infrequent itemsets, SWARM adopts a new pruning strategy that deletes less important items in the mining process. Because a much smaller number of itemset candidates are generated after the items have been deleted, SWARM is more efficient than previous approaches. In SWARM the MaxPUF rules are used to help identify less important items.

In addition, the possible use of association rules for prediction is studied and a new prediction rule model is proposed. The experimental results show that the discovered prediction rules can be used for prediction with good results.

Overall, this dissertation introduces a logic framework for data mining and develops methodologies based on the proposed framework to enhance data mining.

Details

Title
New data mining models based on formal concept analysis and probability logic
Author
Jiang, Liying
Year
2006
Publisher
ProQuest Dissertations Publishing
ISBN
978-0-542-65436-7
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
305293758
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.