Content area

Abstract

Compression of data is a useful way to economize on computing resources. In machine learning from examples, input data is in the form of a decision table. Generally, a decision table consists of a set of examples, a set of attributes, a decision variable, a set of values, and a function to return a value for each pair of an example and an attribute or the decision. Naturally, large decision tables make learning difficult. This paper presents a methodology to compress a decision table using a triple of partitions on the sets of examples, attributes, and attribute values, respectively.

The compression is accomplished by grouping together examples, attributes, and attribute values into blocks such that blocks of examples and blocks of attributes are transformed into blocks of attribute values the same way their members are transformed in the original decision table. Thus the original decision table is compressed into a smaller decision table while preserving the original structure.

Theorems on the underlying algebraic structures of the partition triples and of a special type of partition triple, called MMm triple, are presented. An algorithm to identify all MMm triples of the decision table is developed. In general, a large number of MMm triples are found by the algorithm. Heuristics for finding some good MMm triples have been discussed. Quality of the rules induced from the compressed decision table is analyzed. Experiments are conducted on a real-life decision table from breast cancer domain and rules induced from the compressed decision tables are found to be simpler than the rules induced from the original decision table.

Details

Title
Compression of input data in machine learning from examples
Author
Than, Soe
Year
1994
Publisher
ProQuest Dissertations Publishing
ISBN
979-8-208-80716-3
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
304140404
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.