ADAPTIVE SOURCE MODELS FOR DATA COMPRESSION
Abstract (summary)
Noiseless compression of finite sequences can be viewed as a two-step process consisting of (i) source modeling and (ii) coding. Efficient coding techniques are now known. Therefore, the major problem in noiseless data compression is one of building a source model of a given complexity, which is well matched to the sequence to be compressed.
In this dissertation, a generalized dependent source model called the Conditioned Source Model is studied. The concepts of splitting and merging of contexts (i.e. conditioning events) are introduced and some simple and useful properties of the model are derived using well known Information Theory inequalities. The problem of selection of a good set of contexts for the Conditioned Source Model is then addressed and an approach to adaptive source modeling with selective context splitting is proposed. This method is more practical with a smaller alphabet size and so a technique for alphabet reduction with statistics separation is discussed next.
Finally, a new data compression algorithm, called CRAM, is presented. CRAM uses the alphabet reduction technique, builds the source model adaptively with selective context splitting, and encodes a given sequence by means of arithmetic coding. It is shown to be effective for a variety of computer files.