Enhanced Word-Based Block-Sorting Text Compression

Isal, R.Y.K., Moffat, A. and Ngai, A.C.H.

    The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing several further recency rank transformations, and considering also the role of the entropy coder. By combining the best of the new recency transformations with an entropy coder that conditions ranks upon gross characteristics of previous ones, we are able to obtain improved compression on typical text files.
Cite as: Isal, R.Y.K., Moffat, A. and Ngai, A.C.H. (2002). Enhanced Word-Based Block-Sorting Text Compression. In Proc. Twenty-Fifth Australasian Computer Science Conference (ACSC2002), Melbourne, Australia. CRPIT, 4. Oudshoorn, M. J., Ed. ACS. 129-137.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS