Generalization of Efficient Implementation of Compression by Substring Enumeration

Shumpei Sakuma, Kazuyuki Narisawa, Ayumi Shinohara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


Compression via Substring Enumeration (CSE) is a lossless universal data compression scheme, introduced by Dube and Beaudoin [1]. CSE compresses a target binary string by enumerating substrings occurred in it, and encodes the numbers of occurrences effectively, by calculating its upper-bound and lower-bound based on the previous numbers. They used a data structure called Compacted Substring Tree (CST) for counting the occurrences. Instead of CST, Kanai et al. [2] proposed an elegant and efficient implementation for CSE by utilizing Burrows-Wheeler Transform (BWT) Matrix and several auxiliary arrays. In this paper, we extend it in two ways, (1) to deal with the explicit phase awareness for byte-oriented source, and (2) to treat multiple characters for a finite alphabet source.

Original languageEnglish
Title of host publicationProceedings - DCC 2016
Subtitle of host publication2016 Data Compression Conference
EditorsMichael W. Marcellin, Ali Bilgin, Joan Serra-Sagrista, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages1
ISBN (Electronic)9781509018536
Publication statusPublished - 2016 Dec 15
Event2016 Data Compression Conference, DCC 2016 - Snowbird, United States
Duration: 2016 Mar 292016 Apr 1

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


Other2016 Data Compression Conference, DCC 2016
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Computer Networks and Communications


Dive into the research topics of 'Generalization of Efficient Implementation of Compression by Substring Enumeration'. Together they form a unique fingerprint.

Cite this