TY - JOUR
T1 - Collage system
T2 - A unifying framework for compressed pattern matching
AU - Kida, Takuya
AU - Matsumoto, Tetsuya
AU - Shibata, Yusuke
AU - Takeda, Masayuki
AU - Shinohara, Ayumi
AU - Arikawa, Setsuo
N1 - Funding Information:
∗Corresponding author. E-mail addresses: kida@i.kyushu-u.ac.jp (T. Kida), t-matsu@i.kyushu-u.ac.jp (T. Matsumoto), yusuke@i. kyushu-u.ac.jp (Y. Shibata), takeda@i.kyushu-u.ac.jp (M. Takeda), ayumi@i.kyushu-u.ac.jp (A. Shinohara), arikawa@i.kyushu-u.ac.jp (S. Arikawa). 1Research Fellow of the Japan Society for the Promotion of Science (JSPS). Partly supported by Grant-in-Aid for JSPS research fellows (12000410).
PY - 2003/4/4
Y1 - 2003/4/4
N2 - We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D||+|S|)·height(D)+m2+r) time with O(||D||+m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D||+|S|+m2+r).
AB - We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D||+|S|)·height(D)+m2+r) time with O(||D||+m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D||+|S|+m2+r).
KW - Collage system
KW - Compressed pattern matching
KW - Data compression
KW - String matching
UR - http://www.scopus.com/inward/record.url?scp=0037418753&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037418753&partnerID=8YFLogxK
U2 - 10.1016/S0304-3975(02)00426-7
DO - 10.1016/S0304-3975(02)00426-7
M3 - Article
AN - SCOPUS:0037418753
VL - 298
SP - 253
EP - 272
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
IS - 1
ER -