TY - GEN

T1 - A boyer-moore type algorithm for compressed pattern matching

AU - Shibata, Yusuke

AU - Matsumoto, Tetsuya

AU - Takeda, Masayuki

AU - Shinohara, Ayumi

AU - Arikawa, Setsuo

PY - 2000

Y1 - 2000

N2 - We apply the Boyer-Moore technique to compressed pat-tern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compres-sion methods. For a subclass of collage systems that contain no trun-cation, our new algorithm runs in O(‖D‖ + n m + m2 + r) time using O(‖D‖ + m2) space, where ‖D‖ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D) (‖D‖+n)+n m+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1:2 ~ 3:0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.

AB - We apply the Boyer-Moore technique to compressed pat-tern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compres-sion methods. For a subclass of collage systems that contain no trun-cation, our new algorithm runs in O(‖D‖ + n m + m2 + r) time using O(‖D‖ + m2) space, where ‖D‖ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D) (‖D‖+n)+n m+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1:2 ~ 3:0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.

UR - http://www.scopus.com/inward/record.url?scp=84937407089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937407089&partnerID=8YFLogxK

U2 - 10.1007/3-540-45123-4_17

DO - 10.1007/3-540-45123-4_17

M3 - Conference contribution

AN - SCOPUS:84937407089

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 181

EP - 194

BT - Combinatorial Pattern Matching - 11th Annual Symposium, CPM 2000, Proceedings

A2 - Giancarlo, Raffaele

A2 - Sankoff, David

PB - Springer Verlag

T2 - 11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000

Y2 - 21 June 2000 through 23 June 2000

ER -