Efficient computation of substring equivalence classes with suffix arrays

Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

研究成果: Conference contribution

14 被引用数 (Scopus)

抄録

This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.

本文言語English
ホスト出版物のタイトルCombinatorial Pattern Matching - 18th Annual Symposium, CPM 2007, Proceedings
出版社Springer Verlag
ページ340-351
ページ数12
ISBN(印刷版)9783540734369
DOI
出版ステータスPublished - 2007
イベント18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007 - London, ON, Canada
継続期間: 2007 7 92007 7 11

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
4580 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Other

Other18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007
CountryCanada
CityLondon, ON
Period07/7/907/7/11

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

フィンガープリント 「Efficient computation of substring equivalence classes with suffix arrays」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル