Discovering best variable-length-don’t-care patterns

Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗}), where Σ is an alphabet and ∗ is a wildcard matching any string in Σ. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages86-97
Number of pages12
Volume2534
ISBN (Print)3540001883, 9783540001881
Publication statusPublished - 2002
Externally publishedYes
Event5th International Conference on Discovery Science, DS 2002 - Lubeck, Germany
Duration: 2002 Nov 242002 Nov 26

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2534
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other5th International Conference on Discovery Science, DS 2002
CountryGermany
CityLubeck
Period02/11/2402/11/26

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Discovering best variable-length-don’t-care patterns'. Together they form a unique fingerprint.

Cite this