Unsupervised spam detection based on string alienness measures

Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

Original languageEnglish
Title of host publicationDiscovery Science - 10th International Conference, DS 2007, Proceedings
PublisherSpringer Verlag
Pages161-172
Number of pages12
ISBN (Print)9783540754879
DOIs
Publication statusPublished - 2007
Event10th International Conference on Discovery Science, DS 2007 - Sendai, Japan
Duration: 2007 Oct 12007 Oct 4

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4755 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Discovery Science, DS 2007
CountryJapan
CitySendai
Period07/10/107/10/4

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Unsupervised spam detection based on string alienness measures'. Together they form a unique fingerprint.

  • Cite this

    Narisawa, K., Bannai, H., Hatano, K., & Takeda, M. (2007). Unsupervised spam detection based on string alienness measures. In Discovery Science - 10th International Conference, DS 2007, Proceedings (pp. 161-172). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4755 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-540-75488-6_16