Unsupervised spam detection based on string alienness measures

Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Masayuki Takeda

研究成果: Conference contribution

13 引用 (Scopus)

抜粋

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

元の言語English
ホスト出版物のタイトルDiscovery Science - 10th International Conference, DS 2007, Proceedings
出版者Springer Verlag
ページ161-172
ページ数12
ISBN(印刷物)9783540754879
DOI
出版物ステータスPublished - 2007
イベント10th International Conference on Discovery Science, DS 2007 - Sendai, Japan
継続期間: 2007 10 12007 10 4

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
4755 LNAI
ISSN(印刷物)0302-9743
ISSN(電子版)1611-3349

Other

Other10th International Conference on Discovery Science, DS 2007
Japan
Sendai
期間07/10/107/10/4

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

フィンガープリント Unsupervised spam detection based on string alienness measures' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Narisawa, K., Bannai, H., Hatano, K., & Takeda, M. (2007). Unsupervised spam detection based on string alienness measures. : Discovery Science - 10th International Conference, DS 2007, Proceedings (pp. 161-172). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 4755 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-540-75488-6_16