Web spam detection by exploring densely connected subgraphs

Yutaka I. Leon-Suematsu, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we present a Web spam detection algorithm that relies on link analysis. The method consists of three steps: (1) decomposition of webgraphs in densely connected subgraphs and calculation of the features for each subgraph; (2) use of SVM classifiers to identify subgraphs composed of Web spam; and (3) propagation of predictions over webgraphs by a biased PageRank algorithm to expand the scope of identification. We performed experiments on a public benchmark. An empirical study of the core structure of webgraphs suggests that highly ranked non-spam hosts can be identified by viewing the coreness of the webgraph elements.

Original languageEnglish
Title of host publicationProceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
Pages124-129
Number of pages6
DOIs
Publication statusPublished - 2011 Nov 7
Event2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 - Lyon, France
Duration: 2011 Aug 222011 Aug 27

Publication series

NameProceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
Volume1

Other

Other2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
CountryFrance
CityLyon
Period11/8/2211/8/27

Keywords

  • Biased pagerank
  • Dense subgraphs
  • Web spam

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Web spam detection by exploring densely connected subgraphs'. Together they form a unique fingerprint.

Cite this