A spectral clustering approach to optimally combining numericalvectors with a modular network

Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

76 Citations (Scopus)

Abstract

We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie and semi-supervised clustering by Basu et al., but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source.We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network.

Original languageEnglish
Title of host publicationKDD-2007
Subtitle of host publicationProceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages647-656
Number of pages10
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States
Duration: 2007 Aug 122007 Aug 15

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

ConferenceKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CitySan Jose, CA
Period07/8/1207/8/15

Keywords

  • Eigenvalue problem
  • Heterogeneous data sources
  • K-means
  • Network modularity
  • Spectral clustering

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'A spectral clustering approach to optimally combining numericalvectors with a modular network'. Together they form a unique fingerprint.

Cite this