Micro-clustering by data polishing

Takeaki Uno, Hiroki Maegawa, Takanobu Nakahara, Yukinobu Hamuro, Ryo Yoshinaka, Makoto Tatsuta

研究成果: Conference contribution

6 被引用数 (Scopus)


We address the problem of un-supervised soft-clustering that we call micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, whereas standard clustering methods find boundaries at which records are few. The existing methods have several weak points; generation of intractable amounts of clusters, biased size distributions, lack of robustness, etc. We propose a new methodology data polishing. Data polishing clarifies the cluster structures in the data by perturbating the data according to feasible hypothesis. More precisely, for graph clustering problems, data polishing replaces dense subgraphs that would correspond to clusters by cliques, and deletes edges not included in any dense subgraph. The clusters are clarified as maximal cliques, thus are easy to find, and the number of maximal cliques is reduced to tractable numbers. We also propose an efficient algorithm so that the computation is done in few minutes even for large scale data. The computational experiments demonstrate the efficiency of our formulation and algorithm, i.e., the number of solutions is small, such as 1,000, the members of each group are deeply related, and the computation time is short.

ホスト出版物のタイトルProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
編集者Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
出版社Institute of Electrical and Electronics Engineers Inc.
出版ステータスPublished - 2017 7 1
イベント5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
継続期間: 2017 12 112017 12 14


名前Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017


Other5th IEEE International Conference on Big Data, Big Data 2017
国/地域United States

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • ハードウェアとアーキテクチャ
  • 情報システム
  • 情報システムおよび情報管理
  • 制御と最適化


「Micro-clustering by data polishing」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。