Characterization of topic-based online communities by combining network data and user generated content

Mirai Igarashi, Nobuhiko Terui

Research output: Contribution to journalArticlepeer-review

Abstract

This study proposes a model for characterizing online communities by combining two types of data: network data and user-generated-content (UGC). The existing models for detecting the community structure of a network employ only network information. However, not all people connected in a network share the same interests. For instance, even if students belong to the same community of “school,” they may have various hobbies such as music, books, or sports. Hence, it is more realistic and beneficial for companies to identify communities according to their interests uncovered by their communications on social media. In addition, people may belong to multiple communities such as family, work, and online friends. Our model explores multiple overlapping communities according to their topics identified using two types of data jointly. By way of validating the main features of the proposed model, our simulation study shows that the model correctly identifies the community structure that could not be found without considering both network data and UGC. Furthermore, an empirical analysis using Twitter data clarifies that our model can find realistic and meaningful community structures from large online networks and has a good predictive performance.

Original languageEnglish
Pages (from-to)1309-1324
Number of pages16
JournalStatistics and Computing
Volume30
Issue number5
DOIs
Publication statusPublished - 2020 Sep 1

Keywords

  • Bayesian inference
  • Community detection
  • Network analysis
  • Text analysis
  • Topic modeling

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Characterization of topic-based online communities by combining network data and user generated content'. Together they form a unique fingerprint.

Cite this