Development of a large-scale web crawler and search engine infrastructure

Susumu Akamine, Yoshikiyo Kato, Daisuke Kawahara, Keiji Shinzato, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

This paper reports the ongoing development of a large-scale Web crawler and search engine infrastructure at National Institute of Information and Communications Technology. This infrastructure has the following characteristics: (1) It collects one billion Japanese Web pages while keeping them up-to-date. (2) It selects 100 million pages from among the collected pages and converts them into a standard data format to store the results of morphological analysis, dependency parsing, and synonym augmentation. (3) The selected set of pages is searchable and accessible to the users. (4) The scalability of the system is achieved by using a large-scale cluster machine for distributed data processing.

本文言語English
ホスト出版物のタイトルProceedings of the 3rd International Universal Communication Symposium, IUCS 2009
ページ126-131
ページ数6
DOI
出版ステータスPublished - 2009 12 1
外部発表はい
イベント3rd International Universal Communication Symposium, IUCS 2009 - Tokyo, Japan
継続期間: 2009 12 32009 12 4

出版物シリーズ

名前ACM International Conference Proceeding Series

Other

Other3rd International Universal Communication Symposium, IUCS 2009
国/地域Japan
CityTokyo
Period09/12/309/12/4

ASJC Scopus subject areas

  • ソフトウェア
  • 人間とコンピュータの相互作用
  • コンピュータ ビジョンおよびパターン認識
  • コンピュータ ネットワークおよび通信

フィンガープリント

「Development of a large-scale web crawler and search engine infrastructure」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル