Efficient data transfer scheme using word-pair-encoding-based compression for large-scale text-data processing

Hasitha Muthumala Waidyasooriya, Daisuke Ono, Masanori Hariyama, Michitaka Kameyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Large-scale data processing is very common in many fields such as data-mining, genome mapping, etc. To accelerate such processing, Graphic Accelerator Units (GPU) and FPGAs (Feild-Programmable Gate-Array) are used. However, the large data transfer time between the accelerator and the host computer is a huge performance bottleneck. In this paper, we use a word-pair-encoding method to compress the data down to 25% of its original size. The encoded data can be decoded from any position without decoding the whole data file. For some algorithms, the encoded data can be processed without decoding. Using Burrows-Wheeler algorithm based text search, we show that the data amount and transfer time can be reduced by over 70%.

Original languageEnglish
Title of host publication2014 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages639-642
Number of pages4
EditionFebruary
ISBN (Electronic)9781479952304
DOIs
Publication statusPublished - 2015 Feb 5
Event2014 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2014 - Ishigaki Island, Okinawa, Japan
Duration: 2014 Nov 172014 Nov 20

Publication series

NameIEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS
NumberFebruary
Volume2015-February

Other

Other2014 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2014
CountryJapan
CityIshigaki Island, Okinawa
Period14/11/1714/11/20

Keywords

  • Succinct data structures
  • big data
  • data compression

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Efficient data transfer scheme using word-pair-encoding-based compression for large-scale text-data processing'. Together they form a unique fingerprint.

Cite this