TY - GEN
T1 - Efficient data transfer scheme using word-pair-encoding-based compression for large-scale text-data processing
AU - Waidyasooriya, Hasitha Muthumala
AU - Ono, Daisuke
AU - Hariyama, Masanori
AU - Kameyama, Michitaka
N1 - Publisher Copyright:
© 2014 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2015/2/5
Y1 - 2015/2/5
N2 - Large-scale data processing is very common in many fields such as data-mining, genome mapping, etc. To accelerate such processing, Graphic Accelerator Units (GPU) and FPGAs (Feild-Programmable Gate-Array) are used. However, the large data transfer time between the accelerator and the host computer is a huge performance bottleneck. In this paper, we use a word-pair-encoding method to compress the data down to 25% of its original size. The encoded data can be decoded from any position without decoding the whole data file. For some algorithms, the encoded data can be processed without decoding. Using Burrows-Wheeler algorithm based text search, we show that the data amount and transfer time can be reduced by over 70%.
AB - Large-scale data processing is very common in many fields such as data-mining, genome mapping, etc. To accelerate such processing, Graphic Accelerator Units (GPU) and FPGAs (Feild-Programmable Gate-Array) are used. However, the large data transfer time between the accelerator and the host computer is a huge performance bottleneck. In this paper, we use a word-pair-encoding method to compress the data down to 25% of its original size. The encoded data can be decoded from any position without decoding the whole data file. For some algorithms, the encoded data can be processed without decoding. Using Burrows-Wheeler algorithm based text search, we show that the data amount and transfer time can be reduced by over 70%.
KW - Succinct data structures
KW - big data
KW - data compression
UR - http://www.scopus.com/inward/record.url?scp=84937944277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937944277&partnerID=8YFLogxK
U2 - 10.1109/APCCAS.2014.7032862
DO - 10.1109/APCCAS.2014.7032862
M3 - Conference contribution
AN - SCOPUS:84937944277
T3 - IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS
SP - 639
EP - 642
BT - 2014 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2014
Y2 - 17 November 2014 through 20 November 2014
ER -