TY - GEN
T1 - Performance tuning and analysis of future vector processors based on the roofline model
AU - Sato, Yoshiei
AU - Nagaoka, Ryuichi
AU - Musa, Akihiro
AU - Egawa, Ryusuke
AU - Takizawa, Hiroyuki
AU - Okabe, Koki
AU - Kobayashi, Hiroaki
PY - 2009/12/1
Y1 - 2009/12/1
N2 - Because of a recent steep drop in the ratio of memory bandwidth to computational performance (B/F) of vector processors, their advantage against scalar ones regarding relatively high sustained performance is decaying. To cover the insufficient B/F rate, an on-chip vector cache mechanism is promising for the vector processors. Although the effectiveness of the vector cache has been evaluated, cache-conscious tuning of vector codes and the analysis of the obtained performance have not been discussed yet. Under this situation, the purpose of this paper is to establish a strategy for performance tuning of a vector processor with a cache to exploit its potential. To analyze its sustained performance, this paper uses the roofline model. Several optimization techniques are applied to real scientific and engineering applications, and their effects are assessed with the model. We confirm that the model can guide users to effective tuning so as to maximize its gain. We also discuss the energy efficiency of the on-chip vector cache.
AB - Because of a recent steep drop in the ratio of memory bandwidth to computational performance (B/F) of vector processors, their advantage against scalar ones regarding relatively high sustained performance is decaying. To cover the insufficient B/F rate, an on-chip vector cache mechanism is promising for the vector processors. Although the effectiveness of the vector cache has been evaluated, cache-conscious tuning of vector codes and the analysis of the obtained performance have not been discussed yet. Under this situation, the purpose of this paper is to establish a strategy for performance tuning of a vector processor with a cache to exploit its potential. To analyze its sustained performance, this paper uses the roofline model. Several optimization techniques are applied to real scientific and engineering applications, and their effects are assessed with the model. We confirm that the model can guide users to effective tuning so as to maximize its gain. We also discuss the energy efficiency of the on-chip vector cache.
KW - Energy consumption
KW - Memory system
KW - Performance characterization
KW - Performance model
KW - Performance optimization
KW - Scientific application
KW - Vector cache
KW - Vector processing
UR - http://www.scopus.com/inward/record.url?scp=74549138334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74549138334&partnerID=8YFLogxK
U2 - 10.1145/1621960.1621962
DO - 10.1145/1621960.1621962
M3 - Conference contribution
AN - SCOPUS:74549138334
SN - 9781605588308
T3 - ACM International Conference Proceeding Series
SP - 7
EP - 14
BT - Proceedings of the 10th MEDEA Workshop on MEmory Performance
T2 - 10th MEDEA Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '09, held in conjunction with the Int. Conf. on Parallel Architectures and Compilation Techniques, PACT 2009
Y2 - 13 September 2009 through 13 September 2009
ER -