An on-chip cache design for vector processors

Akihiro Musa, Yoshiei Sato, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

研究成果: Conference article

7 引用 (Scopus)

抜粋

This paper discusses the potential of an on-chip cache memory for modern vector supercomputers. The vector supercomputers can achieve the high computational efficiency for compute-intensive scientific applications. The most important factor affecting the computational performance is high memory bandwidth to provide a sufficient amount of data to the rich arithmetic units in time; the modern vector supercomputers such as NEC SX-7 and SX-8 have 4 bytes per flop (4B/FLOP) on the ratio of memory bandwidth to floating-point operations. However, the gap in performance between memory and processors has become remarkably exposed year by year in high performance computing. Therefore, it is getting harder to keep the 4B/FLOP memory bandwidth in design of future vector supercomputers. As a promising solution to cover a lack of the memory bandwidths of vector load/store units of the future vector supercomputers, we design an on-chip vector cache for the NEC SX vector processor architecture. This paper evaluates the performance of the on-chip cache memory system on the SX-7 system with 2B/FLOP or lower memory bandwidth across two kernel loops and five leading scientific applications. The results of the kernel loops demonstrate that a 2B/FLOP memory system with the on-chip cache whose hit ratio is 50% can achieve a performance comparable to that of a 4B/FLOP system without the cache. The results of the four applications indicate that the on-chip cache can improve sustained performance of the four applications by 20% to 98%. The experimental results regarding the last one show a conflicting effect of loop unrolling with vector caching, resulting in a poor hit rate. However, when loop-unrolling is disabled, its cache hit rate is improved, and the sustained performance comparable to that of the 4B/FLOP memory bandwidth without the loop-unrolling is obtained. In addition, selective caching, in which only a part of data with the high locality of reference are cached, is also effective for efficient use of the limited cache capacity.

元の言語English
ページ(範囲)17-23
ページ数7
ジャーナルParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
DOI
出版物ステータスPublished - 2007 12 1
イベント8th MEDEA Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '07, Held in Conjunction with the PACT 2007 Conference - Brasov, Romania
継続期間: 2007 9 162007 9 16

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

フィンガープリント An on-chip cache design for vector processors' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用