TY - GEN
T1 - Cooperation of neighboring PEs in clustered architectures
AU - Sato, Yukinori
AU - Suzuki, Ken Ichi
AU - Nakamura, Tadao
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - Clustered architectures which intend to process data within a localized PE are one of the approaches to increase the performance under the difficulties of the wire delay problems. The performance of clustered architectures depends on the amount of parallel execution of instructions and the amount of inter-PE communication to synchronize dependent instructions. In this paper, we propose an arrangement of PEs cooperating with the adjacent PEs by means of adding communication structures between the adjacent PEs in order to relax the inter-PE communication and workload imbalance in an effective manner. We evaluate the proposed configurations and compare them with the existing one so far considered. The results show that the proposed adjacent forwarding network configuration with the instruction steering scheme that concerns both the register fanout and available free register can achieve higher instructions per clock (IPC) with the small number of registers per PE than the other configurations.
AB - Clustered architectures which intend to process data within a localized PE are one of the approaches to increase the performance under the difficulties of the wire delay problems. The performance of clustered architectures depends on the amount of parallel execution of instructions and the amount of inter-PE communication to synchronize dependent instructions. In this paper, we propose an arrangement of PEs cooperating with the adjacent PEs by means of adding communication structures between the adjacent PEs in order to relax the inter-PE communication and workload imbalance in an effective manner. We evaluate the proposed configurations and compare them with the existing one so far considered. The results show that the proposed adjacent forwarding network configuration with the instruction steering scheme that concerns both the register fanout and available free register can achieve higher instructions per clock (IPC) with the small number of registers per PE than the other configurations.
UR - http://www.scopus.com/inward/record.url?scp=33847231652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847231652&partnerID=8YFLogxK
U2 - 10.1109/CAHPC.2005.21
DO - 10.1109/CAHPC.2005.21
M3 - Conference contribution
AN - SCOPUS:33847231652
SN - 076952446X
SN - 9780769524467
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 134
EP - 142
BT - Proceedings - 17th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2005
T2 - 17th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2005
Y2 - 24 October 2005 through 27 October 2005
ER -