Clustered architectures which intend to process data within a localized PE are one of the approaches to increase the performance under the difficulties of the wire delay problems. The performance of the clustered architecture depends on the implemented instruction steering scheme. Existing steering schemes insert inter-PE communications to achieve load balance among PEs. These insertions delay the executions of the dependent instructions and lead to the degradation of the performance. In this paper, we propose a novel instruction steering scheme, which gives priority to critical dependencies. The way to find out the critical dependencies is by observing the status of the source operands of an instruction. We evaluate the proposed scheme and compare it with the existing ones. The results show that the proposed scheme outperforms the existing schemes in terms of instruction per clock because of reductions of the critical inter-PE communications with superior load balance among the PEs.