Multiprocessor systems provide us with high performance surpassing sequential computers. When constructing a multiprocessor system, task scheduling is one of the crucial issues affecting the system performance. The paper studies task scheduling for a clustered parallel reduction system of the functional language FL. We construct a shared memory multiprocessor system to realize parallel graph reduction of FL programs. The processing elements PEs in the system are divided into several clusters, in each of which PEs are coupled through a local cache. Redexes with independent data are scheduled to different PEs, and are reduced simultaneously. In this system, the most critical problem is that too many memory accesses may restrict the scalability of the system performance. In order to solve this problem, we take the locality of references into account to keep the contents of a cluster cache available in successive redex evaluation steps. We also pay sufficient attention to the utilization of the PEs while handling the locality of references. As a result, both fewer memory accesses and lower PE idle ratios can be expected. We carry out software simulation to evaluate the system performance under the proposed task scheduling strategy. The simulation results are examined to illustrate the effectiveness of the proposed strategy.