So far we have proposed the systolic computational-memory (SCM) architecture for high-performance and scalable computation based on the finite difference methods. Although the SCM architecture has a completely parallel array structure, a lot of semiconductor devices are required to build a larger SCM array in the real world, which prefers a globally asynchronous and locally synchronous (GALS) design with different clock domains for system extensibility. This paper presents the local-and-global stall mechanism (LGSM) for an SCM array implemented over multiple FPGAs to guarantee the data-synchronization among FPGAs operating at different clocks. Prototype implementation with ALTERA Stratix III FPGAs shows that the proposed design does not give a big overhead to operating frequency and hardware resource utilization. We also evaluate the scalability of the SCM array over multiple FPGAs considering actual stall cycles.