Field-Programmable Gate Arrays (FPGAs) oer a fairly non-invasive method to specialize custom architectures towards a specic application domain. Recent studies has successfully demonstrated that single-node FPGAs can be a rival to both CPUs and GPUs in performance. Unfortunately, most existing studies limit themselves to using a single FPGA devices, and their scalability requires more investigation. In this work, we practically demonstrate how to scale the important n-body problem across a comparatively large FPGA cluster. Our design – composed of up to 256 processing elements – achieves near-linear strong scaling, with performance-levels comparable to that of custom Application-Specic Integrated Circuits (ASICs). We further develop an analytical performance model, which we use to predict the performance of our solution onto future upcoming Intel Agilex systems. Today, our system reaches up to 47 Giga-Pairs/second, and using our performance model we predict that we can reach up-to 0.142 Tera-Pairs/second peak performance with next-generation FPGAs.