Quantum Annealing (QA) is a classical probabilistic algorithm that provides a heuristic to find the globally optimal solution for a combinatorial optimization problem by using quantum tunneling processes. Quantum annealing simulation can be implemented on Field Programmable Gate Arrays (FPGAs) using Quantum Monte Carlo (QMC) simulation in the transverse Ising model. Since input data of the QMC simulation increases exponentially with the problem size, we have to use the DRAM of an FPGA board to store these data. However, storing data in DRAM causes two problems. One is the limited data access bandwidth, and the other is the limitation of DRAM capacity. We propose a data-transfer-bottleneck-less FPGA-based accelerator for quantum annealing simulation and apply it to implement number partitioning problem, which is one of the combinatorial optimization problems. The critical idea of our architecture is not storing but computing the large data in FPGA kernels and eliminating the burden on data transfer. We implement the proposed architecture on Stratix 10 FPGA and achieve up to 39.6 times speed-up compared to CPU-based quantum annealing simulation. We also achieve up to 2.8 times speed-up and implement 262,144 spins, which is 64 times increase compared to the most recent FPGA implementation.