This paper proposes an efficient concurrent error detection scheme for hardware implementation of the block cipher AES. The proposed scheme does not require an additional arithmetic unit, but simply divides the round function block into two sub-blocks and uses the sub-blocks alternately for encryption (or decryption) and error detection. The number of clock cycles is doubled, but the maximum operating frequency is increased owing to the shortened critical path of the sub-block. Therefore, the proposed scheme has a limited impact on hardware performance with respect to size and speed. AES hardware with the proposed scheme was designed and synthesized using a 90-nm CMOS standard cell library with size and speed optimization options. The compact and high-speed implementations achieved performances of 2.21 Gbps @ 16.1 Kgates and 3.21 Gbps @ 24.1 Kgates, respectively. In contrast, the performances of AES hardware without error detection were 1.66 Gbps @ 12.9 Kgates for the compact version and 4.22 Gbps @ 30.7 Kgates for the high-speed version. There is only a slight difference between the performances with and without error detection. The performance overhead caused by the error detection is evaluated at the optimal balance between size and speed and was estimated to be 14.5% at maximum. Conversely, the AES hardware with the proposed scheme had better performance in some cases. If pipeline operation is allowed, as in the CTR mode, throughputs can easily be boosted by further dividing the sub-blocks. Although the proposed error detection scheme was applied to AES in the present study, it can also be applied to other algorithms efficiently.