This paper presents a vehicle detection algorithm using 3-dimensional(3-D) information and its FPGA implementation. For high-speed acquisition of 3-D information, feature-based stereo matching is employed to reduce search area. Our algorithm consists of some tasks with high degree of column-level parallelism. Based on the parallelism, we propose area-efficient VLSI architecture with local data transfer between memory modules and processing elements. Images are equally divided into blocks with some columns, and a block is allocated to a PE. Each PE performs the processing in parallel. The proposed architecture is implemented on FPGA (Altera Stratix EP1S40F1020C7). For specifications of image size 640 x 480, 100 frames/sec, and operating frequency 100MHz, only 11,000 logic elements (< 30%) are required for 30PEs.