The usefulness of taking mammography has widely been recognized, but screening mammography occasionally results in an excessive recommendation for subsequent biopsy causing many women inconvenience and severe anxiety. Especially, there is a high chance of unnecessary biopsy recommendation for those findings which are difficult to be classified into malignancy and benignancy. However, few have focused on the computer-aided diagnosis (CAD) performance for such difficult cases. To address this problem, we developed a deep learning based classification technique to aid the difficult diagnosis. We evaluated 100 benign and malignant masses of the breast imaging-reporting and data system (BI-RADS) Category 4 that are generally difficult to be classified into malignant and benign. Five certificated doctors participated in the experiments where each doctor reads the 100 images alone first and a week later reads again with the proposed CAD system. The area under the receiver operating characteristic curve (AUC-ROC) for the CAD system was 0.79. This is greater than 0.65, the average value of the human readers' AUC-ROCs, while the average value of the human readers' AUC-ROCs reached the best value of 0.8 when they used the CAD system. These results suggest that the proposed CAD system is able to not only outperform human readers in classifying the masses, but also enhance the human performance in this difficult task.