We present a new approach to the problem of grouping similar scene images. The proposed method characterizes both the global feature layout and the local oriented edge responses of an image, and provides a translation-invariant similarity measure to compare scene images. Our method is effective in generating initial clustering results for applications that require extensive local-feature matching on unorganized image collections, such as large-scale 3D reconstruction and scene completion. The advantage of our method is that it can estimate image similarity via integrating global and local information. The experimental evaluations on various image datasets show that our method is able to approximate well the similarities derived from local-feature matching with a lower computational cost.