This paper introduces a new scheme for automatic image annotation based on cascading multi-level multi-instance classifiers (CMLMI). The proposed scheme employs a hierarchy for visual feature extraction, in which the feature set includes features extracted from the whole image at the coarsest level and from the overlapping sub-regions at finer levels. Multi-instance learning (MIL) is used to learn the "weak classifiers" for these levels in a cascade manner. The underlying idea is that the coarse levels are suitable for background labels such as "forest" and "city", while finer levels bring useful information about foreground objects like "tiger" and "car". The cascade manner allows this scheme to incorporate "important" negative samples during the learning process, hence reducing the "weakly labeling" problem by excluding ambiguous background labels associated with the negative samples. Experiments show that the CMLMI achieve significant improvements over baseline methods as well as existing MIL-based methods.