Training CNNs with normalized kernels

Mete Ozay, Takayuki Okatani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Several methods of normalizing convolution kernels have been proposed in the literature to train convolutional neural networks (CNNs), and have shown some success. However, our understanding of these methods has lagged behind their success in application; there are a lot of open questions, such as why a certain type of kernel normalization is effective and what type of normalization should be employed for each (e.g., higher or lower) layer of a CNN. As the first step towards answering these questions, we propose a framework that enables us to use a variety of kernel normalization methods at any layer of a CNN. A naive integration of kernel normalization with a general optimization method, such as SGD, often entails instability while updating parameters. Thus, existing methods employ ad-hoc procedures to empirically assure convergence. In this study, we pose estimation of convolution kernels under normalization constraints as constraint-free optimization on kernel submanifolds that are identified by the employed constraints. Note that naive application of the established optimization methods for matrix manifolds to the aforementioned problems is not feasible because of the hierarchical nature of CNNs. To this end, we propose an algorithm for optimization on kernel manifolds in CNNs by appropriate scaling of the space of kernels based on structure of CNNs and statistics of data. We theoretically prove that the proposed algorithm has assurance of almost sure convergence to a solution at single minimum. Our experimental results show that the proposed method can successfully train popular CNN models using several different types of kernel normalization methods. Moreover, they show that the proposed method improves classification performance of baseline CNNs, and provides state-of-the-art performance for major image classification benchmarks.

Original languageEnglish
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI Press
Pages3884-3891
Number of pages8
ISBN (Electronic)9781577358008
Publication statusPublished - 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: 2018 Feb 22018 Feb 7

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Other

Other32nd AAAI Conference on Artificial Intelligence, AAAI 2018
CountryUnited States
CityNew Orleans
Period18/2/218/2/7

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Training CNNs with normalized kernels'. Together they form a unique fingerprint.

Cite this