Google researchers debut EfficientNets for CNN model scaling without the tedium

Google researchers debut EfficientNets for CNN model scaling without the tedium

Google researchers have open sourced EfficientNets , a method for scaling up CNN models that they claim is up to 10 times more efficient than current “state-of-the-art” techniques.

The method is detailed in a paper which is being presented at next month’s International Conference on Machine Learning, and promises to remove at least some of the “tedious manual tuning” conventional methods require.

According to Mingxing Tan, Staff Software Engineer and Quoc V. Le, Principal Scientist, at Google AI, the researchers set out to find a way to scale up a CNN more accurately and efficiently than conventional practice which is to “arbitrarily increase the CNN depth or width, or to use larger input image resolution for training and evaluation.”

“While these methods do improve accuracy, they usually require tedious manual tuning, and still often yield suboptimal performance,” the team points out.

Their alternative was to use “a simple yet highly effective compound coefficient to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients.”

The approach relies on performing a grid search to find the relationships between “different scaling dimensions of the baseline network under a fixed resource constraint” which gives an appropriate scaling coefficient for each of the dimensions.

“We then apply those coefficients to scale up the baseline network to the desired target model size or computational budget.” This gave improvements in accuracy and efficiency for scaling existing models, for example by 1.4 per cent for MobiledNet and 0.7 per cent for ResNet.

“The effectiveness of model scaling also relies heavily on the baseline network. So, to further improve performance, we have also developed a new baseline network by performing a neural architecture search using the AutoML MNAS framework, which optimizes both accuracy and efficiency (FLOPS),” they continued.

“The resulting architecture uses mobile inverted bottleneck convolution (MBConv), similar to MobileNetV2 and MnasNet, but is slightly larger due to an increased FLOP budget. We then scale up the baseline network to obtain a family of models, called EfficientNets.”

The team compared the EfficientNets with other CNNs on ImageNet dataset. “In general, the EfficientNet models achieve both higher accuracy and better efficiency over existing CNNs, reducing parameter size and FLOPS by an order of magnitude.”

The team tried the EfficientNets on eight other datasets, and said they achieved “state-of-the-art” accuracy on five, “with an order of magnitude fewer parameters (up to 21x parameter reduction), suggesting that our EfficientNets also transfer well.”

They concluded that the EfficientNets provided “significant improvements” to model efficiency, and could “potentially serve as a new foundation for future computer vision tasks.“

To put their models where their mouths are, the team have  open-sourced all the EfficientNet models. The EfficientNet source code and TPU training scripts can be found here.