github twitter instagram linkedin email rss
Momentum is Everywhere
Aug 1, 2017
2 minutes read

This my TIL today. I just found out that momentum is already very good at finding good local minima. It can be better than adaptive (and more sophisticated) optimisers if carefully tuned. Adaptive optimisers are designed to be more robust, where we don’t need to use good hyperparameters to start with. However, adaptive optimisers prone to assume they are in a correct direction to local minima, which causes them to adjust the learning rate so that they will converge faster. The optimisers will reduce the learning rate if the minimum is close (so that it won’t overshoot) and vice versa. With that said, they are more likely to converge to sharp local minima.

On the other hand, momentum doesn’t suffer this.

That’s why momentum is used almost everywhere.

Architectures Optimisers
AlexNet [1] Momentum
ZFNet [2] Momentum
VGGNet [3] Momentum
GoogleNet [4] Momentum
ResNet [5] Momentum
R-CNN [6] SGD
Fast R-CNN [7] Momentum
Faster R-CNN [8] Momentum

Reference

[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.

[2] Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

[3] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).

[4] Szegedy, Christian, et al. “Going deeper with convolutions.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[5] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[6] Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.

[7] Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015.

[8] Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.


Back to posts


comments powered by Disqus