Introduction
Batch Normalization is a simple yet extremely effective technique that makes learning with neural networks faster and more stable. Despite the common adoption, theoretical justification of BatchNorm has been vague and shaky. The belief propagating through the ML community is that BatchNorm improves optimization by reducing internal covariate shift (ICS). As we shall see, ICS has little to no effect on optimization. This blog post looks at the explanations of why BatchNorm works, mainly agreeing with the conclusions from: How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) [1]. This works joins the effort of making reproducibility and open-source a commonplace in ML by reproducing the results from [1] live in your browser (thanks to TensorFlow.js ). To see the results you will have to train the models from scratch, made as easy as clicking a button. Initialization of parameters is random, therefore you will see completely different results every time you train the models. Source code to the models presented here can be found in this GitHub repo.
Continue reading “Why Does Batch Normalization Work?”