About
Apps
Posts
Portfolio
Testimonials

Borrowed Geometry

A small frozen slice of Gemma 4 31B beats matched-capacity from-scratch transformers by 217×. The compositional structure was already there.

April 27, 2026
Why Does Batch Normalization Work?

Why Does Batch Normalization Work?

Batch Normalization is often explained as a method for reducing “Internal Covariate Shift,” but growing evidence points instead to its ability to smooth the optimization landscape and make training more robust. Through interactive demos and reproducible experiments, this overview shows how Batch Normalization enhances convergence, remains effective even under artificially introduced covariate shifts, and reduces dependence on initialization—ultimately speeding up and stabilizing neural network training.

January 15, 2019