Hello! Today, we will dive into RMSProp (Root Mean Square Propagation). This sophisticated optimization algorithm accelerates convergence by adapting the learning rate for each weight separately, addressing the limitations of previous techniques such as Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, and momentum. Our focus today is understanding RMSProp and coding it from scratch in Python to optimize multivariable functions.
Let's begin with a quick recap: SGD
and Mini-Batch Gradient Descent
can be sensitive to learning rates and may converge slowly. Even momentum
, which mitigates these issues to an extent, has limitations. When a uniform learning rate is applied across all parameters, efficient optimization might not be achieved. This is where RMSProp steps in to offer a solution.
RMSProp, an advanced optimization algorithm, adjusts the gradient descent step for each weight individually, accelerating training and allowing faster convergence. This optimization is achieved by RMSProp keeping track of a running average of the square of gradients and then using this information to scale the learning rate.
For RMSProp
, we add another layer to the update rule of SGD
. This additional layer scales each update with the inverse of the square root of the sum of squares of recent gradients. Here, gradients measure the quantity and direction of change for the weights. The mathematical expression is:
