Gradient descent algorithm in Machine learning
Machine Learning and Deep Learning commonly uses Gradient descent (GD) to find a local minimum/maximum of a given function. Gradient descent minimises a cost/loss function (e.g. in a linear regression). In mathematical terms, Gradient descent is a way to minimize an objective function J(θ) parameterized by a model’s parameters θ ∈ Rd by updating the parameters in the opposite direction of the gradient of the objective function ∇θJ(θ) w.r.t. to the parameters. The learning rate η determines the size of the steps we take to reach a (local) minimum. In other words, we follow the direction of the slope of the surface created by the objective function downhill until we reach a valley. There are three variants of gradient descent, which differ in how much data we use to compute the gradient of the objective function. Choice of variant depends on accuracy and time taken to compute parameters. Batch Gradient descent A batch gradient descent, computes the gradient of the cost function w.r.t. to...