Gradient descent
cost function: for example, MSE(Mean Square Error) can be expressed as . To be more generally, . Its gradient can be formulated as
The calculation of gradient has to iterate all samples and sum them together. If the number of samples is very large, the calculation is very time-consuming.
So, to overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step to fit it to the data given.
iterative
learning rate
-
Batch Gradient Descent
iterate all samples at once
-
Stochastic Gradient Descent
one sample at each iterate
-
Mini-Batch Gradient Descent
balance between the Batch gradient descent and stochastic gradient descent. The samples are divided into several batches, and all of the batches comprise one epoch.