Large scale machine learning
Learining with large datasets
Stochastic gradient descent
Batch gradient descent:
Repeat{
}
Stochastic gradient descent:
- Randomly shaffle dataset
- Repeat{for{}}
Mini-batch gradient descent
Mini-batch gradient descent: Use examples in each iteration.
= mini-batch size
Stochastic gradient descent convergence
Checking for convergence:
- Batch gradient descent:
- Stochastic gradient descent: Every 1000 iterations (say), plot averaged ove the last 1000 examples processed by algorithm.
For Stochastic gradient descent: Learning rate istypically held constant. Can slowly decrease over time if we want to converge. (E.g. )
Online learning
operate one data once.
Predicte CTR (click through rate)
Map-reduce and data parallelism
divide all work into many parts and calculate them at the same time with different machine.
Map-reduce and summation over the training set:
Many learining algorithms can be expressed as computing sums of functions over the training set.
Multi-core machines: