1
1.1 Notations
Notation | Description |
---|---|
x(i)<t> | The t-th element of the i-th training example input sequence |
y(i)<t> | The t-th element of the i-th training example output sequence |
Tx(i) | The length of the i-th training input sequence |
Ty(i) | The length of the i-th training output sequence |
NLP: Natural Language Processing
One-hot representation: A (column) vector with zeros in it except the corresponding bit of the word to the vocabulary. This vector is called a one-hot vector.
UNK: Unknown word, representing words that are not in your vocabulary.
1.2 Recurrent Neural Network
Limitation for RNN: The prediction at a certain time uses information earlier in the sequence but not information later in the sequence.
Forward Propagation
We could compress Waa and Wax into a matrix Wa, and stack a<t-1> over x<t>. Then we could simplify the expression.
So the simpler version of forward propagation is:
[missing figure]
Backward Propagation
Use cross-entropy to define the loss function element-wise, and then the cost function is the sum of the losses calculated by each yhat<t> and y<t>.
1.3 Different Architectures for RNN
- Many-to-many (Tx = Ty)
- Many-to-one: Read through the sequence and output one value
- One-to-many: Read the single input and keep running with just the activations as inputs
- Many-to-many (Tx != Ty): After reading through the sequence, start outputting with only activations as inputs