Transformer Primer : Jay Allamar
Contents
What did I learn from transformer primer from Jay Allamar
- The basic idea of transformer is that it contains multi-headed attention and positional encoding
- If one pops open an encoding layer in the encoder, it contains the following parts
- Multi-headed attention layers
- Layer Normalization
- Residual Connections
- Fully Connected layers
- If one pops open a decoding layer in the decoder, it contains the following parts:
- Multi-headed attention
- Encoder-Decoder attention
- Residual Connection
- Layer Normalization
- Fully Connected Layers
- \( softmax \times \frac{Q K^T}{ \sqrt{d+k}} V = Z\) gives the self attention in matrix form
- Positional encoding using sines and cosines
- Beam Search - I had learnt long ago in Andrew Mg’s course. Relearnt from the nice explanation in this blog post
- Visualization of positional encoding using sines and cosines
- Need to explore the PyTorch implementation of Transformers
- Visuals that captures encoding and decoding phase