BERT (Bidirectional Encoder Representations from Transformers) is a variation of the Transformer Architecture that consists primarily of the Encoder block. Unlike models that read text sequentially (left-to-right), BERT analyzes text bi-directionally (looking at context from both left and right directions).
Context:
- Mechanism: It predicts “hidden words” or masked words in a sentence (e.g., “This is an [MASK] of how LLM can perform”).
- Strength: Because it looks at the sentence from both directions, it is particularly good at capturing nuances and performing tasks like sentiment analysis.
- Difference from GPT: BERT has an encoder but no decoder.
