INFO
Measures how well a language model predicts a sample; lower values indicate better predictions.
How It Works
Perplexity quantifies how “surprised” a language model is by a sequence of tokens.
For a sequence :
- : Probability of token given its preceding context
- : Number of tokens in the sequence
What to Look For
- Lower perplexity = better fluency and grammatical structure
- Sensitive to vocabulary and tokenization
- Depends on the evaluation model used (e.g., GPT, BERT, n-gram)