INFO

Measures how well a language model predicts a sample; lower values indicate better predictions.

How It Works

Perplexity quantifies how “surprised” a language model is by a sequence of tokens.
For a sequence :

  • : Probability of token given its preceding context
  • : Number of tokens in the sequence

What to Look For

  • Lower perplexity = better fluency and grammatical structure
  • Sensitive to vocabulary and tokenization
  • Depends on the evaluation model used (e.g., GPT, BERT, n-gram)

Application Models