Back to The Poisson Distribution
Chapter 8

When Poisson Breaks Down

Overdispersion and when to use alternatives

When Poisson Breaks Down: Overdispersion

Poisson assumes Var(X) = λ—the variance equals the mean. In real sports data, you often see overdispersion: the variance is larger than the mean. This usually happens when:

  • Outcomes cluster because the rate isn't truly constant (game script)
  • Opportunities are not independent
  • The player is genuinely boom-or-bust

Key Insight

If your data is overdispersed, Poisson will underestimate the chance of extreme outcomes—exactly where many prop markets live.

The Variance-to-Mean Ratio (VMR)

A simple diagnostic is the variance-to-mean ratio (VMR):

Variance-to-Mean Ratio

VMR = Sample Variance / Sample Mean
Excel: =VAR.S(range)/AVERAGE(range)

VMR Interpretation Guide

VMR ValueInterpretationRecommended Model
VMR ≈ 1Poisson is a reasonable fitPoisson
VMR > 1.3-1.5OverdispersedNegative Binomial (Chapter 9)
VMR < 0.8Underdispersed (rare in sports)May indicate dependencies

Calculating VMR in Excel

For game-by-game counts in a range (e.g., A1:A20):

Mean (λ̂):     =AVERAGE(A1:A20)
Sample Var:   =VAR.S(A1:A20)
VMR:          =VAR.S(A1:A20)/AVERAGE(A1:A20)

Note

Use VAR.S (sample variance, divides by n-1) rather than VAR.P (population variance). For betting, you're using historical games as a sample to estimate the process, so VAR.S is correct.

Why Overdispersion Matters for Betting

When Poisson underestimates tail probabilities, you systematically misprice extreme outcomes:

ScenarioPoisson PredictsOverdispersed Reality
0 events (Under 0.5)Lower probabilityHigher probability
3+ events (Over 2.5)Lower probabilityHigher probability
Middle outcomes (1-2)Higher probabilityLower probability

Translation: Overdispersed players are more boom-or-bust than Poisson suggests. They have more 0-event games AND more 3+ event games.

Example: Detecting Overdispersion

Player A: Consistent Receiver Last 15 games receiving TDs: 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1

Mean = 0.67 TDs/game
Sample Variance = 0.24
VMR = 0.24 / 0.67 = 0.36

VMR = 0.36 < 1.0 → Actually underdispersed (very consistent). Poisson works well here.


Player B: Boom-or-Bust Back Last 15 games rushing TDs: 0, 0, 3, 0, 2, 0, 0, 0, 3, 1, 0, 2, 0, 0, 3

Mean = 0.93 TDs/game  
Sample Variance = 1.64
VMR = 1.64 / 0.93 = 1.76

VMR = 1.76 > 1.5 → Overdispersed. This player is boom-or-bust. Consider Negative Binomial (Chapter 9).

Warning

Using Poisson for Player B would underestimate both P(0 TDs) and P(3+ TDs), potentially causing you to misprice anytime TD props and tail bets.

Zero-Inflation: A Related Problem

Overdispersion isn't the only common issue with count data in sports. For props with especially low λ (e.g., home runs or anytime touchdowns), you might see even more zeros than Poisson expects—not just from variance, but from structural factors like a player being scripted out of the game entirely.

This is called zero-inflation, and it's addressed by specialized models:

  • Zero-Inflated Poisson (ZIP)
  • Hurdle distributions

These models handle zero-inflation by separately modeling the probability of a zero versus positive counts.

A Simple Check for Zero-Inflation

Compare your observed P(X=0) to the Poisson-predicted e^(-λ):

Observed P(0) vs. Predicted e^(-λ)Interpretation
Within 5-10%Poisson is fine
10-20% higher observed zerosConsider ZIP/Hurdle models
20%+ higher observed zerosStrong zero-inflation present

Decision Tree: Which Model to Use?

                    Calculate VMR
                         │
            ┌────────────┼────────────┐
            │            │            │
       VMR < 0.8    VMR ≈ 1.0    VMR > 1.3
            │            │            │
       Investigate   Poisson is    Consider
       for patterns    fine     Negative Binomial
                         │
                         ↓
              Check for excess zeros
                         │
              ┌──────────┴──────────┐
              │                     │
        Observed P(0)          Observed P(0)
        matches e^(-λ)         >> e^(-λ)
              │                     │
         Use Poisson          Consider ZIP
                              (Chapter 11)

Practical Implications for Betting

When to Stick with Poisson

  • VMR is close to 1.0
  • Player has consistent role and usage
  • Game script doesn't dramatically affect opportunity
  • No structural reasons for excess zeros

When to Consider Alternatives

  • VMR > 1.3 → Negative Binomial (Chapter 9)
  • Excess zeros from structural factors → ZIP/Hurdle (Chapter 11)
  • Very high λ (> 20) → Normal approximation

Tip

If you're unsure, Poisson is still a reasonable starting point. Just be aware that for boom-or-bust players, you may be underestimating tail probabilities. Demand a larger edge before betting if you suspect overdispersion.


📝 Exercise

Instructions

Practice identifying when Poisson is appropriate and when alternatives are needed.

A player's last 10 games show touchdown counts of: 0, 1, 0, 1, 1, 0, 1, 0, 1, 1. The mean is 0.6 and variance is 0.27. What's the VMR and is Poisson appropriate?

A running back has λ = 0.7 TDs/game. Poisson predicts P(X=0) = e^(-0.7) = 49.7%. You observe he scored 0 TDs in 65% of his games. What does this suggest?

When Poisson underestimates variance (overdispersion), which probabilities are typically underestimated?