When Poisson Breaks Down: Overdispersion

Poisson assumes Var(X) = λ—the variance equals the mean. In real sports data, you often see overdispersion: the variance is larger than the mean. This usually happens when:

Outcomes cluster because the rate isn't truly constant (game script)
Opportunities are not independent
The player is genuinely boom-or-bust

Key Insight

If your data is overdispersed, Poisson will underestimate the chance of extreme outcomes—exactly where many prop markets live.

The Variance-to-Mean Ratio (VMR)

A simple diagnostic is the variance-to-mean ratio (VMR):

Variance-to-Mean Ratio

VMR = Sample Variance / Sample Mean

Excel: =VAR.S(range)/AVERAGE(range)

VMR Interpretation Guide

VMR Value	Interpretation	Recommended Model
VMR ≈ 1	Poisson is a reasonable fit	Poisson
VMR > 1.3-1.5	Overdispersed	Negative Binomial (Chapter 9)
VMR < 0.8	Underdispersed (rare in sports)	May indicate dependencies

Calculating VMR in Excel

For game-by-game counts in a range (e.g., A1:A20):

Mean (λ̂):     =AVERAGE(A1:A20)
Sample Var:   =VAR.S(A1:A20)
VMR:          =VAR.S(A1:A20)/AVERAGE(A1:A20)

Note

Use VAR.S (sample variance, divides by n-1) rather than VAR.P (population variance). For betting, you're using historical games as a sample to estimate the process, so VAR.S is correct.

Why Overdispersion Matters for Betting

When Poisson underestimates tail probabilities, you systematically misprice extreme outcomes:

Scenario	Poisson Predicts	Overdispersed Reality
0 events (Under 0.5)	Lower probability	Higher probability
3+ events (Over 2.5)	Lower probability	Higher probability
Middle outcomes (1-2)	Higher probability	Lower probability

Translation: Overdispersed players are more boom-or-bust than Poisson suggests. They have more 0-event games AND more 3+ event games.

Example: Detecting Overdispersion

Player A: Consistent Receiver Last 15 games receiving TDs: 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1

Mean = 0.67 TDs/game
Sample Variance = 0.24
VMR = 0.24 / 0.67 = 0.36

VMR = 0.36 < 1.0 → Actually underdispersed (very consistent). Poisson works well here.

Player B: Boom-or-Bust Back Last 15 games rushing TDs: 0, 0, 3, 0, 2, 0, 0, 0, 3, 1, 0, 2, 0, 0, 3

Mean = 0.93 TDs/game  
Sample Variance = 1.64
VMR = 1.64 / 0.93 = 1.76

VMR = 1.76 > 1.5 → Overdispersed. This player is boom-or-bust. Consider Negative Binomial (Chapter 9).

Warning

Using Poisson for Player B would underestimate both P(0 TDs) and P(3+ TDs), potentially causing you to misprice anytime TD props and tail bets.

Zero-Inflation: A Related Problem

Overdispersion isn't the only common issue with count data in sports. For props with especially low λ (e.g., home runs or anytime touchdowns), you might see even more zeros than Poisson expects—not just from variance, but from structural factors like a player being scripted out of the game entirely.

This is called zero-inflation, and it's addressed by specialized models:

Zero-Inflated Poisson (ZIP)
Hurdle distributions

These models handle zero-inflation by separately modeling the probability of a zero versus positive counts.

A Simple Check for Zero-Inflation

Compare your observed P(X=0) to the Poisson-predicted e^(-λ):

Observed P(0) vs. Predicted e^(-λ)	Interpretation
Within 5-10%	Poisson is fine
10-20% higher observed zeros	Consider ZIP/Hurdle models
20%+ higher observed zeros	Strong zero-inflation present

Decision Tree: Which Model to Use?

                    Calculate VMR
                         │
            ┌────────────┼────────────┐
            │            │            │
       VMR < 0.8    VMR ≈ 1.0    VMR > 1.3
            │            │            │
       Investigate   Poisson is    Consider
       for patterns    fine     Negative Binomial
                         │
                         ↓
              Check for excess zeros
                         │
              ┌──────────┴──────────┐
              │                     │
        Observed P(0)          Observed P(0)
        matches e^(-λ)         >> e^(-λ)
              │                     │
         Use Poisson          Consider ZIP
                              (Chapter 11)

Practical Implications for Betting

When to Stick with Poisson

VMR is close to 1.0
Player has consistent role and usage
Game script doesn't dramatically affect opportunity
No structural reasons for excess zeros

When to Consider Alternatives

VMR > 1.3 → Negative Binomial (Chapter 9)
Excess zeros from structural factors → ZIP/Hurdle (Chapter 11)
Very high λ (> 20) → Normal approximation

Tip

If you're unsure, Poisson is still a reasonable starting point. Just be aware that for boom-or-bust players, you may be underestimating tail probabilities. Demand a larger edge before betting if you suspect overdispersion.

📝 Exercise

Instructions

Practice identifying when Poisson is appropriate and when alternatives are needed.

A player's last 10 games show touchdown counts of: 0, 1, 0, 1, 1, 0, 1, 0, 1, 1. The mean is 0.6 and variance is 0.27. What's the VMR and is Poisson appropriate?

A running back has λ = 0.7 TDs/game. Poisson predicts P(X=0) = e^(-0.7) = 49.7%. You observe he scored 0 TDs in 65% of his games. What does this suggest?

When Poisson underestimates variance (overdispersion), which probabilities are typically underestimated?