When Poisson Breaks Down: Overdispersion
Poisson assumes Var(X) = λ—the variance equals the mean. In real sports data, you often see overdispersion: the variance is larger than the mean. This usually happens when:
- Outcomes cluster because the rate isn't truly constant (game script)
- Opportunities are not independent
- The player is genuinely boom-or-bust
Key Insight
If your data is overdispersed, Poisson will underestimate the chance of extreme outcomes—exactly where many prop markets live.
The Variance-to-Mean Ratio (VMR)
A simple diagnostic is the variance-to-mean ratio (VMR):
Variance-to-Mean Ratio
VMR = Sample Variance / Sample Mean=VAR.S(range)/AVERAGE(range)VMR Interpretation Guide
| VMR Value | Interpretation | Recommended Model |
|---|---|---|
| VMR ≈ 1 | Poisson is a reasonable fit | Poisson |
| VMR > 1.3-1.5 | Overdispersed | Negative Binomial (Chapter 9) |
| VMR < 0.8 | Underdispersed (rare in sports) | May indicate dependencies |
Calculating VMR in Excel
For game-by-game counts in a range (e.g., A1:A20):
Mean (λ̂): =AVERAGE(A1:A20)
Sample Var: =VAR.S(A1:A20)
VMR: =VAR.S(A1:A20)/AVERAGE(A1:A20)
Note
Use VAR.S (sample variance, divides by n-1) rather than VAR.P (population variance). For betting, you're using historical games as a sample to estimate the process, so VAR.S is correct.
Why Overdispersion Matters for Betting
When Poisson underestimates tail probabilities, you systematically misprice extreme outcomes:
| Scenario | Poisson Predicts | Overdispersed Reality |
|---|---|---|
| 0 events (Under 0.5) | Lower probability | Higher probability |
| 3+ events (Over 2.5) | Lower probability | Higher probability |
| Middle outcomes (1-2) | Higher probability | Lower probability |
Translation: Overdispersed players are more boom-or-bust than Poisson suggests. They have more 0-event games AND more 3+ event games.
Example: Detecting Overdispersion
Player A: Consistent Receiver Last 15 games receiving TDs: 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1
Mean = 0.67 TDs/game
Sample Variance = 0.24
VMR = 0.24 / 0.67 = 0.36
VMR = 0.36 < 1.0 → Actually underdispersed (very consistent). Poisson works well here.
Player B: Boom-or-Bust Back Last 15 games rushing TDs: 0, 0, 3, 0, 2, 0, 0, 0, 3, 1, 0, 2, 0, 0, 3
Mean = 0.93 TDs/game
Sample Variance = 1.64
VMR = 1.64 / 0.93 = 1.76
VMR = 1.76 > 1.5 → Overdispersed. This player is boom-or-bust. Consider Negative Binomial (Chapter 9).
Warning
Using Poisson for Player B would underestimate both P(0 TDs) and P(3+ TDs), potentially causing you to misprice anytime TD props and tail bets.
Zero-Inflation: A Related Problem
Overdispersion isn't the only common issue with count data in sports. For props with especially low λ (e.g., home runs or anytime touchdowns), you might see even more zeros than Poisson expects—not just from variance, but from structural factors like a player being scripted out of the game entirely.
This is called zero-inflation, and it's addressed by specialized models:
- Zero-Inflated Poisson (ZIP)
- Hurdle distributions
These models handle zero-inflation by separately modeling the probability of a zero versus positive counts.
A Simple Check for Zero-Inflation
Compare your observed P(X=0) to the Poisson-predicted e^(-λ):
| Observed P(0) vs. Predicted e^(-λ) | Interpretation |
|---|---|
| Within 5-10% | Poisson is fine |
| 10-20% higher observed zeros | Consider ZIP/Hurdle models |
| 20%+ higher observed zeros | Strong zero-inflation present |
Decision Tree: Which Model to Use?
Calculate VMR
│
┌────────────┼────────────┐
│ │ │
VMR < 0.8 VMR ≈ 1.0 VMR > 1.3
│ │ │
Investigate Poisson is Consider
for patterns fine Negative Binomial
│
↓
Check for excess zeros
│
┌──────────┴──────────┐
│ │
Observed P(0) Observed P(0)
matches e^(-λ) >> e^(-λ)
│ │
Use Poisson Consider ZIP
(Chapter 11)
Practical Implications for Betting
When to Stick with Poisson
- VMR is close to 1.0
- Player has consistent role and usage
- Game script doesn't dramatically affect opportunity
- No structural reasons for excess zeros
When to Consider Alternatives
- VMR > 1.3 → Negative Binomial (Chapter 9)
- Excess zeros from structural factors → ZIP/Hurdle (Chapter 11)
- Very high λ (> 20) → Normal approximation
Tip
If you're unsure, Poisson is still a reasonable starting point. Just be aware that for boom-or-bust players, you may be underestimating tail probabilities. Demand a larger edge before betting if you suspect overdispersion.
📝 Exercise
Instructions
Practice identifying when Poisson is appropriate and when alternatives are needed.
A player's last 10 games show touchdown counts of: 0, 1, 0, 1, 1, 0, 1, 0, 1, 1. The mean is 0.6 and variance is 0.27. What's the VMR and is Poisson appropriate?
A running back has λ = 0.7 TDs/game. Poisson predicts P(X=0) = e^(-0.7) = 49.7%. You observe he scored 0 TDs in 65% of his games. What does this suggest?
When Poisson underestimates variance (overdispersion), which probabilities are typically underestimated?