Back to Selecting the Right Distribution
Chapter 10

Zero-Inflated Poisson (ZIP)

Handling excess zeros in count data

Zero-Inflated Poisson (ZIP) Models

When a player's zeros come from two fundamentally different sources—"no opportunity" and "had opportunity but produced zero"—standard Poisson fails. Zero-Inflated Poisson (ZIP) is a mixture model designed specifically for this reality.

The Two Types of Zeros

Consider a pocket quarterback's rushing yards:

  • Structural Zero (No Opportunity): The game plan never called for scrambles. He stayed in the pocket all game and handed off or threw on every play.
  • Count Zero (Had Opportunity): He scrambled twice but was tackled for losses or no gain.

These zeros look identical in the box score, but they come from completely different processes. ZIP models both.

Key Insight

ZIP explicitly models two ways to end at zero: a structural zero state (no meaningful opportunity) with probability π, and a count state with probability (1-π), where counts follow a Poisson(λ) distribution.

The ZIP Model Structure

Parameters:

  • π (pi): Probability of the "structural zero" state (no opportunity)
  • 1 - π: Probability of entering the "count process"
  • λ (lambda): Average of the Poisson count process (when opportunity exists)

ZIP Probabilities:

ZIP: Probability of Zero

P(Y = 0) = π + (1 - π) × e^(-λ)
Excel: =C1+(1-C1)*EXP(-D1)

For positive integers k ≥ 1:

ZIP: Probability of k (for k ≥ 1)

P(Y = k) = (1 - π) × [λ^k × e^(-λ)] / k!
Excel: =(1-C1)*POISSON.DIST(A1,D1,FALSE)

Interpretation:

  • With probability π, the player is in the "no-opportunity" state → outcome is automatically 0
  • With probability (1-π), the player enters the count process → outcome follows Poisson(λ)

The ZIP Expected Value

The expected value under ZIP combines both states:

ZIP Expected Value

E[Y] = (1 - π) × λ
Excel: =(1-C1)*D1

This is crucial for matching your mean projection while allowing the model to properly distribute probability around zero.

Case Study: Stafford Rushing Yards (ZIP Approach)

Let's apply ZIP to Matthew Stafford's rushing yards prop:

Given:

  • Projected mean: E[Y] = 1.41 yards
  • Observed structural zero rate: π ≈ 0.647 (65% of games with ≤0 yards)

Step 1: Calculate λ from the mean constraint

Since E[Y] = (1 - π) × λ, we can solve:

λ = E[Y] / (1 - π)
λ = 1.41 / (1 - 0.647)
λ = 1.41 / 0.353
λ ≈ 4.0

What this means in plain English:

  • Stafford is positive only about 35% of the time (1 - π = 0.353)
  • But when he IS positive, the average positive result is ~4 yards (λ = 4.0)
  • This combination gives us the overall mean of 1.41 yards

Step 2: Calculate P(Y ≥ 1)

P(Y = 0) = π + (1 - π) × e^(-λ)
P(Y = 0) = 0.647 + 0.353 × e^(-4.0)
P(Y = 0) = 0.647 + 0.353 × 0.018
P(Y = 0) = 0.647 + 0.006
P(Y = 0) ≈ 0.653 (65.3%)

Therefore:

P(Y ≥ 1) = 1 - P(Y = 0) = 1 - 0.653 = 0.347 (34.7%)

Step 3: Calculate Expected Value

At +134 odds (risk $100 to win $134):

EV = P(Over) × profit - P(Under) × stake
EV = 0.347 × 134 - 0.653 × 100
EV = 46.50 - 65.30
EV = -$18.80

Warning

The ZIP model says the Over is negative EV (-$18.80 per $100), even though the mean projection is 1.41 yards—above the 0.5 line. This is because most probability mass is trapped in the structural non-positive state.

Comparing ZIP to Naïve Poisson

Let's see why the naïve Poisson approach fails:

Naïve Poisson (λ = 1.41):

P(Y = 0) = e^(-1.41) ≈ 0.244 (24.4%)
P(Y ≥ 1) = 1 - 0.244 = 0.756 (75.6%)
EV = 0.756 × 134 - 0.244 × 100 = +$76.90
ModelP(Y = 0)P(Y ≥ 1)EV on Over
Naïve Poisson24.4%75.6%+$76.90
ZIP65.3%34.7%-$18.80

The difference: Poisson predicts 24% zeros; reality shows 65% zeros. That 41-percentage-point gap completely flips the bet.

Key Insight

ZIP can turn the same mean (1.41) into a low P(Over) if most probability mass is trapped in a structural non-positive state. The mean doesn't tell you where the mass is located around zero.

When to Use ZIP

Use ZIP when zeros arise from two distinct sources:

  1. "No opportunity" games (role, game plan, injury limitations)
  2. "Had opportunity but got zero" games (normal count variance)

Best Applications for ZIP

SportProp TypeWhy ZIP Works
NFLBackup RB receptionsRoute participation varies; sometimes blocker-only
NFLPocket QB rushing yardsScramble frequency varies by game plan
MLBStolen basesSome games player never reaches base
NBA3-pointers made (role player)May be benched in certain matchups

Tip

In the Stafford example, ZIP corresponds to: "Most games he simply won't scramble into positive yards at all, but when he does, he may pick up a few."

Estimating ZIP Parameters from Data

Estimating π (Structural Zero Probability)

The simplest approach: use the observed frequency of zeros (or non-positive outcomes):

π ≈ (Games with outcome ≤ 0) / (Total games)

Estimating λ (Conditional Mean)

Once you have π, solve for λ to match your projected mean:

λ = E[Y] / (1 - π)

Or calculate directly from positive games:

λ ≈ Average(outcomes | outcome > 0)

Excel Implementation

Cell C1: π (structural zero probability)
Cell D1: λ (Poisson mean for count process)
Cell A1: k (outcome to calculate probability for)

P(Y = 0):     =C1+(1-C1)*EXP(-D1)
P(Y ≥ 1):     =1-(C1+(1-C1)*EXP(-D1))
              OR: =(1-C1)*(1-EXP(-D1))
P(Y = k):     =(1-C1)*POISSON.DIST(A1,D1,FALSE)
E[Y]:         =(1-C1)*D1

📝 Exercise

Instructions

Exercise: Build a ZIP Model

A backup tight end has the following profile:

  • Last 20 games: 0 receptions in 12 games (60%)
  • Mean receptions across all games: 1.6
  • Line: Over 0.5 receptions at -150

Build a ZIP model and determine if the Over is +EV.

Step 1: What is π (structural zero probability)?

Step 2: Given E[Y] = 1.6 and π = 0.60, what is λ (the Poisson mean for positive games)?

Step 3: What is P(Y = 0) under the ZIP model?

Step 4: At -150 odds (break-even 60%), is Over 0.5 receptions +EV?