Understanding Covariance and Correlation
Now that you understand why correlation matters for SGPs, let's learn how to measure it. This lesson covers the only two statistical concepts you need: covariance and correlation coefficient (r).
Covariance: Direction of Movement
Covariance tells you whether two stats tend to be above (or below) their averages in the same games.
| Covariance | Meaning |
|---|---|
| Positive | They move together (both above or both below average) |
| Negative | They trade off (when one is up, the other is down) |
| Zero | No linear relationship |
The Problem with Covariance
Covariance is useful for direction, but it has a fatal flaw: the number depends on the scale of your data.
- Covariance between passing yards (200-400 range) and receiving TDs (0-2 range) will be small
- Covariance between two yardage stats will be large
This makes raw covariance hard to interpret. Enter correlation.
Correlation (r): Direction + Strength on a Clean Scale
Correlation rescales covariance to a unit-free number between -1 and +1.
Correlation Coefficient
r = Cov(X,Y) / (σₓ × σᵧ)=CORREL(range1, range2)Interpreting r Values
| r Value | Interpretation | SGP Implication |
|---|---|---|
| +1.0 | Perfect positive | Legs always hit together |
| +0.4 to +0.7 | Strong positive | Book shortens payout significantly |
| +0.1 to +0.3 | Weak positive | Small adjustment to independence |
| 0 | No relationship | Independence formula works |
| -0.1 to -0.3 | Weak negative | Small boost to payout |
| -0.4 to -0.7 | Strong negative | Book lengthens payout significantly |
| -1.0 | Perfect negative | Legs never hit together |
Key Insight
In prop betting, you almost never see ±1. What matters is whether dependence is weak enough to ignore or strong enough that the payout will be aggressively shortened.
The Excel Formulas You Need
You don't need to compute covariance and correlation by hand. Excel does it for you.
Primary Formula: Correlation
=CORREL(range1, range2)
This is your default tool. It's the fastest and least error-prone way to measure relationships from paired game logs.
Supporting Formulas
| Purpose | Excel Formula |
|---|---|
| Correlation | =CORREL(A2:A20, B2:B20) |
| Sample Covariance | =COVARIANCE.S(A2:A20, B2:B20) |
| Sample Std Dev | =STDEV.S(A2:A20) |
Tip
Use =CORREL() as your default. It handles the covariance-to-correlation conversion automatically.
Stats Correlation vs. Leg Correlation (The Trap)
Here's a crucial distinction that trips up many bettors:
Raw-Stat Correlation
- Correlates actual numbers (e.g., passing TD counts and receiving TD counts)
- Answers: Do these stats move together?
Leg Correlation
- Correlates binary events (Over/Under hit or miss)
- Answers: Do these events hit together?
They are related but not identical.
Warning
When pricing SGPs, you're betting on events (hit/miss). If you use r, make sure it matches the event definition (Over/Under, Yes/No), not just the underlying stat.
Example: The Difference Matters
Suppose you correlate:
- Williams passing TDs (0, 1, 2, 3, 4...)
- Loveland receiving TDs (0, 1, 2...)
You get r = +0.47 (strong positive).
But the SGP legs are:
- Williams Over 1.5 TDs (binary: hit or miss)
- Loveland Over 0.5 TDs (binary: hit or miss)
The correlation between the binary indicators could be different from +0.47.
Best practice: Use conditional probability from game logs when possible (covered in Lesson 3).
Worked Example: Calculating Correlation in Excel
Let's calculate correlation for the Williams-Loveland pair using real game data.
Step 1: Set Up Your Data
| Game | Williams Pass TDs | Loveland Rec TDs |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 4 | 0 |
| 4 | 1 | 0 |
| 5 | 0 | 0 |
| 6 | 0 | 0 |
| 7 | 3 | 2 |
| 8 | 1 | 0 |
| 9 | 0 | 0 |
| 10 | 3 | 1 |
Step 2: Calculate Correlation
In Excel, with Williams TDs in cells A2:A11 and Loveland TDs in B2:B11:
=CORREL(A2:A11, B2:B11)
Result: r ≈ +0.72 (strong positive correlation)
Step 3: Interpret the Result
A correlation of +0.72 means:
- When Williams throws more TDs, Loveland tends to score more TDs
- The relationship is strong
- The book will significantly shorten SGP payouts for this pair
Understanding Correlation Strength for Betting
Here's a practical guide for how correlation affects SGP pricing:
| Correlation Range | Effect on SGP | Your Action |
|---|---|---|
| r > 0.4 | Book shortens payout heavily | Need strong positive r assumption to find value |
| r = 0.1 to 0.4 | Moderate shortening | May find edges if book overestimates r |
| r ≈ 0 | Near independence pricing | Standard parlay math applies |
| r = -0.1 to -0.4 | Book lengthens payout | May find edges if book underestimates negative r |
| r < -0.4 | Significant payout boost | Rare—look for mispriced negative correlation |
Key Insight
Correlation is the scoreboard. It doesn't tell you why the relationship exists, but it tells you how strongly the game environment ties the outcomes together.
Practice Exercise
📝 Exercise
Instructions
You have 10 games of data for two props:
Prop A (QB passing TDs): [2, 1, 3, 2, 0, 2, 3, 1, 2, 4]
Prop B (WR receiving TDs): [1, 0, 1, 1, 0, 0, 2, 0, 1, 1]
- Enter the data into Excel
- Compute r with
=CORREL(A2:A11, B2:B11) - Classify the relationship as weak/moderate/strong
Key Takeaways
- Covariance tells you direction; correlation tells you direction AND strength
- Use
=CORREL()in Excel—it's your primary tool - Correlation ranges from -1 to +1, with values above ±0.4 being "strong"
- Watch out for the stats vs. legs trap—SGPs bet on binary events, not raw numbers
- Strong positive correlation (r > 0.4) means books shorten payouts significantly
Note
Coming Up Next: Now that you can measure correlation, we'll learn how to actually price 2-leg SGPs using conditional probability from game logs.