David & Goliath
This article records a classic battle in the mold of David vs Goliath. If David and Goliath each made quantitative predictions about a soccer tournament. And if Goliath's model was WAY better. And if Goliath didn't even realize David was competing.
In case the metaphor wasn't clear, VividNumeral (this website) = David & FiveThirtyEight (a subsidiary of ESPN) = Goliath. VividNumeral (me) posted group stage World Cup predictions on June 8th. FiveThirtyEight (Nate Silver) posted complete World Cup predictions on June 9th. VividNumeral+1 for being early. FiveThirtyEight+100 for pretty much everything else.
I love Nate Silver's World Cup model. He goes in to a fuller explanation on FiveThirtyEight.com, but here are the highlights:
- Very selective about which matches it counts
- Splits offense and defense ratings
- Uses individual player's statistics from major club leagues (England, Spain, Germany, Italy, and France) and tournaments
- Circumstantial variables like home field advantage (strong in soccer) and travel distance
My model uses FIFA ranks and game outcomes from the past three World Cups, relates them to game point differential using kernel regression techniques, and uses those results to run simulations of the World Cup using current FIFA ranks. But FIFA ranks are pretty flawed compared to the SPI measure from ESPN that Nate Silver's method uses. In addition to several other flaws (see the end of this article), FIFA ranks particularly miss out on the club statistics that could be a very powerful variable. In short, I'm totally using SPI in 2018.
Nate Silver Hates America
The table below shows the difference in every team's group stage survival probability between FiveThirtyEight and VividNumeral. A high positive number means FiveThirtyEight forecasts a higher probability of advancing.
I seriously considered calling this article "Nate Silver Hates America," mostly to entice haters to stumble on my site after seething Google searches and bump up my site visit statistics, but partially because FiveThirtyEight has an 8% lower probability of the US escaping the group stage. In the SPI-based model, Germany massively benefits at the expense of the US and Portugal. In Nate Silver's defense, he hates Iran even more than the US - so he at least falls on the right side of the Axis of Evil.
Compared to my predictions, FiveThirtyEight's favorite teams are France (seriously, SPI might hate America), Brazil (home field advantage), and South Korea (Korea Republic in the table). I have no idea why FiveThirtyEight's model likes South Korea so much, but at least it's not North Korea, right?
Compared to FiveThirtyEight, my predictions mostly prefer Greece, Algeria, and Switzerland. Not sure there's much to read in to this, other than maybe my model prefers former vassals of the Roman Empire. The love for Algeria (Group H) and Switzerland (E) is just the flip side of FiveThirtyEight's preference for South Korea (H) and France (E).
The only forecast from FiveThirtyEight I'm having trouble swallowing is a greater than 99% chance of Brazil getting out of the group stage. I understand that any ref who doesn't want to die is going to rule in favor of Brazil, but in a sport with such low scoring games anything can happen over a three game sample.
After the die is cast, I plan on regressing each set of probabilities on actual outcomes and comparing the R-Squared & coefficients of each model. I fully expect David to end up as a speck on Goliath's shoe. Without Goliath ever noticing.