I had written a simple
blog about the English Premier League, using simple analytics to uncover some
patterns and was planning to publish it. Then I thought I would see if the same
patterns applied in the world cup. But have you seen so many worldcup
predictions, and how spectacularly wrong they were? Hey, I can’t say anything
about whether the octopus or the guinea pig can predict the winner, but I can
certainly see when the tools/method/data used for prediction are unlikely to
suit.
Let’s start with UBS(1)...
The technique used was
simulation (Neymar’d be happy, so I guess Brazil would have an edge...). The
first weird thing is that they decided to include Italy, a team that didn;t
even qualify... Italy, again, a team that did not qualify, is ranked 12th...
This is where they got
interesting,what did they simulate? Obviously not the tournament, since they
have a phantom team. They input various valirables including the ELO ranking
(2) of teams (claimed to be an objective measure of how good they are) into a
statistical model and ran a series of monte carlo simulations.
The fun thing is that
even while ELO ratings are ranking the performance of a team in a game,
weighing the result by importance of the game (of course a win in a tournament
is more important than in a friendly) among others. But interestingly, according
to Wikipedia, “there is no single nor any official Elo
ranking for football teams”
It becomes hazier when
these are claimed to be econometric methods (3).
Well if my aim was to
come up with an accurate prediction I would try to make the best prediction
possible, not use numbers that are not objective (although I may say they are),
and include teams to honour them.
So, I would classify
this prediction and a PR smoke and mirror exercise with some smoke coloured in
teams colours to make you believe you are at a stadium. Don;t worry, there was
very little violence at the World Cup, kudos to the Russian security.
Next, let’s go to
Goldman Sachs...(4)
Now, no such old
fashioned stuff like econometrics and monte carlo, too old and old school.
Goldman Sachs went for everyone’s favourite: AI!
They used 200,000
models and simulate 1 million variations. Impressive. They used player and team
attributes to forecast specific match scores, and then simulate the whole
tournament. Now that makes more sense than having a phantom team...
Where is gets
interesting is that they say “Brazil is expected to win its sixth World Cup title, defeating Germany in the final by an
unrounded score of 1.70 to 1.41”
mm... do they watch football? 1.71 to 1.41? is that 1-1 and goes to extra time
or 2-1 and Brazil wins? Do you round up or down? Do they actually know there is
extra time and penalty kicks?
Hmm... Looks like
there is some domain knowledge lacking...
Fret not, Goldman
Sachs was at it again (5). As the tournament progressed, they revised their
predictions. After England beat Panama, they revised the prediction, now the
final would be Brazil 1.59 England 1.17. hmmm still these pesky decimals... (also
I guess it meant their initial model predicted England would not beat
Panama...)
Still lacks
understanding of the tournament, looks like all games are assumed to start with
fresh squad... (at least equally fresh: no extra time)
But Goldman Sachs were
not done yet! (6) After Belgium beat Brazil, they updated their prediction and
made Belgium the favourite (32.6% chance of winning the world cup). Note that
these predictions were made at semi-final stage; Goldman Sachs predicted the final
would be Belgium England. Well they got it 100% wrong...
I would say Goldman
has the right idea, simulate the whole tournament, use player and team stats...
but not understanding football or at least the tournament game is not a good
thing...
Any other method?
Well someone tried
using graph theory (7).
I recommend this
article, it is fun, easy to read and pretty.
But it eventually boils
down to: the country whose players are playing in leagues where most players
also play is most likely to win. That’s
what eigen vector centrality boils down to in this case. (ooops, sorry I try to
avoid technical terms, but sometimes they come out)
This is a nice idea.
If players at the world cup are the better ones, then the teams where many of
them play would be of higher quality, Therefore, the teams with the more
players in high quality teams are likely to have higher quality and therefore
likely winners.
Makes sense, right?
Especially if you take into account the fact that world cup teams are balanced
(if you have 10 brilliant goal keepers in the toughest league, it won’t matter
much here as every team can only have 3 goalkeepers). However it does ignore
that football is a team sport, and although your squad matters very much in a
tournament, you need some balance. If you have amazing talent in midfield but
no strikers, who will score for you?
I would say I like
this approach, can be made better with some football thinking; afterall
football is a team sport; a team has to be more than the sum of individuals,
the tactics matter, so does the manager who seems to have been ignored in all this.
So you may ask, while I spit in everyone’s soup, is
there a soup I would drink? The answer is yes! It may have sounded like a joke,
but, to me, the most realistic prediction of the world cup that I have seen is
a simulation of the whole tournament, with data on players, managers, tactics,
from football manager (8)
Why?
It’s very simple, the
data inside the game is of very high quality, the attributes of the players,
their preferences, they style they play, how often they get injured... is all
measured and included in the game; same for managers.
Furthermore, the whole
tournament is simulated; has been 1000 times. And the results compiled.
And the likely winner is
France.
To sum up, choosing
the correct data to suit your problem (player and manager statistics, team
statistics), the correct algorithms (simulating the whole tournament by the
rules of the tournament) gives us a good chance to get the wanted outcome.
Hence France is likely to win
the world cup, or so says the best analytical model I have seen.
But personally, I am
likely to be rooting for Croatia, else my friend Genti may not forgive me ;)
And in case you are
wondering, if you have seen my linkedin comments about the tournament (9),
Tonton Zola Moukoko is a player where FM got it wrong; not everything is about
data and sometimes life takes a turn (10)
- UBS predictions for the world cup https://www.businessinsider.sg/who-will-win-the-world-cup-2018-2018-5/?r=US&IR=T
- https://en.wikipedia.org/wiki/World_Football_Elo_Ratings
- https://www.cnbc.com/2018/05/17/world-cup-winner-predicted-by-ubs.html
- Goldman Sachs predictions for the world cup (https://www.businessinsider.sg/world-cup-predictions-pick-to-win-it-all-goldman-sachs-ai-model-2018-6/?r=UK&IR=T)
- Goldman Sachs predictions for the world cup again https://www.ft.com/content/804a21be-7915-11e8-bc55-50daf11b720d
- Goldman Sachs predictions for the world cup again and again https://www.businessinsider.sg/world-cup-predictions-goldman-sachs-ai-model-belgium-england-final/?r=US&IR=T
- https://cambridge-intelligence.com/graph-theory-world-cup-winner-prediction/
- https://www.youtube.com/watch?v=OxX_tdzFpgk
- https://www.linkedin.com/feed/update/urn:li:activity:6422819043379646464/
- https://offsiderulepodcast.com/2017/07/21/championship-manager-tonton-zola-moukoko/
No comments:
Post a Comment