Friday, 13 July 2018

The world cup predictions prove: you need to use the right algo with the right data to have a chance at the right outcome.



I had written a simple blog about the English Premier League, using simple analytics to uncover some patterns and was planning to publish it. Then I thought I would see if the same patterns applied in the world cup. But have you seen so many worldcup predictions, and how spectacularly wrong they were? Hey, I can’t say anything about whether the octopus or the guinea pig can predict the winner, but I can certainly see when the tools/method/data used for prediction are unlikely to suit.

Let’s start with UBS(1)...

The technique used was simulation (Neymar’d be happy, so I guess Brazil would have an edge...). The first weird thing is that they decided to include Italy, a team that didn;t even qualify... Italy, again, a team that did not qualify, is ranked 12th...

This is where they got interesting,what did they simulate? Obviously not the tournament, since they have a phantom team. They input various valirables including the ELO ranking (2) of teams (claimed to be an objective measure of how good they are) into a statistical model and ran a series of monte carlo simulations.

The fun thing is that even while ELO ratings are ranking the performance of a team in a game, weighing the result by importance of the game (of course a win in a tournament is more important than in a friendly) among others. But interestingly, according to Wikipedia, “there is no single nor any official Elo ranking for football teams

It becomes hazier when these are claimed to be econometric methods (3).

Oh and by the way Italy was added to honour the nation...

Well if my aim was to come up with an accurate prediction I would try to make the best prediction possible, not use numbers that are not objective (although I may say they are), and include teams to honour them.


So, I would classify this prediction and a PR smoke and mirror exercise with some smoke coloured in teams colours to make you believe you are at a stadium. Don;t worry, there was very little violence at the World Cup, kudos to the Russian security.

Next, let’s go to Goldman Sachs...(4)

Now, no such old fashioned stuff like econometrics and monte carlo, too old and old school. Goldman Sachs went for everyone’s favourite: AI!

They used 200,000 models and simulate 1 million variations. Impressive. They used player and team attributes to forecast specific match scores, and then simulate the whole tournament. Now that makes more sense than having a phantom team...

Where is gets interesting is that they say “Brazil is expected to win its sixth World Cup title, defeating Germany in the final by an unrounded score of 1.70 to 1.41” mm... do they watch football? 1.71 to 1.41? is that 1-1 and goes to extra time or 2-1 and Brazil wins? Do you round up or down? Do they actually know there is extra time and penalty kicks?

Hmm... Looks like there is some domain knowledge lacking...




Fret not, Goldman Sachs was at it again (5). As the tournament progressed, they revised their predictions. After England beat Panama, they revised the prediction, now the final would be Brazil 1.59 England 1.17. hmmm still these pesky decimals... (also I guess it meant their initial model predicted England would not beat Panama...)

Still lacks understanding of the tournament, looks like all games are assumed to start with fresh squad... (at least equally fresh: no extra time)


But Goldman Sachs were not done yet! (6) After Belgium beat Brazil, they updated their prediction and made Belgium the favourite (32.6% chance of winning the world cup). Note that these predictions were made at semi-final stage; Goldman Sachs predicted the final would be Belgium England. Well they got it 100% wrong...






I would say Goldman has the right idea, simulate the whole tournament, use player and team stats... but not understanding football or at least the tournament game is not a good thing...

Any other method?

Well someone tried using graph theory (7).

I recommend this article, it is fun, easy to read and pretty.

But it eventually boils down to: the country whose players are playing in leagues where most players also play is most likely to win.  That’s what eigen vector centrality boils down to in this case. (ooops, sorry I try to avoid technical terms, but sometimes they come out)



This is a nice idea. If players at the world cup are the better ones, then the teams where many of them play would be of higher quality, Therefore, the teams with the more players in high quality teams are likely to have higher quality and therefore likely winners.


Makes sense, right? Especially if you take into account the fact that world cup teams are balanced (if you have 10 brilliant goal keepers in the toughest league, it won’t matter much here as every team can only have 3 goalkeepers). However it does ignore that football is a team sport, and although your squad matters very much in a tournament, you need some balance. If you have amazing talent in midfield but no strikers, who will score for you?

I would say I like this approach, can be made better with some football thinking; afterall football is a team sport; a team has to be more than the sum of individuals, the tactics matter, so does the manager who seems to have been ignored in all this.

So you  may ask, while I spit in everyone’s soup, is there a soup I would drink? The answer is yes! It may have sounded like a joke, but, to me, the most realistic prediction of the world cup that I have seen is a simulation of the whole tournament, with data on players, managers, tactics, from football manager (8)

Why?

It’s very simple, the data inside the game is of very high quality, the attributes of the players, their preferences, they style they play, how often they get injured... is all measured and included in the game; same for managers.

Furthermore, the whole tournament is simulated; has been 1000 times. And the results compiled.
And the likely winner is France.

To sum up, choosing the correct data to suit your problem (player and manager statistics, team statistics), the correct algorithms (simulating the whole tournament by the rules of the tournament) gives us a good chance to get the wanted outcome.

Hence France is likely to win the world cup, or so says the best analytical model I have seen.

But personally, I am likely to be rooting for Croatia, else my friend Genti may not forgive me ;)

And in case you are wondering, if you have seen my linkedin comments about the tournament (9), Tonton Zola Moukoko is a player where FM got it wrong; not everything is about data and sometimes life takes a turn (10)

  1. UBS predictions for the world cup https://www.businessinsider.sg/who-will-win-the-world-cup-2018-2018-5/?r=US&IR=T
  2. https://en.wikipedia.org/wiki/World_Football_Elo_Ratings
  3. https://www.cnbc.com/2018/05/17/world-cup-winner-predicted-by-ubs.html
  4. Goldman Sachs predictions for the world cup (https://www.businessinsider.sg/world-cup-predictions-pick-to-win-it-all-goldman-sachs-ai-model-2018-6/?r=UK&IR=T)
  5. Goldman Sachs predictions for the world cup  again https://www.ft.com/content/804a21be-7915-11e8-bc55-50daf11b720d
  6. Goldman Sachs predictions for the world cup  again and again https://www.businessinsider.sg/world-cup-predictions-goldman-sachs-ai-model-belgium-england-final/?r=US&IR=T
  7. https://cambridge-intelligence.com/graph-theory-world-cup-winner-prediction/
  8. https://www.youtube.com/watch?v=OxX_tdzFpgk
  9. https://www.linkedin.com/feed/update/urn:li:activity:6422819043379646464/
  10. https://offsiderulepodcast.com/2017/07/21/championship-manager-tonton-zola-moukoko/


No comments:

Post a Comment