Monday 10 December 2018

Lies, Damned lies, and football statistics (with a sprinkling of fake news)


“People who don’t understand football analyse with statistics”(1), so said  Jose Mourinho, 4 times world’s best coach, 2 times champions league winner, 3 times English champion, 1 time English FA cup winner, 4 time English league cup winner, 3 times Spanish champion, 3 times Spanish cup winner, 2 times Spanish super cup winner, 2 times Italian Champion, 1 time Italian Cup winner, 1 time Italian super cup winner, 2 times Portuguese champion, 2 times Portuguese cup winner, 1 time UEFA cup winner, 1 time Europa League Winner, 1 time UEFA Super Cup winner, 2 times Portuguese Super Cup Winner, 2 times English Super Cup winner (2)

On the other hand, Pep Guardiola kind of referred to statistics when arguing that his team is not dirty: “Normally when a team has 65 or 70 per cent of the ball we cannot kick the opponent. We can kick each other, okay, but we have the ball. Normally when for every 10 minutes you have the ball for seven of them there is less option to make fouls. I don't think we're a team that make a lot of fouls in games.”(3).

So Mr Guardiola, 2 times world’s best coach, 2 times champions league winner, 1 time English champion, 1 time English league cup winner, 3 times Spanish champion, 2 times Spanish Cup winner, 3 times Spanish Super cup winner, 3 times German champion, 2 times German cup winner, 3 times FIFA club world cup winner, 3 times UEFA super cup winner and 1 time English super cup winner (4) on the other hand, uses statistics (sort of) when assessing his team.

And we have the adage that my friend Ramesh reminded me of “Lies, Damn lies, and Statistics”

Given his past record with respect to lies (5), let us consider the argument of Mr Guardiola. Taking data from whoscored (6), I focused on the number of fouls committed per game, and made a distinction between home games and away games.

In the chart above, the teams who commit more fouls per game are on the exterior whereas those who commit less fouls are closer to the origin. It is easy to see that Mr Guardiola was right: Manchester City is one of the teams that make the least fouls. It is an undeniable fact.

So is Mr Mourinho wrong then?

This is where context and subject matter expertise are important. I am very fond of the Drew Conway data science diagram (), and it emphasises the need for subject matter expertise.

What subject matter expertise would you ask?

Well, enough to understand that Manchester City play a possession based strategy, basically they keep the ball for huge chunk of each game. This element provides context.

This is football (or as Americans call is soccer) and players are not allowed to tackle players who do not have the ball, basically your opponents are much more likely to try and attack you when you have the ball; when you do not have the ball, you are unlikely to be attacked. The more time the ball spends in your possession, the less likely you are to commit a foul.

Hence, what matters is not fouls per game, but fouls per number of minutes the opposition has the ball.

Now the situation looks totally different doesn’t it. Manchester City is not among the teams that commit fewer fouls per minute out of possession; they commit more than their fair share of fouls when the opposition has the ball; in fact, if you look only at home games, they commit the most fouls adjusted to possession than any other team in the premier league.

Also interestingly, Chelsea and Liverpool also foul consistently. It makes sense, if your tactics are around overloading opponents, it makes sense that if the opponent gets the ball (by passing your press), you would be very overloaded too, hence the tactical foul to allow you to regroup and balance the situation.

In fact, that was what Gary Neville was referring to when he said that Manchester City is a cynical team (and that he likes that).

Actually what I find interesting is the fact that Manchester City’s triangle is very asymmetric. They foul much more at home than away. They are much more aggressive at home, having scored twice as many goals as they have away; Chelsea and Liverpool are much more balanced.

Anyway I can happily disagree with Mr Guardiola, after all you wouldn’t expect a coach to agree that he asks his players to commit fouls, but I would have expected him to keep quiet rather than manipulate the data in his favour. I thought the temple of “fake news” is located at the white house., apparently it has a branch at the Etihad (and I am not commenting on the FFP and other allegations by Der Spiegel (7) such as “We do what we want”)

Was this a case of “Lies, Damn Lies, and Statistics”? “Fake news” yes, deliberate misdirection/white lying may be, but it’s not the fault of statistics, it’s the fault of the person who chose the metric (number of fouls per game rather than number of fouls adjusted for possession) rather than the metric itself.

So was it an illustration of what Mr Mourinho was saying, that “people who do not understand football analyse with statistics”?

Well, since I am currently spending a lot of my energy trying to make an organisation increase its adoption and usage of statistics in decision making (not a football club though, any takers?), I would neither agree or disagree, and hide, as usual behind “it depends”.

It depends on what Mr Mourinho actually meant. Saying the people who do not understand football analyse with statistics is not the same as saying that people who understand football do not analyse with statistics. Please note that since we are dealing with absolutes, the intersections will be shown as changes in colour of the affected regions (for example red overlap with yellow makes orange, and red with blue makes purple)


The world is made up of people who understand football and those who don’t.
Now let’s add people to analyse football with statistics, Jose Mourinho’s words can be seen as:
There is a perfect overlap between people who do not understand football, and those who use statistics to analyse football.
But his statement is equally valid if:
In this case there are people who understand football and do use statistics to analyse it; presumably Mr Mourinho has at least one such analyst in his team.
So what am I saying?
I deliberately started this blog with a seemingly controversial statement by Mr Mourinho, who is someone with many detractors. It is possible that a proportion of people would have interpreted his words as the orange and blue diagram above, just because of what they perceive Mr Mourinho to be, that is negatively. Hence they may not have seen his statement as representing the last diagram above, which is not very controversial.
On the other hand, if you just go on a search engine and look for Guardiola and Gary Neville, you will find many more articles on the response of Mr Guardiola, than on the statement by Mr Neville. Again, Mr Guardiola has a better image and people tend not to analyse his statements as critically, whereas Mr Neville can be polarising, hence the focus on the rebuttal of his statement.
But as you can see, at least in this case, Mr Guardiola was dealing in “fake news”.
Conclusion(s)
While the source of any data should be looked at, personal feelings towards the person delivering the message should not get into the picture. Data is data and should be analysed without prejudice
That being said, if the person who does the analysis has “mis-spoken” frequently in the past, then it makes sense to review their data a little bit closer, after all frequency of mis-speaking is a characteristic…
One of the simplest ways of analysing data is to put it in a proper context, and this takes some understanding of the data, the process of data creation, some subject matter expertise, and an open but critical mind.

P.S. While I wrote this blog last week, this weekend, Chelsea played and beat Manchester city. In total Chelsea committed 12 fouls, Manchester City 11, whereas possession was 39%-61%; hence Chelsea made more possession adjusted fouls than Manchester City. Chelsea won, by the way and disrupted Manchester City, restricting them to only 4 shots on target, their average being 6.1.

  1. Pep Guardiola’s changing defense against his convictions for taking performance enhancing drugs, and the power of unstable urine in all 4 tests as proposed by his still close collaborator, Mr Manuel Estiarte http://www.sportingintelligence.com/2017/04/25/sharapova-guardiola-doping-darkness-and-light-250401/

No comments:

Post a Comment