Sunday, 27 January 2019

A true "data scientist"



All around the internet, in many wine bars and high end cigar joints but fewer kopitiams, you may hear people discussing about “data science” and how it can change the world. You have seen many diagrams telling you what a “data scientist” is, or should be, and I have been as guilty promoting ideas such as there is no such thing as a “data scientist” but the roles and impact of a “data scientist” are more likely to be achieved by a “data science team”.

Recently I came across someone who has successfully built “data science teams” and not only that, but managed to get real life results out of them while leading an organisation – imagine a “head of data science team” as “COO”.

A man, yup it is a male human, who has shown how to use huge volumes of unstructured data to derive extremely detailed insights, and implement strategies to take advantage of these insights in highly competitive, real life, multi-million dollar arenas across the world.

To recap, what is a “data science team”



To me “data science” involves all the steps from collecting the data, preparing it so it can be analysed, analysing the data, engaging in discovery/predictions based on the data, and importantly acting on the discoveries/predictions and implementing a feedback loop.

In most organisations, this effort may even be the product of more than one department/team, with the “data science team” not responsible for directly managing the implementation but only in providing the insights and measuring the impact.

To me, discovery and prediction is important because this is what differentiates reporting/business intelligence from analytics/”data science”. The tools used for discovery/prediction are less important as long as these are based on data.

And to add, one item that would make an analysis truly a “big data” issue is the use of unstructured data, photos, movies, sounds files…

Personality traits

Someone recently commented that in order to be called a “data scientist” you need to be really mad about analysis. And to a large extend I agree.

I believe in passion, and extreme passion can be seen as a form of madness.

Interestingly, last week I attended a management retreat kind of do. It was interesting; they used the Myers Briggs 16 personalities. The idea was to teach us how to related to each other, to understand how different people work, and help make communication easier.

During the high level explanation, the coach/moderator explained to us that people who are “sensing” as opposed to people who are “Intuitive” are more likely to be moved by data, and prefer the use of data to make decisions. On the other hand, “intuitive” people are more likely to be interested in the big picture, rather than the nitty gritty of data.

Since according to the test I am more “intuitive” than “sensing”, I brought this question up over drinks. And the answer, which seemed reasonable at the time (and still does) was that deeper analysis tends to be an art where “intuitive” people do well. (FYI I did prefer to be called a “data artist” rather than a “data scientist” and do have 2 or 3 pieces in the “Art of Analytics”(1))…

So to me people in the “data science” field should be passionate about what they are doing, especially getting result from their work. If there is no outcome, it should just be academic work, not real life.
So to sum up, a “data scientist” in my view should be passionate about analysing data and about applying it in the real world.

Ok, enough beating around the bush…

Who is this “data scientist”, this leader of a “data science team”?

First let me tell you some of what that team does, it is a huge team of 20 by the way
  • Collect all information available from public sources, including unstructured data
  • Supplements it with specialist provided data
  • Breaks down each dataset into identifiable and manageable chunks, clusters, aspect
  • Classifies every aspect of the data
  • Learns the patterns that hide in the data for each aspect/chunk
  • Predicts the most likely result of a pattern and devises the best response
  • Translates the analysis into visuals, charts…
  • Makes each action plan digestible by the people who actually act

Hence in terms of the data science process:

Data Collection/Pipelines
Preparation/
Storage/Transformation
Data Analysis
Visualisation/ Applications
Collect all information available from public sources, including unstructured data
Breaks down each dataset into identifiable and manageable chunks, clusters, aspect
Learns the patterns that hide in the data for each aspect/chunk
Translates the analysis into visuals, charts…

Supplements it with specialist provided data
Classifies every aspect of the data
Predicts the most likely result of a pattern and devises the best response
Makes each action plan digestible by the people who actually act


The data science team covers all areas in the “data science” process; and they do more.

Not only does the leader of this “data science team” propose the plan that is predicted to bring success, but he also manages the implementation on the ground, provides live feedback and thereby allows his team to update the analysis in real time.

And after the fact, the team conducts post-mortem analysis.

So who is this leader of a “data science team” that I am using as example?

Marcelo Bielsa

If you have not read the transcript of his 70 minute presentation on how he prepares for each game, including friendlies, please do so (2).

  • Let me just list some highlights of his obsession with analysing data and extracting insights:
  • Of each opponent, we watched all of the games from the 2017-2018 season… the 51 games of Derby county, we watched them. The analysis of each game takes 4 hours of work. Why did we do that? Because we think this is professional behaviour
  • The other thing we do is point out the chances to score, the half chances to score and which team dominates every 5 minutes… That’s why it takes 4 hours to analyse each game.
  • We also do video analysis where we register goals, chances, and that’s why it takes four hours to analyse the game.
  • It’s a process, a way to get to know the opponent.
  • We can express all these parameters with this graphic


He then talks about understanding the structure and the patterns (patterns within context, sound familiar?)
  • They played 31 games. 49% they used a 4-3-3- system with #8 on the right… In 22% a 4-3-3- but with the #8 on the left. Same for 4-2-1-3, with #8 on the right and left. They also used structures, 3%, 2% but they are not significant.
  • These structures give you fixed priorities and we here have the minutes and the games to understand why Derby changes the system and when.
  • I would explain that all this information, I don’t memorise it. All the information that this report has corresponds to the questions I have asked myself over the last 30 years

A true SME, putting all his knowledge, defining the metric, parameters, analytical methods, then executing the analysis over 51 football games of 90 minutes each, with each analysis taking 4 hours.

He even looks at various forms of data
  • This is the tactical analysis. This is the document. Have a look. The analysis of chances, goals, domination… We see in each segment of 5 minutes, who dominates and if they create chances.
  • Now let’s look at the video of this game. We know what this player will do when he raises both hands…

And by classifying the different phases, he can predict the patterns
  • As you can see, you have 40 minutes of offensive action from Derby from 31 games. When you see 41 minutes of attack, you see what is the path for the opponent to attack.
  • And if you do the same thing analysing what were the chances conceded by Derby County, you see the defensive weaknesses.
  • We know if the made 184 corners, 33 were dangerous; we know qualitative data. We try to find weaknesses of the goalkeeper, where we can press. The players know about the opponent.

And creates further visualisations so the consumers of his analyses can get the most out of them
  • Our goal is to sum up in 7 or 8 minutes to show to the player ow Derby attack…
  • We also make a 7 or 8 minutes video showing their defensive weakness
  • The goal of this analysis is to allow our players to have an idea of the opponent in 15 minutes.

And drills further into individual players
  • To sum up, these are the players and we have the position which they played… we gather information for the systems, we analyse to see which player goes in which system.

Finally the passion bordering on obsession. There are stories (3) where even when coaching at a University, he watched 3,000 players to pick 20, or how he systematically travelled his country (Argentina) into 70 sections and arranged trials in each, travelling to each in his Fiat 147 so he could catch all the talent.
  • Here we have the analysis for all teams. This is for this season. This is for last season. This is for friendlies. We went to watch the opponent (he has only been on the job this season)
  • I don’t need to watch a training session to know where they play. Why do I go? Because it’s not forbidden… ad even though watching an opponent is not useful, it allows me to keep my anxiety low

And the understanding of the limits of analysis
  • I was very sad to lose this game. When the game finished I sent it (the analysis of that days’ opponents) to Guardiola (the winning coach), this analysis expressing my admiration for him. He told me “you know more about Barcelona than me”. But it was useless because they scored three goals against us. I do this to feel well, I see that this information does not allow you to win games.
Ok, possibly this last trait may disqualify him; humble “data scientist”? mmm…


But how about tools you ask!?!

Is he using the cloud? Is it azure, aws, GCP, Alibaba, Huawei…?

Is all the data on Hadoop and is he using tensor-flow?

For sure he must be using Neural networks right?

Yes you are right.

They have 20 extremely complex neural networks with some interaction between them. The most trained neural network has around 30 years of data. This is the brain of Marcelo Bielsa.


Where is MY blue bucket?

  1. https://www.yumpu.com/en/document/read/54918036/the-art-of-analytics/18
  2. https://talksport.com/football/efl/475976/marcelo-bielsa-leeds-full-transcript-incredible-press-conference-spygate/
  3. https://www.theguardian.com/football/2019/jan/17/marceo-bielsa-spygate-powerpoint-leeds-united


1 comment:

  1. Jajaja you did it again. It´s a pity you don´t know more experienced and data driven ARG football managers such as Bilardo or Sabella (shame on me they both belong to Estudiantes de la Plata). About the famous Estudiantes vs Barcelona final match of 2009 there you go the analysis of Sabella... plenty of Data Science passion. https://www.youtube.com/watch?v=AnUkSnicSaQ

    ReplyDelete