All around the
internet, in many wine bars and high end cigar joints but fewer kopitiams, you
may hear people discussing about “data science” and how it can change the
world. You have seen many diagrams telling you what a “data scientist” is, or
should be, and I have been as guilty promoting ideas such as there is no such
thing as a “data scientist” but the roles and impact of a “data scientist” are
more likely to be achieved by a “data science team”.
Recently I came
across someone who has successfully built “data science teams” and not only
that, but managed to get real life results out of them while leading an
organisation – imagine a “head of data science team” as “COO”.
A man, yup it is
a male human, who has shown how to use huge volumes of unstructured data to
derive extremely detailed insights, and implement strategies to take advantage
of these insights in highly competitive, real life, multi-million dollar arenas
across the world.
To recap,
what is a “data science team”
To me “data
science” involves all the steps from collecting the data, preparing it so it
can be analysed, analysing the data, engaging in discovery/predictions based on
the data, and importantly acting on the discoveries/predictions and
implementing a feedback loop.
In most
organisations, this effort may even be the product of more than one
department/team, with the “data science team” not responsible for directly
managing the implementation but only in providing the insights and measuring
the impact.
To me, discovery
and prediction is important because this is what differentiates
reporting/business intelligence from analytics/”data science”. The tools used
for discovery/prediction are less important as long as these are based on data.
And to add, one
item that would make an analysis truly a “big data” issue is the use of
unstructured data, photos, movies, sounds files…
Personality
traits
Someone recently
commented that in order to be called a “data scientist” you need to be really
mad about analysis. And to a large extend I agree.
I believe in
passion, and extreme passion can be seen as a form of madness.
Interestingly,
last week I attended a management retreat kind of do. It was interesting; they
used the Myers Briggs 16 personalities. The idea was to teach us how to related
to each other, to understand how different people work, and help make
communication easier.
During the high
level explanation, the coach/moderator explained to us that people who are “sensing”
as opposed to people who are “Intuitive” are more likely to be moved by data,
and prefer the use of data to make decisions. On the other hand, “intuitive”
people are more likely to be interested in the big picture, rather than the
nitty gritty of data.
Since according
to the test I am more “intuitive” than “sensing”, I brought this question up
over drinks. And the answer, which seemed reasonable at the time (and still
does) was that deeper analysis tends to be an art where “intuitive” people do
well. (FYI I did prefer to be called a “data artist” rather than a “data
scientist” and do have 2 or 3 pieces in the “Art of Analytics”(1))…
So to me people
in the “data science” field should be passionate about what they are doing,
especially getting result from their work. If there is no outcome, it should
just be academic work, not real life.
So to sum up, a
“data scientist” in my view should be passionate about analysing data and about
applying it in the real world.
Ok, enough
beating around the bush…
Who is this
“data scientist”, this leader of a “data science team”?
First let me
tell you some of what that team does, it is a huge team of 20 by the way
- Collect all information available from public sources, including unstructured data
- Supplements it with specialist provided data
- Breaks down each dataset into identifiable and manageable chunks, clusters, aspect
- Classifies every aspect of the data
- Learns the patterns that hide in the data for each aspect/chunk
- Predicts the most likely result of a pattern and devises the best response
- Translates the analysis into visuals, charts…
- Makes each action plan digestible by the people who actually act
Hence in terms
of the data science process:
Data Collection/Pipelines
|
Preparation/
Storage/Transformation
|
Data Analysis
|
Visualisation/ Applications
|
Collect all information
available from public sources, including unstructured data
|
Breaks down each dataset into
identifiable and manageable chunks, clusters, aspect
|
Learns the patterns that hide in
the data for each aspect/chunk
|
Translates the analysis into
visuals, charts…
|
Supplements it with specialist
provided data
|
Classifies every aspect of the
data
|
Predicts the most likely result
of a pattern and devises the best response
|
Makes each action plan
digestible by the people who actually act
|
The data science
team covers all areas in the “data science” process; and they do more.
Not only does
the leader of this “data science team” propose the plan that is predicted to
bring success, but he also manages the implementation on the ground, provides
live feedback and thereby allows his team to update the analysis in real time.
And after the
fact, the team conducts post-mortem analysis.
So who is
this leader of a “data science team” that I am using as example?
Marcelo Bielsa
If you have not
read the transcript of his 70 minute presentation on how he prepares for each
game, including friendlies, please do so (2).
- Let me just list some highlights of his obsession with analysing data and extracting insights:
- Of each opponent, we watched all of the games from the 2017-2018 season… the 51 games of Derby county, we watched them. The analysis of each game takes 4 hours of work. Why did we do that? Because we think this is professional behaviour
- The other thing we do is point out the chances to score, the half chances to score and which team dominates every 5 minutes… That’s why it takes 4 hours to analyse each game.
- We also do video analysis where we register goals, chances, and that’s why it takes four hours to analyse the game.
- It’s a process, a way to get to know the opponent.
- We can express all these parameters with this graphic
He then talks
about understanding the structure and the patterns (patterns within context,
sound familiar?)
- They played 31 games. 49% they used a 4-3-3- system with #8 on the right… In 22% a 4-3-3- but with the #8 on the left. Same for 4-2-1-3, with #8 on the right and left. They also used structures, 3%, 2% but they are not significant.
- These structures give you fixed priorities and we here have the minutes and the games to understand why Derby changes the system and when.
- I would explain that all this information, I don’t memorise it. All the information that this report has corresponds to the questions I have asked myself over the last 30 years
A true SME,
putting all his knowledge, defining the metric, parameters, analytical methods,
then executing the analysis over 51 football games of 90 minutes each, with
each analysis taking 4 hours.
He even looks at
various forms of data
- This is the tactical analysis. This is the document. Have a look. The analysis of chances, goals, domination… We see in each segment of 5 minutes, who dominates and if they create chances.
- Now let’s look at the video of this game. We know what this player will do when he raises both hands…
And by
classifying the different phases, he can predict the patterns
- As you can see, you have 40 minutes of offensive action from Derby from 31 games. When you see 41 minutes of attack, you see what is the path for the opponent to attack.
- And if you do the same thing analysing what were the chances conceded by Derby County, you see the defensive weaknesses.
- We know if the made 184 corners, 33 were dangerous; we know qualitative data. We try to find weaknesses of the goalkeeper, where we can press. The players know about the opponent.
And creates
further visualisations so the consumers of his analyses can get the most out of
them
- Our goal is to sum up in 7 or 8 minutes to show to the player ow Derby attack…
- We also make a 7 or 8 minutes video showing their defensive weakness
- The goal of this analysis is to allow our players to have an idea of the opponent in 15 minutes.
And drills
further into individual players
- To sum up, these are the players and we have the position which they played… we gather information for the systems, we analyse to see which player goes in which system.
Finally the
passion bordering on obsession. There are stories (3) where even when coaching
at a University, he watched 3,000 players to pick 20, or how he systematically
travelled his country (Argentina) into 70 sections and arranged trials in each,
travelling to each in his Fiat 147 so he could catch all the talent.
- Here we have the analysis for all teams. This is for this season. This is for last season. This is for friendlies. We went to watch the opponent (he has only been on the job this season)
- I don’t need to watch a training session to know where they play. Why do I go? Because it’s not forbidden… ad even though watching an opponent is not useful, it allows me to keep my anxiety low
And the
understanding of the limits of analysis
- I was very sad to lose this game. When the game finished I sent it (the analysis of that days’ opponents) to Guardiola (the winning coach), this analysis expressing my admiration for him. He told me “you know more about Barcelona than me”. But it was useless because they scored three goals against us. I do this to feel well, I see that this information does not allow you to win games.
But how about
tools you ask!?!
Is he using the
cloud? Is it azure, aws, GCP, Alibaba, Huawei…?
Is all the data
on Hadoop and is he using tensor-flow?
For sure he must
be using Neural networks right?
Yes you are
right.
They have 20
extremely complex neural networks with some interaction between them. The most
trained neural network has around 30 years of data. This is the brain of
Marcelo Bielsa.
Where is MY blue bucket?
- https://www.yumpu.com/en/document/read/54918036/the-art-of-analytics/18
- https://talksport.com/football/efl/475976/marcelo-bielsa-leeds-full-transcript-incredible-press-conference-spygate/
- https://www.theguardian.com/football/2019/jan/17/marceo-bielsa-spygate-powerpoint-leeds-united