Sunday, 27 January 2019

A true "data scientist"



All around the internet, in many wine bars and high end cigar joints but fewer kopitiams, you may hear people discussing about “data science” and how it can change the world. You have seen many diagrams telling you what a “data scientist” is, or should be, and I have been as guilty promoting ideas such as there is no such thing as a “data scientist” but the roles and impact of a “data scientist” are more likely to be achieved by a “data science team”.

Recently I came across someone who has successfully built “data science teams” and not only that, but managed to get real life results out of them while leading an organisation – imagine a “head of data science team” as “COO”.

A man, yup it is a male human, who has shown how to use huge volumes of unstructured data to derive extremely detailed insights, and implement strategies to take advantage of these insights in highly competitive, real life, multi-million dollar arenas across the world.

To recap, what is a “data science team”



To me “data science” involves all the steps from collecting the data, preparing it so it can be analysed, analysing the data, engaging in discovery/predictions based on the data, and importantly acting on the discoveries/predictions and implementing a feedback loop.

In most organisations, this effort may even be the product of more than one department/team, with the “data science team” not responsible for directly managing the implementation but only in providing the insights and measuring the impact.

To me, discovery and prediction is important because this is what differentiates reporting/business intelligence from analytics/”data science”. The tools used for discovery/prediction are less important as long as these are based on data.

And to add, one item that would make an analysis truly a “big data” issue is the use of unstructured data, photos, movies, sounds files…

Personality traits

Someone recently commented that in order to be called a “data scientist” you need to be really mad about analysis. And to a large extend I agree.

I believe in passion, and extreme passion can be seen as a form of madness.

Interestingly, last week I attended a management retreat kind of do. It was interesting; they used the Myers Briggs 16 personalities. The idea was to teach us how to related to each other, to understand how different people work, and help make communication easier.

During the high level explanation, the coach/moderator explained to us that people who are “sensing” as opposed to people who are “Intuitive” are more likely to be moved by data, and prefer the use of data to make decisions. On the other hand, “intuitive” people are more likely to be interested in the big picture, rather than the nitty gritty of data.

Since according to the test I am more “intuitive” than “sensing”, I brought this question up over drinks. And the answer, which seemed reasonable at the time (and still does) was that deeper analysis tends to be an art where “intuitive” people do well. (FYI I did prefer to be called a “data artist” rather than a “data scientist” and do have 2 or 3 pieces in the “Art of Analytics”(1))…

So to me people in the “data science” field should be passionate about what they are doing, especially getting result from their work. If there is no outcome, it should just be academic work, not real life.
So to sum up, a “data scientist” in my view should be passionate about analysing data and about applying it in the real world.

Ok, enough beating around the bush…

Who is this “data scientist”, this leader of a “data science team”?

First let me tell you some of what that team does, it is a huge team of 20 by the way
  • Collect all information available from public sources, including unstructured data
  • Supplements it with specialist provided data
  • Breaks down each dataset into identifiable and manageable chunks, clusters, aspect
  • Classifies every aspect of the data
  • Learns the patterns that hide in the data for each aspect/chunk
  • Predicts the most likely result of a pattern and devises the best response
  • Translates the analysis into visuals, charts…
  • Makes each action plan digestible by the people who actually act

Hence in terms of the data science process:

Data Collection/Pipelines
Preparation/
Storage/Transformation
Data Analysis
Visualisation/ Applications
Collect all information available from public sources, including unstructured data
Breaks down each dataset into identifiable and manageable chunks, clusters, aspect
Learns the patterns that hide in the data for each aspect/chunk
Translates the analysis into visuals, charts…

Supplements it with specialist provided data
Classifies every aspect of the data
Predicts the most likely result of a pattern and devises the best response
Makes each action plan digestible by the people who actually act


The data science team covers all areas in the “data science” process; and they do more.

Not only does the leader of this “data science team” propose the plan that is predicted to bring success, but he also manages the implementation on the ground, provides live feedback and thereby allows his team to update the analysis in real time.

And after the fact, the team conducts post-mortem analysis.

So who is this leader of a “data science team” that I am using as example?

Marcelo Bielsa

If you have not read the transcript of his 70 minute presentation on how he prepares for each game, including friendlies, please do so (2).

  • Let me just list some highlights of his obsession with analysing data and extracting insights:
  • Of each opponent, we watched all of the games from the 2017-2018 season… the 51 games of Derby county, we watched them. The analysis of each game takes 4 hours of work. Why did we do that? Because we think this is professional behaviour
  • The other thing we do is point out the chances to score, the half chances to score and which team dominates every 5 minutes… That’s why it takes 4 hours to analyse each game.
  • We also do video analysis where we register goals, chances, and that’s why it takes four hours to analyse the game.
  • It’s a process, a way to get to know the opponent.
  • We can express all these parameters with this graphic


He then talks about understanding the structure and the patterns (patterns within context, sound familiar?)
  • They played 31 games. 49% they used a 4-3-3- system with #8 on the right… In 22% a 4-3-3- but with the #8 on the left. Same for 4-2-1-3, with #8 on the right and left. They also used structures, 3%, 2% but they are not significant.
  • These structures give you fixed priorities and we here have the minutes and the games to understand why Derby changes the system and when.
  • I would explain that all this information, I don’t memorise it. All the information that this report has corresponds to the questions I have asked myself over the last 30 years

A true SME, putting all his knowledge, defining the metric, parameters, analytical methods, then executing the analysis over 51 football games of 90 minutes each, with each analysis taking 4 hours.

He even looks at various forms of data
  • This is the tactical analysis. This is the document. Have a look. The analysis of chances, goals, domination… We see in each segment of 5 minutes, who dominates and if they create chances.
  • Now let’s look at the video of this game. We know what this player will do when he raises both hands…

And by classifying the different phases, he can predict the patterns
  • As you can see, you have 40 minutes of offensive action from Derby from 31 games. When you see 41 minutes of attack, you see what is the path for the opponent to attack.
  • And if you do the same thing analysing what were the chances conceded by Derby County, you see the defensive weaknesses.
  • We know if the made 184 corners, 33 were dangerous; we know qualitative data. We try to find weaknesses of the goalkeeper, where we can press. The players know about the opponent.

And creates further visualisations so the consumers of his analyses can get the most out of them
  • Our goal is to sum up in 7 or 8 minutes to show to the player ow Derby attack…
  • We also make a 7 or 8 minutes video showing their defensive weakness
  • The goal of this analysis is to allow our players to have an idea of the opponent in 15 minutes.

And drills further into individual players
  • To sum up, these are the players and we have the position which they played… we gather information for the systems, we analyse to see which player goes in which system.

Finally the passion bordering on obsession. There are stories (3) where even when coaching at a University, he watched 3,000 players to pick 20, or how he systematically travelled his country (Argentina) into 70 sections and arranged trials in each, travelling to each in his Fiat 147 so he could catch all the talent.
  • Here we have the analysis for all teams. This is for this season. This is for last season. This is for friendlies. We went to watch the opponent (he has only been on the job this season)
  • I don’t need to watch a training session to know where they play. Why do I go? Because it’s not forbidden… ad even though watching an opponent is not useful, it allows me to keep my anxiety low

And the understanding of the limits of analysis
  • I was very sad to lose this game. When the game finished I sent it (the analysis of that days’ opponents) to Guardiola (the winning coach), this analysis expressing my admiration for him. He told me “you know more about Barcelona than me”. But it was useless because they scored three goals against us. I do this to feel well, I see that this information does not allow you to win games.
Ok, possibly this last trait may disqualify him; humble “data scientist”? mmm…


But how about tools you ask!?!

Is he using the cloud? Is it azure, aws, GCP, Alibaba, Huawei…?

Is all the data on Hadoop and is he using tensor-flow?

For sure he must be using Neural networks right?

Yes you are right.

They have 20 extremely complex neural networks with some interaction between them. The most trained neural network has around 30 years of data. This is the brain of Marcelo Bielsa.


Where is MY blue bucket?

  1. https://www.yumpu.com/en/document/read/54918036/the-art-of-analytics/18
  2. https://talksport.com/football/efl/475976/marcelo-bielsa-leeds-full-transcript-incredible-press-conference-spygate/
  3. https://www.theguardian.com/football/2019/jan/17/marceo-bielsa-spygate-powerpoint-leeds-united


Tuesday, 15 January 2019

Great Chariots of Fire! Marcelo Bielsa to the Red Devils (please)


Moneyball (1) has been used by many “data science” people to justify why they should get into professional sports, including football. 

Today most large football employ “sports analysts” to understand the potential performance of players, monitoring fatigue, risk of injury – the famed red zone (2), PhD theses have been written about the factors that can be used to monitor and predict performance (3)…

A whole industry has developed around monitoring the physical fitness of players.

For example STATSports is one of the leading purveyors of sensors(4). At the heart, they do GPS tracking of players, to which they have attached a heart rate monitor, so you can track the location, and easily – as long as your location is accurate – derive the speed acceleration. They also provide an app to visualise the data, a vest to hold the sensors in place(5)...

Another one of their offerings transmits the data to the cloud using ultra wide band and even had on-board memory to retain data in case the device is not able to connect to the cloud.

Amazing huh?

Would cost you a few hundred bucks.

Or you can build one yourself sourcing different components, some of the guys who worked with us last year on the cricket thing would recognise this. Of course industrial made ones are more compact, but it depends if you are willing to pay the price. For a professional club, it certainly seems to make sense.



What does that have to do with Bielsa, the chariots and the Devils?

Basically I am making the point that in professional sports today, using data is very common, and if you add to this the controls on diet and activities that go with the tracking (6), then you can see it can be quite intrusive for the player. 

The chariots come in when I mention professionalism.

One of the sub-stories in the movie is the fact that Abrahams hired a professional coach, whereas the Olympics were restricted to only amateurs at that point in time. (Some sports still follow this rule such as boxing). What is not said is also that the coach tried to get his athletes to take less than common pills including cocaine lozenges. (7)

What does professionalism mean in sports?

Does it mean doing as much as you can, within the limits of legality to get an advantage? We have seen complaints in many areas, such as the length of running blades in the Paralympics (8), to taking drugs which are usually only taken for a very short period of time (9), to banning female athletes (or worse demanding mutilation from such athletes such as “cutting out gonads and partially removing their clitorises”) if their testosterone levels are thought too high (10).

Bottomline, professional sports is serious business for individual sportspeople.

A quick look at Wikipedia will show why it is so serious(11), across sports, being part of the elite can mean access to amazing amounts of money, whether directly through earnings or via endorsements. What is also interesting is that the top earners come from individual as well as collective/team sports.

Rank
Name
Sport
Nation
Total
Salary/Winnings
Endorsements
1
Floyd Mayweather Jr.
Boxing
United States
$285 million
$275 million
$10 million
2
Lionel Messi
Association football
Argentina
$111 million
$84 million
$27 million
3
Cristiano Ronaldo
Association football
Portugal
$108 million
$61 million
$47 million
4
Conor McGregor
Mixed martial arts
Republic of Ireland
$99 million
$85 million
$14 million
5
Neymar
Association football
Brazil
$90 million
$73 million
$17 million
6
LeBron James
Basketball
United States
$85.5 million
$33.5 million
$52 million
7
Roger Federer
Tennis
Switzerland
$77.2 million
$12.2 million
$65 million
8
Stephen Curry
Basketball
United States
$76.9 million
$34.9 million
$42 million
9
Matt Ryan
American football
United States
$67.3 million
$62.3 million
$5 million
10
Matthew Stafford
American football
United States
$59.5 million
$57.5 million
$2 million

If you look at teams, from Forbes (12):
Rank
Team
Current Value ($bil)
2017 Rank
Sport
1
Dallas Cowboys
4,800
 1
 NFL
2
Manchester United
4,123
 3
 Soccer
3
Real Madrid
4,088
 5
 Soccer
4
Barcelona
4,064
 4
 Soccer
5
New York Yankees
4,000
 2
 MLB
6
New England Patriots
3,700
 6
 NFL
7
New York Knicks
3,600
 7
 NBA
8
Los Angeles Lakers
3,300
 9
 NBA
8
New York Giants
3,300
 8
 NFL
10
Golden State Warriors
3,100
 20
 NBA
10
Washington Redskins
3,100
 11
 NFL

Ok, you get the picture, professional sports is big business.

So again, why Bielsa? And what does that have to do with data?

Bielsa has been reminded by his team Leeds United, of “integrity and honesty”(13). What was his crime? Did he get people to attack the opponent’s bus (14)? Did any of his players get caught for doping (15)? Did he ask people to hack into the servers or break into the premises of opponents? No, there was no breach, all he did was that an employee of his was seen, in a public area, watching opponents train.

Note, the employee was watching from a public area.

My question is “so what?”



This is an ethical question; in the words of Mr Bielsa himself: “I understand (Derby manager) Frank Lampard is angry because he thinks I'm someone who is cheating… I understand he draws this conclusion. But I don't feel I cheated because my goal was not to get an illegal advantage… I can explain my behaviour but my intention is not to be understood or to justify it. I have to respect the norms in the country where I work.”

This is definitely a question of ethics. Mr Bielsa thinks he has done no wrong, and I support his view.

Mr Lampard thinks it was wrong for Mr Bielsa to ask someone to gather information about Mr Lampard’s team. Mr Lampard even went to the extent that he’d rather not coach than spy (16). May be, but it does seem hypocritical given that as a player, he benefited from such tactics; Andres Villas Boas said that this was part of his role for Chelsea, then managed by Jose Mourinho, with a certain Frank Lampard as midfielder (17).

Compare this with the reaction of Hoffenheim coach Nagelsman when a drone was found recording his team’s training, “I'm not really angry at the analyst doing his job. It's commendable if they're doing everything they can, trying to spy on the opposition” (18). The best part is that it can be said that a drone is invading private space, but the Leeds United staff was in public…

A few nice stories can be found where this practice is not uncommon (19).
 

Mr Bielsa has produced fantastic results for Leeds United this season; they currently lead in the Championship with 54 points with 27 games played (20) compared to last year when they ended up with 60 points for 46 games played.


If the management team at Leeds United thinks he is causing trouble, doesn’t get along, then please, let history repeat itself, unwanted personel at Leeds United have a way of finally reaching their true potential with the Red Devils (21).


To summarise
Professional sports is serious business ad should be treated as such. I do not see many ethical questions around gathering information that is available to the public. It is actually a legitimate way of gaining an advantage over the opponents. A lot of the reaction has been overreaction or hypocrisy; and if Leeds United does not appreciate the efforts of Mr Bielsa, then he should be the next member of the Leeds team to join the Red Devils and I can see him having the same impact as the one who made the same journey in November 1992. (22)




  1. https://www.imdb.com/title/tt1210166/
  2. https://www.theguardian.com/football/2014/dec/05/arsene-wenger-arsenal-alexis-sanchez
  3. https://bjsm.bmj.com/content/52/22/1473
  4. https://statsports.com/
  5. https://statsports.com/apex-athlete-series/
  6. https://www.skysports.com/football/news/11095/10502787/statsports-the-training-technology-used-by-arsenal-man-city-barcelona-and-more
  7. https://www.telegraph.co.uk/culture/tvandradio/9365703/Chariots-of-Fire-all-hail-the-real-Olympic-heroes.html
  8. https://www.bbc.co.uk/newsround/19473292
  9. https://www.dailytelegraph.com.au/sport/tennis/maria-sharapovas-excuse-for-a-failed-drug-test-treats-fans-as-dopes/news-story/43a6776f1f09cf43b8b810edd9c1b961
  10. http://thegatesofbabylon.blogspot.com/2016/08/olympics-some-thoughts-on-whether.html
  11. https://en.wikipedia.org/wiki/Forbes%27_list_of_the_world%27s_highest-paid_athletes#2018_list
  12. https://www.forbes.com/sites/forbespr/2018/07/18/forbes-releases-2018-list-of-the-worlds-most-valuable-sports-teams/#1dac46ea75ff
  13. https://www.bbc.com/sport/football/46850155
  14. https://www.businessinsider.com/liverpool-fc-fans-attack-manchester-city-bus-at-champions-league-game-2018-4
  15. https://www.telegraph.co.uk/sport/football/10424370/Flashback-Pep-Guardiolas-failed-drug-test-in-2001.html
  16. https://www.theguardian.com/football/2019/jan/11/leeds-derby-championship-match-report
  17. https://twitter.com/awinehouse1/status/1083864793495334913/photo/1
  18. https://twitter.com/honigstein/status/1083885208565399553?ref_src=twsrc%5Etfw
  19. https://www.planetfootball.com/quick-reads/10-times-teams-spied-on-their-opposition-leeds-utd-chelsea-man-utd/
  20. https://www.bbc.com/sport/football/championship/table
  21. https://www.dailymail.co.uk/sport/football/article-1065931/Leeds-sold-United-fall-boss-Wilkinson-reveals-Cantona.html
  22. https://www.sportsjoe.ie/football/man-united-eric-cantona-leeds-135832