Sunday 4 June 2017

The death of the “Data Scientist” or the return of the programmer




The “Data Scientist” is dead; long live the programmer! The world will soon not need “data scientists”, but instead programmers will rule the world.

This brutal assessment comes from the World Economic Forum (WEF), and I must say that their arguments make a lot of sense.

In this age of fake news, credentials are important; so what is the WEF? Founded in 1971, it is committed to improving the state of the world, is the International Organization for Public-Private Cooperation (1). Its board of trustees includes world leaders such as Jim Yong Kim – president of the world bank, Christine Lagarde – Managing Director of the IMF, but also Jack Ma – founder of Alibaba Group, Mukesh Ambani – Chairman of Reliance Group among others. (2)

Furthermore, the “centre for the fourth industrial revolution” is one of their major undertakings (3), and one of the focus areas is artificial intelligence and machine learning (4).

So this is not fake news, neither is it flimsy covfefe gossamer; sad!

In an article entitled “You’ve heard about it, but do you understand?  Everything you need to know about Machine Learning”(5), the WEF gives a very easy to understand convincing description of what machine learning is, how it can be used, and what is the way forward.

The article covers the history of Machine Learning, broad classes of machine learning (supervised – the machine is told what the different groups are, unsupervised – the machine is left alone to find out the different groups, and reinforcement machine learning – after finding out the groups it is told whether it is right or wrong this feeds back into the learning cycle.

In illustrating the concept of reinforcement ML, the WEF shows a clip where a machine learns to play Atari Breakout – where a horizontally mobile paddle at the bottom of the screen is controlled and the aim is to reflect a ball back to destroy an obstacle made of bricks at the top of the screen. The results are amazing (6).

But even more astounding is the commentary:
“The most important thing to know is that all the agent is given is sensory input (what you see on the screen) and it was ordered to maximise the score on the screen”
“No domain knowledge is involved! This means that the algorithm doesn’t know the concept of a ball or what the controls exactly do”

After 10 minutes the machine is clumsy. After 120 it is playing like and expert. After 240, it has developed its own strategy!


Isn’t this amazing?

So you will ask, what does that have to do with the death of the “data scientist”?

Well, if you, like me, think that Drew Conway was on to something when he came up with the basic “Data Scientist” skill diagram (8):

 

In this diagram, data science requires substantive expertise, or “domain knowledge”.

What the WEF agreeing with is that Machine Learning does not require “domain knowledge”. Furthermore, there is no mention of the “data scientist” in the WEF view point, but rather the article highlights “the programmer”:  ML “needs a programmer to tell it what to do when it is fed with data.” Not a “data scientist”, a “programmer”.

But does that mean the death of the “data scientist”?

Yes it does.

Think about it, if an problem can be solved without “domain knowledge”, then ML is all you need. 

And what does ML need? Not pompous “data scientist”, but the humble “programmer’ who had been cast aside and finally makes his/her triumphant return.

There is a wave washing away the veneer off the “data scientist” and restoring that of the programmer. An example argues that coding is not fun, but it is technically and ethically complex (9). 
What I find most interesting is the acknowledgement of the complexity of coding (I am sure most people would agree that it takes serious skill to develop industrial grade code), but also the introduction of ethics into the mix.

To quote from the article “As well as being highly analytical and creative, software developers need almost superhuman focus to manage the complexity of their tasks. Manic attention to detail is a must; slovenliness is verboten. Attaining this level of concentration requires a state of mind called being ‘in the flow,’ a quasi-symbiotic relationship between human and machine that improves performance and motivation.”

To push the idea further, the latest triumph of machine over man was Alpha-go. Alpha-go was built by the same organisation that developed the machine playing Atari Breakout above. And the same principle was used: “no domain knowledge”, but lots of Monte Carlo Simulation. (basically the machine learns which move is best by, from the current position, randomly trying different plays and evaluating the end result, then assigning a value to that play and picking the one that gives higher change of winning/lower chance of losing, and repeats the process at every turn)

No domain knowledge

No data scientist

One of the greatest triumphs of machine over man was enabled by programmers, not data scientists.(9)


The “data scientist” is dead, long live the programmer!
 

 Further readings/references:

  1. https://www.weforum.org/about/world-economic-forum
  2. https://www.weforum.org/about/leadership-and-governance
  3. https://www.weforum.org/center-for-the-fourth-industrial-revolution/
  4. https://www.weforum.org/center-for-the-fourth-industrial-revolution/areas-of-focus
  5. https://www.weforum.org/agenda/2017/05/what-is-machine-learning
  6. https://www.youtube.com/watch?v=V1eYniJ0Rnk
  7. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  8. https://qz.com/987170/coding-is-not-fun-its-technically-and-ethically-complex/
  9. https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/