Wednesday, 29 June 2016

Brexit, Google (car (self-driving)) and Machine Learning





Conscious choice by voters in Great Britain has led to ‘Brexit’ a non-binding result that pushes Great Britain closer to exiting the European Union.

What makes the result thought provoking for me is captured in this chart from the guardian to which I have added an apparent trend line:


 
Basically, areas where the median age is higher tend to have voted for ‘leave’, areas with lower median age tend to have voted for ‘remain’; older people preferred ‘leave’ whereas younger preferred ‘remain’. What this means is people who have less time to benefit from the choice they have made have overwhelmed the choice of those who have more time to suffer from that choice. Is this fair?

One way to make things less seemingly unfair would be to weigh votes by the expected amount of time people who get the consequences of their votes. According to the World Bank (http://data.worldbank.org/indicator/SP.DYN.LE00.IN), the life expectancy in the UK is 81, so someone aged 60 would have a weight of 21 (1 for every year he/she is expected to live) and someone aged 20 would get a weight of 61. Of course, if the consequences of the choice are only felt for say 30 years then the younger person’s weight is curtailed at 30.

Sounds reasonable may be?

But the thing is, right from the start, everyone knew each vote has equal weight. (Sounds fair too, doesn’t it?). Since everyone went into the game (referendum) knowing the rules exactly, there is no point ranting and raving.

So why am I?

That has to do with self-driving cars and what people want. A recent survey aimed at understanding the rules that should be built into self-driving cars. (https://www.theguardian.com/technology/2016/jun/23/self-driving-car-safety-study-pedestrian-crashes)

One of the questions posed was, should a self-driving car crash itself (risking injury to its occupant) in order to prevent a collision with pedestrians? Most people agreed, however, they would choose not to travel in a vehicle with this rule.

If you are driving the vehicle, the choice is yours. If you are in a self-driven vehicle, the choice is not yours, only the consequences.

The key is to understand that this choice, and many others, will have to be built into the self-driving cars. The providers of the self-driving car may print a huge list of rules built in, but chances are it would be like today’s rules on services which most people acknowledge without reading, very long and in very fine print. 

Different providers could highlight different rules, Volvo is known to protect its passengers, but doesn’t do so at the expense of others, so Volvo drivers are not seen differently from drivers of other cars.

What I am getting at is self-driving cars will force you to be subject to choices/rules you are unlikely to be aware of.

So what does that have to do with google, apart from them being one of the contenders for the self-driving car?

It has to do with this being a conscious choice by google across the board, choices will be made for the human, probably all in order to make your experience easier, choose what someone or something thinks you might need/want, why should you make the effort of choosing? Google is heavily investing in this technology (https://backchannel.com/how-google-is-remaking-itself-as-a-machine-learning-first-company-ada63defcb70#.t47q4x9ie).

CEO Sundar Pichai set the direction: “Machine learning is a core, transformative way by which we’re rethinking how we’re doing everything. We are thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. And we’re in early days, but you will see us — in a systematic way — apply machine learning in all these areas.”

Basically instead of a series of explicit rules (that could potentially be listed and ‘voted’ upon like Brexit), the behaviour of the self-driving cars (and more) would be driven by machine learning, not human determined rules, and these would not be visible.

And it’s not that the choice of not taking a self-driving car will exempt you from the choices made by the machine, unless all reality become virtual or you become a hermit, you will still have to be a pedestrian at least.

Whether we realize it or not, we have a choice to make, leave it for too long, the matter may be out of our hands.

I believe machine learning is a tool, just like many other tools people in my line have at our disposal to solve problems, but I wouldn’t want to put blind faith in any single tool and I put a lot of value in conscious decisions. 

After all, whether you are happy or not with the Brexit referendum results, at least the people who had a right to vote had a chance to, and the rules were clear. You can rant and rave at older people (http://www.vox.com/2016/6/24/12023544/brexit-uk-young-voters), try to get enough people to change their minds for a second referendum (http://www.reuters.com/article/us-britain-eu-labour-referendum-idUSKCN0ZE2L4), but you wouldn’t be able to do this if the rules were hidden and a machine pulling the strings based on what it learnt.



Tuesday, 7 June 2016

“Data Science” process: Taking straight lines might solve a puzzle but not get you out of the box



“Data Science” process: Taking straight lines might solve a puzzle but not get you out of the box





Microsoft recently published a paper outlining the “data science process” they are proposing for Azure: “A linear method for non-linear work”.


The diagram below is straight from that article:





From what I saw of Azure at Strata, it looks like they have nicely integrated Revolution Analytics and have a nice comprehensive offering, a nice toolkit. However, how you use the tools is important in determining the outcome.

Despite its title, Microsoft’s proposed “data science process” is not totally linear, it has 2 loops: one between identifying data sources and exploring them (this allows adding more data sources as necessary prior to analysis) and the second between machine learning and the analytics data set (which allows the machine to optimize given the data set and presumably to choose between multiple machine learning algorithms).

My question is: where is ‘human learning’ in all this? It seems that the process is devoid of human participation (except at the penultimate step, and may be at the very beginning).

I think that the answer lies in this diagram on the skill-sets of a data scientist.



Microsoft uses machine learning; it is a combination of hacking/computer science skills and maths/stats skills. There is no place for domain knowledge.

If “data science” is seen as a science - after all computer science and maths/stats are scientific- then it can be seen as being unequivocal. Therefore, making the process linear is a natural extension of that view.

However, I believe that the role I play is not a pure ‘science’. Sure it has rigour brought about by understanding the algorithms used and knowing which to apply in different  circumstances and what care needs to be taken to ensure the algorithm chosen is applicable is ‘scientific’. But the real life context, the application, the implementation, the story-telling are not.

A simple illustration of the process I kind of follow is:


 
This process has multiple loops and a lot of interaction/communication with clients and SMEs. I do not mention the class of algorithms used since I believe in ‘horses for courses’, to use the algorithm/combination of algorithms that suit the issue rather than the other way round.

What I am trying to say is that, your approach, processes, and may be even tool set you adopt to solve an issue using “data Science” will vary based on how you define “data science”.

If you see “data science” as ‘throw the data into the machine and get the answers out of it’ then a linear approach would more likely be the one you prefer.

But to me a linear approach without the input of domain knowledge is unlikely to have the best results during implementation. 

The process I follow may be loopy, but, from experience, it is flexible and delivers more than adequate results.