Thursday 15 December 2016

Why returns on “data science” have been very low and what you can do about it (part 1 – unreasonably LOW expectations)



I have had a fair bit of time to read-up, play with some of the ‘new’ technology, talk to people from the IT, Business, and “Data Science” field in the last few months, and have decided to type my thoughts. I have arranged them in 3 broad lines of themes and will be sharing them from now til the end of the year. 

They address why “data science” hasn’t brought the expected returns, and what can organisations do about it. 

My theme for today is: unreasonably low expectations.



People who call themselves “data scientists” come with a dizzying array of skills. For a “data science project” to be successful, there needs to be collaboration among different skill sets, and one large part of clearing the path to returns is assembling these key skills, which are likely to reside in two or more people working truly collaboratively. 

“Data Scientists” are supposed to have the sexiest jobs of the 21st century [1], and many “data scientists” work hard to substantiate this claim, cultivating the mystique around the role. 
Many organisations are under the spell of the “data scientist” and expect that one specialist (or a group of similar specialists) to have all the answers and follow that specialist in setting expectations, which are most often much lower than what could really occur; it’s a game with asymmetric information

The business wants high returns but doesn’t know what is achievable, the “data scientist” wants a reasonably low promised return but can control, to some degree, how large these returns are. And usually in such games, the one with less information end up getting sub-optimal results – the one holding the aces wins, irrespective of whose deck of cards it is.

(Furthermore, the interests of the “data scientist” and that of the business may not be aligned, the business focusing on the results and the “data scientist” on the process. And in this case too, the “data scientist” holds the aces; the principal-agent problem; but that’s a topic for another day.)

Not so long ago, I attended a best practice sharing session when a colleague from Europe shared his success, a model/algorithm whose lift was 1.2. This means the “data scientist” (and presumably the client) were satisfied with being only 20% better than randomly picking customers from the database (worse still that was a theoretical lift, not the result of tracking an experiment). And in case you were wondering this was not for high margin high ticket products, it was plan upgrade in telco. 

In a previous role my manager was about to accept the work of a team of consultants whose performance was 1.28x the old model in place. Clients trust “data scientists”, it is up to us not to take advantage of that trust.

I have a rule of thumb, any model/algorithm I build has to have a lift of 2 in the top group. Of course I won’t take on any project; if i believe there is little prospect of me creating a model/algorithm that will generate 2x better results than what my client has, I will work with the client to find a situation where he/she will get proper Returns of his/her Investment (RoI); this may take a lot of discussions but is worth the effort even if it delays the commencement of what is usually considered as the core of a “data science” project.

As you probably can tell I wasn’t always very popular with my bosses; but to me there has to be a commitment to a certain quality, and RoI in any “data science” project.

The fact that “data science” is in general not delivering the returns it can has caused many people to think how this can be remedied. HBR came up with 2 articles recently with 2 possible solutions.
“Better questions to ask your data scientist”[2] posits there is a communications gap between the “data scientist” and the “business”. I agree. Referring to the Drew Conway’s “data scientist Venn diagram”[3]:

Most people who are hired as “data scientists” are lacking in domain knowledge. The HBR article attempts to bridge that gap by advising the “business” the type of questions they should be asking the “data scientist”, making the “business” speak the language of the “data scientist”. 

I think that’s absolutely the wrong approach; at best it is a stop gap measure, but it will not generate the RoI that the proper application of “data science” can generate. It’s like expecting a patient to be speaking in medical terms when at a doctor’s. To me, the doctor should be the one who speaks in the language of the patient. 

Would you trust your health to some medical lingo you picked up in an article, or would you prefer a doctor who can speak in your language? 

Then, why would do that to your business by trying to pick up “data science” lingo?

Another approach, again as suggested by HBR in “why you are not getting value from your data science” [4], is to do a different kind of “data science”, make is simpler. Again, in principle, I agree. This, it is argued, is the way to make “data science” faster, to be able to keep up with business needs, and hence increase the RoI.

One of the most stunning descriptions of the article is “machine learning experts (data scientists focused on training and testing predictive models) “. This illustrates totally what “data scientists” are seen as even in the corporate/commercial world. To me, if a data scientists isn’t engaging in predictions, what is he/she doing, looking good in a lab coat? It’s no wonder you don’t get good RoI from his/her.

The article goes on to state that the solution is to use simple models, on samples of the data rather than on the whole lot, and potentially disregard the technological advances such as parallel processing (Hadoop, Rev-R, Aster...). To me if you follow these recommendations, you are simply going backwards in time. 

Parallel processing (and it’s ugly cousin in-memory processing), cheap storage, the development of tools that allow you to relatively easily run a whole suite of algorithms on ‘all the data’ and get results quickly allowing rapid iterations are the key drivers in “data science”. It would be doing a disservice to one’s organisation to ignore that. 

There is a balance to be struck between depth of analysis and number of iterations on one hand, and the speed to market on the other. Taking advantage of the key drivers in “data science” skews the old equation, allowing more to be done in the same amount of time, or the same level of analysis in a much shorter time.

In the past, once I submitted my programs, I would have the time to go out for a cup of kopi before the results were ready. Now I barely have the time to go to the toilet.

That should be a good thing; it potentially enables the organisations to quickly execute experiments on the basis of the output from a “data scientist”, and that is an integral part of deriving RoI out of “data science”.

To summarise:

  • Low RoI out of “data science” is a reality most organisations are experiencing.
  • One of the reasons for this is unreasonably low expectations from the business, together with asymmetry of information, and to some degree a potential principal agent issue
  • Good collaboration between the business who is supposed to drive the RoI and the “data scientist” who is supposed to enable that is one of the easiest ways to resolve this issue.
  •  In this case it is very important that the 2 parties speak a common language, and it is preferable for the “data scientist” to have enough domain expertise to speak the language of the business (this will also help in doing better “data science” but again, that’s a topic for another day)
  • Obviously, since “data science” is driven by technology, any “data science” effort should take advantage of technology. 
  •  This should speed up experiments and enable organisations to take full advantage of “data science” to generate proper RoI.
  • However, the organisation itself has to be geared to do that. But in order to be geared in such a way, it is important to be able to measure the impact of “data science” and this will be the topic of my next blog.



 

No comments:

Post a Comment