Sunday, 2 September 2018

What is Value? Or how much should a "data science" project cost / "data scientist" be paid?


Singapore has been rocked by the Sing Health hack (1) and the fact that the government has been downplaying it saying the data was not valuable - no state Secret (2) - and the description by the ex-PM that people who do not make a million dollars annually are very very mediocre (3). My question is “so what?”(4)

What is the thing called value? What is the value of the data? Is less than a million dollar salary very very mediocre?

In this blog, I’ll take a stab at “value”.

SingHealth

First let’s take the case of SingHealth. You must understand that Singapore takes defence and security very seriously, the concept of total defence (5) includes cybersecurity, and some people were amused that the CEO of SingHealth is the wife of the minister for defence (6). But that’s not the point.
So what is the value of the data? How do you measure the value of data? To me this is a very subjective area. However, I assume everyone will agree that accuracy of data is important, even (or especially for) AI/ML.

How accurate is SingHealth data?

There was another story recently (7), SingHealth actually tagged someone as HIV positive whereas she was not. So my question is how many such mistakes are there in the data? Saying someone is HIV positive when the person is not is a huge mistake. In fact, the article states that, since the ‘victim’ was pregnant, the husband talked about divorce, abortion...

I am not slagging SingHealth specifically, but value of data is tightly tied to its accuracy, and there are some doubts over the accuracy of SIngHealth data, after all how many of us have looked into what is in our files at the doctor/hospital/clinic (especially those of us who are not medically trained)?
However, the government stated that only basic data such as name, NRIC number, age, gender was compromised.  But the NRIC is the key to most (if not all) databases in Singapore. In fact some shops want to use the NRIC as their loyalty card, and I refuse to give that up. With this detail, someone can apply for a loan and make you responsible for it.

SingHealth is aware of the issue and has tried immediate mitigating factors; that kind of shows the depth of the problems (8)(9).

So what is the value of this data?

To me, there are a couple of other criteria that determines the value of the data: the use it will be put to and the skills of the person. So it depends who you are talking about. And this brings us to very very mediocre people.

The ex-PM basically called people who do not make SGD1m a year as being very very mediocre, and needless to say it created some noise in cyberspace, but was it warranted? Many Singapore ministers come from 3 backgrounds: Lawyers, Doctors, and Army officers.



The above charts from payscale.com (10) show the distribution of yearly salaries for these job roles in Singapore. Note that the army numbers are highly skewed since they include people undergoing national service included, and the data has some quality issues (salary of $72 for doctor), but the median is reasonable.

Basically, it is quite clear that people who make $1m or more are definitely above the top 10% in salary for their domain.

The ex-PM also argued (3) that ministers salaries need to compensate the ministers for the salaries they are giving up; and given that Singapore ministers salaries are quite high (11); hence it does make sense that if you aim to compensate people for the salaries they are giving up, to only look at people making S$1m yearly.

I am specifically avoiding questions/discussions around whether this is a good way to remunerate people serving the public. But if the aim is to be able to entice people at the top of their domains as measured by salary drawn, and compensate them similarly, plus given that the ministers’ salaries are close to 7 figures, saying below S$1m is not what they would be looking at, hence “very very mediocre” is acceptable in this context. (You wouldn’t consider an 18 year young man doing his national service as a ministerial candidate for example, hence the ‘segment’ you go for is not the whole spectrum but a small segment at the right of the distribution.)

Surprised that I think so?

I believe that people should be paid based on the value they generate.

If the salary that the minister-to-be is a reflection of his/her value in his/her domain, and if that value can be transferred to how much he/she contributes as a minister, then it is perfectly alright that the salary they receive is similar.

But the real question is how do we measure the value that someone generates?

That is precisely the great thing about being in the Analytics/”Data Science” space. The value you bring in a project can and should be easily measured.

When I took up my first contract more than a decade ago, the business sponsor for me had a choice between employing a new salesperson, or spending the money of an analytics guy, so my contract had targets just like sales people (but no variable income unfortunately). Hence the value I brought to the organisation – indeed that of my fellow analytics guys – was tracked and measured, methods of measurement, metrics all discussed, agreed.

These were extremely exciting times for me. In fact at the time I resigned less than 6 months into a new contract, I told my colleagues to tell my replacement that he/she could rest easy if he/she was paid around the same as I was since I had already justified my existence for the year and therefore his/hers.

This is why, whenever there is a “data science”/analytics project, I insist on having metrics that reflect the impact of the piece of work on the organisation it is being done for, whether it is in terms of savings (say for churn can be decline in number of churners based on past trends, or even monetised by spend – although that comprises of an extra dimension and gives more room to play) or revenue increase or market share increase, whatever is the KPI of the project sponsor; measurable and measured.

When I started in analytics more than a decade ago, we had to prove ourselves to sceptical business, hence we ‘manually’ tracked our impact to justify our existence and gain trust. I spent almost 4 years in that organisation, and we eventually set-up proper campaign tracking. Imagine my shock when I went back and found out that the organisation had stopped tracking campaigns. They did campaigns simply because they had the budget, and “use it or lose it”, without caring whether there we better ways of “using it”.

Some people may like this environment where you get to experiment without risk; but how would you know if the risk paid off, how good your ideas/hypotheses/skills are, if you do not measure the 
outcome? How do you know the value you bring to an organisation? How would you know your value?

Value is not a measurement of input, but of output.

Once you have an idea of how much you will be able to contribute to the organisation, then you can apply RoI/break-even rules and determine how much it would be acceptable for you to charge, thereby delivering a win win situation.

The value of a project is a proportion of the value of the benefits the project generates for the client and that proportion is usually based on the typical RoI or Break-even period for projects the client undertakes.

This means that the same effort may generate less value to an SME than an MNC (in $ terms), hence the value you’d bring to an SME is lower than that to an MNC.

I recently had this discussion with someone who works closely with and helps bring innovation to SMEs. I think SMEs have similar problems to MNCs, albeit at a lower scale. While it is true that SMEs are less likely to have a full set of data to start work on, the analytical methods of solving the problems are similar. Furthermore, analytics is not as expensive as many people think it is.

I think SMEs have an advantage over MNCs, they are more flexible. Hence arrangements where the client pays a low base fee and a proportion of the value generated by a project/analytical piece of work can be done with SMEs whereas MNCs may not have that flexibility (neither would large consultancies who would have to account for revenue recognition risk and so on).

Basically to me it is very simple, tie what you are paid to the value you deliver to your customer, this is a very simple way of having win-win situations. And it all starts by knowing the value you bring which is based on measuring the impact of your work.

Similarly, as someone considering paying for the services of a “data scientist” or “data science team”, you should base the payment on the expected returns from the services received, and that starts by looking at the impact delivered in the past.

P.S. As I wrote this blog, a new ruling from the PDPC Personal Data Protection Commission recognises the value of the NRIC, and therefore restricting unwarranted use of it. (12)



  1. https://www.businesstimes.com.sg/government-economy/singhealth-hacked-records-of-15m-patients-including-pm-lee-hsien-loong-stolen
  2. https://www.straitstimes.com/singapore/singhealth-cyber-attack-pm-lee-says-nothing-alarming-in-his-data-that-was-stolen-no-dark
  3. https://sg.news.yahoo.com/ministers-not-paid-enough-says-goh-chok-tong-reports-043024792.html
  4. https://www.youtube.com/watch?v=FJfFZqTlWrQ
  5. https://www.scdf.gov.sg/home/community-volunteers/community-preparedness/total-defence
  6. https://www.theonlinecitizen.com/2018/07/26/singhealth-hack-exposing-the-cracks-of-elitism-and-entitlement/
  7. https://www.straitstimes.com/singapore/health/singhealth-apologises-after-polyclinic-doctor-mistakenly-marks-woman-as-hiv
  8. https://twitter.com/mrbrown/status/1021332747204218880
  9. https://twitter.com/mrbrown/status/1020223366341394433
  10. https://www.payscale.com/research/SG/Job=Attorney_%2F_Lawyer/Salary , https://www.payscale.com/research/SG/Job=Physician_%2F_Doctor%2C_Cardiologist/Salary/67bbbe3e/Singapore , https://www.payscale.com/research/SG/Job=Army_Officer/Salary
  11. https://en.wikipedia.org/wiki/Cabinet_of_Singapore
  12. https://www.straitstimes.com/singapore/stricter-rules-to-protect-nric-data-from-next-sept