Wednesday 25 April 2018

Yes, facebook has taken liberties with the data they collect about you, but how safe is your DNA?




A while ago, I wrote about a new insurance product launched in Singapore that required you to submit your DNA as part of the deal – you got ‘personalised’ advice in exchange. The ad ridiculously showed two identical-looking twins receiving different advice (since identical twins share the same DNA...). (1) In that blog-post, I mentioned that the insurance company was at pains to stress that they had no access to the DNA, but I raised the prospect of someone buying that company collecting the DNA and not being bound by the same rules. And unfortunately this prospect is very real.

Let’s take a step back, am I talking about DNA or facebook?

These few weeks have been exciting for people interested in data and “Big Data”, since the extent of the data collected by Cambridge Analytica via facebook, very often without the subjects being aware (2). I have been going on about the need for us to own our data but this really takes the cake; you were not only giving away your data but that of your connections too (53 Australians took the test – and possibly gained something – but the data of 311,127 was harvested. Similarly 10 New Zealanders did so, and data from 63,724 as harvested. I am not saying there were national boundaries, but these numbers give an idea of the pandemic).

Ok, so people’s surfing habits, likes comments, photos they posted in public were accessed and used, but what use can be made of this data? As the time magazine article (1) mentioned, one use was for Mr Trump’s presidential campaign. And as this article shows, the efforts started in 2014 (4), and were very effective as confirmed by Mr Trump himself (5):
But they had this expression ‘drain the swamp.’ And I hated it, I thought it was so hokey. I said, ‘that is the hokiest, give me a break, I am embarrassed to say it.’ And I was in Florida where 25,000 people were going wild, and I said, ‘and we will drain the swamp’ — the place went crazy. I couldn’t believe it. And then the next speech I said it again and they went even crazier. ‘We will drain the swamp… we will drain the swamp,’ and every time I said it I got the biggest applause

So we can at least say that the data facebook ‘allowed’ Cambridge Analytica to harvest from the subjects was, at least, ‘useful’.

So what does that have to do with DNA?

Basically if you think that someone getting their hands on your surfing history and using it for their own purposes without your consent is bad, what if they get their hands on your DNA?

The organisation that holds the DNA for myDNA from Prudential is Prenetics Limited (7). Recently I read that Alibaba and Ping An insurance are the major investors in Prenetics (8). On one hand, I find it amusing that Ping An possibly have access to data that Prudential help collect. On the other I find it scary that the data of these people (of course I did not purchase myDNA) is now in the hands of another insurer.

Anyway, Prenetics claims that the DNA of over 200,000 people across South East Asia, China and Hong Kong were in their hands as early as October 2017 (9).

But, I am sure some nice people will say, there is a legitimate reason to do research into DNA; hospitals and universities have been doing so to the benefit of mankind for years. Yes, but I would argue that the CEOs of hospitals and universities have different experiences as compared to the CEO of Prenetics (Mr Danny Yeung) and that may affect how the data is being used:
Prenetics started out as ‘Multigene’ in 2009 when it span out from Hong Kong’s City University. Yeung joined the firm as CEO in 2014, after leaving Groupon following its acquisition of his Hong Kong startup uBuyiBuy, and it has been in startup mode since then. Prenetics has raised over $52 million from investors which, aside from Alibaba, include 500 Startups, Venturra Capital and Chinese insurance giant Ping An.”

This, I will admit, is pure speculation on my part. For all I know, Prenetics really wants to help mankind and bless everyone whose DNA they hold with better health and lower health care costs (prevention rather than cure). But I have other reasons to be sceptical.

Basically, even if humans ‘decoded’ the whole DNA sequence (which hasn’t been achieved yet (10)), even if you have inherited a predisposition to a condition, nobody can tell where you will actually get affected by it:
Genetic testing can provide only limited information about an inherited condition. The test often can't determine if a person will show symptoms of a disorder, how severe the symptoms will be, or whether the disorder will progress over time.” (11)

And to make things more interesting, the pieces of the genetic code that have not been sequences were considered useless or too hard to analyse given technological limitations, but are now being re-evaluated. Does that sound familiar? For people in the “Big Data” space (especially proponents of the “Data Lake”), it should.

One of the arguments of the “Data Lake” is that we do not know what data can be useful; even if we cannot extract is and use it now, we might as well keep it since it might be useful.
When I first started in this line of work, the kind of conversations I would have would be along these lines:
Q: “What data do you need?”
A: “Just give me what you have and I’ll analyse”
Q: “That is impossible, tell me what data do you need?”
A: “Ok, can I have the list of pieces of data that you have?”
Q: “That is impossible, tell me what you want and I will see if I have it...” ad nauseam

Now technology and acceptance of the usefulness of data have advanced and it is possible to “keep all the data” in a “Data Lake” or “Data Swamp” as some friends call it (Drain it! Drain it! Sorry I got caught for a moment).

Pieces of data that we would have had trouble analysing a few years ago such as weblogs, or pictures, or voice recordings can now be analysed relatively easily. But these pieces of data were routinely considered to be useless.

It is the same thing with DNA data. And to make it worse, there is the link between being at risk of some condition as per your DNA profile and actually getting that condition.

Basically, there is way too much data that would be needed to transform this ‘risk’ into something that can be measured with ‘enough accuracy’. That is what insurance companies try to do when they ask questions about your lifestyle, smoking, drinking... but these are very crude.

So is it fair that you could be penalised because of a feature of your DNA make-up? Are we slaves of our DNA?

What I am getting at is not the importance of DNA data, but rather at the care that must be taken when conclusions are made, and people penalised for things they may not be aware of.

To make things more fun, not only is Prenetics in China, Hong Kong and South East Asia, but it has recently acquired DNAFit (12). This impacts Prenetics in 2 ways. Firstly geographically, DNAFit’s market presence is mainly in Europe and is expanding to the USA. Secondly DNAFit goes direct to the consumer whereas Prenetics tended to reach the consumer via Insurance or Medical companies. (In fact even Linkedin is one of DNAFit’s customers).

The impact of direct-to-consumer DNAkits is debatable (13), but “a little learning is a dangerous thing” (14), add to this the emotional weight of ‘learning’ not necessary pleasant things about your own self...

So what I am saying is:
  1. As individuals we should have control over the data we produce by living (web/call/messaging behaviour, surveillance footage...
  2. But we should also have control over data we produce by existing (DNA).

I think there are many gaps between the general public (who have no issues with being facebook’s product in exchange for a quiz (15)) and those who have some idea of what can be done with such data; the same for DNA. And it is critical for people to be educated or educate themselves on this. As long as there is such an asymmetry of information, together with major issues with how people/machines use the data (people/machines, not technology or data itself), the cost of exploitation can be very high.

I would like to end this post with the poem by Alexander Pope (14):

A little learning is a dangerous thing ;
Drink deep, or taste not the Pierian spring :
There shallow draughts intoxicate the brain,
And drinking largely sobers us again.
Fired at first sight with what the Muse imparts,
In fearless youth we tempt the heights of Arts ;
While from the bounded level of our mind
Short views we take, nor see the lengths behind,
But, more advanced, behold with strange surprise
New distant scenes of endless science rise !
So pleased at first the towering Alps we try,
Mount o’er the vales, and seem to tread the sky ;
The eternal snows appear already past,
And the first clouds and mountains seem the last ;
But those attained, we tremble to survey
The growing labours of the lengthened way ;
The increasing prospect tires our wandering eyes,
Hills peep o’er hills, and Alps on Alps arise !


7 https://www.prudential.com.sg/en/prumydna/mydnapromotnc/ see point g: ““myDNA report” means the personalised report that Eligible Customers receive from Prenetics Limited”

No comments:

Post a Comment