Sunday, 15 August 2021

Treat data with respect

Analytics, data science, smart nation, these are terms that are very hyped up in Singapore. Still, I feel, data is not treated with the respect it deserves in Singapore. And as long as this happens, the usage of data and the applicability of analytics/data science and the usefulness of smart nation beyond a system of surveillance will be unrealised.

This may sound a bit overblown, but let me present some facts, starting with the minister for health suggesting that the data being collected for the purpose of contact tracing and managing covid19 is “not comprehensive enough”. Think about that. As I argued recently (1), I believe that the technological solution proposed by the Govtech (Tracetogether and SafeEntry) is one of the better ones around. But where Singapore is falling short is the other success factors for an analytics/data science project, and respect for data is one of them.

1 In Singapore, blue collar, people with “dubious behaviour”, foreigners get Covid

If you have been reading the news in Singapore, you would have realised that we are having an increase in Covid cases in the last few weeks, and that resulted in re-tightening of rules. But what is most interesting is the identity of the people at whom fingers are being pointed

-        A Vietnamese lady who came to Singapore via familial ties and likely worked at a KTV (2)

-        A stall assistant at a satay stall at a hawker centre who patronised a KTV (3)

-        A canteen worker (4) or a cleaner at a school (5)

-        An anonymous Indonesian fisherman (6)

How many people actually think that Covid specifically targets foreigners/blue collar workers? I don’t think so. You may argue that the role you play and the number of people you interact with has some impact, and I agree. However, what I am highlighting is that the press (The Straits Times is not the only one, I just chose them because they might take umbrage if I did not highlight their editorial integrity) gives juicy details of selected cases. These make the people relatively easily identifiable.

No respect for the data of some individuals.

Why would you want to allow your data to be captured when you cannot trust the data to be kept confidential and used only for specific purposes?

 

2 Starhub data leak

This was just announced (7), almost a month after the leak was noticed. Starhub however, is not the only Telco… Starhub and Singtel leaked information last year too – 2020 (8) Singtel also had a leak earlier this year -2021 (9), also 2017 (10), M1 seven years ago (11). And I am just looking a telcos; telcos are probably holders of most behavioural data that you generate. You likely carry your mobile phone wherever you go right?

Yet, it would seem that Telcos can’t protect identifiable data properly – the recent leaks specifically included NRIC of customers. In Singapore, the NRIC is your key to many many things.

You can easily guess that nothing much was done to stem the tide since all it takes, like the Starhub CEO did, is an apology, and possibly a small fine.

I am not saying that people will not hack or misuse the data. But we are now in 2021, not segregating your data and anonymising your data so that it cannot be used to identify people is not rocket science. It can be done relatively easily, from all sorts of data types. There simply is no will to do so.

Afterall, why would you spend some money to fix a leak, where all you need to do is wheel someone in front of the media, say sorry, pay a paltry fine, and get on with life?

In the meantime people whose data has been leaked, at the very best, get blasted with random offers despite being on the national DNC registry; and again there is no visible action after complaints including to the police.

No respect for data of individuals.

 

3 But the government leaks data too (12)(13) and so do hospitals, and health establishments

I group the government and hospitals together because these are areas where members of the public reveal the most detailed information voluntarily.

The elephant in the room is the SingHealth data leak, and the hippo is the AIDS database leak I wrote about earlier (14). Data privacy has been a pet peeve for me for a while, but now I can see clearly how it is negatively impacting the potential of analytics/data science on a large scale.

A small recap; the SingHealth data leak involved the records of 1.5 million people, including the PM. And do you know what was the outcome? An apology and less than $1 fine per record exposed (15) (16).

Lack of respect for data.

I could go on, but you get the idea. And this is a problem because

  1. people’s privacy is being invaded, in fact there is a culture of exposing people as seen for covid
  2. penalties for data breaches are smaller than peanuts in many cases, so there is no real incentive to do better
  3. when people do not trust the systems to hold and protect their data, then they will contribute less data or more selectively, this biases the data captured, and models/algos go cranky. And to me, it is that third point that may have the most disastrous consequences.



 


So what?

AI is receiving a great push from the Singapore government, there even is a national AI strategy (17). But getting rid of bias in AI is a branch that is still being explored (18)(19), but you first need to recognise and search for the bias.

To me, if a system does not respect data, then they are not concerned about the bias that may come in with the data collected and the resulting bias in implementation. When AI is applied on a large scale, it will certainly impact most of our lives. If AI is biased, the consequences can be catastrophic for some individuals.

 

PS:

As I was writing this blog, GovTech announced an enhancement to the app. It now has a function that prevents people from using past screenshots (20). Imagine that, people trust the way their data is treated so much that they spend the effort to take screenshots, store them, and use them at a later date. And in response, GovTech spent resources coming up with this feature… Not only is data collected biased, but extra effort is spent trying to combat a phenomenon that would likely be largely disappear if people’s data was treated with the respect it deserved, increasing trust.


PPS:

As further proof, people have actually been trying to scam the system... (21)


  1. https://www.linkedin.com/posts/kailashpurang_using-data-to-control-covid19-sg-how-to-activity-6825233731436072960-O_rs
  2. https://www.straitstimes.com/singapore/vietnamese-woman-who-is-first-case-of-ktv-cluster-came-here-in-feb-via-familial-ties-lane
  3. https://www.straitstimes.com/singapore/health/toa-payoh-hawker-centre-closed-for-deep-cleaning-after-stall-assistant-who
  4. https://www.straitstimes.com/singapore/health/cluster-linked-to-cleaner-at-punggol-primary-school-grows-to-12-cases
  5. https://www.straitstimes.com/singapore/parenting-education/years-1-4-students-at-raffles-institution-primary-2-pupils-at
  6. https://www.straitstimes.com/singapore/health/jurong-fishery-port-covid-19-cluster-likely-spread-from-indonesian-or-other-fishing
  7. https://www.straitstimes.com/tech/more-than-57000-starhub-customers-personal-data-leaked
  8. https://www.channelnewsasia.com/singapore/3-men-charged-leak-starhub-singtel-subscriber-information-482756
  9. https://www.channelnewsasia.com/singapore/singtel-data-breach-customer-information-stolen-nric-356131
  10. https://www.straitstimes.com/singapore/singtel-fined-25k-for-data-breach-involving-app
  11. https://coconuts.co/singapore/news/online-data-breaches-both-m1-and-k-box-leak-massive-amounts-personal-data-public/
  12. https://www.straitstimes.com/tech/tech-news/public-sector-data-leaks-total-108-last-year-up-from-75-cases-in-2019
  13. https://www.todayonline.com/singapore/2-firms-fined-s43000-total-over-personal-data-breaches-affecting-mindef-saf-personnel
  14. http://thegatesofbabylon.blogspot.com/2019/02/history-of-and-thoughts-on-14200-hiv.html
  15. https://www.straitstimes.com/singapore/personal-info-of-15m-singhealth-patients-including-pm-lee-stolen-in-singapores-most
  16. https://www.straitstimes.com/singapore/singapores-privacy-watchdog-fines-ihis-750000-singhealth-250000-for-data-breach
  17. https://www.smartnation.gov.sg/why-Smart-Nation/NationalAIStrategy
  18. https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-bias-in-artificial-intelligence-and-in-humans
  19. https://www.forbes.com/sites/forbestechcouncil/2021/02/04/the-role-of-bias-in-artificial-intelligence/?sh=79c28d76579d
  20. https://mustsharenews.com/tracetogether-screenshot/
  21. https://mothership.sg/2021/08/dine-in-eateries-vaccination-screenshot/