Friday, 12 February 2021

Myanmar 2021

 


Some of you may know I worked for 1 year in Myanmar, trying to help make a traditional insurer more data-driven. I am now not in Myanmar anymore, however I keep an eye on the happenings in this amazing country.

To me, Myanmar is a country with enormous potential; as a foreigner, you only have to open your eyes and leave your preconceptions at the door.


Attitude

One of the amazing things about Myanmar is the attitude of the people. There is no hiding that the country deals with some issues in terms of development. What surprised me most when I went to the office is a lack of people in what would be called mid-career age in the office. There is a bunch of young people (more on them later) and a bunch of more seasoned people (50s and above). But do not underestimate Myanmar’s ‘older’ generation.

A simple example would explain what I mean. I thought it would be good, as part of spreading the word about the use of data, to have basic statistics and analysis classes for my colleagues. This was open to all staff. Again I have a bunch of young eager students and a bunch of older ones.

One of my students was in his 60s, and regularly attended. He told me “I enjoy your classes. I always want to learn, and these classes open my mind and I learn new things”.

Many of us working in data have to deal with people who refuse to change their ways, and this is one of the major obstacle to organisations on their journey to becoming data driven. But to hear someone tell you this is very encouraging. And this is not an isolated case.

 

Potential

When I arrived, after taking stock of things, I realised processes were very manual. I enquired with the CIO (an expat) and he said that it was impossible to get talent, and that he had to search high and low for qualified people, hence, despite being there for quite a while, the best he could do, with the resources is maintain the systems. The skills he was talking about was basic database skills, SQL.

I found that to be totally wrong.

I found quite a good team of talents in Myanmar (one from the CIO’s office who wanted a change). There are very talented people in the latest technologies in Myanmar, from local universities. We developed and built forms that could be used to send applications instantly to the back-office for processing and the back-office underwriters were willing and able to use the new methods. We revamped the auto claims systems to more than half the time to pay, and this involved a lot of adjustments within the claims team, not only among new and young staff, but also among the more experienced staff. We even trialed telematics for insurance (I hope that was going well after I left).

I will state things simply:

  • From my personal experience, Myanmar has very talented young people who are very eager to learn and grow.
  • So in technical terms, in the world of data, Myanmar has the skills.
  • Myanmar also has people with great attitude, eager to learn, and able to put theory into reality.

My hope

My basic hope for Myanmar is that the situation is resolved without bloodshed, like what most people would agree.

But equally importantly would be for the potential that Myanmar has to be realised.

The young people in Myanmar today should be provided a good framework where they can engage in the intellectual pursuits, grow and learn, and be allowed to contribute back to their country in terms of leap-frogging into the 21st century.

I have seen first-hand the passion these young people have to grow and contribute back to the country, what I hope is that they are allowed to do so.

Furthermore, this should happen soon, so the more experienced people in Myanmar can work side by side with the young.

It would be horrible if Myanmar were to miss another generation.

I have always been against country-wide economic sanctions because it is the ‘normal’ people who suffer and any change is driven from that suffering pushing people to the brink so they have no choice to enforce change. To me, there is too much suffering in this approach.

I believe that ASEAN should take the lead in interfacing with the power in Myanmar. Many people in the region recognise the alignment of the military who now hold power in Myanmar and economic power. Instead of “managing” the economic conditions of the whole country, the economic interests of the people in power is what should be understood and their incentives which are likely have driven them to take power in the way they have be managed. And this is one place where “big data”/”Data science” can help.

 

P.S. Since I started this blog, 3 new events occurred

1 Someone at the protests was wounded by live ammunition. I hope the least other countries can do is put pressure on the powers that be in Myanmar not to use excessive force against the civilian population, their own people.

2 I have been reading that some people are divesting from Myanmar, for example a personality from Razer (1). I believe that, as with economic sanctions, people should be careful. If the partner in Myanmar is from the Military or is military linked, then yes, this can help the military change their minds, and put a higher cost to their recent actions. But then, it may be worth asking why it took recent events to cause divestment, after all 30 years ago Myanmar was under military control. But if the partner is not associated with the military, then it may cause more harm – make some people lose their jobs- than good.

3 someone has made a good start to understanding pressure points that may work without much damage to the people of Myanmar (2). It is up to us as individuals to act. Using big data at a granular level can uncover so much more. If there is a will…


  1.  https://mothership.sg/2021/02/razer-co-founder-myanmar-military/amp/
  2.  https://coconuts.co/singapore/news/burmese-expat-goes-to-police-over-singapore-companies-ties-to-myanmar-military/

Sunday, 29 November 2020

If it looks too good to be true, it probably is

Be wary of analysis where all the indicators seem to point in the same direction: “If it looks too good to be true, it probably is”.

Again

306 – 232

Is the number everyone is focused on.

Yes Biden/Harris seems to have a clear mandate, with almost 5 million more votes than Trump/Pence.

The numbers who voted in this election are unprecedented – Trump/Pence lost with the highest number of votes for any incumbent presidential ticket.

The efforts of the democrats to push people, especially minorities and disenfranchised people to vote paid off.

Well…

President Trump missed his cue

Not in the way you may think. According to CNN exit polls (1):


Trump/Pence did better in 2020 among all non-white-races (vs 2016)

and 

Biden/Harris (2020) worse among all non-white races compared with Clinton/Kaine (2016).

Some of you would have noticed that the changes do not add up to zero (loss of democrats is smaller than gains to Trump/Pence). This is because people are more candid about their choices this round:



Fewer people voted for a third party or refused to reveal their preference.

So, how did Biden/Harris apparently win more votes then?

The Democrat campaign was focused on re-capturing the people who voted for Obama/Biden in 2012 and who switched to Trump/Pence in 2016, especially in the rust-belt (2).

And this strategy seems to have delivered the white house (3)(4).



Data can show much more if you have an open mind and look deeper.

 

 

1 https://edition.cnn.com/election/2020/exit-polls/president/national-results

2 https://www.reuters.com/article/us-usa-election-biden-insight/bidens-winning-strategy-flip-rust-belt-trump-states-and-hold-on-tight-idUSKBN27N0OC

3 https://slate.com/news-and-politics/2020/11/midwest-rust-belt-georgia-will-decide-presidential-election.html

4 https://dyn.realclearpolitics.com/elections/live_results/2020/president/

 


Monday, 9 November 2020

Data Literacy is more important than ever

Data has been sitting in servers and databases for the last 50 years.

Artificial Intelligence has been around for at least that long too.

But data, on its own, does not inform you, does not help you make better decisions, it needs to be interpreted, modelled…

A simple example:

290 -214

This is the current number of votes from the electoral college that Biden-Harris and Trump-Pence are projected to receive. Since a minimum of 270 votes are the minimum to be declared the winner, Mr Biden finds himself called president-elect.

But there is more to this:


The fact remains that close to 8 million people more voted from Trump-Pence in 2020 (as of now) compared to 2016. The Trump-Pence ticket has grown in votes.

Which number you choose to focus on depends on the story you want to tell. A truer picture should use both sets of numbers.

Analytics is not about the data; interpreting, modelling the data, being aware of the weaknesses of modelling techniques is critical.

Data literacy is more important than ever.


Sunday, 19 July 2020

What is the analytical maturity of the Singapore Government? Recent evidence of 2 critical public events


As someone who work in Analytics/”data science”, one of the things I need to be able to judge is whether an organisation is analytically mature or not. This is critical and determines what I think I could offer to the organisation and prove the value that analytics can bring.

Singapore is going through the Covid-19 situation like every other country, and has recently gone through a general election (in the middle of the pandemic). These 2 very public events show how analytically mature the government it.

But what do I mean by analytical maturity?

To me, an organisation being mature in analytics means all or most parts of the organisation use analysis of data to make decisions; this implies production and consumption of analytics. To be truly data-driven, the analytics should feed into the decision making as part of BAU. In analytically mature organisations, the challenge for people like me is to bring skills the organisation may not have internally. This is a different challenge from say the case where the organisation wants to consume analytics, but is unable to produce consistently.

How does the Covid19 situation rate in terms of obtaining and using analysis

In my view, Singapore started dealing with Covid-19 really well.

Firstly, using information available, the government reached out to people and educated everyone about the virus, the measures to deal with the threat by using data. Whether that data was in the form of WHO advice at that time – wear masks only when not feeling well – or as some leaked audio revealed – lack of PPE (Personal Protective Equipment) means ensure front-line workers have PPE – is not really the point. SO, to me, the government did a good job at the beginning.

However, things went south quickly; in April, the numbers of infected people started exploding in Singapore. Basically, the government uncovered a hidden infected zone, the foreign workers dormitories.

I was shocked. One of the first people to be infected, case 42 from the construction at Seletar Aerospace Heights, went to Mustafa centre (1) in early February. When I heard that news, my first reaction was horror; if someone who works at a construction site caught the virus, then, given the conditions in the dormitories, infection would spread like wildfire.

However, the dormitory situation only blew up in April as seen below – original data as used by John Hopkins (2).



So what happened in these 2 months? Do you think the government just did not make the link between tight spaces and the virus? Luckily, you would have thought, the minister in charge is an expert in tight spaces (3).


Apparently, the government was indeed watching the Covid-19 situation in the dorms, even since January, it seems. (4). The questions remains, if this segment of the population was being monitored since January, how did it explode in April?

To me it either:
  • Monitoring was done properly but data was not collected (I assume no tests done)
  • If it was, then no action was taken on the data (which I find less likely, I don’t see the government willfully allowing the virus to spread)


So, to me, the ministry in charge simply did not do a good job using (or collecting) data. And if that was left to dormitory operators, it’s also not a brilliant idea given their track record, half of them breaching rules every year (5)

To say that nobody heard of asymptomatic transmission at that point in time is odd. The whole point is to test.

However, according to ambassador Chan Heng Chee: ”we test, we track and we quarantined them. But later it just exploded” (6). And she added, praising Singapore’s testing capabilities “In the region, you find that testing capabilities are different, so our numbers look much higher than others.”

But as I pointed out in my previous blog, (7) it has been said by Dr Dale Fisher, chair of infection control at the National University Hospital, that there are cases where testing is not needed anymore, you simply can assume everyone has been infected (8)

So something went really wrong there, if indeed testing took place on a significant scale or was done with a view to learn rather than simply react.

Furthermore, no minister, lack of demand for apology is not a metric for a job well done. Wrong metric will lead to wrong analysis and wrong action if any (9)(10)

Ok, so the ability to produce good useful analytics seems missing here.

But, to me, the Singapore PM is doing right by foreign workers. In fact he specified “to our migrant workers, let me emphasise again: we will care for you, just like we care for Singaporeans…”(11). Top management has the right desire, execution seems to be desired. Sounds familiar?

How does the Election GE2020 situation rate in terms of obtaining and using analysis?

The basic function of the ELD, in my simple view, is to ensure elections run smoothly:
  • Every one who has a right to vote is given the opportunity to do so safely
  • The voting and vote count are done transparently and with all parties who are allowed to witness the count in place
  • All valid votes cast, and only these, are counted.



I am not getting into other functions such as setting the electoral boundaries and so on. While this can and should be done using data, there just isn’t any publicly available information for any determination about data use to be made. And the objective of the exercise is also not available publicly.

So how did the ELD ensure everyone who is eligible to vote did so?

Very poorly.

The most ridiculous thing that happened is that voting hours were extended ‘at the last minute’ because it took longer for people to cast their votes compared to what was expected.

Due to the Covid-19 situation, extra measures were put in place. Each votes was given a time window to cast their vote. People were provided with self-inking pens, were asked to sanitise their hands, wear gloves provided, to dispose of the gloves after voting…

The ELD claimed that, because of these extra measures, they had to extend the voting hours.

This is proof of not using data. In one of my roles, I was looking into operational efficiency. One of the first things my team did was to look at the processes and time them. Look at enough of them to form a little sample. Now, since our objective was to make the operations of the organisation more efficient, all we did was observe our staff. We did not have to ask a sample of people to do the tasks, for example people with issues with their fingers due to age/disability, people who are not clear about the processes and have to ask (and who to ask). But still, this is easily done.

To claim they got the amount of time wrong is a clear indication of not using data.

ELD, you get an F.

How did ELD ensure that the people who were allowed to be at the voting centres had the opportunity to do so?

Again, very poorly.

I will admit I only have 1 source of this information. But since the person is prominent, made the statements on national media (CNA) and as far as I know has not been looked at from a POFMA (12) point of view. Dr Paul Tambyah, in this reaction to election results (13) “we’ve seen a number of events that occurred today, with the fiasco about the gloves, about the PPE at the end of the day where polling agents had to leave the polling stations”. The gloves bit was addressed above, the PPE piece is not.

Polling agents are the people, from the various parties who have the right to oversee the voting process (14) to witness the sealing of boxes before voting starts, to observe the process of voting throughout the day, and finally to witness the sealing of boxes at the end of the voting period and the transport to counting centres.

According to Mr Tambyah, the polling agents were asked to leave their stations towards the end of the day and this is linked to PPE.

Now, the voting was schedules to that, at the end of the day, people who have been quarantined are to go cast their votes, that was the design. I am not going to get into whether polling agents need PPEs apart from face masks which are anyway compulsory. But if any extra equipment was required, say face shields, this should have been made clear and provided. The ELD knows the maximum number of polling agents at every point, and should have made the necessary provision. Obviously they did not despite what they said (15) “By law, they can still vote during this time. The necessary precautions have been taken at all polling stations to ensure the safety of voters during the special voting hour,”. The precautions were not taken properly, therefore polling agents had to leave their posts.

Again, ELD, you get a very big F.

How about the casting of valid votes?

First you would expect anyone who turns up with the proper documentation at a voting centre to be able to cast his/her vote. There has been at least 1 case of someone being told she had already voted (16).

Secondly, while people under quarantine and those who are covid19 positive have been barred from voting (17), some people overseas were denied their vote (18) due to a glitch in the ICA system.
Thirdly, how about those Singaporeans who came home and decided to endure stay at home notice? Well, here again, despite knowing their identities and numbers, ELD messed up (19), their names were “missed out”.

Again, ELD, you get another big F.

On top of this, these are cases I came across, there probably are many more. I hope that ELD doesn’t say, well it was only 115 cases (1 vote not counted, 13 people not put on list, 101 overseas not on list either), I won’t be holding my breath.

This shows a pattern at being incapable to collecting and using data effectively.

Is it all that bad?

No, of course not.

Singapore has quite a few successes using data and technology. The trace together app does what it is advertised and the code has been open-sourced. It is good enough for other countries to consider adopting it (20); well done Govtech.
Take a look at the websites of some government bodies, they are beautiful:





For example, the Singstat data on trade is illustrated with a ship, dolphins and seagulls; it is even animated! (21)

The STB page on tourist arrivals is pretty too (22):



Services are extremely efficient, for example you can apply for an get your passport online (23)

What is the conclusion then?

I am using corporate standards so will us corporate terms.

To me the Singapore government top management has the desire to consume analytics and has the right directions.

This has been translated into efficiency and use of very basic analytics to make operations work well; operations that are of large volume have been studied and made efficient; and are tracked for improvement.

Some flagship analytical projects, such as the tracetogether app have shown the isolated ability to produce good analytics.

But the Covid19 response, the ELD shows that middle management is not pulling in the same direction.

This is something that anyone trying to “sell” analytics has come across. Beautiful picture from top management, great at doing high volume repetitive stuff, but horrible at the middle layer. The Singapore government is thus like a typical behemoth that is trying to get into the digital age. The lack of clear competency in the middle to senior management is responsible for the very public deficiencies.


So, to me, the Singapore is not analytically mature; the use of data is restricted, there are huge pockets of resistance within the organisation, and this can only change if the top management weighs heavily on the middle blockers and people challenge the status quo rather than just go along with what middle management says.

  1. https://coconuts.co/singapore/news/covid-19-heres-every-novel-coronavirus-infection-in-singapore-on-a-map/
  2. https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
  3. https://www.straitstimes.com/singapore/ministers-rejoinder-to-no-flat-no-child-belief
  4. https://www.onlinecitizenasia.com/2020/04/24/netizens-unimpressed-with-josephine-teos-aggressive-and-defensive-response-to-questions-on-migrant-workers-dormitories/
  5. https://www.straitstimes.com/singapore/manpower/nearly-half-of-large-dorms-breach-rules-each-year-minister
  6. https://mothership.sg/2020/04/chan-heng-chee-covid-19/
  7. http://thegatesofbabylon.blogspot.com/2020/07/covid-19-green-lanes-bubbles-singapore.html
  8. https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has
  9. https://www.youtube.com/watch?v=uP28mlqgZYk
  10. https://www.onlinecitizenasia.com/2020/05/06/even-if-josephine-teo-doesnt-want-to-apologise-to-migrant-workers-she-should-apologise-to-singaporeans/
  11. https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
  12. https://singaporelegaladvice.com/law-articles/singapore-fake-news-protection-online-falsehoods-manipulation/
  13. https://www.youtube.com/watch?v=OcNeWz0Y7pU the relevant piece is starts at 2minutes
  14. https://www.eld.gov.sg/pdf/GE2020/Guide_for_Polling_Agents_for_General_Election_2020.pdf
  15. https://www.channelnewsasia.com/news/singapore/ge2020-covid-19-patients-quarantined-cannot-vote-special-voting-12889490
  16. https://www.asiaone.com/singapore/ge2020-eld-admits-mistake-after-officials-told-woman-she-couldnt-vote-polling-day
  17. https://www.channelnewsasia.com/news/singapore/ge2020-covid-19-patients-quarantined-cannot-vote-special-voting-12889490
  18. https://www.channelnewsasia.com/news/singapore/ge2020-101-singaporeans-overseas-unable-vote-ica-glitch-eld-12901284
  19. https://www.youtube.com/watch?v=LwAchzbVLLY
  20. https://thekopi.co/2020/05/15/tracetogether-explainer/
  21. https://www.singstat.gov.sg/modules/infographics/singapore-international-trade
  22. https://stan.stb.gov.sg/public/sense/app/254dd6c2-eaf7-46c4-bf7a-39b5df6ff847/sheet/3101ecdd-af88-4d5d-be49-6c7f90277948/state/analysis
  23. https://www.ica.gov.sg/singapore-citizen/singapore-passport/apply-for-a-passport


Tuesday, 7 July 2020

Covid-19, Green lanes, bubbles… Singapore left out! Data and interpretation, story telling, doing things right


Tourism and trans-border travel are still very important in today’s world. Many countries are opening up to varying degrees and this is a trend that is likely to continue as countries try to find ways of allowing the lifeblood of foreign spending to re-enter their veins.

For example, the EU has a list of 15 countries outside of the EU where travel is allowed: Algeria, Australia, Canada, Georgia, Japan, Montenegro, Morocco, New Zealand, Rwanda, Serbia, South Korea, Thailand, Tunisia and Uruguay, China’s status depends on reciprocity.
The EU as kind enough to disclose their official criteria (1):
  • Ensuring that the Covid-19 infection rate in the country was low enough (where nations had fewer than 16 in every 100,000 infected)
  • That there was a downward trend of cases
  • That social distancing measures were at "a sufficient level"

Many people in Singapore were surprised that Singapore was not included. And, in my view, this illustrates perfectly how data is used. Data is data, but how information is created and communicated is probably more important than the data itself.

The first difference is that numbers reported in Singapore make a clear distinction between 3 groups of cases in Singapore (2).
  • Imported cases, people who have returned to Singapore recently
  • Cases residing in dormitories, the description is self-explanatory
  • Cases in the community, this is the rest.

Many people in Singapore, including some of my friends, focus on the “cases in the community”, and don’t bother much about the cases residing in dormitories. I would argue that this is a deliberate communication choice by the authorities, and the purpose is to reassure the ‘average’ person in Singapore: when you step out, you are not that much as risk, so with basic precautions, life can resume.

Therefore, it is not surprising to see that many people are surprised at the stance of the EU for example, excluding Singapore from the ‘green zone’.

What I always found interesting was this way of segmenting the population: imported, in-community, residing in dormitories. Anyone would know that workers residing in dormitories are mainly from Bangladesh, India region, so splitting the numbers that way is highly correlated to country of origin/race.

However, this simply amplifies the feelings expressed by people:


There is an undercurrent of racism in the coronavirus situation in Singapore. Making the distinction between “in community” and “residing in dormitories” which strongly correlates with splitting along nationality/race (and even more strongly along nationality/race + earning) does not help with this.

Note though that I am not saying the government subscribes to this racist view; on the contrary. The fact that the Prime Minister gave a speech specifically mentioning that “to our migrant workers, let me emphasise again: we will care for you, just like we care for Singaporeans…” (3).

What puzzles me the most is that the foreign workers living in dorms are not in the community, but prisoners are. The number of cases in prisons in Singapore are added to the numbers i+”in the community”, (4). It is an interesting thought, people living in Singapore outside of the dorms are closer to prisoners than to people living in the dorms…

In any case, the government has achieved its aim: Singapore residents are reassured; however, the international community just sees the total number of people affected and does not interpret the numbers in the same way.

Who is right, who is wrong?

This is a fundamental question for anyone remotely into analysis/analytics/”data science”.
Is it possible that two parties look at a piece of data and come to opposite conclusions (“Singapore is safe enough”, “Singapore is not safe enough”)? Must one be right and the other wrong?

Come on, who is right?

In my view, they have their reasons for interpreting the data as they are, but both are wrong.

How could the Singaporean interpretation be right?

If we measure risk of infection by the number of people who are getting infected on a daily basis, then it makes sense to look at the number of people, not in dorms, who are infected. This is because, the people who were in the dorms have basically been isolated from the rest of the population.
The interpretation is the one needed to achieve the purpose of reassuring the population.

How could the EU interpretation be right?

It doesn’t make sense to look at a segmented view of any population, but especially if the view that the covid-19 virus is more airborne that previously estimated, since it is impossible to physically totally split the population. Add to this, when looking at data from different countries, making data comparable is an arduous task, so it may be more practical to use high level numbers without going into specifics (unless specifically requested to do so)

Hence the interpretation by the EU suits its purposes.

Why are they both wrong?

Judging risk by the numbers found to be infected on a daily basis needs to be qualified; risk is a rate, a percentage, not an integer. The EU uses 16/100,000 infection rate, 16 people infected our of every 100,000 population. The simple solution to this is, as president trump said: ”if we stop testing right now, we’d have very few cases” (5).

Risk is the number of people infected divided by the number of people tested.

PLUS

The tests would have to be random.

Covid-19 is known to sometimes be asymptomatic, estimates for the percentage of asymptomatic cases varies from 5% to 80% (6). Hence, focusing tests on people who display symptoms or who are linked to people who are known to have been infected is likely to seriously underestimate the true risk.

Furthermore, there are cases where people are simply assumed to have been infected and tests not conducted. This was highlighted in Singapore in an interview on Channel News Asia by Dr Dale Fisher, chair of infection control at the National University Hospital “The numbers are not really coming down. It’s a function of the tests. In some dormitories, the infection rate or the positivity rate if the tests is so high, you get to the point where you don’t need to test anymore” (9).

Needless to say, not testing people who are likely to be infected, reduces the number and percentage of people infected in the test results.

Basically, this goes back to why you are undertaking an analysis. 

To me, in every case,
  • doing an analysis to prove a point is not the right way of doing things. To a hammer everything looks like a nail
  • there may be practical considerations when you analyse data, you do need to take into account how the analysis will be implemented


Conclusions:

  1. “Lies, Damned lies and statistics”, there are many ways to interpret data, or any bunch of data may be transformed into different actionable items, some more valid than others. Hence the process of deriving the actionable items and the skill of the interpreter both matter.
  2. Analysis of data supposed to be as objective as possible. It is bad practice to start an analysis with a view to provide evidence for a point of view.
  3. In real life, how the results of the analysis will be used does impact the analysis itself. Analysis for the sake of analysis without being implemented is useless.

P.S.
Actually, you could actually re-look at the problem the analysis is being used for. What the EU is basically trying to do is manage the risk that allowing people from outside the EU with respect to Covid-19; specifically they are focusing on minimising the risk of the people coming into the EU of bringing the virus with them. Using country wide (or even state wide if that applies to larger countries) rules is quite blunt, it ignores individual circumstances.

I am sure countries will lobby the EU, for example Singapore could explain that the numbers are mainly due to "foreign workers in dormitories" whereas "in-community" infections are low, to allow their residents to travel. A further step would be for the EU to, at a minimum, overlay some data that each individual provides/allows the EU to collect so that the EU can make a better individual decision, and this must be something that can be done at scale.

In other words, ladies and gentlemen of the EU (and other countries), this is a case where analytics (in its larger sense) and really help make a difference. I say analytics in its larger sense because this would require data collection, processing, dynamic scoring... involve infrastructure, architecture... not just AI-jockeying; but with cloud solutions, this lessens the runway to a solution.

In sum, as always, analytics should be as unbiased as possible, and take into account implementation to obtain a workable solution and help resolve a problem. And in this case of deciding who to allow in as the covid-19 situation across the world evolves is one where proper analytics can make a real difference.

  1. https://www.bbc.com/news/world-europe-53222356
  2. https://www.moh.gov.sg/news-highlights/details/324-more-cases-discharged-136-new-cases-of-covid-19-infection-confirmed
  3. https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
  4. https://www.channelnewsasia.com/news/singapore/covid-19-cases-singapore-jun-14-community-moh-imported-12833548
  5. https://www.businessinsider.com/trump-stop-coronavirus-testing-right-now-have-very-few-cases-2020-6
  6. https://www.cebm.net/covid-19/covid-19-what-proportion-are-asymptomatic/
  7. https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has I have not managed to find the original interview, if someone does, please add to comments


Wednesday, 24 June 2020

Covid-19, how safe is your DNA


I am into sports, football mainly with some recent interest in indoor cricket, and I do use some data analysis to illustrate some points on data (1)(2)(3). So, I was surprised to see who the EPL is partnering with to come up with a covid19-passport that would allow people back in stadia: Prenetics (4)(5). And for those who are not aware, Prenetics is behind circledna, offered all over at Watsons (6).
I blogged about Prenetics two years ago (7) where they collaborated with Prudential to offer an insurance product that offers personalised advice based on your dna. While prudential assured users they did not keep the DNA, guess who did… Prenetics.

So what does that have to do with Covid-19?

I am running a small poll on linked-in. Some countries are going into relaxing their “lock-downs” by introducing/further enforcing contact-tracing. What this means is that some data created by people will be made available, usually to authorities. The 2 major methods are via location (GPS) or proximity (blue-tooth), and the poll is about which of these 2 are people less uncomfortable with (8). The third idea is a health passport.

GPS tracing

Your location is captured 24/7. In order for the system to be effective, the data is transferred back to a central database. So technically, if someone is tested positive, the list of people who were at the same place and time with that person can be extracted and contacted.

Bluetooth tracing

Your device captures data from people close to you (and your data is captured by the devices of people close to you). If someone is tested positive, the device is surrendered to the authorities and the people who data was captured are contacted.

Health Passport

A third idea, which is what the premier league is using, is the idea of health passports. “According to Lasarow, the web-based system would require fans to scan their health passport information, by way of a QR code, upon access to a venue in order to prove their Covid-19 test is valid and has also produced a negative result.” (4)

First of all, as I pointed out (9), the tests are designed to test whether there is enough evidence that someone is covid-19 positive; if there is not enough evidence, the person is not deemed covid-19 positive; not positive does not mean negative. It simply means there is not enough evidence to say the person has been infected with the virus.

Secondly, a test is valid at a point in time. You extract samples from me now for the test. Let’s assume I am isolated until the results come out. The results will indicate whether, when the test was carried out, there was enough evidence to say I was covid-19 positive. This test is valid for a point in time in the past. Since I have isolated myself, then chances are the same status is valid since I isolated myself (unless for example I was at too early a stage to be detected, and even if I am not further exposed, the virus replicates in my body and becomes detectable).

Now I carry this result on my “health passport”, and go to the stadium and “prove” I can be safely allowed in. The key points are
  • how much time has passed from the test to me entering the stadium
  • what have I been up to, where have I been, who have I been in close proximity with in the time between the test and me entering the stadium.



This is not as risk-free as many would like. All the passport says is that: at a point in time in the past (I am sure there will be a time-based validity), my test did not indicate that I was covid-19 positive. And this applies to everyone else in the stadium.

To me, the health passport, used in isolation, is insufficient, especially since we know so little about the virus, the incubation period, the contagious period, what factors affect these (diet, temperature, activities, behaviour…).

A health passport would be good if it indicated immunity to covid-19. At this point, it does not.

So, the EPL’s current health passport offers some cover, anywhere from a blanket that keeps your feet uncovered, to a fig leaf. This is because all your activities, all the places you have been and the people in whose proximity you have been in, are all not recorded by this “health passport”. That’s precisely why there are trac(k)ing approaches.

Now, if you add to that the Peltzman effect (10), that is people who now think they are safe tend to take more risks than earlier, this makes going to stadia to watch the EPL a bit scary (fortunately I am a plastic fan from Singapore).

However this is not what this blog is actually about.

What truly scares me is that it is Prenetics behind the initiative.



The covid19 tests are not done by Prenetics, they are done by a third-party lab, the doctors’ laboratory (11). Prenetics simply allows the person to confirm his/her identity, matching the person who took the test to the person entering the stadium. Basically, that’s an IT integration job, not that of a company that deals in DNA.


So what does Prenetics gain from this? If it was only the US$4.8m deal with the EPL (5), I wouldn’t be bothered, but what is left is. Prenetics is in the business of collecting DNA samples. Once the tests are done, who owns the samples? What happens to them? Is the DNA extracted? What is done with it?

It is not unheard of for medical samples to be used for purposes other than the main one they were collected for. In fact, donated blood that is not used (blood doesn’t have that long shelf-life) (12). The unused blood may be used for research purposes.

My fear is that our data (remember, the way the health passport works is that your identity is ties to the sample and to the results) would be used to extract DNA and this can be used somehow – for example insurance companies would love to get their hands on your DNA, and even learn your genetic predisposition to some illnesses even when you don’t. When someone gives a sample for covid-19 testing, I would assume that’s all they’d like the labs to do with the sample.

Ok, so if the health passport is so bad, what is the solution?

There are many solutions, many countries have their own tracing apps, apple/google have their own, a flexible and useful one is goPassport (different from the health passport used by the EPL). GoPassport works across international borders and combines a few methods, including interfacing with local apps, and provides a comprehensive risk assessment from various sources such as tests, other measurements, movements…

If you want to know more about goPassport, please contact Francesca.goh@alphazetta.ai or Alec.gardner@alphazetta.ai , do mention me so I can claim a few drinks from them if they get a deal out of it 😊


  1. http://thegatesofbabylon.blogspot.com/2018/12/people-who-dont-understand-football.html
  2. http://thegatesofbabylon.blogspot.com/2019/01/great-chariots-of-fire-marcelo-bielsa_15.html
  3. http://thegatesofbabylon.blogspot.com/2019/01/a-true-data-scientist.html
  4. https://www.straitstimes.com/sport/football/ticket-please-passport-too
  5. https://www.sportspromedia.com/news/premier-league-digital-health-passport-prenetics-testing-covid-19
  6. https://www.watsons.com.sg/all-brands/b/230155/circledna
  7. http://thegatesofbabylon.blogspot.com/2018/04/yes-facebook-has-taken-liberties-with.html
  8. https://www.linkedin.com/posts/kailashpurang_contacttracing-bluetooth-gpstracking-activity-6679573622664892416-gGlE
  9. http://thegatesofbabylon.blogspot.com/2020/03/stats-may-help-you-understand-more.html
  10. https://en.wikipedia.org/wiki/Risk_compensation
  11. https://www.tdlpathology.com/covid-19/
  12. https://medium.com/dose/what-happens-to-unused-blood-after-its-been-donated-fa2df960de11


Wednesday, 10 June 2020

The Singapore wearable trac(k)er debate, ahum…



A few days ago, I saw the news that Singapore was developing a wearable contact tracing device; this would make it easier to inform people if they were in proximity to someone who turned out to be covid19 positive (1).

The main reasons for a wearable stated are that the current app (tracetogether) does not play well with apple devices (for the Bluetooth on which the app relies to work, it cannot be running in the background, blocking the user from other uses of the phone), and the battery consumption of the app (your bluetooth is on ‘all the time’).

Note that, today, downloading the app is not compulsory, unless you are someone on a work permit, living in a dorm; for these guys who are sadly bearing the brunt of the infection, the government has made it compulsory, 24/7 trac(k)ing.

2 days ago, the minister for smart nation declared contact tracing “absolutely essential”. Mr Balakrishnan specifically highlighted:
  • This is not a tracking device because it has no GPS component
  • There is no internet connection hence the data cannot be uploaded
  • The “data” never leaves the device unless you are found to be covid19 positive
  • “only a very limited restricted team of contact tracers” would have access to the data

Within the next couple of days, a petition was created on change.org (3) “Singapore says 'No' to wearable devices for Covid-19 contact tracing“. Over 40,000 people have signed.

The author of this petition, Mr Wilson Goh wrote a lengthy explanation that I will try to summarise below:
  • The device cannot be switched off and the user will have no choice.
    • “This will be done regardless of whether the person has a phone or not; regardless whether their phone is switched off or on; whether that person is within reception of a cell tower or not; and regardless whether their phone has wifi or Bluetooth switched off or on.”
  • Having a permanent ‘tracker’ is the final step to the police state
  • Tracing/tracking infringes on the rights, privacy and freedom of movement of people of Singapore
    • “We - as free, independent, and lawful members of the public of Singapore - condemn the device's implementation as blatant infringements upon our rights to privacy, personal space, and freedom of movement.”



(4)
“Are you pondering what I am pondering?” Do you think I agree more with the minister or the petitioner?

My answer is:


This is a red herring dragged by a horse that has bolted before someone can close the door.

The question is not whether the phone should be used, or a wearable device used. You most likely are already showing a lot of people your location and more 24/7 via a device you hold dear, so why the fuss now?

That device is called a mobile phone. Many organisations have access to your location. Do you use some map? Is your GPS on? How do you think you can be connected so quickly when calling a mobile phone, do you think the telco searches for the recipient of your call on demand, or do they roughly know where to look (which tower(s))? How do you think you get “relevant” advertising, sometimes even locally (contextual messages)?

If you don’t believe me, while on your phone, just try clicking on myactivity.google.com . I could never afford an iPhone so I don’t really know if there is an equivalent.

(4)

Hmmm, confused? Am I not supposed to be someone who values privacy and who believes that each individual’s data (s)he produces should belong to him/her?

At this stage, we are in a crisis, or trying to manage one, as a society. There may be a call to balance individual privacy against everyone’s safety.

The government is arguing that the data on ‘your’ wearable will only be read if you test positive. Then anyone is the list of people in close proximity to you will be contacted. On top of the proximity, there is a time limit, a maximum data retention period of 25 days.

Sounds reasonable, right?

But what about accusations of police state?

One of the keys here is compulsion, when people are compelled to do anything, they are likely to question. Now the tracetogether app is voluntary (around 20% of people have downloaded it), but it is compulsory for people living in dorms, leading to comparisons to animals being microchipped. Will the wearable device be compulsory?

A second question is that of enforcement. Even if the device is made compulsory, how will the government check if I am wearing one? Will there be Bluetooth scanners and you will be approached if you don’t have one? Will people be stopped and asked to show their thing?

A third question is for how long would people be required to carry the device? At the moment there doesn’t seem to be an end point, and the worry is that, even if covid19 treatment is found the tracing will continue, or one way out will be to take a potential vaccine.

A fourth question is, does that mean that all other systems such as logging visitors to a supermarket for example will be stopped? Or is this a supplementary measure? While the device may not keep location data, if it is added to location specific data (entries to buildings, onto modes of transport…) the journey of people can easily be reconstituted.

The stand of the organisation supposed to protect individual privacy on the safe-entry app is enlightening “In the event of a COVID-19 case, relevant personal data can be collected, used and disclosed without consent during this period to carry out contact tracing and other response measures, as this is necessary to respond to an emergency that threatens the life, health or safety of other individuals.” And “Collection of personal data for Government’s contact tracing purposes should only be done through the use of SafeEntry. The data collected will only be stored in Government’s servers”(5) (highlights are mine)

While the wearable itself may not be what the petitioners deem a police state, adding it to the safe entry app, where all log-ins are captured, is likely to be.

A fifth question is what happens to the data captured. The proliferation of databases today has increased the chances of loss of privacy. A single database may not have enough data to identify people, but if matched with another database, the combined data may be sufficient to identify people. Furthermore, I believe that when people consent to give their data, it is done for a purpose, and data should be used purely for that purpose. This should apply in all cases.

The thing is, despite the PDPA and the PDPC (Personal Data Protection Act and Council respectively), data is being used for purposes other that what it was collected for. In fact, for the Covid19 case, the PDPC is vague “In the event of a COVID-19 case, relevant personal data can be collected, used and disclosed without consent during this period to carry out contact tracing and other response measures” (5) (highlights are mine)

In the same line of thought, OCBC bank, while arguing that data sharing is safe as long as you are in charge of your data (6) (full business times article (7)) actually shows an example of how data from telcos is being used to plan transport(8); unless telcos are now in the transportation business, this looks like misuse of data to me, PDPC or not. This practice has to stop, especially when extremely granular data is being captured centrally.

A sixth question is, how secure is the data on the device? While the government stresses that only in cases of positive tests would individuals be asked to give the data captured, can individuals look into the data captured by their devices? Even if I can’t tell which device and its related information belongs to you when I am in a crowd, if I bump into you often enough (may be 4 times), I could easily figure out your identity and the related captured information. 

(4)

So, what is my conclusion?

Basically, if you are worried about the wearable, it is perfectly understandable. However, you need to realise that you have been leaking this information, or this information already culled from you for a long while.

I actually think that, if it is made compulsory to have such a piece of software, I would prefer a standalone device and no other tracing mechanism. However the questions above (and probably more) should be addressed to build trust. At the same time, since the government is looking into this area, it would be fantastic if they made data ownership in all cases to the people who generate it, and force permission, even retroactively, to be required from the owners/creators of the data by other parties who want to have access to the data, including stating the usage of the data.

I think that would be a nice compromise that could meet the aims of most people.

Only after we as individuals are in control of the data we generate, thereby having the right to choose who, for what purpose, and for how long to share it with, and the right to have the data deleted, will there possibly be enough trust to move away from the “police state” idea, and from abuses by corporations (‘cambridge analytica’ is a classic case of data being misused (9)).

  1. https://www.channelnewsasia.com/news/singapore/covid-19-contact-tracing-device-trace-together-app-12806842
  2. https://www.channelnewsasia.com/news/singapore/covid-19-contact-tracing-wearable-devices-trace-together-12815796
  3. https://www.change.org/p/singapore-government-singapore-says-no-to-wearable-devices-for-covid-19-contact-tracing
  4. https://en.wikipedia.org/wiki/Pinky_and_the_Brain
  5. https://www.pdpc.gov.sg/Help-and-Resources/2020/03/Advisory-on-Collection-of-Personal-Data-for-COVID-19-Contact-Tracing
  6. https://www.linkedin.com/posts/ken-wong-ab35842_data-sharing-is-safe-as-long-as-you-are-activity-6671646060701736960-uV1b/
  7. https://www.businesstimes.com.sg/opinion/data-sharing-is-safe-as-long-as-you-are-in-charge-of-your-data
  8.  https://www.linkedin.com/feed/update/urn:li:activity:6671646060701736960?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6671646060701736960%2C6671658524692615168%29
  9. https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_data_scandal