Tuesday 7 July 2020

Covid-19, Green lanes, bubbles… Singapore left out! Data and interpretation, story telling, doing things right


Tourism and trans-border travel are still very important in today’s world. Many countries are opening up to varying degrees and this is a trend that is likely to continue as countries try to find ways of allowing the lifeblood of foreign spending to re-enter their veins.

For example, the EU has a list of 15 countries outside of the EU where travel is allowed: Algeria, Australia, Canada, Georgia, Japan, Montenegro, Morocco, New Zealand, Rwanda, Serbia, South Korea, Thailand, Tunisia and Uruguay, China’s status depends on reciprocity.
The EU as kind enough to disclose their official criteria (1):
  • Ensuring that the Covid-19 infection rate in the country was low enough (where nations had fewer than 16 in every 100,000 infected)
  • That there was a downward trend of cases
  • That social distancing measures were at "a sufficient level"

Many people in Singapore were surprised that Singapore was not included. And, in my view, this illustrates perfectly how data is used. Data is data, but how information is created and communicated is probably more important than the data itself.

The first difference is that numbers reported in Singapore make a clear distinction between 3 groups of cases in Singapore (2).
  • Imported cases, people who have returned to Singapore recently
  • Cases residing in dormitories, the description is self-explanatory
  • Cases in the community, this is the rest.

Many people in Singapore, including some of my friends, focus on the “cases in the community”, and don’t bother much about the cases residing in dormitories. I would argue that this is a deliberate communication choice by the authorities, and the purpose is to reassure the ‘average’ person in Singapore: when you step out, you are not that much as risk, so with basic precautions, life can resume.

Therefore, it is not surprising to see that many people are surprised at the stance of the EU for example, excluding Singapore from the ‘green zone’.

What I always found interesting was this way of segmenting the population: imported, in-community, residing in dormitories. Anyone would know that workers residing in dormitories are mainly from Bangladesh, India region, so splitting the numbers that way is highly correlated to country of origin/race.

However, this simply amplifies the feelings expressed by people:


There is an undercurrent of racism in the coronavirus situation in Singapore. Making the distinction between “in community” and “residing in dormitories” which strongly correlates with splitting along nationality/race (and even more strongly along nationality/race + earning) does not help with this.

Note though that I am not saying the government subscribes to this racist view; on the contrary. The fact that the Prime Minister gave a speech specifically mentioning that “to our migrant workers, let me emphasise again: we will care for you, just like we care for Singaporeans…” (3).

What puzzles me the most is that the foreign workers living in dorms are not in the community, but prisoners are. The number of cases in prisons in Singapore are added to the numbers i+”in the community”, (4). It is an interesting thought, people living in Singapore outside of the dorms are closer to prisoners than to people living in the dorms…

In any case, the government has achieved its aim: Singapore residents are reassured; however, the international community just sees the total number of people affected and does not interpret the numbers in the same way.

Who is right, who is wrong?

This is a fundamental question for anyone remotely into analysis/analytics/”data science”.
Is it possible that two parties look at a piece of data and come to opposite conclusions (“Singapore is safe enough”, “Singapore is not safe enough”)? Must one be right and the other wrong?

Come on, who is right?

In my view, they have their reasons for interpreting the data as they are, but both are wrong.

How could the Singaporean interpretation be right?

If we measure risk of infection by the number of people who are getting infected on a daily basis, then it makes sense to look at the number of people, not in dorms, who are infected. This is because, the people who were in the dorms have basically been isolated from the rest of the population.
The interpretation is the one needed to achieve the purpose of reassuring the population.

How could the EU interpretation be right?

It doesn’t make sense to look at a segmented view of any population, but especially if the view that the covid-19 virus is more airborne that previously estimated, since it is impossible to physically totally split the population. Add to this, when looking at data from different countries, making data comparable is an arduous task, so it may be more practical to use high level numbers without going into specifics (unless specifically requested to do so)

Hence the interpretation by the EU suits its purposes.

Why are they both wrong?

Judging risk by the numbers found to be infected on a daily basis needs to be qualified; risk is a rate, a percentage, not an integer. The EU uses 16/100,000 infection rate, 16 people infected our of every 100,000 population. The simple solution to this is, as president trump said: ”if we stop testing right now, we’d have very few cases” (5).

Risk is the number of people infected divided by the number of people tested.

PLUS

The tests would have to be random.

Covid-19 is known to sometimes be asymptomatic, estimates for the percentage of asymptomatic cases varies from 5% to 80% (6). Hence, focusing tests on people who display symptoms or who are linked to people who are known to have been infected is likely to seriously underestimate the true risk.

Furthermore, there are cases where people are simply assumed to have been infected and tests not conducted. This was highlighted in Singapore in an interview on Channel News Asia by Dr Dale Fisher, chair of infection control at the National University Hospital “The numbers are not really coming down. It’s a function of the tests. In some dormitories, the infection rate or the positivity rate if the tests is so high, you get to the point where you don’t need to test anymore” (9).

Needless to say, not testing people who are likely to be infected, reduces the number and percentage of people infected in the test results.

Basically, this goes back to why you are undertaking an analysis. 

To me, in every case,
  • doing an analysis to prove a point is not the right way of doing things. To a hammer everything looks like a nail
  • there may be practical considerations when you analyse data, you do need to take into account how the analysis will be implemented


Conclusions:

  1. “Lies, Damned lies and statistics”, there are many ways to interpret data, or any bunch of data may be transformed into different actionable items, some more valid than others. Hence the process of deriving the actionable items and the skill of the interpreter both matter.
  2. Analysis of data supposed to be as objective as possible. It is bad practice to start an analysis with a view to provide evidence for a point of view.
  3. In real life, how the results of the analysis will be used does impact the analysis itself. Analysis for the sake of analysis without being implemented is useless.

P.S.
Actually, you could actually re-look at the problem the analysis is being used for. What the EU is basically trying to do is manage the risk that allowing people from outside the EU with respect to Covid-19; specifically they are focusing on minimising the risk of the people coming into the EU of bringing the virus with them. Using country wide (or even state wide if that applies to larger countries) rules is quite blunt, it ignores individual circumstances.

I am sure countries will lobby the EU, for example Singapore could explain that the numbers are mainly due to "foreign workers in dormitories" whereas "in-community" infections are low, to allow their residents to travel. A further step would be for the EU to, at a minimum, overlay some data that each individual provides/allows the EU to collect so that the EU can make a better individual decision, and this must be something that can be done at scale.

In other words, ladies and gentlemen of the EU (and other countries), this is a case where analytics (in its larger sense) and really help make a difference. I say analytics in its larger sense because this would require data collection, processing, dynamic scoring... involve infrastructure, architecture... not just AI-jockeying; but with cloud solutions, this lessens the runway to a solution.

In sum, as always, analytics should be as unbiased as possible, and take into account implementation to obtain a workable solution and help resolve a problem. And in this case of deciding who to allow in as the covid-19 situation across the world evolves is one where proper analytics can make a real difference.

  1. https://www.bbc.com/news/world-europe-53222356
  2. https://www.moh.gov.sg/news-highlights/details/324-more-cases-discharged-136-new-cases-of-covid-19-infection-confirmed
  3. https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
  4. https://www.channelnewsasia.com/news/singapore/covid-19-cases-singapore-jun-14-community-moh-imported-12833548
  5. https://www.businessinsider.com/trump-stop-coronavirus-testing-right-now-have-very-few-cases-2020-6
  6. https://www.cebm.net/covid-19/covid-19-what-proportion-are-asymptomatic/
  7. https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has I have not managed to find the original interview, if someone does, please add to comments


No comments:

Post a Comment