Tourism and trans-border travel
are still very important in today’s world. Many countries are opening up to
varying degrees and this is a trend that is likely to continue as countries try
to find ways of allowing the lifeblood of foreign spending to re-enter their
veins.
For example, the EU has a list of
15 countries outside of the EU where travel is allowed: Algeria, Australia,
Canada, Georgia, Japan, Montenegro, Morocco, New Zealand, Rwanda, Serbia, South
Korea, Thailand, Tunisia and Uruguay, China’s status depends on reciprocity.
The EU as kind enough to disclose
their official criteria (1):
- Ensuring that the Covid-19 infection rate in the country was low enough (where nations had fewer than 16 in every 100,000 infected)
- That there was a downward trend of cases
- That social distancing measures were at "a sufficient level"
Many people in Singapore were
surprised that Singapore was not included. And, in my view, this illustrates perfectly
how data is used. Data is data, but how information is created and communicated
is probably more important than the data itself.
The first difference is that
numbers reported in Singapore make a clear distinction between 3 groups of cases
in Singapore (2).
- Imported cases, people who have returned to Singapore recently
- Cases residing in dormitories, the description is self-explanatory
- Cases in the community, this is the rest.
Many people in Singapore,
including some of my friends, focus on the “cases in the community”, and don’t
bother much about the cases residing in dormitories. I would argue that this is
a deliberate communication choice by the authorities, and the purpose is to
reassure the ‘average’ person in Singapore: when you step out, you are not that
much as risk, so with basic precautions, life can resume.
Therefore, it is not surprising
to see that many people are surprised at the stance of the EU for example,
excluding Singapore from the ‘green zone’.
What I always found interesting
was this way of segmenting the population: imported, in-community, residing in
dormitories. Anyone would know that workers residing in dormitories are mainly
from Bangladesh, India region, so splitting the numbers that way is highly
correlated to country of origin/race.
However, this simply amplifies
the feelings expressed by people:
There is an undercurrent of racism
in the coronavirus situation in Singapore. Making the distinction between “in
community” and “residing in dormitories” which strongly correlates with
splitting along nationality/race (and even more strongly along nationality/race
+ earning) does not help with this.
Note though that I am not saying
the government subscribes to this racist view; on the contrary. The fact that
the Prime Minister gave a speech specifically mentioning that “to our migrant
workers, let me emphasise again: we will care for you, just like we care for
Singaporeans…” (3).
What puzzles me the most is that
the foreign workers living in dorms are not in the community, but prisoners are.
The number of cases in prisons in Singapore are added to the numbers i+”in the
community”, (4). It is an interesting thought, people living in Singapore
outside of the dorms are closer to prisoners than to people living in the dorms…
In any case, the government has
achieved its aim: Singapore residents are reassured; however, the international
community just sees the total number of people affected and does not interpret
the numbers in the same way.
Who is right, who is wrong?
This is a fundamental question
for anyone remotely into analysis/analytics/”data science”.
Is it possible that two parties
look at a piece of data and come to opposite conclusions (“Singapore is safe
enough”, “Singapore is not safe enough”)? Must one be right and the other
wrong?
Come on, who is right?
In my view, they have their reasons
for interpreting the data as they are, but both are wrong.
How could the Singaporean
interpretation be right?
If we measure risk of infection
by the number of people who are getting infected on a daily basis, then it
makes sense to look at the number of people, not in dorms, who are infected.
This is because, the people who were in the dorms have basically been isolated
from the rest of the population.
The interpretation is the one
needed to achieve the purpose of reassuring the population.
How could the EU interpretation be right?
It doesn’t make sense to look at
a segmented view of any population, but especially if the view that the covid-19
virus is more airborne that previously estimated, since it is impossible to
physically totally split the population. Add to this, when looking at data from
different countries, making data comparable is an arduous task, so it may be more
practical to use high level numbers without going into specifics (unless
specifically requested to do so)
Hence the interpretation by the
EU suits its purposes.
Why are they both wrong?
Judging risk by the numbers found
to be infected on a daily basis needs to be qualified; risk is a rate, a
percentage, not an integer. The EU uses 16/100,000 infection rate, 16 people
infected our of every 100,000 population. The simple solution to this is, as
president trump said: ”if we stop testing right now, we’d have very few cases”
(5).
Risk is the number of people
infected divided by the number of people tested.
PLUS
The tests would have to be
random.
Covid-19 is known to sometimes be
asymptomatic, estimates for the percentage of asymptomatic cases varies from 5%
to 80% (6). Hence, focusing tests on people who display symptoms or who are
linked to people who are known to have been infected is likely to seriously
underestimate the true risk.
Furthermore, there are cases
where people are simply assumed to have been infected and tests not conducted.
This was highlighted in Singapore in an interview on Channel News Asia by Dr
Dale Fisher, chair of infection control at the National University Hospital “The
numbers are not really coming down. It’s a function of the tests. In some
dormitories, the infection rate or the positivity rate if the tests is so high,
you get to the point where you don’t need to test anymore” (9).
Needless to say, not testing
people who are likely to be infected, reduces the number and percentage of
people infected in the test results.
Basically, this goes back to why
you are undertaking an analysis.
To me, in every case,
- doing an analysis to prove a point is not the right way of doing things. To a hammer everything looks like a nail
- there may be practical considerations when you analyse data, you do need to take into account how the analysis will be implemented
Conclusions:
- “Lies, Damned lies and statistics”, there are many ways to interpret data, or any bunch of data may be transformed into different actionable items, some more valid than others. Hence the process of deriving the actionable items and the skill of the interpreter both matter.
- Analysis of data supposed to be as objective as possible. It is bad practice to start an analysis with a view to provide evidence for a point of view.
- In real life, how the results of the analysis will be used does impact the analysis itself. Analysis for the sake of analysis without being implemented is useless.
P.S.
Actually, you could actually re-look at the problem the analysis is being used for. What the EU is basically trying to do is manage the risk that allowing people from outside the EU with respect to Covid-19; specifically they are focusing on minimising the risk of the people coming into the EU of bringing the virus with them. Using country wide (or even state wide if that applies to larger countries) rules is quite blunt, it ignores individual circumstances.
I am sure countries will lobby the EU, for example Singapore could explain that the numbers are mainly due to "foreign workers in dormitories" whereas "in-community" infections are low, to allow their residents to travel. A further step would be for the EU to, at a minimum, overlay some data that each individual provides/allows the EU to collect so that the EU can make a better individual decision, and this must be something that can be done at scale.
In other words, ladies and gentlemen of the EU (and other countries), this is a case where analytics (in its larger sense) and really help make a difference. I say analytics in its larger sense because this would require data collection, processing, dynamic scoring... involve infrastructure, architecture... not just AI-jockeying; but with cloud solutions, this lessens the runway to a solution.
In sum, as always, analytics should be as unbiased as possible, and take into account implementation to obtain a workable solution and help resolve a problem. And in this case of deciding who to allow in as the covid-19 situation across the world evolves is one where proper analytics can make a real difference.
- https://www.bbc.com/news/world-europe-53222356
- https://www.moh.gov.sg/news-highlights/details/324-more-cases-discharged-136-new-cases-of-covid-19-infection-confirmed
- https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
- https://www.channelnewsasia.com/news/singapore/covid-19-cases-singapore-jun-14-community-moh-imported-12833548
- https://www.businessinsider.com/trump-stop-coronavirus-testing-right-now-have-very-few-cases-2020-6
- https://www.cebm.net/covid-19/covid-19-what-proportion-are-asymptomatic/
- https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has I have not managed to find the original interview, if someone does, please add to comments
No comments:
Post a Comment