Sunday, 19 July 2020

What is the analytical maturity of the Singapore Government? Recent evidence of 2 critical public events


As someone who work in Analytics/”data science”, one of the things I need to be able to judge is whether an organisation is analytically mature or not. This is critical and determines what I think I could offer to the organisation and prove the value that analytics can bring.

Singapore is going through the Covid-19 situation like every other country, and has recently gone through a general election (in the middle of the pandemic). These 2 very public events show how analytically mature the government it.

But what do I mean by analytical maturity?

To me, an organisation being mature in analytics means all or most parts of the organisation use analysis of data to make decisions; this implies production and consumption of analytics. To be truly data-driven, the analytics should feed into the decision making as part of BAU. In analytically mature organisations, the challenge for people like me is to bring skills the organisation may not have internally. This is a different challenge from say the case where the organisation wants to consume analytics, but is unable to produce consistently.

How does the Covid19 situation rate in terms of obtaining and using analysis

In my view, Singapore started dealing with Covid-19 really well.

Firstly, using information available, the government reached out to people and educated everyone about the virus, the measures to deal with the threat by using data. Whether that data was in the form of WHO advice at that time – wear masks only when not feeling well – or as some leaked audio revealed – lack of PPE (Personal Protective Equipment) means ensure front-line workers have PPE – is not really the point. SO, to me, the government did a good job at the beginning.

However, things went south quickly; in April, the numbers of infected people started exploding in Singapore. Basically, the government uncovered a hidden infected zone, the foreign workers dormitories.

I was shocked. One of the first people to be infected, case 42 from the construction at Seletar Aerospace Heights, went to Mustafa centre (1) in early February. When I heard that news, my first reaction was horror; if someone who works at a construction site caught the virus, then, given the conditions in the dormitories, infection would spread like wildfire.

However, the dormitory situation only blew up in April as seen below – original data as used by John Hopkins (2).



So what happened in these 2 months? Do you think the government just did not make the link between tight spaces and the virus? Luckily, you would have thought, the minister in charge is an expert in tight spaces (3).


Apparently, the government was indeed watching the Covid-19 situation in the dorms, even since January, it seems. (4). The questions remains, if this segment of the population was being monitored since January, how did it explode in April?

To me it either:
  • Monitoring was done properly but data was not collected (I assume no tests done)
  • If it was, then no action was taken on the data (which I find less likely, I don’t see the government willfully allowing the virus to spread)


So, to me, the ministry in charge simply did not do a good job using (or collecting) data. And if that was left to dormitory operators, it’s also not a brilliant idea given their track record, half of them breaching rules every year (5)

To say that nobody heard of asymptomatic transmission at that point in time is odd. The whole point is to test.

However, according to ambassador Chan Heng Chee: ”we test, we track and we quarantined them. But later it just exploded” (6). And she added, praising Singapore’s testing capabilities “In the region, you find that testing capabilities are different, so our numbers look much higher than others.”

But as I pointed out in my previous blog, (7) it has been said by Dr Dale Fisher, chair of infection control at the National University Hospital, that there are cases where testing is not needed anymore, you simply can assume everyone has been infected (8)

So something went really wrong there, if indeed testing took place on a significant scale or was done with a view to learn rather than simply react.

Furthermore, no minister, lack of demand for apology is not a metric for a job well done. Wrong metric will lead to wrong analysis and wrong action if any (9)(10)

Ok, so the ability to produce good useful analytics seems missing here.

But, to me, the Singapore PM is doing right by foreign workers. In fact he specified “to our migrant workers, let me emphasise again: we will care for you, just like we care for Singaporeans…”(11). Top management has the right desire, execution seems to be desired. Sounds familiar?

How does the Election GE2020 situation rate in terms of obtaining and using analysis?

The basic function of the ELD, in my simple view, is to ensure elections run smoothly:
  • Every one who has a right to vote is given the opportunity to do so safely
  • The voting and vote count are done transparently and with all parties who are allowed to witness the count in place
  • All valid votes cast, and only these, are counted.



I am not getting into other functions such as setting the electoral boundaries and so on. While this can and should be done using data, there just isn’t any publicly available information for any determination about data use to be made. And the objective of the exercise is also not available publicly.

So how did the ELD ensure everyone who is eligible to vote did so?

Very poorly.

The most ridiculous thing that happened is that voting hours were extended ‘at the last minute’ because it took longer for people to cast their votes compared to what was expected.

Due to the Covid-19 situation, extra measures were put in place. Each votes was given a time window to cast their vote. People were provided with self-inking pens, were asked to sanitise their hands, wear gloves provided, to dispose of the gloves after voting…

The ELD claimed that, because of these extra measures, they had to extend the voting hours.

This is proof of not using data. In one of my roles, I was looking into operational efficiency. One of the first things my team did was to look at the processes and time them. Look at enough of them to form a little sample. Now, since our objective was to make the operations of the organisation more efficient, all we did was observe our staff. We did not have to ask a sample of people to do the tasks, for example people with issues with their fingers due to age/disability, people who are not clear about the processes and have to ask (and who to ask). But still, this is easily done.

To claim they got the amount of time wrong is a clear indication of not using data.

ELD, you get an F.

How did ELD ensure that the people who were allowed to be at the voting centres had the opportunity to do so?

Again, very poorly.

I will admit I only have 1 source of this information. But since the person is prominent, made the statements on national media (CNA) and as far as I know has not been looked at from a POFMA (12) point of view. Dr Paul Tambyah, in this reaction to election results (13) “we’ve seen a number of events that occurred today, with the fiasco about the gloves, about the PPE at the end of the day where polling agents had to leave the polling stations”. The gloves bit was addressed above, the PPE piece is not.

Polling agents are the people, from the various parties who have the right to oversee the voting process (14) to witness the sealing of boxes before voting starts, to observe the process of voting throughout the day, and finally to witness the sealing of boxes at the end of the voting period and the transport to counting centres.

According to Mr Tambyah, the polling agents were asked to leave their stations towards the end of the day and this is linked to PPE.

Now, the voting was schedules to that, at the end of the day, people who have been quarantined are to go cast their votes, that was the design. I am not going to get into whether polling agents need PPEs apart from face masks which are anyway compulsory. But if any extra equipment was required, say face shields, this should have been made clear and provided. The ELD knows the maximum number of polling agents at every point, and should have made the necessary provision. Obviously they did not despite what they said (15) “By law, they can still vote during this time. The necessary precautions have been taken at all polling stations to ensure the safety of voters during the special voting hour,”. The precautions were not taken properly, therefore polling agents had to leave their posts.

Again, ELD, you get a very big F.

How about the casting of valid votes?

First you would expect anyone who turns up with the proper documentation at a voting centre to be able to cast his/her vote. There has been at least 1 case of someone being told she had already voted (16).

Secondly, while people under quarantine and those who are covid19 positive have been barred from voting (17), some people overseas were denied their vote (18) due to a glitch in the ICA system.
Thirdly, how about those Singaporeans who came home and decided to endure stay at home notice? Well, here again, despite knowing their identities and numbers, ELD messed up (19), their names were “missed out”.

Again, ELD, you get another big F.

On top of this, these are cases I came across, there probably are many more. I hope that ELD doesn’t say, well it was only 115 cases (1 vote not counted, 13 people not put on list, 101 overseas not on list either), I won’t be holding my breath.

This shows a pattern at being incapable to collecting and using data effectively.

Is it all that bad?

No, of course not.

Singapore has quite a few successes using data and technology. The trace together app does what it is advertised and the code has been open-sourced. It is good enough for other countries to consider adopting it (20); well done Govtech.
Take a look at the websites of some government bodies, they are beautiful:





For example, the Singstat data on trade is illustrated with a ship, dolphins and seagulls; it is even animated! (21)

The STB page on tourist arrivals is pretty too (22):



Services are extremely efficient, for example you can apply for an get your passport online (23)

What is the conclusion then?

I am using corporate standards so will us corporate terms.

To me the Singapore government top management has the desire to consume analytics and has the right directions.

This has been translated into efficiency and use of very basic analytics to make operations work well; operations that are of large volume have been studied and made efficient; and are tracked for improvement.

Some flagship analytical projects, such as the tracetogether app have shown the isolated ability to produce good analytics.

But the Covid19 response, the ELD shows that middle management is not pulling in the same direction.

This is something that anyone trying to “sell” analytics has come across. Beautiful picture from top management, great at doing high volume repetitive stuff, but horrible at the middle layer. The Singapore government is thus like a typical behemoth that is trying to get into the digital age. The lack of clear competency in the middle to senior management is responsible for the very public deficiencies.


So, to me, the Singapore is not analytically mature; the use of data is restricted, there are huge pockets of resistance within the organisation, and this can only change if the top management weighs heavily on the middle blockers and people challenge the status quo rather than just go along with what middle management says.

  1. https://coconuts.co/singapore/news/covid-19-heres-every-novel-coronavirus-infection-in-singapore-on-a-map/
  2. https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
  3. https://www.straitstimes.com/singapore/ministers-rejoinder-to-no-flat-no-child-belief
  4. https://www.onlinecitizenasia.com/2020/04/24/netizens-unimpressed-with-josephine-teos-aggressive-and-defensive-response-to-questions-on-migrant-workers-dormitories/
  5. https://www.straitstimes.com/singapore/manpower/nearly-half-of-large-dorms-breach-rules-each-year-minister
  6. https://mothership.sg/2020/04/chan-heng-chee-covid-19/
  7. http://thegatesofbabylon.blogspot.com/2020/07/covid-19-green-lanes-bubbles-singapore.html
  8. https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has
  9. https://www.youtube.com/watch?v=uP28mlqgZYk
  10. https://www.onlinecitizenasia.com/2020/05/06/even-if-josephine-teo-doesnt-want-to-apologise-to-migrant-workers-she-should-apologise-to-singaporeans/
  11. https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
  12. https://singaporelegaladvice.com/law-articles/singapore-fake-news-protection-online-falsehoods-manipulation/
  13. https://www.youtube.com/watch?v=OcNeWz0Y7pU the relevant piece is starts at 2minutes
  14. https://www.eld.gov.sg/pdf/GE2020/Guide_for_Polling_Agents_for_General_Election_2020.pdf
  15. https://www.channelnewsasia.com/news/singapore/ge2020-covid-19-patients-quarantined-cannot-vote-special-voting-12889490
  16. https://www.asiaone.com/singapore/ge2020-eld-admits-mistake-after-officials-told-woman-she-couldnt-vote-polling-day
  17. https://www.channelnewsasia.com/news/singapore/ge2020-covid-19-patients-quarantined-cannot-vote-special-voting-12889490
  18. https://www.channelnewsasia.com/news/singapore/ge2020-101-singaporeans-overseas-unable-vote-ica-glitch-eld-12901284
  19. https://www.youtube.com/watch?v=LwAchzbVLLY
  20. https://thekopi.co/2020/05/15/tracetogether-explainer/
  21. https://www.singstat.gov.sg/modules/infographics/singapore-international-trade
  22. https://stan.stb.gov.sg/public/sense/app/254dd6c2-eaf7-46c4-bf7a-39b5df6ff847/sheet/3101ecdd-af88-4d5d-be49-6c7f90277948/state/analysis
  23. https://www.ica.gov.sg/singapore-citizen/singapore-passport/apply-for-a-passport


Tuesday, 7 July 2020

Covid-19, Green lanes, bubbles… Singapore left out! Data and interpretation, story telling, doing things right


Tourism and trans-border travel are still very important in today’s world. Many countries are opening up to varying degrees and this is a trend that is likely to continue as countries try to find ways of allowing the lifeblood of foreign spending to re-enter their veins.

For example, the EU has a list of 15 countries outside of the EU where travel is allowed: Algeria, Australia, Canada, Georgia, Japan, Montenegro, Morocco, New Zealand, Rwanda, Serbia, South Korea, Thailand, Tunisia and Uruguay, China’s status depends on reciprocity.
The EU as kind enough to disclose their official criteria (1):
  • Ensuring that the Covid-19 infection rate in the country was low enough (where nations had fewer than 16 in every 100,000 infected)
  • That there was a downward trend of cases
  • That social distancing measures were at "a sufficient level"

Many people in Singapore were surprised that Singapore was not included. And, in my view, this illustrates perfectly how data is used. Data is data, but how information is created and communicated is probably more important than the data itself.

The first difference is that numbers reported in Singapore make a clear distinction between 3 groups of cases in Singapore (2).
  • Imported cases, people who have returned to Singapore recently
  • Cases residing in dormitories, the description is self-explanatory
  • Cases in the community, this is the rest.

Many people in Singapore, including some of my friends, focus on the “cases in the community”, and don’t bother much about the cases residing in dormitories. I would argue that this is a deliberate communication choice by the authorities, and the purpose is to reassure the ‘average’ person in Singapore: when you step out, you are not that much as risk, so with basic precautions, life can resume.

Therefore, it is not surprising to see that many people are surprised at the stance of the EU for example, excluding Singapore from the ‘green zone’.

What I always found interesting was this way of segmenting the population: imported, in-community, residing in dormitories. Anyone would know that workers residing in dormitories are mainly from Bangladesh, India region, so splitting the numbers that way is highly correlated to country of origin/race.

However, this simply amplifies the feelings expressed by people:


There is an undercurrent of racism in the coronavirus situation in Singapore. Making the distinction between “in community” and “residing in dormitories” which strongly correlates with splitting along nationality/race (and even more strongly along nationality/race + earning) does not help with this.

Note though that I am not saying the government subscribes to this racist view; on the contrary. The fact that the Prime Minister gave a speech specifically mentioning that “to our migrant workers, let me emphasise again: we will care for you, just like we care for Singaporeans…” (3).

What puzzles me the most is that the foreign workers living in dorms are not in the community, but prisoners are. The number of cases in prisons in Singapore are added to the numbers i+”in the community”, (4). It is an interesting thought, people living in Singapore outside of the dorms are closer to prisoners than to people living in the dorms…

In any case, the government has achieved its aim: Singapore residents are reassured; however, the international community just sees the total number of people affected and does not interpret the numbers in the same way.

Who is right, who is wrong?

This is a fundamental question for anyone remotely into analysis/analytics/”data science”.
Is it possible that two parties look at a piece of data and come to opposite conclusions (“Singapore is safe enough”, “Singapore is not safe enough”)? Must one be right and the other wrong?

Come on, who is right?

In my view, they have their reasons for interpreting the data as they are, but both are wrong.

How could the Singaporean interpretation be right?

If we measure risk of infection by the number of people who are getting infected on a daily basis, then it makes sense to look at the number of people, not in dorms, who are infected. This is because, the people who were in the dorms have basically been isolated from the rest of the population.
The interpretation is the one needed to achieve the purpose of reassuring the population.

How could the EU interpretation be right?

It doesn’t make sense to look at a segmented view of any population, but especially if the view that the covid-19 virus is more airborne that previously estimated, since it is impossible to physically totally split the population. Add to this, when looking at data from different countries, making data comparable is an arduous task, so it may be more practical to use high level numbers without going into specifics (unless specifically requested to do so)

Hence the interpretation by the EU suits its purposes.

Why are they both wrong?

Judging risk by the numbers found to be infected on a daily basis needs to be qualified; risk is a rate, a percentage, not an integer. The EU uses 16/100,000 infection rate, 16 people infected our of every 100,000 population. The simple solution to this is, as president trump said: ”if we stop testing right now, we’d have very few cases” (5).

Risk is the number of people infected divided by the number of people tested.

PLUS

The tests would have to be random.

Covid-19 is known to sometimes be asymptomatic, estimates for the percentage of asymptomatic cases varies from 5% to 80% (6). Hence, focusing tests on people who display symptoms or who are linked to people who are known to have been infected is likely to seriously underestimate the true risk.

Furthermore, there are cases where people are simply assumed to have been infected and tests not conducted. This was highlighted in Singapore in an interview on Channel News Asia by Dr Dale Fisher, chair of infection control at the National University Hospital “The numbers are not really coming down. It’s a function of the tests. In some dormitories, the infection rate or the positivity rate if the tests is so high, you get to the point where you don’t need to test anymore” (9).

Needless to say, not testing people who are likely to be infected, reduces the number and percentage of people infected in the test results.

Basically, this goes back to why you are undertaking an analysis. 

To me, in every case,
  • doing an analysis to prove a point is not the right way of doing things. To a hammer everything looks like a nail
  • there may be practical considerations when you analyse data, you do need to take into account how the analysis will be implemented


Conclusions:

  1. “Lies, Damned lies and statistics”, there are many ways to interpret data, or any bunch of data may be transformed into different actionable items, some more valid than others. Hence the process of deriving the actionable items and the skill of the interpreter both matter.
  2. Analysis of data supposed to be as objective as possible. It is bad practice to start an analysis with a view to provide evidence for a point of view.
  3. In real life, how the results of the analysis will be used does impact the analysis itself. Analysis for the sake of analysis without being implemented is useless.

P.S.
Actually, you could actually re-look at the problem the analysis is being used for. What the EU is basically trying to do is manage the risk that allowing people from outside the EU with respect to Covid-19; specifically they are focusing on minimising the risk of the people coming into the EU of bringing the virus with them. Using country wide (or even state wide if that applies to larger countries) rules is quite blunt, it ignores individual circumstances.

I am sure countries will lobby the EU, for example Singapore could explain that the numbers are mainly due to "foreign workers in dormitories" whereas "in-community" infections are low, to allow their residents to travel. A further step would be for the EU to, at a minimum, overlay some data that each individual provides/allows the EU to collect so that the EU can make a better individual decision, and this must be something that can be done at scale.

In other words, ladies and gentlemen of the EU (and other countries), this is a case where analytics (in its larger sense) and really help make a difference. I say analytics in its larger sense because this would require data collection, processing, dynamic scoring... involve infrastructure, architecture... not just AI-jockeying; but with cloud solutions, this lessens the runway to a solution.

In sum, as always, analytics should be as unbiased as possible, and take into account implementation to obtain a workable solution and help resolve a problem. And in this case of deciding who to allow in as the covid-19 situation across the world evolves is one where proper analytics can make a real difference.

  1. https://www.bbc.com/news/world-europe-53222356
  2. https://www.moh.gov.sg/news-highlights/details/324-more-cases-discharged-136-new-cases-of-covid-19-infection-confirmed
  3. https://www.mfa.gov.sg/Overseas-Mission/Pretoria/Mission-Updates/2020/04/PM-LEE-ON-THE-COVID-19-SITUATION-IN-SINGAPORE-21-APR-2020
  4. https://www.channelnewsasia.com/news/singapore/covid-19-cases-singapore-jun-14-community-moh-imported-12833548
  5. https://www.businessinsider.com/trump-stop-coronavirus-testing-right-now-have-very-few-cases-2020-6
  6. https://www.cebm.net/covid-19/covid-19-what-proportion-are-asymptomatic/
  7. https://www.straitstimes.com/singapore/health/coronavirus-dip-in-local-cases-a-good-sign-but-too-early-to-say-singapore-has I have not managed to find the original interview, if someone does, please add to comments