Wednesday, 4 March 2020

Stats may help you understand more about Covid19


There has been a lot of fear around the Covid19, and a lot of it has to do with not understanding what is going on. A lot of what is going on should be easily understood by anyone with basic statistics background and some common sense when it comes to medical issues. Since I am one of the millions who satisfy these criteria, and I have heard some stuff that surprised me – I was so relieved when my colleague on the same floor tested negative! - I decided to write this blog, showing how basic statistical knowledge would help understand what is happening in the Covid19 situation. It will not prevent anyone from catching the virus, but I hope it can lessen anxiety regarding various reports that have been floating around.



A Why you need to understand sampling

How would you know if someone has Covid19 or not? Covid19 is caused by a virus, and there are mainly 4 ways in which one can test for a virus (1). But let me just explain what a virus does, and you will see that the testing methods are easy to understand.

A virus is a small infectious agent that replicates inside living cells (2). So in order to thrive, a virus has to get into a living cell and there it can try to replicate. So find a healthy cell, infect it, and try to replicate. The infection damages the healthy cell, which can then damage the immune system, while doing that the virus actually changes the genetic material of the cells (the cells don’t do what they were supposed to do anymore, or not as well), and can cause inflammation that may damage organs.

So how can you detect a virus?

1 Viral Culture
This is the way most people would imagine a test to be like, especially since we have all see the picture of a corona virus by now. Basically you take tissue from the individual, and create optimal conditions for the virus to proliferate, then checkout if the virus is present after a suitable time so that it can proliferate.

For example, when one of my “monsters”  had an ear infection, the vet used ear buds to swipe inside the ear, sent the swabs to the lab, where the labs grew whatever was inside and identified it (in this case it was bacteria) and recommended appropriate treatment (Thank you Dr Au!)

2 Antibody test
Your body is designed to fight any intruders, viruses included. When the immune system detects an invader, it will pick a weapon from its stockpiles, and increase production of that weapon in order to fight the invader. While there are generic weapons, more specialized weapons are antibodies to specific threats. This is one way vaccines can be developed but I won’t go there here. The antibodies are to be found in the blood stream, so they can travel all over the body and deal with the specific invaders.

For example, this is what people test for also to see if you need a tetanus shot. People like me often get into trouble with animals and inanimate objects, so cuts and wounds are par for the course. It is therefore important that I am protected against tetanus. The test for whether I am protected is to detect the level of antibodies I have, whether I have enough to fight off an infection.

3 Viral antigen detection test
If a cell is infected with a virus, the cell itself changes. The idea of an antigen is that the surface of an infected cell gets coated with a specific antigen. The trick to knowing whether someone has been infected is to detect whether there are cells that are coated with the specific antigen. One way of doing this is to inject a special chemical that will attach itself to that antigen and be visible using some equipment. So if the antigen is present, the chemical will attach itself to the infected cells, and be visible.

4 Viral DNA/RNA detection test
In this test, some body fluid (blood, spinal fluid…) of the person is taken, and searched for DNA/RNA of the virus. Since this test looks for the DNA/RNA, it can exactly pin-point what virus is infecting the person.

So what has sampling got to do with this?

Let me put it simply, if you are looking for something in someone’s blood, it is impractical to look at all the blood. You have to take some of the blood, a sample. Samples are important in statistics, specifically the size of a sample.

How big does your sample need to be? Well it obviously depends on a few factors, the main ones being:
  • How homogeneous is the population you are studying? For example, if you are trying to find the average height of a group of people, you would need a smaller sample size if you are looking at a class/age/grade, rather than looking at the whole school. The class would have less variability as compared to the school which has many age groups.
  • How much do you want to risk getting things wrong? If the reason you were trying to find the average height was to select taller people to try out for the newly formed basket-ball team then getting it wrong is not as important (people of all heights can play basket ball well) as compared to the case when you are looking at really tall people who may have some condition you would like to treat.

Before we go further, any idea what type of test is being used at the moment?

It may be surprising to some people, but there is no universally used test. Let me just give a list of a few tests used:

RT-PCR (3)
The US CDC (Centre for disease control) came up and started shipping tests for Covid19 in early February (4). They used the RT-PCR which basically hunts for DNA/RNA of the virus (4). What do they test? Samples taken from lower and upper respiratory systems. Basically since the virus affects the lungs, then you should be able to find evidence of it in the respiratory tract. The CDC even specifies the equipment and software versions that should be used in the test. Note though that the CDC developed their own test.

Since the process of DNA/RNA testing is not a quick one, and as mentioned by the CDC only a limited number of labs qualify, this poses some issues regarding how universally usable this test is.
For those who are interested in the process, the Ontario Public Health has a very informative page (5) that even explains how to collect the samples
  • Naso-laryngeal swab (nose, throat, larynx), throat swab for upper respiratory tract
  • Bronchial wash, pleural fluid, lung tissue sample for lower respiratory tract (which all involve some tubes down the throat – bronchial is from the lungs, pleural is around the lungs, lung tissue is self-evident)

So sample collection is not a walk in the park for the person being tested.

To summarise, the RT-PCR test is a bit tough on people undergoing them, the labs need to be properly equipped, hence doing the tests on large scale is not straightforward, plus it takes time.

To make things worse, the CDC admitted that the test kits they shipped were defective (6)(7) being too inconclusive, and limiting the testing as explained above (China can test 1.6m people a week, South Korea has tested 35,000 people in a few weeks, the USA 429 (8)- although as of Feb 29 the number of people tested in the USA has risen to 472 (9))

Clinical Test/CT Scan
Recently China has a huge rise in the number of people suspected of having been infected; this is because they changed the way they tested (10).

There has been some support in the scientific community regarding the use of CT scans (11), but unfortunately, it is expected that CT scans would only show cases where there is quite a bit of ‘damage’ to the lungs, and therefore would only be useful to detect cases where the person has a relatively advances stage of infection.

Antibody test
A third avenue that is being explored is the antibodies test. Duke-NUS has come up with a test a few days age (12) (13). But again, note that this works to know more about the virus after someone has been infected and is trying to fight off the virus.

The best summary I have found on the various tests if at the WHO website (14).

So what is the best test? How can tests get things wrong?

You may have heard of a dog catching the virus from a human (15), cases of people not testing positive, but also testing positive on a prior sample using a different kit, or people who test positive after being given the all clear (16) and debates around this (17)

This brings us to the second piece of statistics that is useful, test error.

B Why you need to understand that tests have errors and there are Type I and Type II errors

Since the vast majority of tests involve sampling, it is possible to make mistakes in determining whether someone say has been infected or not. So what kind of errors is possible?
A simple table will illustrate this:

In an ideal world, tests would be perfect and you would obtain only True Positives (correctly identify infected people) and True Negatives (correctly identify people who are not infected).

However, in real life, there will be cases where a test wrongly clears someone from being infected (false negative) and there are cases when a test wrongly classifies someone as infected when that person is not (false positive).

The question is: what is worse, or which error do you want to minimize? Given a few tests, which one would you prefer, the one that catches the most infected people even if you also put some non-infected people in quarantine (low false negative) or the one where you ensure that few non-infected people are identified as being infected (low false positive)?

My wild guess is that quarantining non-infected people is preferable than letting infected  and potentially contagious people to roam free.

Now, let’s switch the question to people who engage in petty theft. Would you rather let a guilty person free (high false negative) or put an innocent person in prison (high false positive)?

I think, given the scare around Covid19, most people would prefer to have false negatives as close to zero as possible, that is try to catch all cases of infected people, even though it means we may be declaring some people who are not infected as likely infected and quarantine them. So I would prefer a test that minimizes false negatives.

To reiterate, tests are not perfect. There usually is a trade-off between how well a test performs and the cost of the test (whether in terms of time, technology, size/type of sample used for testing…).
Let me go a level deeper; let us look at the hypothesis.

C Hypothesis testing

In the table above, I have implicitly assumed the hypothesis we are testing is “the person being tested is infected”; that is what I am trying to find out. The baseline case is “the person is not infected” and I am running tests to find out whether I can say, with enough certainty, that “No, that person is actually likely to be infected”.

It may sound pedantic, but this has huge implications.

The baseline case is called the null hypothesis. (H0). Usually the null hypothesis represents the normal case that we are trying to prove does not hold.

The hypothesis we are testing, the alternative hypothesis (H1) is the hypothesis we want to verify.

The key is that if we are not sufficiently sure that H1 is correct, then we will fail to reject H0.

What this means is simple: if a test is designed to detect whether someone has been infected with a virus, then the test can only say “yes, chances are this person is infected, I have enough evidence to support this” or “no, there is not enough evidence to state that the person is infected”.

Therefore, it is not right to say that the person has tested negative; all that has happened is that the person has not tested positive.

In the table above, the person has tested negative. If the person is truly not infected, then it is a true negative, but if the person is indeed infected (may be the % of the markers is too low, or the infection has not yet developed enough to produce antibodies, or the RNA is not spread yet), then we have a case of false negative.

So to me, saying someone tested negative is fake news. 

D Contact Tracing

Ok this is not basic statistics, but I just wanted to make a little note.

I am happily in Singapore at the moment, and will I did get caught in the orange-alert-day mess at a supermarket (to me this was due to a lack of communication/underestimation of public reaction), I think the management of the issues here has been quite good. The PM’s message was great, and the provision of masks to households, to me, had the desired effect of calming people. Add to this reinforcement in newspapers and local media about how to use masks, basic hygiene, all is going well.

What I don’t get is why contact tracing is such a difficult topic, and we have to rely on the words of people. It has been proven that people do not have as self-sacrificing/civic mind as we would have hoped. For example, one lady in South Korea is known to have infected many people because she refused to believe she could have been infected (18) and being very religious continued attending church multiple times, spreading the virus. In another case, in Singapore, an individual repeatedly violated the quarantine order, a couple provided false information (19), or more interesting hiding the onset of symptoms (20) – like this lady: quarantine order on Feb 26, reporting no recent illness, later admitted having symptoms since Feb 20 and even visited a doctor on Feb 25.

What I really don't get is why we have to rely on the say-so of people to trace their movements in this day and age. It is very easy to trace people especially in Singapore, and find potential clusters by using “big data”. I really don’t understand why this is not taking place. Or maybe it is and the people are not told so as to keep the illusion of privacy; tin foil hat on.



No comments:

Post a Comment