There has been a lot of fear
around the Covid19, and a lot of it has to do with not understanding what is going on. A lot of
what is going on should be easily understood by anyone with basic statistics
background and some common sense when it comes to medical issues. Since I am
one of the millions who satisfy these criteria, and I have heard some stuff
that surprised me – I was so relieved when my colleague on the same floor
tested negative! - I decided to write this blog, showing how basic statistical
knowledge would help understand what is happening in the Covid19 situation. It
will not prevent anyone from catching the virus, but I hope it can lessen
anxiety regarding various reports that have been floating around.
A Why you need to understand sampling
How would you know if someone has
Covid19 or not? Covid19 is caused by a virus, and there are mainly 4 ways in
which one can test for a virus (1). But let me just explain what a virus does,
and you will see that the testing methods are easy to understand.
A virus is a small infectious
agent that replicates inside living cells (2). So in order to thrive, a virus
has to get into a living cell and there it can try to replicate. So find a
healthy cell, infect it, and try to replicate. The infection damages the
healthy cell, which can then damage the immune system, while doing that the
virus actually changes the genetic material of the cells (the cells don’t do
what they were supposed to do anymore, or not as well), and can cause
inflammation that may damage organs.
So how can you detect a virus?
1 Viral Culture
This is the way most people would
imagine a test to be like, especially since we have all see the picture of a
corona virus by now. Basically you take tissue from the individual, and create
optimal conditions for the virus to proliferate, then checkout if the virus is
present after a suitable time so that it can proliferate.
For example, when one of my “monsters”
had an ear infection, the vet used ear buds to swipe inside the ear, sent the swabs to the lab, where the labs grew whatever was inside and identified it
(in this case it was bacteria) and recommended appropriate treatment (Thank you
Dr Au!)
2 Antibody test
Your body is designed to fight
any intruders, viruses included. When the immune system detects an invader, it
will pick a weapon from its stockpiles, and increase production of that weapon
in order to fight the invader. While there are generic weapons, more
specialized weapons are antibodies to specific threats. This is one way
vaccines can be developed but I won’t go there here. The antibodies are to be
found in the blood stream, so they can travel all over the body and deal with
the specific invaders.
For example, this is what people
test for also to see if you need a tetanus shot. People like me often get into
trouble with animals and inanimate objects, so cuts and wounds are par for the
course. It is therefore important that I am protected against tetanus. The test
for whether I am protected is to detect the level of antibodies I have, whether
I have enough to fight off an infection.
3 Viral antigen detection test
If a cell is infected with a
virus, the cell itself changes. The idea of an antigen is that the surface of
an infected cell gets coated with a specific antigen. The trick to knowing
whether someone has been infected is to detect whether there are cells that are
coated with the specific antigen. One way of doing this is to inject a special
chemical that will attach itself to that antigen and be visible using some
equipment. So if the antigen is present, the chemical will attach itself to the
infected cells, and be visible.
4 Viral DNA/RNA detection test
In this test, some body fluid
(blood, spinal fluid…) of the person is taken, and searched for DNA/RNA of the
virus. Since this test looks for the DNA/RNA, it can exactly pin-point what
virus is infecting the person.
So what has sampling got to do
with this?
Let me put it simply, if you are
looking for something in someone’s blood, it is impractical to look at all the
blood. You have to take some of the blood, a sample. Samples are important in
statistics, specifically the size of a sample.
How big does your sample
need to be? Well it obviously depends on a few factors, the main ones being:
- How homogeneous is the population you are studying? For example, if you are trying to find the average height of a group of people, you would need a smaller sample size if you are looking at a class/age/grade, rather than looking at the whole school. The class would have less variability as compared to the school which has many age groups.
- How much do you want to risk getting things wrong? If the reason you were trying to find the average height was to select taller people to try out for the newly formed basket-ball team then getting it wrong is not as important (people of all heights can play basket ball well) as compared to the case when you are looking at really tall people who may have some condition you would like to treat.
Before we go further, any idea
what type of test is being used at the moment?
It may be surprising to some
people, but there is no universally used test. Let me just give a list of a few
tests used:
RT-PCR (3)
The US CDC (Centre for disease
control) came up and started shipping tests for Covid19 in early February (4).
They used the RT-PCR which basically hunts for DNA/RNA of the virus (4). What
do they test? Samples taken from lower and upper respiratory systems. Basically
since the virus affects the lungs, then you should be able to find evidence of
it in the respiratory tract. The CDC even specifies the equipment and software
versions that should be used in the test. Note though that the CDC developed
their own test.
Since the process of DNA/RNA
testing is not a quick one, and as mentioned by the CDC only a limited number
of labs qualify, this poses some issues regarding how universally usable this
test is.
For those who are interested in
the process, the Ontario Public Health has a very informative page (5) that
even explains how to collect the samples
- Naso-laryngeal swab (nose, throat, larynx), throat swab for upper respiratory tract
- Bronchial wash, pleural fluid, lung tissue sample for lower respiratory tract (which all involve some tubes down the throat – bronchial is from the lungs, pleural is around the lungs, lung tissue is self-evident)
So sample collection is not a
walk in the park for the person being tested.
To summarise, the RT-PCR test is
a bit tough on people undergoing them, the labs need to be properly equipped,
hence doing the tests on large scale is not straightforward, plus it takes
time.
To make things worse, the CDC
admitted that the test kits they shipped were defective (6)(7) being too
inconclusive, and limiting the testing as explained above (China can test 1.6m
people a week, South Korea has tested 35,000 people in a few weeks, the USA 429
(8)- although as of Feb 29 the number of people tested in the USA has risen to
472 (9))
Clinical Test/CT Scan
Recently China has a huge rise in
the number of people suspected of having been infected; this is because they
changed the way they tested (10).
There has been some support in
the scientific community regarding the use of CT scans (11), but unfortunately,
it is expected that CT scans would only show cases where there is quite a bit
of ‘damage’ to the lungs, and therefore would only be useful to detect cases
where the person has a relatively advances stage of infection.
Antibody test
A third avenue that is being
explored is the antibodies test. Duke-NUS has come up with a test a few days
age (12) (13). But again, note that this works to know more about the virus
after someone has been infected and is trying to fight off the virus.
The best summary I have found on
the various tests if at the WHO website (14).
So what is the best test? How can
tests get things wrong?
You may have heard of a dog
catching the virus from a human (15), cases of people not testing positive, but
also testing positive on a prior sample using a different kit, or people who
test positive after being given the all clear (16) and debates around this (17)
This brings us to the second
piece of statistics that is useful, test error.
B Why you need to understand that tests have errors and there are Type
I and Type II errors
Since the vast majority of tests
involve sampling, it is possible to make mistakes in determining whether
someone say has been infected or not. So what kind of errors is possible?
A simple table will illustrate
this:
In an ideal world, tests would be
perfect and you would obtain only True Positives (correctly identify infected people) and True Negatives (correctly identify
people who are not infected).
However, in real life, there will
be cases where a test wrongly clears someone from being infected (false
negative) and there are cases when a test wrongly classifies someone as
infected when that person is not (false positive).
The question is: what is worse,
or which error do you want to minimize? Given a few tests, which one would you
prefer, the one that catches the most infected people even if you also put some
non-infected people in quarantine (low false negative) or the one where you
ensure that few non-infected people are identified as being infected (low false
positive)?
My wild guess is that
quarantining non-infected people is preferable than letting infected
and potentially contagious people to roam free.
Now, let’s switch the question to
people who engage in petty theft. Would you rather let a guilty person free (high
false negative) or put an innocent person in prison (high false positive)?
I think, given the scare around
Covid19, most people would prefer to have false negatives as close to zero as
possible, that is try to catch all cases of infected people, even though it
means we may be declaring some people who are not infected as likely infected
and quarantine them. So I would prefer a test that minimizes false negatives.
To reiterate, tests are not
perfect. There usually is a trade-off between how well a test performs and the
cost of the test (whether in terms of time, technology, size/type of sample
used for testing…).
Let me go a level deeper; let us
look at the hypothesis.
C Hypothesis testing
In the table above, I have
implicitly assumed the hypothesis we are testing is “the person being tested is
infected”; that is what I am trying to find out. The baseline case is “the
person is not infected” and I am running tests to find out whether I can say,
with enough certainty, that “No, that person is actually likely to be
infected”.
It may sound pedantic, but this
has huge implications.
The baseline case is called the
null hypothesis. (H0). Usually the null hypothesis represents the normal case
that we are trying to prove does not hold.
The hypothesis we are testing,
the alternative hypothesis (H1) is the hypothesis we want to verify.
The key is
that if we are not sufficiently sure that H1 is correct, then we will fail to
reject H0.
What this means is simple: if a
test is designed to detect whether someone has been infected with a virus, then
the test can only say “yes, chances are this person is infected, I have enough
evidence to support this” or “no, there is not enough evidence to state that
the person is infected”.
Therefore, it is not right to
say that the person has tested negative; all that has happened is that the
person has not tested positive.
In the table above, the person
has tested negative. If the person is truly not infected, then it is a true
negative, but if the person is indeed infected (may be the % of the markers is
too low, or the infection has not yet developed enough to produce antibodies,
or the RNA is not spread yet), then we have a case of false negative.
So to me, saying someone tested
negative is fake news.
D Contact Tracing
Ok this is not basic statistics,
but I just wanted to make a little note.
I am happily in Singapore at the
moment, and will I did get caught in the orange-alert-day mess at a supermarket
(to me this was due to a lack of communication/underestimation of public
reaction), I think the management of the issues here has been quite good. The
PM’s message was great, and the provision of masks to households, to me, had
the desired effect of calming people. Add to this reinforcement in newspapers
and local media about how to use masks, basic hygiene, all is going well.
What I don’t get is why contact
tracing is such a difficult topic, and we have to rely on the words of people.
It has been proven that people do not have as self-sacrificing/civic mind as we
would have hoped. For example, one lady in South Korea is known to have
infected many people because she refused to believe she could have been
infected (18) and being very religious continued attending church multiple
times, spreading the virus. In another case, in Singapore, an individual
repeatedly violated the quarantine order, a
couple provided false information (19), or more interesting hiding the onset of
symptoms (20) – like this lady: quarantine order on Feb
26, reporting no recent illness, later admitted having symptoms since Feb 20
and even visited a doctor on Feb 25.
What I really don't get is why we
have to rely on the say-so of people to trace their movements in this day and
age. It is very easy to trace people especially in Singapore, and find
potential clusters by using “big data”. I really don’t understand why this is not
taking place. Or maybe it is and the people are not told so as to keep the
illusion of privacy; tin foil hat on.
13 https://www.channelnewsasia.com/news/singapore/covid19-coronavirus-duke-nus-antibody-tests-12469184
No comments:
Post a Comment