The coronavirus (COVID-19) that is spreading through China is a tragedy, and I’d like to send my best wishes to those that are sick and my love to those who have passed on.
However, as yesterday’s quote noted – sometimes we need to temper our warm hearts with cold heads in order to understand social phenomenon and discover truths that lead to the improvement of society. If we are interested in understanding the risks to a country associated with coronavirus, we need good information on how the virus spreads and the fatality rate, in order to set appropriate policy.
In that way I’d like to talk about selection bias in the coronavirus figures – how this may overstate the death rate and understate the number of people catching it, and how the cruise ship – as well as being an incubator – may be (in part) a natural experiment which allows us to correct for this bias.
As my economics PhD thesis consider health related questions, this is an issue I like to think through, and which I think is valuable.
Looking at the gross numbers alone there appear to be a number of “facts” (as at 5pm NZT 19/02/2020):
- 75,216 people are or have been infected.
- 16,645 closed cases – with a gross death rate of 12%
- Of the remaining cases 21% are serious or critical.
- Of those that have died 10.5% have cardiovascular issues.
Assuming the death numbers are accurate, this is a very high death rate. However, the WHO states that they estimate a fatality rate of 2%. This issue also generates hot debate.
But there is a good reason why the true fatality rate may be so much lower than the currently measured rate, which has nothing to do with a lack of preparation or it initially occurring in a poor community. In my view the number of cases is understated!
This would be due to sample selection bias and the missing individuals with no or limited symptoms.
We need to think about how this sample was selected in order to understand whether these measures are representative of the whole population.
The key issue is that someone is noted as having the virus after being checked by a doctor, this tells us that:
- This is an individual showing symptoms,
- This is more likely to be an individual that is disproportionately impacted by the symptoms
As a result, we would expect the following biases in the numbers:
- People with no symptoms or with limited symptoms that didn’t lead them to seek out a doctor will not have been counted – so the number of infections could be much higher. Furthermore the spread rate (Rho) may be higher.
- If the vast majority of those with major symptoms do seek help, and so are measured, this indicates that the death rate would be much lower.
- Furthermore, the serious and critical rates would also be lower.
- Interestingly, if a given condition is more likely to lead to symptoms being serious (such as those with pre-existing cardiovascular conditions) then we will relatively overestimate how many people with those conditions catch the disease.
As we know, the Diamond Princess cruise ship has been quarantined in Japan due to the coronavirus outbreak. This quarantine is now ending, with countries getting their citizens home.
On February 18 worldometers noted the following:
“88 new cases on the Diamond Princess cruise ship in Japan were confirmed as a result of 681 people being tested (13% infection rate). Of these, 65 people (74%) have no symptoms. So far, a total of 542 infected people were found among 2,404 passengers and crew members tested (23% infection rate) out of 3,711 total people on the ship”
This isn’t the same as testing the “population of the ship” – or a random sample. As they will have tested everyone who was showing symptoms, and then some number of people who had not. If the additional people were selected on the basis of their closeness with those with symptoms, rather than randomly, that would generate further selection bias.
However, 65% of all passengers were tested even with a large number of these individuals not showing symptoms – as a result although selection bias still exists it is not as strong.
If the last days data were representative of the rest of the sample, this means that 74% of people who have tested positive for the virus showed no symptoms. [Note: It may not be representative of the sample, especially if those showing symptoms were tested first – I am just using this for the sake of argument].
There are three biases in estimating population statistics given current sampling around the world – a) a downward bias in the % of people not showing symptoms relative to total infected, b) a downward bias in the number of people infected (so in the infection rate), and c) an upward bias in the infection rate among those tested.
Given selection bias we would expect a LARGER % of people with the virus to show no symptoms – as the sample includes those showing symptoms (who make up 100% of the population showing symptoms) and a random selection of individuals some of which are infected. As some of the unsampled individuals not showing symptoms will be infected, this implies that the percentage not showing symptoms of the total infected is larger.
In addition, this figure as a population estimate of the number infected is biased downwards. As the worldwide sample is based on only testing those who have symptoms, people not showing symptoms who have the virus are excluded from the sample.
Furthermore, the infection rate among those tested is biased upwards, as your chance for being selected for sampling is related to having the disease.
For the sake of argument let’s pretend that they did test everyone on the boat to get a 75% figure – and assumed the people on the boat was a random selection of the global population (which cruise ship passengers aren’t, eg they tend to be older and likely more likely to show symptoms). Furthermore, assume that the only testing that has occurred around the rest of the world is on people showing symptoms.
This implies that we have a potential correction to current world-wide estimates based on the “natural experiment” on the boat where there is no selection bias. Specifically, the current case number is equal to only 25% of the total – or around 300,000 people are infected.
There are two ways to read this figure:
- We may think that many of these infections just haven’t manifested in symptoms YET and as a result this is very scary.
- Seeing how long the boat was quarantined we may view this as saying that 75% of individuals that catch the virus don’t ever have symptoms. As a result, the true fatality rate is lower.
Let’s assume that the gross death rate for the above figures is going to be 12% – so around 9,000 poor souls pass away based on the 75,000 who are recorded to have caught the virus. If the real number of cases is 300,000 then the fatality rate is instead 3%, much closer to the WTO estimates.