Flawed statistics used for lockdown decision?


“There are three kinds of lies: lies, damn lies and statistics.” – Mark Twain

SO, the lockdown continues as the numbers have not gone down to below 4,000cases. This is robust but way off track.

I think many people disagree with such a simplistic approach in the decision-making taken by Putrajaya.

Ever since the Covid-19 pandemic started, there have been many numbers or, to be precise, statistics that have been thrown about or regularly published and referred to. This is fine so long as we know how to interpret them and use them wisely. But are we really doing that?

A key element in dealing with raw data is the method that leads to data analysis itself, which will then produce meaningful statistics, the results of which are crucial and vital in the drawing of conclusion and the subsequent decision-making process.

In the case of Covid-19 data, we are presented with the number of positive cases on a daily basis. We have to assume that the testing methods are correct and will take the positive cases at its face value.

The main question that we should be asking then are:

• What is the total number of people who are being tested daily?

• Are these numbers recorded on a daily basis before a portion is classified as positive?

• Are there any repeat tests undertaken on the same positive patients?

For instance, if there are 3,000 positive cases, what is the total number that tested negative? Is it 27,000 people, which means 30,000 people in total were tested?

Or is it 3,000 cases against 50,000 people or 3,000 positives from a total of 100,000 people?

It is important to know the number of people tested and the breakdown between positive and negative cases.

Table A: A daily data set presented in a declining percentage format.


In Table A, without the second column (the number of people tested), the seriousness of the positive figures (third column) recorded over three days are misleading and should not be presented as such.

Even though the daily positive number is increasing, the rate of infection is actually declining (fourth column, by percentage).

The figures could also be presented in a ratio context, shown in the last column, which is also declining.

A total of 3,000 positive cases when measured against a testing size of 30,000 will produce an infected value of 10%, whereas over a testing size of 50,000, it is only 6%.

If 3,000 positive cases come from a 100,000 testing size, the value drops further to only 3%. Different implications could be drawn from such a statistical analysis.

Increasing or declining?

So, are we testing more people each day or are we testing fewer people? How many of these are repeatedly tested?

The daily figures, in this second sample of 3,000 to 5,000, have to be measured against the total number of people tested, whatever number that has been recorded for that day. Then only, it would represent meaningful data.

In the case of our daily published data, this number of tested figures is often not revealed. This is indeed a major flaw in the presentation of the statistics.

Let us take a look at another hypothetical table, Table B, below.

Table B: A daily data set presented in whole numbers.

The illustration indicates that the number of people tested positive is increasing rapidly from 3,000 to 10,000 within three days.

On its own, these numbers are alarming. But when measured against the total number of people tested, in percentage format, it is not so shocking. In fact, it is fairly constant at about 10% for three days in a row.

When the data on the total number of people being tested is revealed, a different story appears. Perhaps it is best for this sort of data transparency be publicised, if such data are available.

So, which of the two data sets, referred to by the prime minister, when he made his decision today?

Biased data

It has been mentioned also that our authorities do not undertake random check and testing. Therefore, a true picture or scientific analysis that could be based on a random sampling method could not be applied here.

Perhaps also, the cost of having to do random sampling is too expensive for the country to bear.

It has been reported also that the number of those who tested positive came from the suspected cases of people who were exposed to certain clusters. They were, therefore, tested purely on the basis of contact or suspicion due to their association with certain clusters.

Also, repeated people are tested a few times over several days in order to determine whether they continued to be positive. So, these same positive people are counted again and again, resulting in double or triple counting.

As such, the data presented is no longer reliable and not valid.

In that sort of scenario, it is expected that positive cases would tend to be high. In statistical analysis, it is referred to as biased and unreliable data.

Therefore, for decision-making purposes, such data should not be relied upon and should be rejected outright.

One therefore wonders, if this is the sort of data fed to the prime minister’s office for him to decide on continuing the lockdown?

Stage 1, 2, 3 or 4?

In fact, the whole process of testing and the sharing of that data, captured in an identified zone, area or district, over a given sample should not be used at all as it does not represent the true state of the population.

We also know now that Covid-19 tests classify patients into four categories – stage 1, 2, 3 and 4. Shouldn’t we be presented with the breakdown of these categories, too?

We know now that those in stages 1 and 2 are less dangerous and they could be isolated at home.

We should not lump the data altogether and not differentiate them prior to using them to make important decisions.

If certain areas are high in numbers, especially for stages 3 and 4, then it would be wiser to go for targeted lockdown, as a matter of public health strategy.

It has been mentioned that clusters are now either factory-based or work-based. So, if such factories and offices are known, then why not lock them down instead of imposing a lockdown for the whole state or the whole country?

Targeted lockdown has proved to be more effective in the past. This is specific to certain areas or a particular site for a period of time only.

Lazy strategy

As mentioned by a Johor Umno leader today, we must not be lazy to go down to the ground, identify a particular site and zoom in a particular factory or office, for a targeted lockdown.

We must not throw a blanket lockdown for all the areas in the country, simply because it is easier to do so.

This is what has been referred to as a lazy strategy. Punishing the whole population for a few delinquents is not exactly a beneficial strategy. It can be interpreted as containing political motives or politically motivated.

That is why, despite a continued lockdown in the last four weeks, the absence of a targeted lockdown strategy has led to the spread of the virus over a much larger area; a clear manifestation of the authority’s level of competency in dealing with data analysis and usage, which led to this poor decision-making.

Now we know the thinking behind the use of these data and the subsequent decision-making, which is glaringly flawed.

Precisely, this is the danger associated with a lack of understanding of these published data with no analysis shown. The subsequent interpretation of raw data quoted does not inspire confidence, let alone a well-supported decision with clear empirical evidence.

Data from different areas or districts are lumped together into the state-based data, thus we have Selangor, for example, reporting more than 1,500 cases daily.

The data obtained from several industrial areas are not even representative across the entire state.

For instance, Sabak Bernam or Sepang districts should not be punished for the rise in new cases in a few factories in Shah Alam.

Similarly, Perlis or Terengganu should not be locked down because of a constant number recorded in Selangor. This is so obvious.

There is a definite flaw in using such data to represent a district, state or geographical location that does not possess similar basic characteristics, one of which is the presence of cluster zones.

Sad to say, poor decisions are often made from inadequate analysis of statistics. –  June 27, 2021.

* Rosli Khan reads The Malaysian Insight.

* This is the opinion of the writer or publication and does not necessarily represent the views of The Malaysian Insight. Article may be edited for brevity and clarity.


Sign up or sign in here to comment.


Comments