Google+

Misleading with Statistics

Numbers, and by extension statistics, have scared more students than any other subject. Of course there is physics too that scares them. But the reluctance to process physics comes rather naturally, while resistance to numbers and statistics often comes as a result of repeated attempts to master the art of crunching numbers. It is perhaps due to this fear that often most of us show a propensity to rely on matter experts instead. The expert on numbers in today's world is often called a statistician. Their numbers and explanations are trusted with so much confidence that one barely questions the reports produced by them. However, are those reports always objective? Although the numbers themselves are often intact meaning inferred are seldom concordant. That leads to pondering if statisticians are truly capable of perpetually calibrating truth objectively? Aren’t they, just like all others susceptible to human errors or biases? Isn’t it too much to expect for them to override natural human disposition?

To illustrate: in 2007, the Advertising Standards Authority (ASA) of UK received a complaint against the most famous ad campaign by Colgate - ‘More than 80% of the dentists recommend colgate.’ It’s a tagline that most of us can recall when we think of colgate. Then ASA began investigating the survey that led to these claims. In this survey dentists were requested to recommend toothpastes and most of them did recommend colgate. However, Colgate was one among the recommendations they made. In other words, there were other brands of toothpastes too. Also, it was found that not all dentists positioned colgate first in their list of recommendations despite mentioning colgate at some point. This is a classic example of misleading with statistics. However, it is important to note that this is not just an example of misleading statistics. That is, the survey was actually conducted and colgate was indeed recommended. However, the advertisement makes it seem that 80% of the dentists prefer colgate over others, which is potentially misleading.

This article suggests ways to catch misleading data even if one is not well-acquainted with numbers. The following sections list markers crucial for interpreting statistical data, and outline possible ways in which people are commonly misled. Before beginning, it is essential to understand the three bases of statistical data: data collection, interpreting the data, and representing it. There is scope for misleading at all the three stages. 

Unsurprisingly, the most common and rudimentary source of misleading happens to be sampling. Sampling is the method of picking a subset from a larger population to gather required data. An example of such a sampling error led to the wrong prediction of the United States’ president in 1936. It so happened that Literary Digest, a popular weekly magazine in the US conducted a survey to predict the future president in the 1936 elections. This magazine was famous for such predictions as they proved to be accurate most times. However, this time it was different. They predicted Alfred Landon, and not Franklin Roosevelt, would win by a landslide majority. The result declared Mr. Roosevelt as the president with almost double the votes than Landon. This subsequently led to the vanquish of the magazine that conducted the survey. Upon much scrutiny and speculation from academicians and statisticians, it was revealed that there were two reasons for wrong prediction. One, the data was drawn from telephone books and automobile registration lists. Meaning, the haves of the society had participated more than the have-nots of the society. Second, the response rate to the survey was higher from those who supported Landon than from those who supported Roosevelt, both leading to selection bias. Therefore, the type of sample, its size, and rate of response from different categories within the sample are of crucial importance. 

Then comes interpreting data obtained from the sample. This stage finds meaning from the results obtained. For example, in June 2018, Instagram had 1000 million users. Now, a consumer behaviourist can interpret this data as ‘in the 21st century advertising on instagram is more profitable.’ Therefore, data interpretation adds context to the findings and points to possible application of the findings. The most common error in data interpretation is reading correlation as causation. Correlation between two variables suggests a relationship, like say the nature of their proportionality, inverse or direct. For example, a study in New York found a correlation between the number of ice creams sold and increase in the number of homicides in the city. Does this mean the ice creams are causing those homicides? No; it's a coincidence that has been explained with temperature rise as both ice cream sales and homicides tend to increase with temperature. Therefore, correlation can neither mean nor predict causality. Now, it is easier to spot the difference in an absurd correlation as the above example, what would you think if you see a more probable claim. Like, children who read more are more intelligent as reading and intelligence are positively correlated. In reality, reading does not cause intelligence despite their positive correlation. Therefore, always keep in mind that correlation does not imply causality, no matter how reasonable it looks. 

The third and the last means of misleading with statistics is through data representation. The data obtained and interpreted is finally represented in understandable, often catchy formats. Although the objective of representing data is quick, clear and easy understanding, this is one important means of misleading. The most widely used form of representation is graphs, followed by pie charts and tables. The following images help to illustrate misleading with graphs: 

Fig:1 Google stats of Covid cases in Andhra Pradesh on 23rd May.

figure 1.png
figure 2.png

Although it is true that case numbers may not be reported accurately, for the sake of the following example, let’s assume they are. A brief comparison of these statistics shows that the number of cases in both states is rising. In such an interpretation we seldom try to compare the specifics in the rise. For instance, when we focus on the Y-axis, we see that it is not the same for both figures. Meaning, although the cases are rising, they differ in the rate of rising. Therefore, the metrics carry profound insights but we seldom take them into notice. Therefore, maybe comparing the two states on how well they are handling COVID-19 is only fair dubiously. 

Following is a checklist to keep in mind to assess an advertisement that one may come across on TV, pamphlets, or as social media posts:

  • Looking into the sample upon which the study has been carried and checking for its representativeness.

  • Remembering that correlation does not imply causality. The found relationship often only shows that they ‘might’ be related. 

  • Never overlooking the metrics, and taking their form (logarithmic or exponential) into account. 

  • Besides the technical observations, it is also important to observe who is funding a certain research. It is crucial to take notice of it as a lot of errors in sampling, interpreting and representing can be a result of funders motives! 

Happy debunking advertisements in the future!

Rajitha Panditharadyula

mini_logo.png