When conducting statistical analysis, researchers try to come to conclusions based on key factors, metrics, and rigid statistical indicators, nevertheless many statistical analysis have been proven wrong over the years.
For example, the famous 1998 study conducted by physician Andrew Wakefield claimed that MMR vaccines caused autism, the study stirred fear, led many parents in hesitating to give their children proper vaccination and led to fatal consequences. A while later, this study was proven to have statistical bias, specifically, the study was conducted only on 12 children whom were susceptible to autism.
The study was retracted a year later, however the consequences were irreversible.
Reasons behind vaccination hesitancy
Another example is the meta-analysis of dietary fatty acids and risk of coronary heart disease, which concluded that eating fatty foods will lead to coronary heart diseases. This study contains multiple errors, incorrect logic implication, and induced statistical bias. This study had huge impact on consumers choice and dietary guidelines, and led a noticeable percentage of the world population to shift from natural fatty foods to fat free proceeded food, resulting in increased obesity, and diseases rates.
Obesity rates for different age groups
What is common between incorrect studies is that they were not conducted by professionals, or may have a greater profitable purpose. As we saw, the previously mentioned studies had negative impact on mass population.
To be able to spot incorrect studies, we will conduct a mock study, conclude what went wrong with it, and finally obtain key points and questions to ask when trying to distinguish between correct and incorrect studies.
In this study, we will talk about the effect of immigration and unemployment rate, specifically in Spain Barcelona, the study covers data from 2015 till 2017 for multiple districts in Barcelona, and is represented in the below visuals.
For more details please refer to the below link:
https://app.powerbi.com/groups/me/reports/26839ccf-c9c6-480c-971c-721928c474a6/ReportSection
In a more concise way we can see the unemployment rates for different districts in Barcelona
Unemployment rates for different districts in Barcelona
After conducting our study we concluded that when the rate of immigration increases the unemployment rate also increases with a clear linear relationship.
Relationship between immigrant rates and unemployment rates
On the other hand, a simple root cause analysis can show an inconsistency in our findings, for example the monotonic relation between unemployment rate and immigration rate does not stand for all cities, this can be seen by looking at the size of variables in image 2 and 6, we can see that cities have changed inside the bubbles, this tells us that other factors may play role in the unemployment rate could be city economic factors, under qualification for a job… A further look to image 1 and 2 would assure that, we can that the relation between the number of unemployed people and the rate of unemployed people does not stand, this means that there are cities with larger population but less opportunities.
What went wrong with this study ?
Not all factors were taken in consideration, as seen in our conclusion more factors can lead to unemployment rates from which are economics factors, candidates under qualifications, low opportunities, unwillingness to work a specific job..
The counter logic does not apply, since there are multiple factors that can determine unemployment rates and there was not data for these factors, an isolation test should be made , for example we need to at least prove that if immigrant rates decreases the unemployment rate should also decrease.
The study was conducted only on Barcelona’s population only, this is not enough to prove the hypothesis of immigration rates can effect unemployment rates.
Data from different years was aggregated into a single point of time study.
What questions should I ask to determine if a study is correct ?
Was this study conducted by a professional ?
Such studies should be conducted by professional and reputable statisticians, or data scientists.
Is there a higher purpose for conducting this study ?
No study is conducted for free, we should look for a greater reason whether it’s profitable or political.
Were all factors taken in consideration in the study ?
Similar to our study were we missed multiple factors that could determine unemployment rate, other studies could omit such factors on purpose or due to lack of data.
Was the population in consideration large enough to avoid statistical biases ?
Studies should take in consideration a wide range of data to avoid statistical biases, coincidences, and to dilute specific factors in small samples.
Does the counter logic also imply?
For example if the presence of factor A can lead to to factor B, will the absence of factor A eliminate factor B.