Why Are Sample Sizes and Quality Control So Important?
When doing research, we want to know how things affect certain people. For example, a researcher may want to know how treatment X (a supplement or a drug), affects people with condition A, or people who live in area B.
Unfortunately, we can't study all the people with condition A or all the people who live in area B. It would take forever. So rather than do this, we take a random sample of participants from a population, we study the effects of a treatment on them, and then we generalize those results to the population. This saves us time and money.
And this is how most scientific studies are designed, with the idea that if the study is reliably able to show a cause and effect relationship in the sample, and if the sample is representative of the population, then you can infer that this same cause and effect relationship can apply to the population.
The purpose of most scientific studies is to study samples of people from a population, see what kind of relationship can be determined from this sample, and to extrapolate that to the overall population.
Unfortunately, this is not how things always turn out.
Sample Sizes, Precision, and Noisy Data
Let’s use a coin to demonstrate.
If you had a fair coin, there’s a 50% probability that you might get heads or tails. Now, let’s flip it ten times. You might get five heads and five tails. You might also get ten heads and 0 tails or 0 heads and ten tails. There are quite a few possibilities. The chance of getting five heads or 5 tails, exactly, is not that high. You're only flipping it ten times. You might continuously get heads and wonder what the hell is going on.
However, let’s say you flipped the coin 1,000 times or 1,000,000 times. Now, the ratio of heads to tails is a lot more likely to be 1:1, it might still slightly vary, but the ratio will be very close.
This is the same with scientific studies. When you have small sample sizes, there’s a lot of random error that can influence the results you get.
An Example Using Drugs
Let’s use an example with simulated data from the statistical package R.
Below are two groups: Group A got the placebo, and Group B got Drug Z, a treatment that improves mood. Each of the dots is the scores of individuals' responses to their treatment. Higher score = improvement in mood. The black horizontal lines represent averages of each group. The top graph has 10 participants in each group. The bottom figure has 80 participants in each group.
Let's say the TRUE effect Drug Z on mood is an average increase of 0.6 points. This is the TRUE effect on the population (of course we don’t know this, we’re using studies to figure out what the real effect is!). Anyway, again, the true difference between group A and group B should be 0.6 because the placebo has no actual effect (hypothetically in this scenario), while Drug Z has an effect (which we're trying to figure out).
Notice any difference? Yup. Having 80 participants in each group led to group B having an average increase of 0.6 points, AKA the TRUE effect of Drug Z, whereas having 10 participants led to group B having a whopping average of 2! It overestimated…...by A LOT. Meaning if we only used 10 participants in each group, we'd think that Drug Z is waaaaay more useful than it is, based on a tiny sample size that overestimated because a few people responded too well. These are the effects of random error
With larger sample sizes, just like more coin flips, the average begins to balance out because the high scores are also balanced out by really low scores, and since the TRUE effect is 0.6 for this scenario, most people will cluster around that area.
In summary, we can reduce the effects of random error on our averages with larger sample sizes, therefore increasing the precision of our overall data.
This fixes RANDOM ERROR but not SYSTEMATIC ERROR which is referred to as bias in science.
Accuracy, Quality Control, and Risk of Bias
Let’s go back to our coin flip example. If we had a coin and it wasn’t a fair coin, let’s say it was shaped all weird and it would overall lead to more heads than tails, then we might flip this coin ten times and get several heads, maybe even several tails. Now, some people may think that they can get a proper balance of heads to tails if they start flipping the coin 1,000 times or 1,000,000 times.
Here's the reality though: flipping this unfair coin A MILLION times won’t solve this issue. We’ll still get a significantly higher chance of getting heads in the long term because this error (shaping the coin all weird) has messed with our probability.
The same applies to scientific studies. If there is no QUALITY control and if there’s a high risk of bias in a study, large sample sizes will NEVER fix this. They will always lead to results that may be precise, but not accurate. Meaning it will be off from the TRUE result. This is demonstrated below.
Bias has to be addressed with quality control and cannot be fixed with larger sample sizes.
Sources of bias include things like improper randomization, lack of blinding, measurement and sampling error (which are also influenced by random error). Luckily, with good quality control, you can address sources of bias (systematic errors) and reduce its effects in the first place.
So, if a meta-analysis (which combines several scientific studies AKA combines sample sizes to increase precision) is trying to pool together several small studies with poor quality control, it's just adding more clutter to the scientific literature. It is NOT helping anyone.