Hide Boring Results Using Within-Group Analyses

When we’re testing an intervention, we want to see if it’s better than something that we know doesn’t work (a placebo control), or something that we know works (an active control). We not only want to see if it works better than these controls, but we also want to look at how much better it is than the controls.

These are known as between-group analyses because you’re usually testing the averages between groups (treatment and control) as a function of time.

However, there’s another common analysis done by researchers, and it can trick people into thinking that a treatment works when it’s no better than the control, and that is known as a within-group analysis.

Note: This isn’t to say that within-group analyses are useless, they indeed are useful in several contexts, but I want to discuss how they can be narrated in a way to hide null findings.

Let’s Use Cholesterol

Imagine that you’re testing a new drug that might lower total cholesterol. You want to test this against something that doesn’t work (a placebo) or something that already works (a current known treatment for high cholesterol ).

You gather 100 people with high cholesterol. Randomize them into group A (n=50) which gets the new drug, or group B (n=50) which receives the control.

At the beginning of the study, you calculate the average total cholesterol in both groups. They’re both incredibly high (400 points) and the same. Perfect. Now you give Group A their treatment and Group B the placebo, which they must take every day. They come back in 20 weeks. You calculate the average of each group again, after the 20 weeks and find that the average in both groups is 350 points now. So, there is no difference between the averages.

You would report this in a paper by saying “there was no significant difference in total cholesterol over 20 weeks between Group A and B.” This tells you that the new treatment isn’t better than placebo.

But here’s the thing, the averages for each group did get lower over time. Group A’s average total cholesterol went down 50 points. So, did Group B’s average. There was no difference at the end between them, both had 350 points, but they did decrease over time from 400 points.

   The Importance of Control Groups

This improvement could be attributed to the placebo effect, but also to a phenomenon known as regression towards the mean, which means that extreme values become more average over time due to randomness. People who are incredibly unhealthy at one point will become healthier (towards an average) over time.

Thus, both groups were bound to improve over time anyway, and the purpose of the study was to see, that even when we knew that both groups would improve over time, would the new treatment result in even higher improvements on top of the placebo effect and regression towards the mean?

And it didn’t. There was no between-group difference in averages. They were both 350. But again, they were 350. Not 400. So, researchers will compare the average from the beginning of the study (400 points) to the average after the study (350 points) with a statistical test. And obviously, there’s a 50-point difference in both groups, which includes the new treatment group. And they will report from this data,

“There was a significant improvement in total cholesterol over time for Group A.” And then conclude their paper with “the drug works,” when it was no different than placebo. In fact, they may choose not to focus on the between-group differences at all (remember there were none), but rather on the within-group differences over time.

And that is one way to spin boring results into publishable findings.

Again, the entire point of having a control group in the first place was to see if the intervention would result in more significant improvements that could not be attributed to the placebo effect and regression towards the mean. When you focus only on improvements over time without comparing it to a control, you lose this ability to distinguish what are real effects from noise and placebo.

Here’s a nice paper discussing this issue in greater detail.

3 thoughts on “Hide Boring Results Using Within-Group Analyses

  1. What about in the case where a between subjects test shows no difference at baseline between groups and no significant difference between groups at post-test, but a within subjects test is significant for the intervention but not the control? In other words, the control did not change at all, but the intervention did when compared against itself, but still is in the noise band of the variation around the control group. Is this still worth discussing?

Leave some bamboo