Note: A lot of this article will build off of my previous blog post on fixed-effects models and random-effects models. So, I would highly recommend reading that post if you don’t know the difference between the two models or if you are a bit shaky on the differences and most of this post is based on the work of Borenstein (Borenstein, Hedges, Higgins, & Rothstein, 2011).

Usually, most people don’t consider the impact of statistical power on meta-analyses, but it’s worth considering because a meta-analysis **can be underpowered** and it may not be for the reason you think. It sounds bizarre, doesn’t it? An underpowered meta-analysis? Doesn’t that defeat one of its purposes?

If you’re not sure what I mean, let’s brush up on why we do meta-analyses in the first place. A meta-analysis combines the treatment effects from several studies to get an overall effect. Here are some reasons why we conduct meta-analyses

- To increase precision and estimate the true effect or average of true effects
- To increase statistical power (the ability to detect an effect when there is one)
- To see how robust an effect is across the literature
- To explore sources of true dispersion across effects (heterogeneity)
- To also assess things like risk of bias, study quality, publication bias, etc.

**Understanding Power Conceptually**** **

Since this post is about statistical power, I think it’s essential to understand what it is conceptually, and I think these next few sections will be necessary to read even if you have a full grasp of power, just to better follow the article. There’s an excellent analogy that Motulsky (Motulsky, 2014) uses to describe statistical power. It goes something like this:

Suppose you have a key (effect) in your room and you are looking for it. Now imagine two scenarios and how likely you are to find the key in both scenarios.

**Scenario 1:**

- The key is small (small effect)
- You are the only person looking for it
- There’s a lot of clutter in the room (lots of variance)

**Scenario 2 **

- The key is large (large effect)
- There are several people looking for it
- The room is organized (not too much variance)

In which scenario are you more likely to find the key? If you’re blessed with extrasensory perception (Bem, 2011), probably scenario 1, and I’d also wonder why you’re reading this blog post in the first place. But, if you’re a mere mortal, probably scenario 2.

**Understanding Power Mathematically**

Now that I’m 95% confident that I’ve driven the concept of statistical power home conceptually, I want to talk about it a bit more mathematically. For example, **why** do larger sample sizes increase statistical power and why do larger effect sizes also do this?

We can use the test statistic above to understand. The equation above calculates a Z score, from our mean effect (M) over the standard error of the mean (SE), which we compare against a distribution of Z scores to see whether our data are significant. So, if we set our significance level to .05 and we consider everything underneath .05 to be significant, then that means that a Z score above 1.96 is statistically significant. If our Z score is below 1.96, it’s not significant.

We can get larger Z scores if the effect size (M) increases or if standard error (SE) decreases (larger sample size). But if we also choose a higher alpha level, say .10, then we need a smaller Z score (less than 1.96) to achieve statistical significance.

So we can establish that the following increase statistical power (ability to find a significant effect):

- Larger effect sizes (M)
- Larger sample sizes, which lower standard error (SE), and
**increase precision** - Larger alpha levels, which reduce the threshold of the Z score to achieve significance

OK, now we can talk about power in meta-analyses.

**The Effects of Meta-Analysis Models on Power**** **

Generally, in a meta-analysis, the number of participants will affect the statistical power. The reason being that larger sample sizes will **lower standard error** and as we saw above, lower standard error = higher Z statistic = more likely to be significant = higher power. While statistical power is largely influenced by participants, **the meta-analysis model** we choose will also affect our statistical power.

**Statistical Power in Fixed-Effects Models**

In a fixed-effects meta-analysis, where you’re assuming there is** one true effect** and it is the same for all studies, your statistical power is primarily influenced by the number of participants your meta-analysis has in total. Remember # of participants influences standard error. Here’s the general equation (more conceptual than applied) for standard error in a fixed-effect model. K is the number of studies; sigma is standard deviation and n is the number of participants.

Let’s say you hold the standard deviation constant. If you include two studies with 500 participants in your meta-analysis or ten studies with 100 participants each, you’ll get a total of 1,000 participants in both scenarios, and the same standard error in both scenarios, (again, if the standard deviation is held constant), hence the same power. Try it! Plug these number into the equation above. You’ll get the same bottom term (1000), which will give you the same standard error.

**Statistical Power in Random-Effects Models**

In a random-effects model, the number of participants influences your statistical power, which is the same as the fixed-effects model, but so does the number of studies. So, let’s use the two scenarios above again as an example.

**Scenario 1:** You include 2 studies with 500 participants each = 1,000

**Scenario 2:** You include 10 studies with 100 participants each = 1,000

Which do you think has more statistical power in a random effects model. Is one higher than the other, or are they equal?

It’s scenario two because you have more studies. Even though both situations give you 1,000 participants in total, you have more studies in scenario two, which provides you with a lot more statistical power.

Let’s look at this mathematically. Here’s the equation for standard error in a random-effects model.

Compare it to one above for fixed-effects model. Notice any difference? Yeah. There’s a new part of the equation; it’s the **between-study variance** divided by **the number of studies**. The between-study variance is not divided by the number of participants, but instead, the number of studies you have. If you don’t have too much variance, and many studies, your second term will be **small**, and it won’t add too much to the standard error. But if you have lots of between-study variance, and very few studies, this second term will add more to your standard error, and it will be higher, even if the number of participants is large. The only way to make the second term negligible is to have an **infinite number of studies**.

The above equations using standard error don’t reflect the real equations to use in a meta-analysis, which involve using the square root of the variance of the studies (equations below), These are calculated differently based on the meta-analysis model that’s chosen (however, the equations I used above to get my point across are still valid.)

So what have we learned so far? In a fixed-effects model, the standard error is primarily influenced by the number of participants, while in a random-effects model, it’s influenced by the number of participants AND the between-study variance/the number of studies.

A random-effects model makes more sense in the real world because there are very few instances where a fixed-effects model is appropriate. Again, see my previous blog post on the assumptions behind these models.

**Underpowered Meta-Analyses**

While most meta-analyses are likely to have reasonably good power, in many cases, they may not. Some reasons include:

- A lack of studies (mainly affects random effects) and/or participants (affects both)
- Lots of true variance (heterogeneity)
- Using dichotomous outcomes
- Looking at variables that are difficult to detect because they are rarer (side effects)

So, why would an underpowered meta-analysis be a problem? Imagine the that your meta-analysis has the following characteristics:

- Using odds ratios to pool the number of people who experienced adverse events
- you’re using a random-effects model (based on assumptions)
- you haven’t found too many studies
- there’s a lot of between-study variance,

What’s the result?

Low power. And then, you’ll falsely conclude that XX drug doesn’t result in a statistically significant increase in adverse events. What’s the solution to this? Doing a power analysis before starting a meta-analysis, which will help you figure out how many studies/participants you’d need to achieve a certain amount of power. Doing post-hoc power analyses aren’t very useful unless you report the confidence intervals, which give you more information then “our meta-analysis had low power, which is why we couldn’t detect an effect.”

The calculations by hand are somewhat complicated and tedious, but there is a reasonably straightforward R script that folks like Dan Quintana have developed to do this. I can’t vouch for the R script because I haven’t used it myself, but I thought it would be worth sharing.

**References**

- Borenstein M, Hedges LV, Higgins JPT, Rothstein HR.
*Introduction to Meta-Analysis*. John Wiley & Sons; 2011. - Motulsky H.
*Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking*. Oxford University Press; 2014. - Bem DJ. Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect.
*J Pers Soc Psychol*. 2011;100(3):407-425.

Great

RE: “To also assess things like risk of bias, study quality, publication bias, etc” My understanding was that meta-analyses are conducted using a sample of published studies. Doesn’t this feature subject them to the publication bias? Or has that method of sampling chnaged?

It does. My article before this one was on publication bias and some methods of detecting it. Our best way to currently deal with publication bias is to extensively search the published literature, the gray literature, and unpublished literature. Only a few research groups like Cochrane and Campbell do this, which is why their systematic reviews are of much higher quality than the vast majority that are published (which only add noise).

And it is possible to obtain unpublished research. For example, in 2008, Irving Kirsch published a meta-analysis of antidepressants using both published and unpublished research. He got his hands on the unpublished research by invoking the Freedom of a Information Act and pressing the FDA and pharmaceutical companies to hand over the registered study data put in a file drawer (preregistering with the FDA is a requirement with pharmaceutical trials).