The Bradford Hill Criteria Don’t Hold Up

In 1965, the epidemiologist, Austin Bradford Hill, who helped link smoking to lung cancer, gave a speech where he presented his viewpoints/criteria on how we can arrive at causation from correlation.

This lecture was a bit of a game changer at the time given that the tobacco industry was employing statisticians, medical doctors, and even popular science writers to push the idea that the relationship between smoking and lung cancer was merely a correlation, not a causal one.  

The tobacco industry using physicians to market its products

Both the tobacco industry and empiricists argued that the existing data was not very convincing because there were no human experiments showing that smoking causes lung cancer. Of course, this was a sticky situation because it was neither ethical nor practical to randomize people and force them to smoke cigarettes and compare rates of lung cancer to a control group.

Ronald Fisher, the pioneer of modern statistics and randomization, smoking from his pipe

Austin Bradford Hill and his co-investigator, Richard Doll, were able to argue with compelling data from the British Doctors’ Study that smoking cigarettes does indeed cause lung cancer, when combining these data with several other data (molecular, cellular) and using a set of criteria.

Since then, these criteria have been used as a sort of checklist in several papers and by several authors to argue causality when randomized trials weren’t possible. For example, Science-Based Medicine (a blog that I am mostly a fan of) often refers to these criteria and recently discussed its usage in a book about hormonal therapy.

In this post, drawing heavily from Rothman, Greenland & Lash, 2008 (from here on, RGL), I want to discuss these viewpoints, and why many of them don’t often hold up.  

Hill’s nine viewpoints were the following:


Stronger associations, according to Hill, were more compelling for causal relationships than weaker associations because of the possibility of unmeasured confounding leading to weak associations between two phenomena. He uses the example of smoking and cancer and compares it to thrombosis in smokers,

“...prospective inquiries into smoking have shown that the death rate from cancer of the lung in cigarette smokers is nine to ten times the rate in non-smokers and the rate in heavy cigarette smokers is twenty to thirty times as great. On the other hand the death rate from coronary thrombosis in smokers is no more than twice, possibly less, the death rate in non-smokers.”

It does indeed sound compelling. Science-Based Medicine uses the same argument for alternative medicine,

“If acupuncture or homeopathy were 400 times superior to placebo, there would no discussion of its validity. Many medical therapies are not 400 times as effective as placebo, but the strength of the association between cause and effect is well above background noise.”

Why It Doesn’t Hold Up

Several causal relationships that we know of today, such as the relationship between smoking and cardiovascular disease, and environmental tobacco smoke and lung cancer, had weak associations. However, Hill didn’t discount weak associations as seen here,

“In thus putting emphasis upon the strength of an association we must, nevertheless, look at the obverse of the coin. We must not be too ready to dismiss a cause and effect hypothesis merely on the grounds that the observed association appears to be slight. There are many occasions in medicine when this is in truth so.”

Okay, but really strong associations are more likely to be causal, right? Not necessarily. This criterion easily falls apart when considering relationships that are strongly associated, but noncausal. For example, there is a very strong relationship between Down syndrome and birth rank. However, this association is confounded by the relationship between Down syndrome and maternal age. Once, adjusting for this particular confounder, the association between Down syndrome and birth rank wanes.

There is no reason to believe that this wouldn’t apply to many associations that happen to be very strong. It’s possible that many associations that are strong and not spurious are likely a result of several unmeasured confounders and strong bias. We must be comfortable with the likelihood that most effects studied in the real world are small and few relationships of interest are similar to the one between smoking and lung cancer. As we can see, the strength of association doesn’t necessarily contribute to the determination of causality.


Hill defines something as being consistent if it has been observed in different people, in different places, and under different circumstances. Hill writes,

“Some of them on the customary tests of statistical significance will appear to be unlikely to be due to chance. Nevertheless whether chance is the explanation or whether a true hazard has been revealed may sometimes be answered only by a repetition of the circumstances and the observations.”

Science-Based Medicine writes,

“Almost every study should support the association for there to be causation.”

Why It Doesn’t Hold Up

Several causal relationships may be circumstantial or unique to a particular population and may not occur in other scenarios; however, this does not rule out causality. RGL provide a great example of this: blood transfusions can lead to the receiver being infected with HIV, however, this *only* occurs if the virus is present in the donor’s blood. If there is no virus, then the transfusion cannot cause HIV. However, just because blood transfusions don’t always cause HIV in several different scenarios doesn’t mean they don’t cause HIV in particular scenarios (with the virus present).


Hill described specificity in two components: that a specific cause leads to a single effect, and that an effect has one cause. He writes,

One reason, needless to say, is the specificity of the association, the third characteristic which invariably we must consider. If as here, the association is limited to specific workers and to particular sites and types of disease and there is no association between the work and other modes of dying, then clearly that is a strong argument in favor of causation.”

Why It Doesn’t Hold Up

This one makes little sense given that many exposures (such as smoking) are known to contribute to several disease and outcomes. However, Hill also acknowledged this in the same lecture,

“We must also keep in mind that diseases may have more than one cause. It has always been possible to acquire a cancer of the scrotum without sweeping chimneys of taking to mulespinning in Lancashire. One-to-one relationships are not frequent. Indeed I believe that multi-causation is generally more likely than single causation though possibly if we knew all the answer we might get back to a single factor.”


Hill argued that for relationships to be causal, the cause needed to proceed the effect,

“My fourth characteristic is the temporal relationship of the association – which is the cart and which is the horse? This is a question which might be particularly relevant with diseases of slow development. Does a particular diet lead to disease or do the early stages of the disease lead to those particular dietetic habits?

Does a particular occupation or occupational environment promote infection by the tubercle bacillus or are the men and women who select that kind of work more liable to contract tuberculosis whatever the environment – or, indeed, have they already contracted it? This temporal problem may not arise often, but it certainly needs to be remembered, particularly with selective factors at work in the industry.”

Why this holds up

As RGL points out, this is inarguable,

“This criterion is inarguable, insofar as any claimed overvation of causation must involve the putative cause C preceding the putative effect D. It does not, however, follow that a reverse time order is evidence against the hypothesis that C can cause D. Rather, observations in which C followed D merely show that C could not have caused D in these instances, they provide no evidence for or against the hypothesis that C can cause D in those instances in which it precedes D. Only if it is found that C cannot precede D can we dispense with the causal hypothesis that C could cause D.”

Thus, this criterion does hold up and is a necessity.

Biological gradient

Hill’s fifth criterion is a referral to a dose-response relationship. For example, the more cigarettes one smokes, the higher the likelihood of developing lung cancer.

“Fifthly, if the association is one which can reveal a biological gradient, or dose-response curve, then we should look most carefully for such evidence. For instance, the fact that the death rate from cancer of the lung rises linearly with the number of cigarettes smoked daily, adds a very great deal to the simpler evidence that cigarette smokers have a higher death rate than non-smokers.

The comparison would be weakened, though not necessarily destroyed, if it depended upon, say, a much heavier death rate in light smokers and a lower rate in heavier smokers. We should then need to envisage some much more complex relationship to satisfy the cause and effect hypothesis. The clear dose-response curve admits of a simple explanation and obviously puts the case in a clearer light.”

Dose-response curve for three pathogens

Why It Doesn’t Hold Up

RGL mentions several problems with this criterion.

First, although Hill mentions a “linear” relationship explicitly, he does not specify on which scale a relationship should be linear. Linear gradients on scales such as risk can easily become nonlinear on other scales such as log risk, odds, or log odds.

Although it is plausible that more carcinogenic exposure would lead to more tissue damage and a higher risk of developing lung cancer, there are causal relationships that are not dose-response relationships/monotonic. For example, the relationship between DES and adenocarcinoma of the vagina is causal. However, it has been observed to be a threshold effect rather than a monotonic one.


Many monotonic, dose-response relationships are not causal and are sometimes a result of confounding, in that the confounder itself is the cause of monotonicity (maybe because the confounder is a biological gradient).

As RGL summarizes,

“These issues imply that the existence of a monotonic association is neither necessary nor sufficient for a causal relation.”


Hill argued that if a relationship seems biologically plausible, then that is more evidence in favor of a causal relationship. However, he also acknowledged that a relationship and its plausibility would depend on the knowledge at the time.  

“It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.

Why It Doesn’t Hold Up

As Hill pointed out, this criterion depends heavily on the knowledge of the observer and their prior beliefs. What may seem biologically plausible to one researcher may seem entirely impossible for another researcher, even when the same research is available to the both of them. And many times, beliefs about plausibility can be flat out wrong,

RGL: “... Cheever in 1861, who had been commenting on the etiology of typhus before its mode of transmission (via body lice) was known:

It could be no more ridiculous for the stranger who passed the night in the steerage of an emigrant ship to ascribe the typhus, which he there contracted, to the vermin with which bodies of the sick might be infested. An adequate cause, one reasonable in itself, must correct the coincidences of simply experience.

What was to Cheever an implausible explanation turned out to the be the correct explanation, because it was indeed the vermin that caused the typhus infection. Such is the problem with plausibility: It is too often based not on logic or data, but only on prior beliefs. This is not too say that biologic knowledge should be discounted when a new hypothesis is being evaluated, but only to point the difficulty in applying that knowledge.”


Hill describes coherence as,

“On the other hand the cause-and-effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease – in the expression of the Advisory Committee to the Surgeon-General it should have coherence.”

This viewpoint is a bit difficult to discuss without repeating some of the arguments made against the plausibility criterion and the consistency criterion. However, Hill does elaborate a bit more on this criterion,

“Nevertheless, while such laboratory evidence can enormously strengthen the hypothesis and, indeed, may determine the actual causative agents, the lack of such evidence cannot nullify the epidemiological associations in man. Arsenic can undoubtedly cause cancer of the skin in man but it has never been possible to demonstrate such an effect on any other animal.”

Thus, according to Hill, the absence of coherent information cannot be taken as evidence against a causal relationship, but the presence of conflicting information can be considered as such evidence.

Experimental Evidence

As RGL point out, many assume the experiment criterion to be a referral to animal/human studies,

“To different observers, experimental evidence can refer to clinical trials, to laboratory experiments with rodents or other nonhuman organisms, or to both.”

Science-Based Medicine’s take on this criterion confirms what RGL claims,

“Always nice. Written in 1965, before the massive increase in biomedical research funding, experiments were not as vital in understanding diseases and treatments as they are today… There were no consistent studies discussing subluxation in the animal model.”

However, this doesn’t seem to be what Hill was referring to,

“Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent? The dust in the workshop is reduced, lubricating oils are changed, persons stop smoking cigarettes. Is the frequency of the associated events affected? Here the strongest support for the causation hypothesis may be revealed.”

To Hill, the most persuasive evidence for causation is removing a potentially harmful exposure and observing whether the frequency of the disease declines. This seems to be what he meant by experiment.

Why It Doesn’t Hold Up

RGL: “It can be faulty, however, as the “semi-experimental” approach is nothing more than a “before-and-after” time trend analysis, which can be confounded or otherwise biased by a host of concomitant secular changes. Moreover, even if the removal of exposure does causally reduce the frequency of disease, it might not be for the etiologic reason hypothesized.

The draining of a swamp near a city, for instance, would predictable and causally reduce the rate of yellow fever or malaria in that city the following summer. But it would a mistake to call this observation the strongest possible evidence of a causal role of miasmas.”


Here, Hill presents a criterion where if one can find a similar, but a plausible relationship to the one being studied, then this may provide evidence for a causal relationship.

“In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.”

Why It Doesn’t Hold Up

The problem with the analogy criterion merely is that it is easy for any researcher to find an analogy to build a stronger case for a causal relationship. The presence of such analogies may be evidence of an actual signal or imagination, while the absence of such analogies may be a result of a lack of vision or experience.

An analogy between the Bohr model of the atom and the solar system.

Concluding Thoughts

As we can see both from the original lecture by Hill himself and RGL, these criteria have several limitations and exceptions. Many continue to use them as some gold-standard checklist to see whether they can argue causality even when Hill himself argued against this form of usage,

“Here then are nine different viewpoints from all of which we should study association before we cry causation. What I do not believe – and this has been suggested – that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we can accept cause and effect.

None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question – is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?”

Because I am a pessimist, I believe that these criteria will continue to be used in this way despite several writings discouraging such use.

Science-Based Medicine,

"They stress that correlation is not causation. So how can we approach the mosaic of findings with breast cancer? They show how each of the Bradford Hill criteria for causation supports the finding that smoking causes lung cancer. Then they show that the hypothesis that estrogen causes breast cancer fails to meet any of the Bradford Hill criteria."

Skeptical Raptor,

English statistician Sir Austin Bradford Hill was interested in developing a set of objective criteria that could be used to provide epidemiological evidence of causality between a cause and effect. It serves as a sort of checklist for scientists who can take data that establishes correlation and then logically determine if that supports causality.

He used his criteria to establish that smoking was linked to lung cancer (and other diseases). He essentially went through each point of his criteria to show how smoking and cancer were linked.

No, no, no!

Subscribe to the blog.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 54 other subscribers