The point I was trying to make (and will elaborate here) is that the usual mantra, "Correlation does not imply causation," is true only in a trivial sense, so we need to think about it more carefully. And as regular readers might expect, I'll take a Bayesian approach.

It is true that correlation doesn't imply causation in the mathematical sense of "imply;" that is, finding a correlation between A and B does not prove that A causes B. However, it does provide evidence that A causes B. It also provides evidence that B causes A, and if there is a hypothetical C that might cause A and B, the correlation is evidence for that hypothesis, too.

In Bayesian terms, a dataset, D, is evidence for a hypothesis, H, if the probability of H is higher after seeing D. That is, if P(H|D) > P(H).

For any two variables, A and B, we should consider 4 hypotheses:

A: A causes B

B: B causes A

C: C causes A and B

N: there are no causal relationships among A, B, and C

And there might be multiple versions of C, for different hypothetical factors. If I have no prior evidence of any causal relationships among these variables, I would assign a high probability (in the sense of a subjective degree of belief) to the null hypothesis, N, and low probabilities to the others. If I have background information that makes A, B, or C more plausible, I might assign prior probabilities accordingly. Otherwise I would assign them equal priors.

Now suppose I find a correlation between A and B, with p-value=0.01. I would compute the likelihood of this result under each hypothesis:

L(D|A) ≈ 1: If A causes B, the chance of finding a correlation is probably high, depending on the noisiness of the relationship and the size of the dataset.

L(D|B) ≈ 1, for the same reason.

L(D|C) ≈ 1, or possibly a bit lower than the previous likelihoods, because any noise in the two causal relationships would be additive.

L(D|N) = 0.01. The probability of seeing a correlation with the observed strength, or more, under the null hypothesis, is the computed p-value, 0.01.

When we multiply the prior probabilities by the likelihoods, the probability assigned to N drops by a factor of 100; the other probabilities are almost unchanged. When we renormalize, the other probabilities go up.

In other words, the update takes most of the probability mass away from N and redistributes it to the other hypotheses. The result of the redistribution depends on the priors, but for all of the alternative hypotheses, the posterior is greater than the prior. That is

P(A|D) > P(A)

P(B|D) > P(B)

P(C|D) > P(C)

Thus, the correlation is evidence in favor of A, B and C. In this example, the Bayes factor for all three is about 100:1, maybe a bit lower for C. So the correlation alone does not discriminate much, if at all, between the alternative hypotheses.

If there is a good reason to think that A is more plausible than B and C, that would be reflected in the priors. In that case the posterior probability might be substantially higher for A than for B and C.

And if the resulting posterior, P(A|D), were sufficiently high, I would be willing to say that the observed correlation implies causation, with the qualification that I am using "imply" in the sense of strong empirical evidence, not a mathematical proof.

People who have internalized the mantra that correlation does not imply causation might be surprised by my casual (not causal) blasphemy. But I am not alone. This article from Slate makes a similar point, but without the Bayesian mumbo-jumbo.

And the Wikipedia page on "Correlation does not imply causation" includes this discussion of correlation as scientific evidence:

Much of scientific evidence is based upon a correlation of variables – they are observed to occur together. Scientists are careful to point out that correlation does not necessarily mean causation. The assumption that A causes B simply because A correlates with B is often not accepted as a legitimate form of argument. However, sometimes people commit the opposite fallacy – dismissing correlation entirely, as if it does not suggest causation. This would dismiss a large swath of important scientific evidence.

I think this is a reasonable conclusion, and hopefully not too shocking to my colleagues in the back of the room.In conclusion, correlation is a valuable type of scientific evidence in fields such as medicine, psychology, and sociology. But first correlations must be confirmed as real, and then every possible causative relationship must be systematically explored. In the end correlation can be used as powerful evidence for a cause-and-effect relationship between a treatment and benefit, a risk factor and a disease, or a social or economic factor and various outcomes. But it is also one of the most abused types of evidence, because it is easy and even tempting to come to premature conclusions based upon the preliminary appearance of a correlation.

UPDATE February 21, 2014: There is a varied and lively discussion of this article on reddit/r/statistics.

One of the objections raised there is that I treat the hypotheses A, B, C, and N as mutually exclusive, when in fact they are not. For example, it's possible that A causes B

*and*B causes A. This is a valid objection, but we can address it by adding additional hypotheses for A&B, B&C, A&C, etc. The rest of my argument still holds. Finding a correlation between A and B is evidence for all of these hypotheses, and evidence against N.

One of my anonymous correspondents on reddit added this comment, which gives examples where correlation alone might be used, in the absence of better evidence, to guide practical decisions:

In general, one of the nice things about Bayesian analysis is that it provides useful inputs for decision analysis, especially when we have to make decisions in the absence of conclusive evidence.This [meaning my article] is not too different from the standard view in medicine, though usually phrased in more of a discrete "levels of evidence" sense than a Bayesian sense. While direct causal evidence is the gold standard in medicine, correlational studies are still taken as providing some evidence that is sometimes worth acting on, in the absence of better evidence. For example, correlations with negative health outcomes are sometimes taken as reasons to issue recommendation to avoid certain behaviors/drugs/foods (pending further data), and unexpected correlations are often taken as good justification for funding further studies into a relationship.