Friday, 24 October 2014

Should we use Bayes' Theorem to do History?

It's simpler than it looks!  Image from Richard Carrier's website.

I think so.

I'm going to try my hand at writing this discussion up as a dialogue...

What is Bayes' Theorem?

Bayes' Theorem is a formula for calculating logically how well a theory is supported by the evidence.  It works by multiplying and dividing probabilities.  It is explained here and here.  You can see how I used it in all seriousness to analyse the verdict in the case of the Lockerbie Bombing or for fun in the case of Twelve Angry Men.  The Bayesian method is also used by Richard Carrier in his proof (I consider it proven!) of the non-existence of Jesus: I got the method from him and my discussion is indebted to his writing.

The basic idea is simple:
  • Look at the historical evidence that has (and has not) been discovered.
  • Consider: how likely was it that all this evidence we see would be the outcome of historical events if your theory about what happened is true?
  • Now also consider:  how likely was it that all this evidence we see would be the outcome of historical events if your theory about what happened is not true?
  • Now put those two probabilities into a ratio like 2:1 and you have the probability that your theory about what happened is true.  This is called the 'conditional probability' of your theory being true.
So fundamentally, Bayes' Theorem is useful for working out how well the evidence supports a theory in comparison with other theories.  Those theories are competing explanations of what caused the evidence to exist.  The explicit comparison of theories helps you to avoid a common mistake in historical reasoning, i.e. seeking evidence that seems to confirm your pet theory while not giving alternative theories adequate consideration.

What about the more complex version?

There's another key factor to explain: 'prior probability'.

This is the probability we estimate of a theory being true, even before considering the specific, detailed evidence.  That is, how probable we think it is based just on what sort of thing it proposes.  Prior probability is the reason why 'extraordinary claims require extraordinary evidence'.  In other words, the less inherently probable a claim or theory, the stronger is the specific evidence required to overcome this inherent improbability.  The existence of magic, for example, would require extremely strong evidence to overcome the initial improbability of phenomena existing that violate known laws of physics.

Thinking in historical terms, a theory that proposes that Martin Luther wrote friendly letters to the Pope has a low prior probability, in other words is inherently unlikely, because it goes against everything we expect based on our general knowledge about Luther.  Historians would demand extremely strong evidence before accepting this theory.  On the other hand, a theory that proposes that Luther sometimes caught a cold has a very high prior probability, since it is just the sort of thing that we know tends to happen to people, Luther included, based on our general knowledge.  We would not need much evidence at all to accept this theory.

So any theory has to be given both a prior probability based on general knowledge, and a conditional probability based on the specific evidence of the case in hand.

Can you show me how the formula represents this method of reasoning?

I sure can:

I tweaked this image from first publication.

Why should I concern myself with how likely the evidence was to be the outcome of events if my theory was not true?

You need to be aware that the evidence that makes sense on your theory might also make sense on different theories too.  For example, the evidence of a broken window on your house might suggest your house has been burgled.  The evidence makes sense on your theory.  It matches.  But, of course, it might just be that some kids kicked a football against the window.  So just because your theory explains the evidence, doesn't mean it is the only or the best explanation.  That's why Bayes' Theorem considers how likely the evidence was, even if your theory was wrong.

How do you take into account all the different pieces of evidence?

We estimate the probability of each piece of evidence existing on a given theory, then multiply all those probabilities together for an overview of the probability of all the evidence existing on that theory.

Don't historians already have a logical method for considering the merits of different theories in that manner: Argument to the Best Explanation (ABE)?

They do.  It's laid out here.  The thing is, when you analyse it, this method is completely represented by Bayes' Theorem—and improved too!

Let's see how this is so:

  • 'The statement, together with other statements already held to be true, must imply yet other statements describing present, observable data.'
    This is what Bayes' Theorem is all about: using evidence to assess theories.
  • 'The hypothesis must be of greater explanatory scope than any other incompatible hypothesis about the same subject; that is, it must imply a greater variety of observation statements.'
    The more evidence that is explained only by your theory, the higher its ratio of probability will turn out.
  • 'The hypothesis must be of greater explanatory power than any other incompatible hypothesis about the same subject; that is, it must make the observation statements it implies more probable than any other.'
    The more the evidence was more probable on your theory than on another, the higher its probability again.
  • 'The hypothesis must be more plausible than any other incompatible hypothesis about the same subject; that is, it must be implied to some degree by a greater variety of accepted truths than any other, and be implied more strongly than any other; and its probable negation must be implied by fewer beliefs, and implied less strongly than any other.'
    This is synonymous with requiring a high prior probability.
  • 'The hypothesis must be less ad hoc than any other incompatible hypothesis about the same subject; that is, it must include fewer new suppositions about the past which are not already implied to some extent by existing beliefs.'
    The more ad hoc assumptions you have to make to keep your theory alive, the less probable it will turn out, because each uncertain assumption you add to your theory reduces the theory's probability.  It's just the 'and' rule of multiplying probabilities, and is accounted for by a reduced prior probability.
  • 'It must be disconfirmed by fewer accepted beliefs than any other incompatible hypothesis about the same subject; that is, when conjoined with accepted truths it must imply fewer observation statements and other statements which are believed to be false.'
    This will also be accounted for by the prior probability, or plausibility, of your theory.
  • 'It must exceed other incompatible hypotheses about the same subject by so much, in characteristics 2 to 6, that there is little chance of an incompatible hypothesis, after further investigation, soon exceeding it in these respects.'
    This is represented by the relative consequent probabilities of the various theories: is your theory by far and away the most likely explanation of the evidence?
So, you see, historians who use a logical method of historical reasoning already use Bayesian logic without realising.  The only bit they don't do is the arithmetic.

If ABE logically reduces to Bayes' Theorem, then why bother using the theorem?

If you do the verbal reasoning logically, but don't assign probabilities quantitatively and do the maths, then you risk failing to combine logically the results of your consideration of each separate piece of evidence.  You might be biased by the tendency of the evidence you considered first.  Or you might allow one piece of evidence to overrule another when their relative strengths do not justify this.  You might just fail to take all the evidence into account, especially weak pieces of evidence that nevertheless multiply up to strong evidence when taken together.

Since you are dealing with probabilities anyway, you need to use the logic of probabilities.  In the words of this article defending the use of Bayes' Theorem in court:
Bayes theorem is a basic rule, akin to any other proven maths theorem, for updating the probability of a hypothesis given evidence. Probabilities are either combined by this rule, or they are combined wrongly.
So to refuse to use Bayesian reasoning is a refusal to think logically.

Aren't your prior and conditional probability estimates just subjective opinions?

They may indeed be.  If you do not have lots of objective data for making probability estimates, then your conclusions will unavoidably be unreliable and unscientific.  But this is not the fault of Bayes' Theorem or Bayesian reasoning.  Subjectivity and uncertainty will mar the results of Plain English verbal reasoning just as badly.  Plus you will have the added disadvantage of verbal reasoning of foregoing a logical method for combining your consideration of all the evidence together.

How do you make probability estimates about historical events?

Good question!  It's a lot harder than estimating the probability of getting a 6 off the roll of a die!

Let's say you want to know the probability that a certain general rose through the ranks of the Spartan army.  You could consider all the Spartan generals whose biographies you know, and calculate what proportion of them rose through the ranks.  That might give you a first approximation of the prior probability of this having happened to the general you are interested in.

If you are unable to come up with a convincing, objective estimate using such methods, even when you consider everything you know and all the evidence, then you will just have to accept that you will never have an objective, scientific estimate of the strength of your theory that your general did or did not rise through the ranks.

My point is, it isn't a valid objection to using Bayes' Theorem to say the data aren't scientific.  Junk in, junk out!

How can I use responsibly use Bayes' Theorem if I don't have good data?

It would be a problem, if people looked at your use of numbers and assumed your results were scientific.  Bayes' Theorem might look 'sciencey' to people who are unfamiliar with it.  So you need to warn them that your results are only as objective and scientific as the data you used to make your estimates.

If you are unable to come up with defensible estimates of exact probabilities, you might at least be able to come up with reasonable maximum and/or minimum values.  This is what Richard Carrier does in his book on Jesus: he uses maximum values that allow Jesus' existence to be as likely as he thinks is reasonably feasible—then still finds his existence highly unlikely.  That way, his argument is proven true a fortiori.  In other words, his theory is at least as likely as he estimates it minimally to be.

Doesn't multiplying probabilities together spoil the historian's sense of how the evidence fits together as a whole?


Multiplying probabilities is the logical way to combine the evidence.  The part where you get to come up with a sense of the evidence as a whole is when you contrive your theory for explaining it all.  Then you get to test your theory against the evidence using a logically valid method.

You're boring me.  Sum up.

So, historians thinking logically are already using Bayesian reasoning.  They are already considering how well different theories explain the evidence.  What they are not doing is using maths to combine the evidence together logically.

Assigning probability values is often subjective and unscientific.  But forcing yourself to try to assign such values will be helpful in exposing to view just how subjective and unscientific your assumptions are.  It will help you to see what data you need to search for to make your premises more objective.  It should also push you to use maximum and minimum estimates so that your conclusions show the range of likelihoods that your theory might have.

As long as you remember that the validity of your results still depends on the objectivity of your data and the logic of your reasoning about probabilities, and that a 'sciencey' formula won't do the hard work for you, you can only make your conclusions more logical by using Bayes' Theorem!

No comments:

Post a Comment

Please make your comment evidence-based and polite.