Yesterday, I posed these questions:
Here I have a couple of urns. The one on the left contains 70 red balls and 30 black. The one on the right contains 30 red and 70 black.
While you weren’t looking, I reached into one of these urns and randomly drew out a dozen balls…4 of them were red and 8 were black.
1. If you had to guess, which urn would you guess I drew from?
2. What’s your estimate of the odds that you’re right?
3. Do you think you’re right beyond a reasonable doubt?
I stole this problem from the decision theorist Howard Raiffa, with some minor changes (he used bags instead of urns, and green and white balls instead of red and black — and he drew his twelve balls with replacement, rather than all at once, which has only a tiny effect on the probability). Here, with appropriate minor wording changes, is what Raiffa had to say:
At a cocktail party a few years ago, I asked a group of lawyers, who were discussing the interpretation of probabilistic evidence, what they would answer…
First of all, they wanted to know whether there was any malice aforethought on the part of the experimenter. I assured them of the neutrality of the experimenter, and told them it would be appropriate to assign a .5 chance to each urn.
“In this case”, one lawyer exclaimed after thinking awhile, “I would bet you drew from the left-hand urn”.
“No, you don’t understand”, one of his colleagues retorted. “The drawing was eight blacks and four reds, not the other way around”.
“Yes, I understand, but in my experience at the bar, life is just plain perverse, so I would bet on the left-hand urn!. But I am not really a betting man.”
The other lawyers all agreed that this was not a very rational thing to do — that the evidence was in favor of the right-hand urn.
“But by how much?” I persisted. After a while a consensus emerged: The evidence is meager; the odds might go up from 50-50 to 55-45, but “…as lawyers we are trained to be skeptical, so we would slant our best judgments downward and act as if the odds were still roughly 50-50″.
The correct answer is about 98%. Yes, the balls were drawn from the right-hand urn beyond a reasonable doubt. This story points out the fact that most subjects vastly underestimate the power of a small sample. The lawyers described above had an extreme reaction, but even my statistics students clustered their guesses around .70.
Now the audience here at The Big Questions is substantially more sophisticated than most lawyers and even most statistics students, and therefore quite a few correctly calculated the probability at 98%. (In Raiffa’s experiment, where the balls were drawn with replacement, the answer is 96.7%, which I changed to 98% in the above quote.) Several commenters also worried, as did Raiffa’s lawyers, that I might not have chosen the urn according to the equivalent of a coin flip. Fair enough, though I did indeed mean for you to make this natural default assumption.
To make this result a little more graphic, suppose you had the opportunity, on the first of every month, to place a bet that’s as close to a sure thing as this one is. Then you’d lose your bet only about once every four years or so.
Is Raiffa right that 98% is “beyond a reasonable doubt”? Given a reasonable interpretation of what “reasonable” means, I think the answer is pretty clearly yes. There’s not much in life that we can be more than 98% sure of.
If I were on trial for the crime of drawing from the right urn, I hope this evidence would be strong enough to convict me. If you’re unwilling to convict on this evidence, then you’re ipso facto willing to free 49 guilty men before you’ll convict a single innocent. According to the frequently cited Blackstone Standard, “it is better that ten guilty men escape than that one innocent suffer”. To let 49 guilty men escape is to go far above and beyond this standard.
A few more remarks:
- While 10 guilty men might indeed be the industry standard, the legal scholar Sasha Volokh has documented a long tradition that encompasses a wide range of numbers, some as high as 100 or more. It is difficult for me to believe that the largest of those numbers were ever meant to be taken seriously.
- Note carefully the wording of the Blackstone Standard: “It is better that ten guilty men escape than that one innocent suffer.” That is not at all the same thing as saying “It is better that ten guilty men escape than that one innocent be convicted”. A false conviction is indeed a form of suffering, but so is victimization at the hands of an acquitted criminal (or a criminal emboldened by the difficulty of obtaining convictions). So even by the Blackstone Standard, in order to minimize the suffering of innocents, it’s quite plausible you’d want to convict on substantially less than 90% certainty.
- Indeed, I learned from yesterday’s comments that as an empirical matter, potential jurors appear to set their cutoff for conviction at something like 70-74% certainty.
- 70-74% certainty sounds like roughly the right standard to me in a world where the police can be counted on not to take advantage of that standard by falsifying evidence against people they don’t like. Given that prospect, though, I think I prefer something a little tougher — though not as tough as 98%. I addressed this point in considerably more detail in my book More Sex is Safer Sex.
- One commenter suggested that if we adopted a “98%” standard, then 1 out of every 50 people on death row would be innocent. That’s not true, because under that standard, we’d convict everyone who’s 98% sure to be guilty plus everyone who’s 99% sure, plus everyone who’s 99.5% sure, and so forth. So among the entire convicted population, the fraction of innocents would likely be well below 2%.