Joe P. was accused of sexual misconduct. Sheila awoke to find the accused masturbating over her. Her shirt had what appeared to be semen on it. The shirt and a blood sample from Joe were sent to the lab for testing, and the DNA profile from the sample and the shirt matched at 5 RFLP loci. In the resulting trial, a statistician was asked to calculate the statistical and population genetic significance of this evidence.
As quickly as the role of DNA evidence has risen to the top of the forensic evidence ladder, so has the role of the statistical expert. Practicing experts should utilize Bayes' Theorem, expressed as
Posterior Odds = Likelihood Ratio x Prior Odds
The only aspect of a case the statistical expert should comment on is the likelihood ratio (LR). The LR tells the jury how likely the evidence is under each of the two competing explanations. The issues of guilt or innocence do not arise, because the scientist is simply commenting on the probability of the evidence if either the prosecution or defense explanations were true. Second, the prior odds, which represents pre-existing belief in the explanations given by the prosecution and defense, is left for the jury to decide.
In a case of single stain with a single contributor, it has been common to present a profile frequency. That is, a typical case report might state 'The frequency of this profile is one person in P in the Caucasian population." There is a simple relationship between the LR and P for this case, where LR = 1/P. E.g., if the case report states 'The frequency of this profile is one person in 1 million in the Caucasian population," then the LR is one million. That is, the evidence is 1 million times more likely if the prosecution hypothesis (the defendant is the true contributor) is true rather than if the defense hypothesis (someone other than the defendant is the true contributor) is true. A large LR provides strong support for the prosecution explanation, whereas a small LR (less that 1) provides support for the defense explanation.
Based on the evidence given, in Joe's case and Bayes' hypothesis, the statistician reported:
"The evidence is 2 billion times more likely that the defendant was the one who left the stain rather than that someone unrelated to the defendant left the stain."
Sampling Error
The numbers presented by the statistician are not carved in stone. They are based on DNA databases which consist of a small sample of individuals (typically less than 500) and a population genetic model. Any statistics presented should reflect the size of the database. You can think of this as the equivalent of reporting the margin of error in a survey. Typically, the lower bound on an LR will be somewhere between one half to one third of the reported value. For example the 99% lower bound on our case LR of 2 billion is 950 million. This is reported as
"On average, 99% of all cases, involving the same profiles, would have a LR larger than 950 million. Therefore, the evidence in this case is at least 950 million times more likely that the defendant was the one who left the stain rather than that someone unrelated to the defendant"
Relationships Matter
Relationships such as brothers or fathers or cousins can have very large effects on the LR. If it is possible that a sibling or close relative could have been the contributor, ask the statisticians whether their calculations reflect that. These calculations can be done without additional typings on the relatives.
Low level relatedness exists amongst individuals in the same subpopulation, and this can and should be incorporated into our case. Using a conservative coancestry coefficient of 3% we find that our numbers change from 2 billion to 233 million, when we couple this with sampling error, the LR reduces further to 132 million, and so the report should read
'The evidence in this case is at least 132 million times more likely that the defendant was the one who left the stain rather than that someone unrelated to the defendant"
The inclusion of sampling error and relatedness alone in these calculations has resulted in a 15-fold decrease in the LR presented to the court. This lower number is still large, but at least it is now statistically correct.
Issues a Statistician Shouldn't Comment Upon
With very large numbers involved in many cases, statistical experts are often invited to make a statement of identity of source, e.g. "this stain came from the defendant." While we accept that eventually the evidence will be so overwhelming that statistics will not matter, we don't believe this point has been reached yet. Object strenuously to such statements, especially if no account of relatedness has been made.
The statistical expert is in court for one reason alone, to provide the jury with a scientifically sound method for weighing the strength of the DNA evidence presented to them by the forensic scientists.