Back to Publications

Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information

Perlin, M.W. Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information. Journal of Pathology Informatics, 6(1):59, 2015.


Journal of Pathology Informatics


Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned.

Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the 1/CPI value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(1/PI) values were examined and compared with corresponding log(LR) values.

Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), 1/CPI increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention.

Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst’s initial qualitative assessment that a person’s DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice.