Validating TrueAllele® interpretation of DNA mixtures containing up to ten unknown contributors

N. Butt, D. Bauer and M.W. Perlin, "Validating TrueAllele® interpretation of DNA mixtures containing up to ten unknown contributors", American Academy of Forensic Sciences 70th Annual Meeting, Seattle, WA, 23-Feb-2018.

Talk

PowerPoint presentation of Dr. Butt and Dr. Bauer's talk.

Download Handout
Download PowerPoint

Abstract

After attending this presentation, attendees will understand the reliable interpretation of low-level DNA mixtures containing many people. Fully Bayesian computer methods can accurately extract identification information from complex DNA evidence.

This presentation will impact the forensic and legal communities by providing empirical support for sophisticated scientific interpretation of DNA evidence often found in criminal justice applications.

Mixed samples of known genotype composition were prepared following a randomized experimental design that included very low contributor amounts. The mixture samples contained from two to ten contributors. The Cuyahoga County Regional Forensic Science Laboratory (Cuyahoga) amplified the samples using a PowerPlex® Fusion STR kit. Cybergenetics and Cuyahoga independently conducted TrueAllele® Casework testing to interpret these DNA mixture data.

TrueAllele provides fully Bayesian analysis. The system examines all STR data without human intervention, without knowing the comparison genotype; data objectivity eliminates contextual bias. The computer extracts all information from the evidence data, inferring genotypes and other parameters; the model completely eliminates unneeded calibration. Inferred evidence genotypes are compared with references only after genotypes have been separated from DNA mixture data; the two-phase likelihood ratio (LR) approach enforces objectivity.

The study examined sensitivity, specificity and reproducibility of TrueAllele match information. Fewer contributors in a mixture generally improved these metrics. With more contributors, longer Markov chain Monte Carlo (MCMC) statistical sampling increased accuracy. Assuming excess contributors did not affect match information for true contributors, but added superfluous genotypes containing little information.

Sensitivity. As contributor number increased from 2 to 6, average match strength decreased from 24 to 5 LR log units ("ban"). With up to four contributors, only positive log(LR) values were seen.

Specificity. As contributor number increased from 2 to 6, average specificity decreased from -33 to -13 ban. When comparing with non-contributors, a match statistic of over a thousand had a false positive probability of under 0.0001. With LR over a hundred thousand, the probability decreased to under 0.000001.

Reproducibility. Fewer contributors conferred greater reproducibility. The within-group standard deviation for 2 contributors was 0.17 ban, and for 6 contributors it was 1.00 ban.

DNA amount. Match information depended on a contributor's DNA amount, regardless of how many contributors were in the mixture. (More contributors mean less DNA per contributor.) Contributors comprising under 10% of the mixture showed an average log(LR) of 3 ban. However, with fewer contributors the match statistics were higher. Major contributors (over 50%) averaged 28 ban. Contributors over 20% gave positive log(LR) values, regardless of how many contributors were present.

Contributor number. When given the number of contributors a user observed in the data, the computer's mixture solutions were better than when given the expected number known from the study design. This result demonstrated application robustness, as practiced by forensic analysts on DNA mixtures.

Assumed genotypes. With low-level minor contributors, providing the computer with known (i.e., previously matched) references reduced uncertainty in genotype inference. For a 15% minor in a ten-person mixture, the log(LR) value increased from 3 to 8 ban when assuming previously matched references as known genotypes. The log(LR) value for a 2% minor in a ten-person mixture increased from 1 to 4 ban. When assuming known genotypes, specificity improved - the non-contributor log(LR) average shifted from -3 to -6 ban.

MCMC sampling. More MCMC sampling increased sensitivity. With 6 contributors, MCMC sampling at 5 thousand cycles gave an average log(LR) of 2.4 ban for true contributors; this increased at 100 thousand cycles to 5.2 ban. Specificity was essentially unchanged by further MCMC sampling, so faster run times did not increase false positives. Regardless of contributor number, reproducibility at 100 thousand cycles showed a between-run variability of under 1 ban.

Independent testing. Cybergenetics average match statistic for true contributors was 8.52 ban, while Cuyahoga's average was 8.24 ban. On average, the log(LR) values for independent comparisons conducted at the two sites were within 1.15 ban.

The authors will present a validation study showing reliable interpretation of complex DNA evidence.

previous next