Using exact LR distributions for probabilistic genotyping software validation

J. Bracamontes, E. Estus, M. W. Perlin, "Using exact LR distributions for probabilistic genotyping software validation", American Academy of Forensic Sciences 75th Annual Scientific Conference, Orlando, FL, 15-Feb-2023.

Poster

Download Poster

Abstract

After attending this presentation, attendees will understand a fast and easy approach to constructing exact LR distributions that help calculate accurate sensitivity and specificity error rates when validating probabilistic genotyping (PG) software.

This presentation will impact the forensic science community by showing a method for calculating precise LR distributions for sensitivity and specificity error rates. These distributions consider every possible reference genotype. They are helpful for PG software validation, and for establishing scientific reliability in the courtroom.

Testing DNA PG interpretation software is important in forensic science. Empirical testing ensures a method works as expected. Validation studies test the PG method on representative data sets, reporting likelihood ratio (LR) match statistics. These studies typically include sensitivity and specificity error rates. Sensitivity evaluates the inclusionary strength of true contributors to DNA. Specificity examines the ability of DNA evidence to statistically exclude non-contributors. The log(LR) number is used to measure sensitivity and specificity information. From LR distributions, false exclusion and false inclusion error rates can be immediately calculated.

To use PG software for DNA interpretation, applicable validation standards require sensitivity and specificity studies, as well as error rate determination. Legal admissibility standards encourage PG software validation and error rate calculation – both of which are Daubert prongs.

LR distributions for examining system sensitivity and specificity can be developed either by limited sampling or by exact convolution. Both calculation methods produce distributions of log(LR) statistics. The sampling method approximates exact log(LR) distributions by comparing a set of evidence genotypes with a set of randomly sampled reference genotypes. Sampling is incomplete, only testing a thousand (10³) or so references, which is a miniscule fraction of possible genotypes. Calculating by sampling is tedious in validation; comparing a thousand (10³) evidence genotypes with a thousand references entails a million (10⁶) match statistic calculations.

The exact method accurately calculates log(LR) distributions for evidence genotypes. The requisite convolution can have any preset numerical resolution. The convolution approach is complete, with one distribution accounting for all (e.g., 10²⁴) possible reference genotypes [1]. The calculation is fast; a hundred genotype distributions can be constructed in one second. Many evidence genotype distributions can be averaged to represent a set of genotypes in one composite distribution. This composite feature is highly useful for validation studies.

We assessed both the sampling and convolution methods on the same DNA laboratory's mixture validation data set. We constructed contributor (posterior evidence probability weighted) and non-contributor (prior population probability weighted) genotype log(LR) distributions. We calculated error rates from these distributions to measure sensitivity and specificity. The data came from single source and DNA mixture samples.

Sampled contributor distributions were limited to the provided matching references, which severely under sampled reference genotypes, and gave limited false exclusion rates. But the exact distributions spanned the entire range of expected log(LR) match values, and provided accurate false exclusion probability for the tested data sets.

Non-contributor distributions were calculated by limited sampling and exact convolution. The distributions from both methods appeared qualitatively similar. But more random reference sampling – and time – was needed to better approximate the true distribution. Building exact convolved distributions was far faster than using sampling.

Using exact convolution, rapid calculation of sensitivity and specificity from the log(LR) distributions on multiple datasets sped up the PG validation, relative to sampling methods. Human operator time was significantly reduced. User interfaces for noncontributor, contributor, and composite distributions simplified PG validation.

Calculating exact composite log(LR) distributions by convolution – and determining associated error rates on genotype subsets – improves on LR sampling methods. Convolution construction is easy, fast, complete, and accurate. The method lets forensic scientists readily determine error rates for PG methods of interpreting complex DNA evidence. Moreover, the exact convolution LR distribution construction approach has applicability to other forensic subdisciplines, providing accurate error rate determination for reliable scientific validation and reporting.

Links

American Academy of Forensic Sciences 75th Annual Scientific Conference - Program