Back to Publications

Automated STR Data Analysis: Validation Studies

Perlin, M.W., Coffman, D., Crouse, C.A., Konotop, F. and Ban, J.D. Automated STR data analysis: validation studies in the Proceedings of Promega's Twelfth International Symposium on Human Identification. Biloxi, MS, 2001.

Downloads

Article

Abstract

STR technology has enabled the rapid generation of highly informative DNA data for use in human identification. However, these data must be carefully analyzed. With database samples, there is now an acute shortage of skilled data reviewers. With casework samples (including mixtures), much information is not extracted from the data, despite considerable examiner effort. We are rapidly developing novel computational, mathematical and statistical methods that help overcome these limitations. This report focuses on the collaborative validation of these methods.

Convicted offender DNA databases must be accurate. To minimize error, the original STR data are carefully reviewed by two or more people. Moreover, in a troubleshooting capacity, this review helps to continuously maintain high quality lab data. But there are not enough skilled personnel for this arduous, repetitive task. To alleviate this critical labor shortage, we developed the TrueAllele® expert system. The computer program automates virtually every human review function, and provides consistent quality assessment and allele designation.

The TrueAllele validation began with the original data from 50,000 CODIS genotypes. System parameters were adapted to the instruments (ABI/310, ABI/3700, Hitachi/FMbio) and panels (ProfilerPlus, Cofiler, PowerPlex 1.2) used to generate the data. Computer processing was then done, with automated scoring of the high quality data, followed by limited human review. The computed expert system results were compared against manually scored results. We report here on the relative accuracy and efficiency of the automated approach.

In casework, DNA mixtures are analyzed to assess candidate suspects. When inferred profiles are matched against a convicted offender database, useful leads are generated. When matched against a known suspect, the mixture data can help convict or exonerate. However, data uncertainty leads to inherently complex and ambiguous analysis. We have developed a new technology, Linear Mixture Analysis (LMA), which uses multilocus quantitative data to automatically eliminate this complexity. LMA objectively resolves mixtures into candidate profiles, and provides highly informative statistical measures.

The LMA validation involves both synthetic mixtures and actual casework profiles derived from diverse panels and instruments. After quantitative peak analysis (using TrueAllele) on the original data, we applied LMA to automatically determine contributor profiles. Database search validation can be done by assessing the error rates of matching these profiles against existing DNA databases. Casework validation can be done by examining the LMA statistics relative to known suspect profiles. We report here our initial studies on LMA's accuracy and informativeness.

Our presentation describes novel computer-based methods for assuring data quality, automating DNA database review, and analyzing the mixed DNA profiles found in casework. We present here the objective results of our ongoing validation studies, and demonstrate the feasibility of practical automated analysis. Our primary objective is the rapid introduction of validated intelligent data analysis systems for eliminating tedious human STR analysis. This contribution may help free up valuable DNA examiner time for serving justice through forensic science.

previous next