An Expert System for Scoring DNA Database Profiles

M.W. Perlin, "An expert system for scoring DNA database profiles", Promega's Eleventh International Symposium on Human Identification, Biloxi, MS, 13-Oct-2000.


PowerPoint presentation and handout for the International Symposium on Human Identification 2000 talk.

Forensic databases are becoming an increasingly valuable law enforcement tool for convicting repeat offenders and exonerating the innocent. However, constructing such databases is quite laborious. After generating STR profiles in the lab, people expend even greater effort visually reviewing the data before it enters the database. All artifacts must be detected, and no error can be tolerated. With millions of forensic samples to be analyzed, this bottleneck has become a formidable task.

We have developed software analysis methods that can automate this data review and potentially eliminate 90% of the work. Our fully automated TrueAllele system inputs raw fluorescent DNA sequencer (gel or capillary) files, processes the gel image (separating colors, tracking and sizing lanes), and analyzes the STR experiments (quantitating and sizing peaks, comparing with ladder peaks, calling alleles). For each allele call, TrueAllele assigns a quality score and applies artifact detection rules. These quality checks enable a user to focus on just the 5%-10% of suspect data, thereby eliminating most of the review effort.

TrueAllele models every step of the STR data generation process. By computing hundreds of variables for each genotype, the system can compare the observed data against expected behav- ior. Large deviations between expected and observed enable the software to flag potentially problematic data for human review.

TrueAllele automatically extracts the information it needs from the raw data (e.g., ABI/377 collection files). After automated image processing, the software provides a user quality assur- ance review for assessing data runs (lane tracking, gel quality, control lanes, etc.). Automated data processing continues with peak quantitation, allele designation, and quality scoring. In the allele-based quality assurance review, the user focuses on those designations which TrueAllele has flagged as having specific problems. After reviewing (and possibly editing) this small subset of data, TrueAllele then generates files for submission to the DNA database (e.g., CODIS).

TrueAllele operates independently of DNA sequencer manufacturer or technology, and runs on all major computer platforms (Macintosh, Windows, and UNIX). The program can analyze any panel of STR loci, allelic ladders, or internal size standards. Evaluation software can be down- loaded from "".

The expert system is designed to provide an automated computer-based "second scorer" for STR database profiles. The British Forensic Science Service (FSS) has selected TrueAllele automated scoring for scaling up the UK National DNA Database. We anticipate that TrueAllele will have a role in significantly reducing the human review of forensic STR data, assessing the quality of STR data and providing laboratory feedback, and fully automating the forensic STR data scoring process.