Method and system for DNA mixture analysis
The present invention pertains to a process for analyzing mixtures of DNA molecules. More specifically, the present invention is related to performing experiments that produce quantitative data, and then analyzing these data to characterize a DNA component of the mixture. The invention also pertains to systems related to this DNA mixture information.Download Patent
CLAIMS: What is claimed is:
1. A method of analyzing a DNA sample that contains genetic material from at least two individuals to determine a probability distribution of genotype likelihood or weight in the sample, comprising the steps:
- (a) amplifying the DNA sample to produce an amplification product comprising DNA fragments, wherein each allele at a locus is amplified to generate relative amounts of DNA fragments of the alleles that are proportional to the relative amounts of template DNA from the alleles in the DNA sample, and wherein the amplification product produces a signal comprising signal peaks from each allele the amounts of which are proportional to the relative amounts of the alleles;
- (b) detecting signal peak amounts in the signal and quantifying the amounts using quantifying means that include a computing device with memory to produce DNA length and concentration estimates from the sample;
- (c) resolving the estimates into one or more component genotypes using automated resolving means, said resolution into one or more genotypes including solving the coupled linear equations d = G.w+e for the relevant loci (i), individuals (j) and alleles (k), in which d is a column vector which describes the peak quantitation data of a DNA sample from the signal, G is a matrix that represents the genotypes in the DNA sample, with a column j giving the alleles for individual j, w is a weight column vector that represents relative proportions of template DNA in the sample and e is an error vector, wherein the solution includes calculation of data variance σ2 from the linear model d = G.w+e together with the global minimal solution Pd = Gw0 , where Pd is the perpendicular projection point which is the closest point to d in mixture space C(G) and w0 is the minimum weight vector, using linear regression methods, and calculating a probability distribution of the data assuming a normal distribution and that the error is unbiased, so that E(e) = 0, but has a dispersion D[e] = σ2 V in which V is the covariance matrix of the data; and
- (d) determining, using the probability distribution of the data, a probability distribution of genotype likelihood or weight in the DNA sample.
2. A method as claimed in claim 1, further including determining a statistical confidence in the information about the composition of the DNA mixture.
3. A method as claimed in claim 1, further including recording a genotype likelihood or probability of an individual in a report.
4. A method as claimed in claim 1, further including using a computing device to generate a visualization that shows the genotype matrix and the weight vector.
5. A method as claimed in claim 1, further including comparing the genotype information with a set of suspect genotypes to identify a likely suspect.
6. A method as claimed in claim 5, wherein the set of suspect genotypes contains a genotype of a convicted offender individual.
7. A method as claimed in claim 1, wherein the speed of the method is such as to process at least one DNA sample per hour.
8. A method as claimed in claim 1, wherein the speed of the resolving step is such as to determine a genotype using less than an hour of human interaction time.
9. A method as claimed in claim 1, which includes iteratively determining the genotype matrix based on the weight vector, and the weight vector based on the genotype matrix.
10. A method as claimed in any preceding claim, wherein the locus is a short tandem repeat (STR).
11. A method as claimed in any one of claims 1 to 9, wherein the locus is a single nucleotide polymorphism (SNP).
12. A method as claimed in claim 1, wherein the genotype matrix contains entries that are not integer values.
13. A method as claimed in claim 12, wherein the noninteger values represent an efficiency of the amplification.
14. A method as claimed in any preceding claim, wherein the DNA sample contains genetic material from more than two individuals.
15. A method as claimed in any preceding claim, further including computing a genotype of an individual by subtracting from the signal the genotypes of other individuals in the DNA sample in proportion to the weight vector w.
16. A method as claimed in claim 15, wherein the subtracted genotypes are all the genotypes of all other individuals in the DNA sample.
17. A method as claimed in claim 1, wherein the analysis includes comparing with a candidate genotype selected from a database of known genotypes.
18. A method as claimed in claim 17, wherein the analysis produces a set of ranked genotypes that are in the database and match the signal.
19. A method as claimed any preceding claim, wherein the amplifying step uses polymerase chain reaction (PCR) amplification.
20. A method as claimed in claim 19, wherein the PCR amplification uses a low copy number PCR protocol.
21. A method as claimed in claim 19 or claim 20, wherein the method includes removal of the effects of PCR amplification artifacts prior to quantitative part of the analysis.