Building statistical foundation for next-gen forensic DNA profiling

That extra data might not be needed because in most cases, STR-based profiles contain more than enough information to reliably identify a suspect. However, if the evidence contains only a minute amount of DNA, or if the DNA has been exposed to the elements and has begun to break down, then the analyst might only get a partial profile, which may not be enough to identify a suspect. In those cases, the extra data in an NGS-based profile might help solve the case.

In addition, evidence that contains a mixture of DNA from several people can be difficult to interpret. The extra data in NGS-based profiles can help in those cases as well.

Calculating match statistics
DNA analysts are able to calculate match statistics for STR-based profiles because scientists have measured how frequently different versions of the markers occur in the population. With those

frequencies, you can calculate the chances of randomly encountering a particular DNA profile, just as you can calculate the chances of picking all the right numbers in a lottery.

NIST measured those STR gene frequencies years ago using a library of DNA samples from 1,036 individuals. To calculate gene frequencies for NGS-based profiles, Gettings and her co-authors cracked open the freezer that contained the original samples, which were anonymized and donated by people who consented to their DNA being used for research. The scientists generated NGS-based profiles for them by sequencing 27 markers—the core set of 20 included in most DNA profiles in the U.S. plus seven others. They then calculated the frequencies for the various genetic sequences found at each marker.

It might be surprising that scientists can estimate gene frequencies from such a small library of samples. However, the NIST team was measuring frequencies not for the full profiles, but for the individual markers. Since they sequenced 27 markers, with each marker occurring twice per sample, the number of markers tested wasn’t 1,036, but more than 55,000.

NIST says that although NIST has now published the data needed to generate match statistics for NGS-based profiles, other hurdles must still be cleared before the new technology sees widespread use in forensics. For instance, labs will have to develop ways to manage the greater amounts of data produced by NGS. They will also have to implement operating procedures and quality controls for the new technology. Still, while much work remains, said Peter Vallone, the research chemist who leads NIST’s forensic genetics research, “We’re laying the foundation for the future.”

— Read more in K. B. Gettings et al., “U.S. Population Sequence Data for 27 Autosomal STR Loci,” Forensic Science International: Genetics (19 July 2018) (DOI: 10.1016/j.fsigen.2018.07.013)