DNA typing and genetic profile data interpretation are among the most relevant topics in forensic science; among other applications, genetic profile’s capability to distinguish biogeographic information about population groups, subgroups and affiliations have been largely explored in the last decade. In fact, for investigative and intelligence purposes, it is extremely useful to identify subjects and estimate their biogeographic origins by examining the recovered DNA profiles from evidence on a crime scene. Current approaches for BiogeoGraphic Ancestry (BGA) estimation using STRs profiles are usually based on Bayesian methods, which quantify the evidence in terms of likelihood ratio, supporting or not the hypothesis that a certain profile belongs to a specific ethnic group. The present study provides an alternative approach to the likelihood ratio method that involves multivariate data analysis strategies for the estimation of multiple populations. Starting from the well-known NIST US autosomal STRs dataset involving African-American, Asian, and Caucasian individuals, and moving towards further and more geographically restricted populations (such as Northern Africans vs sub-Saharan Africans, Afghans vs Iraqis and Italians vs Romanians), powerful multivariate techniques such as Sparse and Logistic Principal Component Analysis (SL-PCA), Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) and Support Vector Machines (SVM) were employed and their discriminating power was also compared. Both sPLS-DA and SVM techniques provided robust classifications, yielding high sensitivity and specificity models capable of discriminating populations on ethnic basis. This application may represent a powerful and dynamic tool for law enforcement agencies whenever a standard autosomal STR profile is obtained from the biological evidence collected at a crime scene or recovered during mass-disaster and missing person investigations.

A multivariate statistical approach for the estimation of the ethnic origin of unknown genetic profiles in forensic genetics

Ornella Semino;
2020-01-01

Abstract

DNA typing and genetic profile data interpretation are among the most relevant topics in forensic science; among other applications, genetic profile’s capability to distinguish biogeographic information about population groups, subgroups and affiliations have been largely explored in the last decade. In fact, for investigative and intelligence purposes, it is extremely useful to identify subjects and estimate their biogeographic origins by examining the recovered DNA profiles from evidence on a crime scene. Current approaches for BiogeoGraphic Ancestry (BGA) estimation using STRs profiles are usually based on Bayesian methods, which quantify the evidence in terms of likelihood ratio, supporting or not the hypothesis that a certain profile belongs to a specific ethnic group. The present study provides an alternative approach to the likelihood ratio method that involves multivariate data analysis strategies for the estimation of multiple populations. Starting from the well-known NIST US autosomal STRs dataset involving African-American, Asian, and Caucasian individuals, and moving towards further and more geographically restricted populations (such as Northern Africans vs sub-Saharan Africans, Afghans vs Iraqis and Italians vs Romanians), powerful multivariate techniques such as Sparse and Logistic Principal Component Analysis (SL-PCA), Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) and Support Vector Machines (SVM) were employed and their discriminating power was also compared. Both sPLS-DA and SVM techniques provided robust classifications, yielding high sensitivity and specificity models capable of discriminating populations on ethnic basis. This application may represent a powerful and dynamic tool for law enforcement agencies whenever a standard autosomal STR profile is obtained from the biological evidence collected at a crime scene or recovered during mass-disaster and missing person investigations.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1288926
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 11
social impact