A computer-implemented method to determine the pathogenicity of combinations of digenic or oligogenic variants, in relation to a disease, is described. The method first includes defining a set of variants, the pathogenicity of which must be determined. Such variants refer to mutations present in one or both alleles of a respective gene of at least two genes, in which each of the genes is associated with two respective alleles. The method then comprises the step of determining the situations which can occur, regarding the presence or absence of the aforesaid variants in the alleles of the at least two genes considered. Each situation is associated with a respective combination in which each variant is present in a respective subset of alleles, among all the possible subsets of alleles of all the genes considered, or in which the variant is present in all the alleles of all the genes considered. For each of the defined situations, i.e., for each combination and for each gene, the method then includes calculating a pathogenicity index or score, adapted to estimate how much the one or more respective variants modify the functioning of the respective gene. The method further comprises the steps of describing phenotypic traits of a patient, by standardized phenotypic terms, i.e., standardized information adapted to describe phenotypic abnormalities found in the patient, and calculating or preparing input information for the pathogenicity determination. Such information comprises the following four types of features: gene-phenotype association features, calculated individually for each of the genes considered, and adapted to measure how much the aforesaid phenotypic traits of the patient are superimposable to phenotypes already known to be associated with the single gene; digenicity or oligogenicity features, calculated for each of the aforesaid gene combinations, adapted to capture the interaction between the genes forming each combination; a priori property features of the genes, calculated for each of the aforesaid genes considered; variant-related features, calculated for each gene considered, based on the aforesaid pathogenicity indices or scores calculated in relation to all the combinations considered. The method further includes providing said input information for the pathogenicity determination to at least one trained algorithm, and processing said input information for the pathogenicity determination by the at least one trained algorithm. The trained algorithm is an algorithm trained by artificial intelligence and/or machine learning techniques. The algorithm is trained in a preliminary training step, based on a training dataset of known cases, providing the aforesaid input information calculated for each of the known cases to the algorithm to be trained, and training the algorithm based on the knowledge of the pathogenicity/benignity of the respective known cases. Finally, the method comprises the step of obtaining output information from the trained algorithm, representing the pathogenicity of each of the combinations of variants or mutations considered.
PREDICTIVE METHOD FOR DETERMINING THE PATHOGENICITY OF COMBINATIONS OF DIGENIC OR OLIGOGENIC VARIANTS
I Limongelli;S. Zucca;F. De Paoli;E. Rizzo;P. Magni;
2021-01-01
Abstract
A computer-implemented method to determine the pathogenicity of combinations of digenic or oligogenic variants, in relation to a disease, is described. The method first includes defining a set of variants, the pathogenicity of which must be determined. Such variants refer to mutations present in one or both alleles of a respective gene of at least two genes, in which each of the genes is associated with two respective alleles. The method then comprises the step of determining the situations which can occur, regarding the presence or absence of the aforesaid variants in the alleles of the at least two genes considered. Each situation is associated with a respective combination in which each variant is present in a respective subset of alleles, among all the possible subsets of alleles of all the genes considered, or in which the variant is present in all the alleles of all the genes considered. For each of the defined situations, i.e., for each combination and for each gene, the method then includes calculating a pathogenicity index or score, adapted to estimate how much the one or more respective variants modify the functioning of the respective gene. The method further comprises the steps of describing phenotypic traits of a patient, by standardized phenotypic terms, i.e., standardized information adapted to describe phenotypic abnormalities found in the patient, and calculating or preparing input information for the pathogenicity determination. Such information comprises the following four types of features: gene-phenotype association features, calculated individually for each of the genes considered, and adapted to measure how much the aforesaid phenotypic traits of the patient are superimposable to phenotypes already known to be associated with the single gene; digenicity or oligogenicity features, calculated for each of the aforesaid gene combinations, adapted to capture the interaction between the genes forming each combination; a priori property features of the genes, calculated for each of the aforesaid genes considered; variant-related features, calculated for each gene considered, based on the aforesaid pathogenicity indices or scores calculated in relation to all the combinations considered. The method further includes providing said input information for the pathogenicity determination to at least one trained algorithm, and processing said input information for the pathogenicity determination by the at least one trained algorithm. The trained algorithm is an algorithm trained by artificial intelligence and/or machine learning techniques. The algorithm is trained in a preliminary training step, based on a training dataset of known cases, providing the aforesaid input information calculated for each of the known cases to the algorithm to be trained, and training the algorithm based on the knowledge of the pathogenicity/benignity of the respective known cases. Finally, the method comprises the step of obtaining output information from the trained algorithm, representing the pathogenicity of each of the combinations of variants or mutations considered.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.