Multicenter next-generation sequencing studies between theory and practice: harmonization of data analysis using real world myelodysplastic syndrome data

Sandmann, Sarah; De Graaf, Aniek O; Tobiasson, Magnus; Kosmider, Olivier; Abáigar, María; Clappier, Emmanuelle; Gallì, Anna; Van Der Reijden, Bert A; Malcovati, Luca; Fenaux, Pierre; Díez-Campelo, María; Fontenay, Michaela; Hellström-Lindberg, Eva; Jansen, Joop H; Dugas, Martin

doi:10.1016/j.jmoldx.2020.12.001

In the age of personalized medicine, genetic testing by means of targeted sequencing has taken a key role. However, when comparing different sets of targeted sequencing data these are often characterized by a considerable lack of harmonization. Laboratories follow their own best practices, analyzing their own target regions. The question on how to best integrate data from different sites remains unanswered. Studying the example of myelodysplastic syndromes (MDS), we analyzed 11 targeted sequencing sets, collected from 6 different centers (n=831). We identified an intersecting target region of 43,076bp (30 genes), while the original target regions covered up to 499,097bp (117 genes). Considering a region of interest in the context of MDS, a target region of 55,969bp (31 genes) was identified. For each gene, coverage and sequencing data quality was evaluated, calculating a sequencing score. Analyses revealed huge differences between different datasets as well as between different genes. Analyzing the relation between sequencing score and mutation frequency in MDS, we observed that a majority of genes with high frequency in MDS could be sequenced without expecting low coverage or quality. Still, no gene appeared consistently unproblematic for all datasets.