Predicting the onset of psychosis in individuals at-risk is based on robust prognostic model building methods including a priori clinical knowledge (also termed clinical-learning) to preselect predictors or machine-learning methods to select predictors automatically. To date, there is no empirical research comparing the prognostic accuracy of these two methods for the prediction of psychosis onset. In a first experiment, no improved performance was observed when machine-learning methods (LASSO and RIDGE) were applied—using the same predictors—to an individualised, transdiagnostic, clinically based, risk calculator previously developed on the basis of clinical-learning (predictors: age, gender, age by gender, ethnicity, ICD-10 diagnostic spectrum), and externally validated twice. In a second experiment, two refined versions of the published model which expanded the granularity of the ICD-10 diagnosis were introduced: ICD-10 diagnostic categories and ICD-10 diagnostic subdivisions. Although these refined versions showed an increase in apparent performance, their external performance was similar to the original model. In a third experiment, the three refined models were analysed under machine-learning and clinical-learning with a variable event per variable ratio (EPV). The best performing model under low EPVs was obtained through machine-learning approaches. The development of prognostic models on the basis of a priori clinical knowledge, large samples and adequate events per variable is a robust clinical prediction method to forecast psychosis onset in patients at-risk, and is comparable to machine-learning methods, which are more difficult to interpret and implement. Machine-learning methods should be preferred for high dimensional data when no a priori knowledge is available.
Clinical-learning versus machine-learning for transdiagnostic prediction of psychosis onset in individuals at-risk
Fusar-Poli P.;Rutigliano G.;Bonoldi I.;De Micheli A.;
2019-01-01
Abstract
Predicting the onset of psychosis in individuals at-risk is based on robust prognostic model building methods including a priori clinical knowledge (also termed clinical-learning) to preselect predictors or machine-learning methods to select predictors automatically. To date, there is no empirical research comparing the prognostic accuracy of these two methods for the prediction of psychosis onset. In a first experiment, no improved performance was observed when machine-learning methods (LASSO and RIDGE) were applied—using the same predictors—to an individualised, transdiagnostic, clinically based, risk calculator previously developed on the basis of clinical-learning (predictors: age, gender, age by gender, ethnicity, ICD-10 diagnostic spectrum), and externally validated twice. In a second experiment, two refined versions of the published model which expanded the granularity of the ICD-10 diagnosis were introduced: ICD-10 diagnostic categories and ICD-10 diagnostic subdivisions. Although these refined versions showed an increase in apparent performance, their external performance was similar to the original model. In a third experiment, the three refined models were analysed under machine-learning and clinical-learning with a variable event per variable ratio (EPV). The best performing model under low EPVs was obtained through machine-learning approaches. The development of prognostic models on the basis of a priori clinical knowledge, large samples and adequate events per variable is a robust clinical prediction method to forecast psychosis onset in patients at-risk, and is comparable to machine-learning methods, which are more difficult to interpret and implement. Machine-learning methods should be preferred for high dimensional data when no a priori knowledge is available.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.