We present an evaluation study of the usage of two different post-hoc model agnostic XAI methods, namely SHAP and AraucanaXAI, to provide insights about the most predictive factors of worsening in MS patients, based on clinical observations carried out during a period of 2.5 years. We pre-processed the temporal features considering a Latent Class Mixed Modelling (LCMM) approach in order to discover and extract temporal trajectories as an additional informative feature. The different XAI approaches are compared according to four quantitative evaluation metrics consisting in identity, fidelity, separability and time to compute an explanation. Furthermore, a qualitative comparison of post-hoc generated explanations is carried out on specific scenarios where the ML model predicted the outcome incorrectly, in the effort to debug potentially problematic model behaviour. The combination of the qualitative and quantitative results forms the basis for a critical discussion of XAI methods properties and desiderata for healthcare applications at large, advocating for more meaningful and extensive XAI evaluation studies involving human experts.

Predicting and Explaining Risk of Disease Worsening Using Temporal Features in Multiple Sclerosis

Buonocore T. M.;Bosoni P.;Nicora G.;Vazifehdan M.;Bellazzi R.;Parimbelli E.;Dagliati A.
2023-01-01

Abstract

We present an evaluation study of the usage of two different post-hoc model agnostic XAI methods, namely SHAP and AraucanaXAI, to provide insights about the most predictive factors of worsening in MS patients, based on clinical observations carried out during a period of 2.5 years. We pre-processed the temporal features considering a Latent Class Mixed Modelling (LCMM) approach in order to discover and extract temporal trajectories as an additional informative feature. The different XAI approaches are compared according to four quantitative evaluation metrics consisting in identity, fidelity, separability and time to compute an explanation. Furthermore, a qualitative comparison of post-hoc generated explanations is carried out on specific scenarios where the ML model predicted the outcome incorrectly, in the effort to debug potentially problematic model behaviour. The combination of the qualitative and quantitative results forms the basis for a critical discussion of XAI methods properties and desiderata for healthcare applications at large, advocating for more meaningful and extensive XAI evaluation studies involving human experts.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1487683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact