Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that "nearest consensus clustering classification" improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods.

Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

BOSONI, PIETRO
Software
;
Bellazzi, Riccardo
Supervision
;
2018-01-01

Abstract

Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that "nearest consensus clustering classification" improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1260186
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? ND
social impact