Online forums play an important role in connecting people who have crossed paths with cancer. These communities create networks of mutual support that cover different cancer-related topics, containing an extensive amount of heterogeneous information that can be mined to get useful insights. This work presents a case study where users' posts from an Italian cancer patient community have been classified combining both count-based and prediction-based representations to identify discussion topics, with the aim of improving message reviewing and filtering. We demonstrate that pairing simple bag-of-words representations based on keywords matching with pre-trained contextual embeddings significantly improves the overall quality of the predictions and allows the model to handle ambiguities and misspellings. By using non-English real-world data, we also investigated the reusability of pretrained multilingual models like BERT in lower data regimes like many local medical institutions.

Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers

Buonocore T. M.;Parimbelli E.;Sacchi L.;Bellazzi R.;Quaglini S.
2022-01-01

Abstract

Online forums play an important role in connecting people who have crossed paths with cancer. These communities create networks of mutual support that cover different cancer-related topics, containing an extensive amount of heterogeneous information that can be mined to get useful insights. This work presents a case study where users' posts from an Italian cancer patient community have been classified combining both count-based and prediction-based representations to identify discussion topics, with the aim of improving message reviewing and filtering. We demonstrate that pairing simple bag-of-words representations based on keywords matching with pre-trained contextual embeddings significantly improves the overall quality of the predictions and allows the model to handle ambiguities and misspellings. By using non-English real-world data, we also investigated the reusability of pretrained multilingual models like BERT in lower data regimes like many local medical institutions.
2022
Studies in Health Technology and Informatics
Inglese
18th World Congress on Medical and Health Informatics: One World, One Health - Global Partnership for Digital Innovation, MEDINFO 2021
2021
290
597
601
5
9781643682648
9781643682655
IOS Press BV
Classification; Community Health Services; Natural Language Processing
no
none
Buonocore, T. M.; Parimbelli, E.; Sacchi, L.; Bellazzi, R.; Del Campo, L.; Quaglini, S.
273
info:eu-repo/semantics/conferenceObject
6
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1477663
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact