Systematic Literature Review (SLR) is nowadays a challenging task due to the large number of papers that typically compose the scientific material of the topic to review. Recently, a lot of research effort has been devoted to automate, even partially, the stages of an SLR. This paper proposes the design and implementation of a workflow and a set of tools – called slr-kit – to support key tasks in an SLR. The proposed approach leverages a semi-supervised strategy, in which time-consuming processes are carried out using automatic tools, whereas manual tasks have been optimized by carefully designed support tools to reduce the overall required effort. Important parts of the workflow include the extraction of key terms directly from the abstracts of the papers to survey, and the subsequent topic modeling that allows for a thematic clustering of the corpus of papers. In the proposed workflow, the former task is carried out by exploiting a novel tool, called FAst WOrd Classifier (FAWOC). The latter, instead, is designed to be automatically carried out by leveraging an ad-hoc solution based on the application of the Latent Dirichlet Allocation (LDA) algorithm. The result of the process consists in a set of statistics regarding the relationship among papers, topics, and their trend of publication on journals and conference proceedings. The validity of the method is demonstrated with an application to a dataset related to the scientific field of NLP, while its accuracy is assessed by the manual examination of the results by domain experts.

slr-kit: A semi-supervised machine learning framework for systematic literature reviews

Facchinetti T.;Benetti G.;Giuffrida D.;Nocera A.
2022-01-01

Abstract

Systematic Literature Review (SLR) is nowadays a challenging task due to the large number of papers that typically compose the scientific material of the topic to review. Recently, a lot of research effort has been devoted to automate, even partially, the stages of an SLR. This paper proposes the design and implementation of a workflow and a set of tools – called slr-kit – to support key tasks in an SLR. The proposed approach leverages a semi-supervised strategy, in which time-consuming processes are carried out using automatic tools, whereas manual tasks have been optimized by carefully designed support tools to reduce the overall required effort. Important parts of the workflow include the extraction of key terms directly from the abstracts of the papers to survey, and the subsequent topic modeling that allows for a thematic clustering of the corpus of papers. In the proposed workflow, the former task is carried out by exploiting a novel tool, called FAst WOrd Classifier (FAWOC). The latter, instead, is designed to be automatically carried out by leveraging an ad-hoc solution based on the application of the Latent Dirichlet Allocation (LDA) algorithm. The result of the process consists in a set of statistics regarding the relationship among papers, topics, and their trend of publication on journals and conference proceedings. The validity of the method is demonstrated with an application to a dataset related to the scientific field of NLP, while its accuracy is assessed by the manual examination of the results by domain experts.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1463585
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact