The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.

The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.

Mining Git based Software Repositories

ROVEDA, GIANLUCA
2018-03-01

Abstract

The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.
1-mar-2018
The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.
File in questo prodotto:
File Dimensione Formato  
PhD thesis Roveda_revised.pdf

accesso aperto

Descrizione: tesi di dottorato
Dimensione 6.24 MB
Formato Adobe PDF
6.24 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1214877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact