This paper describes a software tool that reconstructs entire genealogies from data collected from different and heterogeneous sources, including municipal and parish records archived over centuries. The tool exploits a record linkage algorithm relying on a rule-based data matching approach. It applies a general strategy for managing the ambiguities due to missing, imprecise or erroneous input data. The process follows an iterative approach that combines automatic pedigree reconstruction with software-empowered human data revision to improve the quality and the accuracy of the results and to optimize the matching rules. The paper discusses the results obtained by reconstructing the entire genealogy of the population of the Val Borbera, a geographically isolated valley in Northern Italy. The genealogy could be reconstructed from data going back as far as the XVI century. The resulting pedigree includes 75,994 trios, 58.9% of which belonging to a unique big family, reconstructed over 13 generations.

Computer-based genealogy reconstruction in founder populations.

BELLAZZI, RICCARDO;LARIZZA, CRISTIANA
2011-01-01

Abstract

This paper describes a software tool that reconstructs entire genealogies from data collected from different and heterogeneous sources, including municipal and parish records archived over centuries. The tool exploits a record linkage algorithm relying on a rule-based data matching approach. It applies a general strategy for managing the ambiguities due to missing, imprecise or erroneous input data. The process follows an iterative approach that combines automatic pedigree reconstruction with software-empowered human data revision to improve the quality and the accuracy of the results and to optimize the matching rules. The paper discusses the results obtained by reconstructing the entire genealogy of the population of the Val Borbera, a geographically isolated valley in Northern Italy. The genealogy could be reconstructed from data going back as far as the XVI century. The resulting pedigree includes 75,994 trios, 58.9% of which belonging to a unique big family, reconstructed over 13 generations.
2011
Computer Science & Engineering includes resources on computer hardware and architecture, computer software, software engineering and design, computer graphics, programming languages, theoretical computing, computing methodologies, broad computing topics, and interdisciplinary computer applications.
Molecular Biology & Genetics considers all aspects of basic and applied genetics, including molecular genetics, prokaryotic and eukaryotic gene expression, mechanisms of mutagenesis, structure, function and regulation of genetic material. Also included are resources concerned with clinical genetics, patterns of inheritance, genetic cause, and screening and treatment of disease. Resources dealing specifically with developmentally regulated gene expression, or with signal transduction pathways that modulate gene expression at the cellular level are excluded and are covered in the Cell and Developmental Biology category.
Sì, ma tipo non specificato
Inglese
Internazionale
STAMPA
44
6
997
1003
7
Computer-based genealogies; Biomedical informatics; Algorithms
9
info:eu-repo/semantics/article
262
Milani, G; Masciullo, C; Sala, C; Bellazzi, Riccardo; Buetti, I; Pistis, G; Traglia, M; Toniolo, D; Larizza, Cristiana
1 Contributo su Rivista::1.1 Articolo in rivista
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/273519
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 4
social impact