Global ETD Search

Return to search

Genealogy Extraction and Tree Generation from Free Form Text

Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create an extensive tree. This thesis proposes a novel three part system which can automatically interpret free form text to extract relationships and produce a family tree compliant with GED- COM formatting. The first subsystem builds an extendable database of genealogical records that are systematically extracted from free form text. This corpus provides the tagged data for the second subsystem, which trains a Naı̈ve Bayes classifier to predict relationships from free form text by examining the types of relationships for pairs of entities and their associated feature vectors. The last subsystem accumulates extracted relationships into family trees. When a multiclass Naı̈ve Bayes classifier is used, the proposed system achieves an accuracy of 54%. When binary Naı̈ve Bayes classifiers are used, the proposed system achieves accuracies of 69% for the child to parent relationship classifier, 75% for the spousal relationship classifier, and 73% for the sibling relationship classifier.

Genealogy Extraction

Relation Extraction

Information Extraction

Wikipedia

Machine Learning

Natural Language Processing

Computer Engineering

Identifer	oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-3086
Date	01 December 2017
Creators	Chu, Timothy Sui-Tim
Publisher	DigitalCommons@CalPoly
Source Sets	California Polytechnic State University
Detected Language	English
Type	text
Format	application/pdf
Source	Master's Theses

Page generated in 0.0022 seconds

Genealogy Extraction and Tree Generation from Free Form Text

Description

Links & Downloads

Tags

Additional Fields