1 |
Disambiguating Multiple Links in Historical Record LinkageRichards, Laura 30 August 2013 (has links)
Historians and social scientists are very interested in longitudinal data created from historical sources as the longitudinal data creates opportunities for studying people’s lives over time. However, its generation is a challenging problem since historical sources do not have personal identifiers. At the University of Guelph, the People-in-Motion group have currently constructed a record linkage system to link the 1871 Canadian census to the 1881 Canadian census. In this thesis, we discuss one aspect of linking historical census data, the problem of disambiguating multiple links that are created at the linkage step. We show that the disambiguating techniques explored in this thesis improve upon the linkage rate of the People-in-Motion’s system, while maintaining a false positive rate no greater than 5%.
|
2 |
Combination of a Probabilistic-Based and a Rule-Based Approach for Genealogical Record LinkageShah, Pooja P. 01 March 2015 (has links)
Record linkage is the task of identifying records within one or multiple databases that refer to the same entity. Currently, there exist many different approaches for record linkage. Some approaches incorporate the use of heuristic rules, mathematical models, Markov models, or machine learning. This thesis focuses on the application of record linkage to genealogical records within family trees. Today, large collections of genealogical records are stored in databases, which may contain multiple records that refer to a single individual. Resolving duplicate genealogical records can extend our knowledge on who has lived and more complete information can be constructed by combining all information referring to an individual. Simple string matching is not a feasible option for identifying duplicate records due to inconsistencies such as typographical errors, data entry errors, and missing data.
Record linkage algorithms can be classified under two broad categories, a rule-based or heuristic approach, or a probabilistic-based approach. The Cocktail Approach, presented by Shirley Ong Ai Pei, combines a probabilistic-based approach with a rule-based approach for record linkage. This thesis discusses a re-implementation and adoption of the Cocktail Approach to genealogical records.
|
3 |
Maximizing the use of blocking in record linkage : theory and simulationKhan, Mahmudul Huq January 1991 (has links)
Thesis (Ph. D.)--University of Hawaii at Manoa, 1991. / Includes bibliographical references (leaves 128-132) / Microfiche. / xiii, 132 leaves, bound ill. 29 cm
|
4 |
Generování rodokmenů z matričních záznamů / Family Trees Making from Parish RecordsTušimová, Lucia January 2020 (has links)
This work discusses the field of genealogy, different types of records and data in them. The thesis describes the topic of comparison of data and record linkage. It further it also discusses the design and implementation of the resulting system. The developed system connects people from parish records to larger pedigrees. These are then stored in the form of a graph database. The success of the interconnection of records was tested on the provided data sets.
|
5 |
Computation of Weights for Probabilistic Record Linkage Using the EM AlgorithmBauman, G. John 29 June 2006 (has links) (PDF)
Record linkage is the process of combining information about a single individual from two or more records. Probabilistic record linkage gives weights to each field that is compared. The decision of whether the records should be linked is then determined by the sum of the weights, or “Score”, over all fields compared. Using methods similar to the simple versus simple most powerful test, an optimal record linkage decision rule can be established to minimize the number of unlinked records when the probability of false positive and false negative errors are specified. The weights needed for probabilistic record linkage necessitate linking a “training” subset of records for the computations. This is not practical in many settings, as hand matching requires a considerable time investment. In 1989, Matthew A. Jaro demonstrated how the Expectation-Maximization, or EM, algorithm could be used to compute the needed weights when fields have Binomial matching possibilities. This project applies this method of using the EM algorithm to calculate weights for head-of-household records from the 1910 and 1920 Censuses for Ascension Parish of Louisiana and Church and County Records from Perquimans County, North Carolina. This project also expands the Jaro's EM algorithm to a Multinomial framework. The performance of the EM algorithm for calculating weights will be assessed by comparing the computed weights to weights computed by clerical matching. Simulations will also be conducted to investigate the sensitivity of the algorithm to the total number of record pairs, the number of fields with missing entries, the starting values of estimated probabilities, and the convergence epsilon value.
|
6 |
Private Record Linkage: A Comparison of Selected Techniques for Name MatchingGrzebala, Pawel B. 06 May 2016 (has links)
No description available.
|
7 |
Estimation of a lower bound for the cumulative incidence of failure of female surgical sterilisation in NSW: a population-based study.Churches, Timothy January 2007 (has links)
MPhilPH / Female tubal sterilisation, often referred to as "tubal ligation" but more often performed these days using laparoscopically-applied metal clips, remains a popular form of contraception in women who have completed their families. A review of the literature on the incidence of failure of tubal sterilisation found many reports of case-series and small clinic-based studies, but only a few larger studies with good epidemiological designs, most recently the US CREST study conducted during the 1980s and early 1990s. The CREST study reported a conditional (life-table) cumulative incidence of failure of 0.55, 0.84, 1.18 and 1.85 per 100 women at 1, 2, 4 and 10 years of follow-up respectively. The study described here estimated a lower bound for the incidence of tubal sterilisation failure in NSW by probabilistically linking routinely-collected hospital admission records for women undergoing sterilisation surgery to hospital admission records for the same women which were indicative of subsequent conception or which represented censoring events such as hysterectomy or death in hospital. Data for the period July 1992 to June 2000 were used. Kaplan-Meier and proportional-hazards survival analyses were performed on the resulting linked data set. The conditional cumulative incidence per 100 women at 1, 2 4 and 8 years of follow-up was estimated to be 0.74 (95% CI 0.68-0.81), 1.05 (0.97-1.13), 1.33 (1.23-1.42) and 1.51 (1.39-1.62) respectively. Forty percent of failures ended in abortion and 14% presented as ectopic pregnancies. Age, private health insurance status and sterilisation in a smaller hospital were all found to be associated with lower rates of failure. Strong evidence of time-limited excess numbers of failures in women undergoing surgery in particular hospitals was also found. The study demonstrates the feasibility of using linked, routinely-collected health data to evaluate relatively rare, long-term outcomes such as sterilisation failure on a population-wide basis.
|
8 |
Estimation of a lower bound for the cumulative incidence of failure of female surgical sterilisation in NSW: a population-based study.Churches, Timothy January 2007 (has links)
MPhilPH / Female tubal sterilisation, often referred to as "tubal ligation" but more often performed these days using laparoscopically-applied metal clips, remains a popular form of contraception in women who have completed their families. A review of the literature on the incidence of failure of tubal sterilisation found many reports of case-series and small clinic-based studies, but only a few larger studies with good epidemiological designs, most recently the US CREST study conducted during the 1980s and early 1990s. The CREST study reported a conditional (life-table) cumulative incidence of failure of 0.55, 0.84, 1.18 and 1.85 per 100 women at 1, 2, 4 and 10 years of follow-up respectively. The study described here estimated a lower bound for the incidence of tubal sterilisation failure in NSW by probabilistically linking routinely-collected hospital admission records for women undergoing sterilisation surgery to hospital admission records for the same women which were indicative of subsequent conception or which represented censoring events such as hysterectomy or death in hospital. Data for the period July 1992 to June 2000 were used. Kaplan-Meier and proportional-hazards survival analyses were performed on the resulting linked data set. The conditional cumulative incidence per 100 women at 1, 2 4 and 8 years of follow-up was estimated to be 0.74 (95% CI 0.68-0.81), 1.05 (0.97-1.13), 1.33 (1.23-1.42) and 1.51 (1.39-1.62) respectively. Forty percent of failures ended in abortion and 14% presented as ectopic pregnancies. Age, private health insurance status and sterilisation in a smaller hospital were all found to be associated with lower rates of failure. Strong evidence of time-limited excess numbers of failures in women undergoing surgery in particular hospitals was also found. The study demonstrates the feasibility of using linked, routinely-collected health data to evaluate relatively rare, long-term outcomes such as sterilisation failure on a population-wide basis.
|
9 |
Kolektivní propojování entit pro aplikaci ClueMaker / Collective Entity Matching Solution for ClueMaker ApplicationJaroschy, Petr January 2021 (has links)
ClueMaker (CM) is a Java desktop application used for data visualisation (via graph) by subjects like insurance companies (to unravel fraud activity), Czech organisation Hlí- dač Státu (to identify connections between subjects) or many others. This application currently uses a naive way to merge entities from different data sources (matching one field by exact string match). Goal of this thesis is to analyse, create and integrate a solution to CM, which would allow for merging entities based on entity similarity, and integrate such solution into the GUI of CM. Such solution should allow the user to merge two graph entities, show user the potentially same or very similar entities and allow for a global scan of the graph for potential merges. Furthermore, this solution should make use of data relationships within CM in addition to the attributes of entities. 1
|
10 |
Improving Record Linkage Through PedigreesPixton, Burdette N. 10 July 2006 (has links) (PDF)
Record linkage, in a genealogical context, is the process of identifying individuals from multiple sources which refer to the same real-world entity. Current solutions focus on the individuals in question and on complex rules developed by human experts. Genealogical databases are highly-structured with relationships existing between the individuals and other instances. These relationships can be utilized and human involvement greatly minimized by using a filtered structured neural network. These neural networks, using traditional back-propagation methods, are biased in a way to make the network human readable. The results show an increase in precision and recall when pedigree data is available and used.
|
Page generated in 0.1521 seconds