This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.
Identifer | oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-4832 |
Date | 11 December 2013 |
Creators | Larsen, Stasha Ann Bown |
Publisher | BYU ScholarsArchive |
Source Sets | Brigham Young University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | http://lib.byu.edu/about/copyright/ |
Page generated in 0.002 seconds