Return to search

Přibližná shoda znakových řetězců a její aplikace na ztotožňování metadat vědeckých publikací / Approximate equality of character strings and its application to record linkage in metadata of scientific publications

The thesis explores the application of approximate string matching in scientific publication record linkage process. An introduction to record matching along with five commonly used metrics for string distance (Levenshtein, Jaro, Jaro-Winkler, Cosine distances and Jaccard coefficient) are provided. These metrics are applied on publication metadata from V3S current research information system of the Czech Technical University in Prague. Based on the findings, optimal thresholds in the F​1​, F​2​ and F​3​-measures are determined for each metric.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:415121
Date January 2020
CreatorsDobiášovský, Jan
ContributorsDvořák, Jan, Ivánek, Jiří
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0018 seconds