Return to search

String Matching Techniques: An Empirical Assessment Based on Statistics Austria's Business Register

The maintenance and updating of Statistics Austria's business register
requires a regularly matching of the register against other data sources;
one of them is the register of tax units of the Austrian Federal Ministry of
Finance. The matching process is based on string comparison via bigrams of
enterprise names and addresses, and a quality class approach assigning pairs
of register units into classes of different compliance (i.e., matching quality)
based on bigram similarity values and the comparison of other matching variables,
like the NACE code or the year of foundation.
Based on methodological research concerning matching techniques carried
out in the DIECOFIS project, an empirical comparison of the bigram method
and other string matching techniques was conducted: the edit distance, the
Jaro algorithm and the Jaro-Winkler algorithm, the longest common subsequence
and the maximal match were selected as appropriate alternatives and
evaluated in the study.
This paper briefly introduces Statistics Austria's business register and the corresponding
maintenance process and reports on the results of the empirical
study.

Identiferoai:union.ndltd.org:VIENNA/oai:epub.wu-wien.ac.at:5630
Date January 2005
CreatorsDenk, Michaela, Hackl, Peter, Rainer, Norbert
PublisherAustrian Statistical Society, c/o Bundesanstalt Statistik Austria
Source SetsWirtschaftsuniversität Wien
LanguageEnglish
Detected LanguageEnglish
TypeArticle, PeerReviewed
Formatapplication/pdf
RightsCreative Commons: Attribution 4.0 International (CC BY 4.0)
Relationhttp://www.ajs.or.at/index.php/ajs/article/view/vol34%2C%20no3%20-%201, http://www.ajs.or.at/index.php/ajs, http://www.ajs.or.at/index.php/ajs/about/editorialPolicies#openAccessPolicy, http://eeyore.wu-wien.ac.at/stat4/hackl/home.html, http://epub.wu.ac.at/5630/

Page generated in 0.0025 seconds