Global ETD Search

Return to search

Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches

Automatic language identification has been applied to short texts such as queries in information retrieval, but it has not yet been applied to metadata records. Applying this technology to metadata records, particularly their title elements, would enable creators of metadata records to obtain a value for the language element, which is often left blank due to a lack of linguistic expertise. It would also enable the addition of the language value to existing metadata records that currently lack a language value. Titles lend themselves to the problem of language identification mainly due to their shortness, a factor which increases the difficulty of accurately identifying a language. This study implemented four proven approaches to language identification as well as one open-source approach on a collection of multilingual titles of books and movies. Of the five approaches considered, a reduced N-gram frequency profile and distance measure approach outperformed all others, accurately identifying over 83% of all titles in the collection. Future plans are to offer this technology to curators of digital collections for use.

language identification

metadata

digital collections

Computational linguistics.

Machine-readable bibliographic data.

Metadata.

Identifer	oai:union.ndltd.org:unt.edu/info:ark/67531/metadc801895
Date	05 1900
Creators	Knudson, Ryan Charles
Contributors	Chen, Jiangping, Mihalcea, Rada, 1974-, O'Connor, Brian Clark, Ross, John Robert, ǂd 1938-
Publisher	University of North Texas
Source Sets	University of North Texas
Language	English
Detected Language	English
Type	Thesis or Dissertation
Format	v, 92 pages : illustrations (some color), Text
Rights	Public, Knudson, Ryan Charles, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0141 seconds

Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches

Description

Links & Downloads

Tags

Additional Fields