Global ETD Search

Return to search

Understanding and Exploiting Language Diversity

Languages are well known to be diverse on all structural levels, from the smallest (phonemic) to the broadest (pragmatic). We propose a set of formal, quantitative measures for the language diversity of linguistic phenomena, the resource incompleteness, and resource incorrectness. We apply all these measures to lexical semantics where we show how evidence of a high degree of universality within a given language set can be used to extend lexico-semantic resources in a precise, diversity-aware manner. We demonstrate our approach on several case studies: First is on polysemes and homographs among cases of lexical ambiguity. Contrarily to past research that focused solely on exploiting systematic polysemy, the notion of universality provides us with an automated method also capable of predicting irregular polysemes. Second is to automatically identify cognates from the existing lexical resource across different orthographies of genetically unrelated languages. Contrarily to past research that focused on detecting cognates from 225 concepts of Swadesh list, we captured 3.1 million cognate pairs across 40 different orthographies and 335 languages by exploiting the existing wordnet-like lexical resources.

https://hdl.handle.net/11572/368635

Identifer	oai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/368635
Date	January 2018
Creators	Batsuren, Khuyagbaatar
Contributors	Batsuren, Khuyagbaatar, Giunchiglia, Fausto
Publisher	Università degli studi di Trento, place:TRENTO
Source Sets	Università di Trento
Language	English
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis
Rights	info:eu-repo/semantics/closedAccess
Relation	firstpage:1, lastpage:95, numberofpages:95

Page generated in 0.0026 seconds

Understanding and Exploiting Language Diversity

Description

Links & Downloads

Tags

Additional Fields