Global ETD Search

Return to search

Exploiting Linguistic Knowledge to Infer Properties of Neologisms

Neologisms, or newly-coined words, pose problems for natural language processing (NLP) systems. Due to the recency of their coinage, neologisms are typically not listed in computational lexicons---dictionary-like resources that many NLP applications depend
on. Therefore when a neologism is encountered in a text being processed, the performance of an NLP system will likely suffer due to the missing word-level information. Identifying and documenting the
usage of neologisms is also a challenge in lexicography, the making of dictionaries. The traditional approach to these tasks has been to manually read a lot of text. However, due to the vast quantities of text being produced nowadays, particularly in electronic media such as blogs, it is no longer possible to manually analyze it all in search of neologisms. Methods for automatically identifying and inferring
syntactic and semantic properties of neologisms would therefore address problems encountered in both natural language processing and lexicography. Because neologisms are typically infrequent due to
their recent addition to the language, approaches to automatically learning word-level information relying on statistical distributional information are in many cases inappropriate. Moreover, neologisms occur in many domains and genres, and therefore approaches relying on domain-specific resources are also inappropriate. The hypothesis of this thesis is that knowledge about etymology---including word formation processes and types of semantic change---can be exploited for the acquisition of aspects of the syntax and semantics of neologisms. Evidence supporting this hypothesis is found in three case studies: lexical blends (e.g., "webisode" a blend of "web" and "episode"), text messaging forms (e.g.,
"any1" for "anyone"), and ameliorations and pejorations (e.g., the use of "sick" to mean `excellent', an amelioration).
Moreover, this thesis presents the first computational work on lexical blends and ameliorations and pejorations, and the first unsupervised approach to text message normalization.

http://hdl.handle.net/1807/26140

Computer science

Computational linguistics

Natural language processing

Lexical acquisition

Neologisms

0984

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/26140
Date	14 February 2011
Creators	Cook, C. Paul
Contributors	Stevenson, Suzanne
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.002 seconds

Exploiting Linguistic Knowledge to Infer Properties of Neologisms

Description

Links & Downloads

Tags

Additional Fields