The role of the lexicon has been ignored or minimized in most work on computational stylistics. This research is an effort to fill that gap, demonstrating the key role that the lexicon plays in stylistic variation. In doing so, I bring together a number of diverse perspectives, including aesthetic, functional, and sociological aspects of style.
The first major contribution of the thesis is the creation of aesthetic stylistic lexical resources from large mixed-register corpora, adapting statistical techniques from approaches to topic and sentiment analysis. A key novelty of the work is that I consider multiple correlated styles in a single model. Next, I consider a variety of tasks that are relevant to style, in particular tasks relevant to genre and demographic variables, showing that the use of lexical resources compares well to more traditional approaches, in some cases offering information that is simply not available to a system based on surface features. Finally, I focus in on a single stylistic task, Native Language Identification (NLI), offering a novel method for deriving lexical information from native language texts, and using a cross-corpus supervised approach to show definitively that lexical features are key to high performance on this task.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/44095 |
Date | 20 March 2014 |
Creators | Brooke, Julian |
Contributors | Hirst, Graeme |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.002 seconds