11 |
A Semi-Supervised Approach to the Construction of Semantic LexiconsAhmadi, Mohamad Hasan 14 March 2012 (has links)
A growing number of applications require dictionaries of words belonging to semantic classes present in specialized domains. Manually constructed knowledge bases often do not provide sufficient coverage of specialized vocabulary and require substantial effort to build and keep up-to-date. In this thesis, we propose a semi-supervised approach to the construction of domain-specific semantic lexicons based on the distributional similarity hypothesis. Our method starts with a small set of seed words representing the target class and an unannotated text corpus. It locates instances of seed words in the text and generates lexical patterns from their contexts; these patterns in turn extract more words/phrases that belong to the semantic category in an iterative manner. This bootstrapping process can be continued until the output lexicon reaches the desired size.
We explore employing techniques such as learning lexicons for multiple semantic classes at the same time and using feedback from competing lexicons to increase the learning precision. Evaluated for extraction of dish names and subjective adjectives from a corpus of restaurant reviews, our approach demonstrates great flexibility in learning various word classes, and also performance improvements over state of the art bootstrapping and distributional similarity techniques for the extraction of semantically similar words. Its shallow lexical patterns also prove to perform superior to syntactic patterns in capturing the semantic class of words.
|
12 |
Phonological Trends in the Lexicon: The Role of ConstraintsBecker, Michael 01 February 2009 (has links)
This dissertation shows that the generalizations that speakers project from the lexical exceptions of their language are biased to be natural and output-oriented, and it offers a model of the grammar that derives these biases by encoding lexical exceptions in terms of lexically-specific rankings of universal constraints in Optimality Theory (Prince & Smolensky 1993/2004). In this model, lexical trends, i.e. the trends created by the phonological patterning of lexical exceptions, are incorporated into a grammar that applies deterministically to known items, and the same grammar applies stochastically to novel items. The model is based on the Recursive Constraint Demotion algorithm (Tesar & Smolensky 1998, 2000; Tesar 1998; Prince 2002), augmented with a mechanism of constraint cloning (Pater 2006, 2008b). Chapter 2 presents a study of Turkish voicing alternations, showing that speakers replicate the effects that place of articulation and phonological size have on the distribution of voicing alternations in the lexicon, yet speakers ignore the effects of vowel height and backness. This behavior is tied to the absence of regular effects of vowel quality on obstruent voicing cross-linguistically, arguing for a model that derives regular phonology and irregular phonology from the same universal set of OT constraints. Chapter 3 presents a study of Hebrew allomorph selection, where there is a trend for preferring the plural suffix [-ot] with stems that have [o] in them, which is analyzed as a markedness pressure. The analysis of the trend in terms of markedness, i.e. constraints on output forms, predicts that speakers look to the plural stem vowel in their choice of the plural suffix, and ignore the singular stem. Since real Hebrew stems that have [o] in the plural also have [o] in the singular, Hebrew speakers were taught artificial languages that paired the suffix [-ot] with stems that have [o] only in the singular or only in the plural. As predicted, speakers preferred the pairing of [-ot] with stems that have [o] in the plural, i.e. speakers prefer the surface-based, output-oriented generalization. Chapter 4 develops the formal theory of cloning and its general application to lexical trends, and explores its fit with the typologically available data. One necessary aspect of the theory is the "inside out" analysis of paradigms (Hayes 1999), where the underlying representations of roots are always taken to be identical to their surface base form, and abstract underlying representations are limited to affixes. An algorithm for learning the proposed underlying representations is presented in a general form and is applied to a range of test cases.
|
13 |
Social Dialect Features of Military Speech: A Sociolinguistic Study of Fargo VeteransAlbright, Anthony J. January 2020 (has links)
This mixed-methods study examines the potential existence of a military dialect separate from regional or social dialects experienced by civilians. In particular, how similar is the military-related storytelling lexicon of veterans in the Fargo-Moorhead area to the lexicon set forth in training bases and training manuals used by the U.S. military? The lexicon used by veterans in storytelling can sometimes seem opaque to an audience. It is typically dense with meaning borne by a few coded words. These words carry a contextual burden that can be better understood by an appeal to the dialect from which they were borne.
In order to disentangle the veteran way of speaking from other overlapping and intersecting social and regional dialects that make up a subject’s typical speech, guided conversation and word-matching exercises were used to isolate lexicon that was typical to the military experience. The resulting interview transcripts were analyzed in comparison to military training manuals to arrive at a percentage of military-specific terms used in the guided conversation and a percentage of general knowledge military terms retained in the word-matching measure.
The resulting 1.85% of military-specific terms and phrases used by participants in guided conversations and 61% retention of military-specific term knowledge was used to show that the military dialect not only exists but persists in the repertoire of veteran participants. As the majority of those who work with veterans are not veterans themselves, these percentages represent a significant barrier to understanding veteran storytelling. This barrier hinders the successful reintegration and mental health of veterans who return to their communities without knowing how to meaningfully express their stories in their existing support networks.
|
14 |
AXEL : a framework to deal with ambiguity in three-noun compoundsMartinez, Jorge Matadamas January 2010 (has links)
Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two. This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds. This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint. Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of “at the moment of use” interpretations, optimising the way the space is needed for lexical storage. Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools.
|
15 |
Die Konstruktion des allgemeinen Wissens in Zedlers "Universal-Lexicon"Schneider, Ulrich Johannes 17 July 2014 (has links) (PDF)
Das "Universal-Lexicon", das ab 1732 von Johann Heinrich Zedler herausgegeben wurde und bis 1754 auf 68 Folianten und damit zum größten Lexikon des 18. Jahrhunderts anwuchs, ist ein Lexikon ohne Programm. Das macht moderne Leser ratlos im Hinblick auf die verfolgten Ziele. Man sucht ergebnislos eine Ideologie wie bei der französischen "Encyclopedie", ein Bekenntnis zum Wie und Warum,
das im bürgerlichen 18. Jahrhundert ein Datum darstellte. Das "Universal-Lexicon" wirkt ohne Programm schwach und scheint verteidigt werden zu müssen, etwa wie ein Zedler-Forscher 1969 formulierte: „Das Universallexikon blieb allein ein alphabetisches Nachschlagewerk. Aber auch so wurde es dem Anspruch, der Wissenschaft zu dienen, gerecht." Welcher Wissenschaft hat das
"Universal-Lexicon" gedient? Und vor allem: wie eigentlich? Das sind bis heute offene Fragen.
|
16 |
An investigation into near-nativeness at the syntax-lexicon interface : evidence from Dutch learners of EnglishSchutter, John-Sebastian January 2013 (has links)
This thesis investigates whether there are differences in language comprehension and language production between highly advanced/near-native adult learners of a second language (late L2ers) and native speakers (L1ers), and if so, how they should be characterised. In previous literature (Sorace & Filiaci 2006, Sorace 2011 inter alia), nonconvergence of the near-native grammar with the native grammar has been identified as most likely to occur at the interface between syntax and another cognitive domain. This thesis focuses on grammatical and ungrammatical representations at the syntax-lexicon interface between very advanced/near-native Dutch learners of English and native speakers of English. We tested differences in syntactic knowledge representations and real-time processing through eight experiments. By syntactic knowledge representations we mean the explicit knowledge of grammar (specifically word order dependence on lexical-semantics) that a language user exhibits in their language comprehension and production, and by realtime processing we mean the language user’s ability to access implicit and explicit knowledge of grammar under time and/or memory constraints in their language comprehension and production. To test for systematic differences at the syntax-lexicon interface we examined linguistic structures in English that differ minimally in word order from Dutch depending on the presence or absence of certain lexical items and their characteristics; these were possessive structures with animate and inanimate possessors and possessums in either a prenominal or postnominal construction, preposed adverbials of location (locative inversions) followed by either unergative or unaccusative verbs, and preposed adverbials of manner containing a negative polarity item (negative inversions) or positive polarity item followed by either V2 or V3 word order. We used Magnitude Estimation Tasks and Speeded Grammaticality Judgement Tasks to test comprehension, and Syntactic Priming (with/without extra memory load) and Speeded Sentence Completion Tasks to test production. We found evidence for differences in comprehension and production between very advanced, near-native Dutch L2ers and native speakers of English, and that these differences appear to be associated with processing rather than with competence. Dutch L2ers differed from English L1ers with respect to preferences in word order of possessive structures and after preposed adverbials of manner. However, these groups did not differ in production and comprehension with respect to transitivity in locative inversions. We conclude that even among highly advanced to near-native late learners of a second language there may be non-convergence of the L2 grammar. Such non-convergence need not coincide with the L1 grammar but may rather be a result of over-applying linguistic L2 knowledge. Thus, very advanced to near-native L2ers still have access to limited (meta)linguistic resources that under time and memory constraints may result in ungrammatical language comprehension and/or production at the syntax-lexicon interface. In sum, in explaining interface phenomena, the results of this study provide evidence for a processing account over a representational account, i.e. Dutch L2ers showed they possess grammatical knowledge of the specific L2 linguistic structures in comprehension and production, but over-applied this knowledge in exceptional cases under time and/or memory pressure. We suggest that current bilingual production models focus more on working memory by including a separate memory component to such models and conducting empirical research to test its influence on L2 production and comprehension.
|
17 |
Catégorisation lexicale en Muinane : Amazonie Colombienne / Lexical categorization in Muinane : Colombian AmazonDe Vengoechea, Consuelo 10 September 2012 (has links)
Cette thèse cherche à approfondir la culture des Muinanes à travers leur histoire et leur langue. Nous décrivons, en premier lieu, certains aspects ethnographiques et historiques du groupe muinane. En second lieu, et en ce qui concerne la langue, nous abordons le problème de la catégorisation lexicale et établissons des comparaisons entre les caractéristiques du muinane et celles des langues apparentées bora et miraña. En d’autres termes, dans une perspective typologique, notre but est de définir les classes de catégories lexicales du muinane et de déterminer des critères phonologiques, morphosyntaxiques et discursifs à utiliser pour la définition des catégories. Nous abordons aussi la question de la présence ou de l’absence d’une classe adjectivale. Nous décrivons les outils employés par les locuteurs de la langue pour exprimer l’attribution et la qualification et finalement nous proposons un rapport entre l’absence d’une vraie classe adjectivale et le système saillant de classification nominale, dans un ensemble de langues de la région amazonienne appartenant aux familles bora, tukano orientale, uitoto et andoke. / The objective of this doctoral thesis is to approach the culture of the Muinane people through their history and language. We describe some ethnographic and historical aspects of the indigenous muinane group living in the colombian Amazon. We are concerned with the study of their language and particularly with the lexical categorization and with the comparison between the muinane, bora and miraña, all classified as integrating the bora linguistic family. In other words, our goal is to define the classes of lexical categories of the muinane from a typological perspective, and to determine the phonological, morphosyntactical and discursive criteria, which allow us to define this categorization. We debate here the question of the existence or the absence of an adjectival category in the bora languages and the strategies used by their speakers to express qualification and attribution. Finally, we propose a relationship between some languages spoken in the northwest Amazon, which don’t exhibit an adjectival class but have a rich and salient system of nominal classification such as the languages from the bora, uitoto and eastern tukanoan linguistic families.
|
18 |
Creating and validating an aroma and flavor lexicon for the evaluation of sparkling winesLe Barbé, Eric January 1900 (has links)
Master of Science / Department of Food Science / Edgar Chambers IV / Sparkling wines represent an important part of the full wine category. Currently, no lexicon exists that includes aroma, flavors, and mouthfeel for sparkling wine. The objectives of this research were to:1) develop an aroma, flavor, taste and mouthfeel lexicon for sparkling wines, 2) train a panel to use this lexicon on white sparkling wines, which represent the majority of sparkling wines, and validate the panel’s performance with white sparkling wines. For lexicon development, 25 sparkling wines were selected from 132 by a team of sensory professionals and winemakers. The lexicon developed included 13 mouthfeel and taste, 48 aroma, and 48 flavor (aromatic) attributes (109 total attributes). For lexicon training, 22 experienced wine panelists participated in 10, 3-hour sessions over two weeks. After training was complete, panel performance was validated with a practice phase and two studies. Analysis of panel discrimination (i.e. sample p-value) and within panel reproducibility (i.e. correlation of panelist with panel intensity) indicated that the new lexicon differentiated sparkling wines consistently. Further, principal components analysis for studies two and three revealed grouping by wine type (e.g. brut, extra dry, etc.) again validating the new lexicon.
|
19 |
O punk sob o olhar da mídia: um estudo léxico-discursivo / The punk under the gaze of the media: a lexical-discursive studyMelão, Cesar Augusto 04 April 2013 (has links)
A mídia de massa constitui um grupo detentor de um grande poder no âmbito discursivo, uma vez que esse grupo tem acesso e pode controlar as informações que vão a público. O movimento punk, por outro lado, representa diversas minorias na sociedade e sua principal ferramenta de divulgação de ideias é a arte, principalmente a música. O discurso punk, porém, tem um alcance bastante limitado em comparação com a mídia de massa. Tendo em mente essa assimetria de poder, analisamos, nesta dissertação, o discurso da mídia em relação ao punk brasileiro. Desde sua chegada ao Brasil, o punk é alvo de várias confusões, acusações e controvérsias. Sem o mesmo destaque que teve nos anos 1980, o movimento punk, hoje em dia, não tem muita expressividade na mídia de massa. Quando ele é veiculado, em geral, é em razão de algum episódio que envolva violência física ou crimes. Em novembro de 2011 um caso de briga entre punks e neonazistas acabou com um punk morto e um neonazista gravemente ferido. Esse acontecimento teve um destaque notável na mídia, diversos periódicos e programas televisivos abordaram o assunto e até dedicaram programas inteiros para falar sobre o assunto. Esse caso serviu como recorte metodológico para compormos nosso corpus. Selecionamos textos que abordam a vida do jovem assassinado, pois vários deles tratam não só o caso do assassinato, mas também o punk como um todo. Além disso, selecionamos alguns textos da fase inicial do movimento punk para termos uma base de como ele era visto naquela época. Tendo o corpus definido, fizemos um levantamento lexical e separamos as lexias em campos semânticos, utilizando as noções sobre Léxico encontradas em Barbosa (1978), Biderman (1978) e Pottier (1975 e 1985). Analisamos esses dados à luz da abordagem triangular proposta por van Dijk (2008), segundo a qual a produção de sentido deve ser entendida de acordo com os seguintes elementos: discurso, cognição e sociedade. Além disso, utilizamos o recurso metodológico do mesmo autor, chamado de quadrado ideológico (VAN DIJK, 2005) para situar e compreender criticamente as escolhas lexicais no discurso midiático. A partir das análises dos dados obtidos, concluímos que o punk, enquanto objeto do discurso da mídia de massa, adquire um caráter bastante negativo e estereotipado. O indivíduo punk é visto como um sujeito perigoso, ligado ao crime e a situações violentas, além de ser, segundo o estereótipo criado, preconceituoso e agressivo. Entendemos que diversas informações divulgadas pela mídia são manipuladas e manipuladoras. Segundo o pensamento de van Dijk, a manipulação ocorre quando um grupo com mais poder abusa de sua posição favorável para informar as pessoas de modo parcial, isso gera uma compreensão incompleta do evento sobre o qual se fala no discurso. Apesar de não negarmos que o movimento punk manifeste-se de modo violento algumas vezes, notamos que ele, em muitos casos, é alvo de discursos manipuladores, o que gera um estereótipo majoritariamente negativo. / The mass media is a group which holds a great discursive power within itself, once it has access to information that becomes public and can control it. The punk movement, on the other hand, represents several minorities in society and its main tool for the dissemination of ideas is the art, especially music. The punk discourse, however, has a very limited range in comparison with the mass media one. Thinking about that power asymmetry, we analyzed, in this thesis, the media discourse in relation to Brazilian punk. Since its arrival in Brazil, the punk movement is the target of several confusions, accusations and controversies. Without the same prominence it had in the 1980s, the punk movement, today, does not have much expressiveness in the mass media. When it is reported, in general, is due to some incident involving physical violence or crimes. In November 2011 a case of fight between punks and neo-Nazis ended up with a punk killed and seriously injured a neo-Nazi. This event had a remarkable prominence in the media; various journals and television shows have discussed the issue and even devoted entire programs to talk about it. This case served as a methodological approach to compose our corpus. We have selected texts that discuss the life the young punk murdered because many of them talk not only about the murder case, but also about punk as a whole. In addition, we selected some texts of the early punk movement to have a base as he was seen at that time. Having defined the corpus, we did a survey and separate lexical semantic fields using notions about Lexicon found in Barbosa (1978), Biderman (1978) and Pottier (1975 and 1985). We analyze these data in the light of the triangular approach proposed by van Dijk (2008), according to which the production of meaning must be understood according to the following elements: discourse, cognition and society. Furthermore, we use the methodological resource by the same author, called the ideological square (van Dijk, 2005) to locate and critically understand the lexical choices in media discourse. From the analysis of the obtained data, we conclude that the punk, as an object of discourse of the mass media, acquires a very negative and stereotypical image. The punk individual is seen as a dangerous person, linked to crime and violent situations. He/she also is, according to the created stereotype, prejudiced and aggressive. We understand that various disclosures by the media are manipulated and manipulative. According to van Dijk thought, manipulation occurs when a group with more power abuses its comfortable position to inform people partially, so it generates an incomplete understanding of the event about which it speaks in the discourse. While not denying that the punk movement manifests itself violently sometimes, we noticed that it, in many cases, is the target of manipulated speeches, which generates an overwhelmingly negative stereotype.
|
20 |
Afasia e linguagem figurada: o acesso lexical dentro de contextos metafóricos / Aphasia and figurative language: the lexical access in metaphoric contextsLima, Bruna Seixas 03 February 2011 (has links)
Esta pesquisa traz a análise de fenômenos linguísticos extraídos de entrevistas realizadas com seis sujeitos afásicos com diferentes graus de dificuldade de acesso lexical. Observamos a habilidade desses sujeitos em produzir e compreender nomes de animais utilizados em contexto não-literal. Desenvolvemos uma entrevista para determinar se os sujeitos em questão apresentavam dificuldade para acessar os nomes de animais escolhidos. Numa primeira etapa, os sujeitos tiveram de nomeá-los e descrevê-los e, posteriormente, utilizá-los dentro de um contexto provido pela entrevistadora. A hipótese é que possa haver diferença entre a habilidade do sujeito para produzir e compreender nomes de animais dependendo do contexto apresentado. Duas perspectivas de análise diferentes são apresentadas aqui: primeiro, temos as teorias baseadas em correlatos biológicos da linguagem e, em segundo, a teoria linguística de Roman Jakobson sobre o processamento da linguagem e a sua divisão em dois eixos principais, a metáfora e a metonímia (habilidades de abstração baseadas na similaridade e na contiguidade, respectivamente). Alguns sujeitos apresentam dificuldade para produzir formas de palavras no seu sentido literal, mas o mesmo não acontece quando as mesmas palavras são produzidas no seu sentido não-literal, sugerindo que nesses sujeitos o sistema semântico-lexical pode estar mais preservado do que se imagina, sendo que o tipo de entrada ou saída dessas formas lexicais pode ser o elemento prejudicado. A análise das entrevistas realizadas revela que a compreensão dessas mesmas metáforas foi uma tarefa mais laboriosa para os sujeitos, o que reforça nossa hipótese, uma vez que durante a tarefa de compreensão das metáforas os sujeitos não foram providos do contexto dado na tarefa de produção. / This research proposes the analysis of language phenomena taken from interviews made with six aphasic subjects presenting different degrees of lexical access deficits. The focus of this paper is the observation of the ability of these subjects to produce and comprehend names of animals used in a metaphorical context. We developed an interview in order to determine whether the subjects presented problems to access the chosen names of animals. In the first part of the interview, the subjects were asked to name and describe the animal pictures presented and, aftermost, they had to produce and comprehend those names in the context provided by the interviewer. Two distinct perspectives are presented in this paper: first, we have theories based on biological correlates of language, and in second, the linguistic theory by Roman Jakobson about the processing of language and its division in two main axis: metaphor and metonymy (modes of relation based on similarity and contiguity, respectively). Some subjects present distress to produce word forms in their literal meaning, whereas the same does not occur when those words are used in their nonliteral meaning. This suggests that these subjects present a better preservation of the semantic-lexical system than expected, and the only affected element can be the type of input or output of the lexical form. We can see in the interviews presented here that the comprehension of the mentioned metaphors was a more laborious task for the subjects, which reinforces our hypothesis, once during the comprehension part of the interview, the subjects were not provided with the context given previously, in the production task.
|
Page generated in 0.0417 seconds