• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 23
  • 21
  • 10
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 87
  • 87
  • 28
  • 25
  • 22
  • 15
  • 14
  • 14
  • 12
  • 11
  • 11
  • 10
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Semantic Role Agency in Perceptions of the Lexical Items Sick and Evil

Simmons, Nathan G. 18 November 2008 (has links)
Inspired by an ongoing debate in the clinical sciences concerning the value of evil as a label for human behavior (Mowrer 1960, Staub 1999, Wellman 2000, Williams 2004 etc.), this thesis examines the semantic role of AGENT in the lexical items sick and evil. Williams makes the argument that the label evil removes responsibility from the doctor, whereas, the label sick empowers the doctor in bringing about a cure. While this view is not universally accepted in the field, it does bring to light an interesting question in applied linguistic semantics as to the assignment of agency with respect to sick and evil. Based on the close association of the meanings of sick and evil that stems from historical, psychological, and legal perspectives, this thesis assumes that the semantic feature [+/-RESPONSIBILITY] is assigned to either sick or evil at some point along a continuum. This continuum establishes EVIL at one pole and receives [+RESPONSIBILITY] while SICK is at the opposite pole and receives [-RESPONSIBILITY]. Using a variety of prompts to survey 106 respondents, the continuum model is shown to be only partially true. There is a correlation between NON-RESPONSIBILITY and SICK. Also, a continuum exists that allows the assignment of PARTIAL RESPONSIBILITY to both terms. However, there is no definitive significant correlation between RESPONSIBILITY and EVIL. Further conclusions include the indication of adherence to a legal model of guilt, innocence, and insanity in the general conceptions of SICK and EVIL. Also, demographic variation shows little predictive potential in how people perceive SICK and EVIL. This thesis concludes with a proposal for an alternative model using a Greimas Square to represent the conceptions of SICK and EVIL that more appropriately fits the trends found in the survey data.
42

Stratégie domaine par domaine pour la création d'un FrameNet du français : annotations en corpus de cadres et rôles sémantiques / Domain by domain strategy for creating a French FrameNet : corpus annotationsof semantics frames and roles

Djemaa, Marianne 14 June 2017 (has links)
Dans cette thèse, nous décrivons la création du French FrameNet (FFN), une ressource de type FrameNet pour le français créée à partir du FrameNet de l’anglais (Baker et al., 1998) et de deux corpus arborés : le French Treebank (Abeillé et al., 2003) et le Sequoia Treebank (Candito et Seddah, 2012). La ressource séminale, le FrameNet de l’anglais, constitue un modèle d’annotation sémantique de situations prototypiques et de leurs participants. Elle propose à la fois :a) un ensemble structuré de situations prototypiques, appelées cadres, associées à des caractérisations sémantiques des participants impliqués (les rôles);b) un lexique de déclencheurs, les lexèmes évoquant ces cadres;c) un ensemble d’annotations en cadres pour l’anglais. Pour créer le FFN, nous avons suivi une approche «par domaine notionnel» : nous avons défini quatre «domaines» centrés chacun autour d’une notion (cause, communication langagière, position cognitive ou transaction commerciale), que nous avons travaillé à couvrir exhaustivement à la fois pour la définition des cadres sémantiques, la définition du lexique, et l’annotation en corpus. Cette stratégie permet de garantir une plus grande cohérence dans la structuration en cadres sémantiques, tout en abordant la polysémie au sein d’un domaine et entre les domaines. De plus, nous avons annoté les cadres de nos domaines sur du texte continu, sans sélection d’occurrences : nous préservons ainsi la distribution des caractéristiques lexicales et syntaxiques de l’évocation des cadres dans notre corpus. à l’heure actuelle, le FFN comporte 105 cadres et 873 déclencheurs distincts, qui donnent lieu à 1109 paires déclencheur-cadre distinctes, c’est-à-dire 1109 sens. Le corpus annoté compte au total 16167 annotations de cadres de nos domaines et de leurs rôles. La thèse commence par resituer le modèle FrameNet dans un contexte théorique plus large. Nous justifions ensuite le choix de nous appuyer sur cette ressource et motivons notre méthodologie en domaines notionnels. Nous explicitons pour le FFN certaines notions définies pour le FrameNet de l’anglais que nous avons jugées trop floues pour être appliquées de manière cohérente. Nous introduisons en particulier des critères plus directement syntaxiques pour la définition du périmètre lexical d’un cadre, ainsi que pour la distinction entre rôles noyaux et non-noyaux.Nous décrivons ensuite la création du FFN : d’abord, la délimitation de la structure de cadres utilisée pour le FFN, et la création de leur lexique. Nous présentons alors de manière approfondie le domaine notionnel des positions cognitives, qui englobe les cadres portant sur le degré de certitude d’un être doué de conscience sur une proposition. Puis, nous présentons notre méthodologie d’annotation du corpus en cadres et en rôles. à cette occasion, nous passons en revue certains phénomènes linguistiques qu’il nous a fallu traiter pour obtenir une annotation cohérente ; c’est par exemple le cas des constructions à attribut de l’objet.Enfin, nous présentons des données quantitatives sur le FFN tel qu’il est à ce jour et sur son évaluation. Nous terminons sur des perspectives de travaux d’amélioration et d’exploitation de la ressource créée. / This thesis describes the creation of the French FrameNet (FFN), a French language FrameNet type resource made using both the Berkeley FrameNet (Baker et al., 1998) and two morphosyntactic treebanks: the French Treebank (Abeillé et al., 2003) and the Sequoia Treebank (Candito et Seddah, 2012). The Berkeley FrameNet allows for semantic annotation of prototypical situations and their participants. It consists of:a) a structured set of prototypical situations, called frames. These frames incorporate semantic characterizations of the situations’ participants (Frame Elements, or FEs);b) a lexicon of lexical units (LUs) which can evoke those frames;c) a set of English language frame annotations. In order to create the FFN, we designed a “domain by domain” methodology: we defined four “domains”, each centered on a specific notion (cause, verbal communication, cognitive stance, or commercial transaction). We then sought to obtain full frame and lexical coverage for these domains, and annotated the first 100 corpus occurrences of each LU in our domains. This strategy guarantees a greater consistency in terms of frame structuring than other approaches and is conducive to work on both intra-domain and inter-domains frame polysemy. Our annotating frames on continuous text without selecting particular LU occurrences preserves the natural distribution of lexical and syntactic characteristics of frame-evoking elements in our corpus. At the present time, the FFNincludes 105 distinct frames and 873 distinct LUs, which combine into 1,109 LU-frame pairs (i.e. 1,109 senses). 16,167 frame occurrences, as well as their FEs, have been annotated in our corpus. In this thesis, I first situate the FrameNet model in a larger theoretical background. I then justify our using the Berkeley FrameNet as our resource base and explain why we used a domain-by- domain methodology. I next try to clarify some specific BFN notions that we found too vague to be coherently used to make the FFN. Specifically, I introduce more directly syntactic criteria both for defining a frame’s lexical perimeter and for differentiating core FEs from non-core ones.Then, I describe the FFN creation itself first by delimitating a structure of frames that will be used in the resource and by creating a lexicon for these frames. I then introduce in detail the Cognitive Stances notional domain, which includes frames having to do with a cognizer’s degree of certainty about some particular content. Next, I describe our methodology for annotating a corpus with frames and FEs, and analyze our treatment of several specific linguistic phenomena that required additional consideration (such as object complement constructions).Finally, I give quantified information about the current status of the FFN and its evaluation. I conclude with some perspectives on improving and exploiting the FFN.
43

A cognitive semantic assessment of עִם and אֵת's semantic potential

Lyle, Kristopher Aaron 03 1900 (has links)
Thesis (MA)--Stellenbosch University, 2012. / ENGLISH ABSTRACT: This thesis provides a critical assessment of the semantic potential of two Biblical Hebrew lexemes: עִם and אֵת . Previous lexical inquiries of the target lexemes provide the impetus for the current research; this is because the linguistic frameworks assumed by these studies are outmatched in the amount of explanatory power accompanying more recent theoretical developments, primarily evidenced within Cognitive linguistics (and semantics). As its methodological framework, the current study then appropriates these new advances and demonstrates a semantic potential of the target lexemes that can be determined through criteria offered by Tyler and Evans (2003). This criteria specifically aids in the task of semantic demarcation as well as identifying the primary sense, from which the remaining network of senses are derived. Furthermore, not only is an attempt made at representing the range of עִם and אֵת 's semantic potential, but a proposal for the development of these senses is offered as well. This is done primarily through an implementation of the theory of grammaticalization, as posited by Heine et al. (1991). The identified semantic networks are then analyzed from two different perspectives of lexical inquiry: 1) as a monosemy-polysemy cline, and 2) from both a semasiological and onomasiological point of departure (the latter method of onomasiology represents a unique contribution to the assessment of עִם and אֵת since most Biblical Hebrew lexical inquiries are limited to being a semasiological endeavor). The investigation uses the Pentateuch as its data-set and reveals a representation of (at least) eleven distinct senses in עִם 's semantic network as well as אֵת 's. Even though each lexeme's semantic potential is comprised of primarily the same senses, these eleven distinct senses are not completely synonymous and represent different meanings. Significantly, it is determined that 1) both target lexemes share the same primary sense (i.e., proto-scene), 2) both indicate the same core senses and consequently, 3) the target lexemes may rightly be considered as near synonyms. / AFRIKAANSE OPSOMMING:Hierdie tesis bied 'n kritiese evaluering van die semantiese potensiaal van twee Bybelse Hebreeus lekseme: עִם en אֵת . Gebreke in bestaande navorsing ten opsigte van hierdie twee lekseme het die impuls verskaf vir hierdie projek. Onlangse ontwikkelinge in teoretiese taalkunde, in besonder kognitiewe taalkunde (en semantiek), het aangetoon dat die modelle in terme waarvan die bestaande beskrywing van die lekseme gedoen is, agterhaal is. Hierdie studie gebruik die perspektiewe wat kognitiewe semantiek bied om die semantiese potensiaal van hierdie twee Bybels-Hebreeuse lekseme te beskryf. Kriteria wat deur Tyler en Evans (2003) geformuleer is in hulle beskrywing van ‘n aantal Engelse voorsetsels, word as metodologiese vertrekpunt gebruik. Hierdie kriteria is veral nuttig in die semantiese afbakening, asook die identifisering van die primêre betekenis van die lekseme. Lg. bied die basis in terme waarvan die res van netwerk van betekenisonderskeidings beskryf word. In die studie word nie net die gepoog om die verskillende betekenisse van die lekseme te beskryf nie, maar daar word ook gepoog om aan te dui hoe die verskillende onderskeidings ontwikkel het. Dit word primêr gedoen in terme van die grammatikaliseringsteorie van Heine et al (1991). Die semantiese netwerke wat geïdentifiseer is, word vanuit twee verskillende perspektiewe van leksikale ondersoek gedoen: 1) die mono-polisemiese klien (“cline”) en 2) ‘n semasiologiese en onomasiologiese vertrekpunt. Laasgenoemde benadering tot onomasiologie verteenwoordig ‘n unieke bydrae tot die beskrywing van עִם en אֵת aangesien die meeste bestaande Bybels-Hebreeuse beskrywings van die lekseme semasiologies van aard is. Hierdie ondersoek is beperk tot die gebruik van עִם en אֵת in die Pentateug. Ten minste 11 verskillende betekenisseonderskeidings word vir beide lekseme geïdentifiseer. Alhoewel beide lekseme se semantiese potensiaal in baie opsigte dieselfde is, is dit nie presies identies nie. Wat wel merkwaardig is, is 1) dat beide lekseme dieselfde basiese betekenis (dit is die sg. “proto-scene”) het, 2) dat beide dieselfde kernbetekenisonderskeidings het en dat gevolglik 3) hulle as naby-sinonieme bestempel kan word.
44

[en] A SEMANTIC-ASPECTUAL ANALYSIS ABOUT COPULA (LINKING-VERBS) / [pt] UMA ANÁLISE SEMÂNTICO-ASPECTUAL DOS VERBOS DE LIGAÇÃO

SHEILA MEJLACHOWICZ 06 October 2003 (has links)
[pt] Esta pesquisa propõe uma nova forma de análise dos verbos de ligação em Língua Portuguesa. A Gramática Tradicional considera que este grupo de verbos possui apenas uma função sintática, sem atenção ao seu significado. Na verdade, os verbos de ligação e seus complementos constituem uma expressão por si só significativa. Esta característica nos permite compará-los aos verbos-suporte, já que eles têm o mesmo tipo de comportamento. / [en] This research proposes a new way of analising verbos de ligação (copulae) in Portuguese. Traditional Grammar considers this group of verbs as having a purely syntactic function, paying no attention to their meaning. In fact, these verbs and their complements constitute an expression with meaning of its own. This characteristic enables us to compare verbos de ligação to light verbs, since they have the same kind of behavior.
45

Modeling Preferences for Ambiguous Utterance Interpretations / Modélisation de préférences pour l'interprétation d'énoncés ambigus

Mirzapour, Mehdi 28 September 2018 (has links)
Le problème de représentation automatique de la signification logique des énoncés ambigus en langage naturel a suscité l'intérêt des chercheurs dans le domaine de la sémantique computationnelle et de la logique. L'ambiguïté dans le langage naturel peut se manifester au niveau lexical / syntaxique / sémantique de la construction de sens, ou elle peut être causée par d'autres facteurs tels que la grammaticalité et le manque de contexte dans lequel la phrase est effectivement prononcée. L'approche traditionnelle Montagovienne ainsi que ses extensions modernes ont tenté de capturer ce phénomène en fournissant quelques modèles qui permettent la génération automatique de formules logiques. Cependant, il existe un axe de recherche qui n'est pas encore profondément étudié: classer les interprétations d'énoncés ambigus en fonction des préférences réelles des utilisateurs de la langue. Ce manque suggère une nouvelle direction d'étude qui est partiellement explorée dans ce mémoire en modélisant des préférences de sens en alignement avec certaines des théories de performance préférentielles humaines bien étudiées disponibles dans la littérature linguistique et psycholinguistique.Afin d'atteindre cet objectif, nous suggérons d'utiliser / d'étendre les Grammaires catégorielles pour notre analyse syntaxique et les Réseaux catégoriels de preuve comme notre analyse syntaxique. Nous utilisons également le Lexique Génératif Montagovien pour dériver une formule logique multi-triée comme notre représentation de signification sémantique. Cela ouvrirait la voie à nos contributions à cinq volets, à savoir, (i) le classement de la portée du quantificateur multiple au moyen de l'opérateur epsilon de Hilbert sous-spécifié et des réseaux de preuve catégoriels; (ii) modéliser la gradation sémantique dans les phrases qui ont des coercitions implicites dans leurs significations. Nous utilisons un cadre appelé Montagovian Generative Lexicon. Notre tâche est d'introduire une procédure pour incorporer des types et des coercitions en utilisant des données lexicales produites par externalisation ouverte qui sont recueillies par un jeu sérieux appelé JeuxDeMots; (iii) l'introduction de nouvelles métriques sensibles au référent basées sur la localité pour mesurer la complexité linguistique au moyen de réseaux de preuve catégoriels; (iv) l'introduction d'algorithmes pour l'achèvement des phrases avec différentes mesures linguistiquement motivées pour sélectionner les meilleurs candidats; (v)l'intégration de différentes métriques de calcul pour les préférences de classement afin de faire d'elles un modèle unique. / The problem of automatic logical meaning representation for ambiguous natural language utterances has been the subject of interest among the researchers in the domain of computational and logical semantics. Ambiguity in natural language may be caused in lexical/syntactical/semantical level of the meaning construction or it may be caused by other factors such as ungrammaticality and lack of the context in which the sentence is actually uttered. The traditional Montagovian framework and the family of its modern extensions have tried to capture this phenomenon by providing some models that enable the automatic generation of logical formulas as the meaning representation. However, there is a line of research which is not profoundly investigated yet: to rank the interpretations of ambiguous utterances based on the real preferences of the language users. This gap suggests a new direction for study which is partially carried out in this dissertation by modeling meaning preferences in alignment with some of the well-studied human preferential performance theories available in the linguistics and psycholinguistics literature.In order to fulfill this goal, we suggest to use/extend Categorial Grammars for our syntactical analysis and Categorial Proof Nets as our syntactic parse. We also use Montagovian Generative Lexicon for deriving multi-sorted logical formula as our semantical meaning representation. This would pave the way for our five-folded contributions, namely, (i) ranking the multiple-quantifier scoping by means of underspecified Hilbert's epsilon operator and categorial proof nets; (ii) modeling the semantic gradience in sentences that have implicit coercions in their meanings. We use a framework called Montagovian Generative Lexicon. Our task is introducing a procedure for incorporating types and coercions using crowd-sourced lexical data that is gathered by a serious game called JeuxDeMots; (iii) introducing a new locality-based referent-sensitive metrics for measuring linguistic complexity by means of Categorial Proof Nets; (iv) introducing algorithms for sentence completions with different linguistically motivated metrics to select the best candidates; (v) and finally integration of different computational metrics for ranking preferences in order to make them a unique model.
46

A semântica dos adjetivos: como e por que incluí-la em uma ontologia de domínio jurídico

Bertoldi, Anderson 26 February 2007 (has links)
Made available in DSpace on 2015-03-05T18:10:48Z (GMT). No. of bitstreams: 0 Previous issue date: 26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A meta principal desta pesquisa é realizar um estudo da semântica dos adjetivos objetivando a sua representação em uma ontologia jurídica. O fato motivador desse interesse foi o estudo de ontologias e léxicos computacionais jurídicos e a constatação de que os adjetivos não recebem um tratamento sistemático nessas ferramentas computacionais de conhecimento especializado. A partir dessa constatação, partiu-se para o estudo de ontologias e léxicos computacionais de linguagem não-especializada, buscando elementos práticos e teóricos para a inclusão sistemática, e não ocasional, de adjetivos em uma ontologia jurídica. Esta pesquisa defende que a eficiência no tratamento computacional da linguagem requer uma combinação de teorias. Assim, a metodologia adotada combina diferentes abordagens teóricas. Através do estudo do corpus e construção da ontologia jurídica, percebe-se a importância dos adjetivos para a organização do conhecimento especializado. Nos domínios de conhecimento especializados, os adjetivos têm a fu / The main goal of this research is to study the semantics of the adjectives in order to codify them into a legal ontology. Through a search for legal ontologies and lexicons one verifies that adjectives are not codified systematically into specialized lexicons and ontologies. Taking the codification of adjectives into a legal ontology as the target of this work, one analyzes non-specialized language ontologies and lexicons. The purpose of such analysis is to find practical and theoretical elements for including adjectives systematically into a legal ontology. This research defends the position that integrating linguistic approaches is more fruitful to natural language processing. Therefore, the methodology applied here combines different theoretical approaches. The corpus analysis and legal ontology construction shows the importance of adjectives in organizing the specialized knowledge. In the specialized domains, adjectives have the main function of classifying entities
47

A semântica dos compostos nominais – um estudo de corpus paralelo inglês/português

Teixeira, Lílian Figueiró 10 March 2009 (has links)
Made available in DSpace on 2015-03-05T18:11:58Z (GMT). No. of bitstreams: 0 Previous issue date: 10 / Nenhuma / Os compostos nominais são construções produtivas em diversos idiomas, ou seja, novas combinações são facilmente criadas em contextos de uso da língua. No entanto, esse fenômeno linguístico é idiossincrático, fato que torna o seu estudo um desafio para a linguística e para as investigações que se ocupam do Processamento da Linguagem Natural. Neste trabalho, é feita uma investigação sobre a forma como os elementos constituintes dos compostos nominais em inglês formados por dois substantivos (compostos NN) se relacionam semanticamente e quais as características dos seus correspondentes de tradução em língua portuguesa encontrados em dez edições da revista National Geographic. O objetivo desta investigação é identificar as relações mais frequentes no corpus a fim de que se possa propor uma tipologia que expresse a composicionalidade semântica dessas construções. Para alcançar esse fim, o trabalho está dividido em três etapas. A primeira etapa consiste em apresentar os pressupostos teóricos adotados no trabalho. P / Noun compounds are productive constructions in many languages. However, they are idiosyncratic, fact that makes the study of this linguistic phenomenon a challenge for the linguistics and for the Natural Language researches. The purpose of this paper is to study the semantics of the noun compounds formed by two nouns (NN compounds). It is also intended to identify the translation equivalents in Portuguese found in ten editions of the National Geographic Magazine. The final product is a proposal of typology which expresses the compositionality of the NN compounds according to the data found in the corpus. This paper has three distinctive parts, where the following subjects are introduced: the theoretical bases for this paper; the methodological resources from Corpus Linguistics that were adopted; the analysis and discussion about the data. Concepts about the semantics of nominal compounds as productivity, semantic transparency, headness, lexicalization and nominalization are commented. Two theories were used f
48

Jazykovědné termíny v mediálním diskurzu (Studie determinologizace) / Linguistics terms in mass media (Study of determinologization)

Pavlová, Hana January 2019 (has links)
The diploma thesis focuses on linguistic terminology in current mass media discourse. The main objective of this thesis is to analyze the occurrences of linguistic terms in the corpus SYN2013PUB and to describe their function, meaning and context and the process of their determinologization. The first part concerns the existing approaches to terms and terminology, the specifics of the linguistics terminology, the function of the terms in the professional style and their use in the journalistic style. Furthermore, the thesis deals with the relations between general lexicon and terminology, terminologization and determinologization. The analysis of the examples indicates, how to follow the determinologization in the texts. The examples are classified according to the indicators of determinologization. These indicators are typical means of journalistic style (which contains terms), journalistic innovation of the collocability, specific ways of explaining the terms and the semantics or formal modifications of the terms. The determinologization is considered an unfinished process, accomplished for diverse corpus occurrences of linguistic terms in various extent.
49

Aide à l'identification de relations lexicales au moyen de la sémantique distributionnelle et son application à un corpus bilingue du domaine de l'environnement

Bernier-Colborne, Gabriel 08 1900 (has links)
L’analyse des relations lexicales est une des étapes principales du travail terminologique. Cette tâche, qui consiste à établir des liens entre des termes dont les sens sont reliés, peut être facilitée par des méthodes computationnelles, notamment les techniques de la sémantique distributionnelle. En estimant la similarité sémantique des mots à partir d’un corpus, ces techniques peuvent faciliter l’analyse des relations lexicales. La qualité des résultats offerts par les méthodes distributionnelles dépend, entre autres, des nombreuses décisions qui doivent être prises lors de leur mise en œuvre, notamment le choix et le paramétrage du modèle. Ces décisions dépendent, à leur tour, de divers facteurs liés à l’objectif visé, tels que la nature des relations lexicales que l’on souhaite détecter; celles-ci peuvent comprendre des relations paradigmatiques classiques telles que la (quasi-)synonymie (p. ex. conserver -> préserver), mais aussi d’autres relations telles que la dérivation syntaxique (p. ex. conserver -> conservation). Cette thèse vise à développer un cadre méthodologique basé sur la sémantique distributionnelle pour l’analyse des relations lexicales à partir de corpus spécialisés. À cette fin, nous vérifions comment le choix, le paramétrage et l’interrogation d’un modèle distributionnel doivent tenir compte de divers facteurs liés au projet terminologique envisagé : le cadre descriptif adopté, les relations ciblées, la partie du discours des termes à décrire et la langue traitée (en l’occurrence, le français ou l’anglais). Nous montrons que deux des relations les mieux détectées par l’approche distributionnelle sont la (quasi-)synonymie et la dérivation syntaxique, mais que les modèles qui captent le mieux ces deux types de relations sont très différents. Ainsi, les relations ciblées ont une influence importante sur la façon dont on doit paramétrer le modèle pour obtenir les meilleurs résultats possibles. Un autre facteur à considérer est la partie du discours des termes à décrire. Nos résultats indiquent notamment que les relations entre verbes sont moins bien modélisées par cette approche que celles entre adjectifs ou entre noms. Le cadre descriptif adopté pour un projet terminologique est également un facteur important à considérer lors de l’application de l’approche distributionnelle. Dans ce travail, nous comparons deux cadres descriptifs, l’un étant basé sur la sémantique lexicale et l’autre, sur la sémantique des cadres. Nos résultats indiquent que les méthodes distributionnelles détectent les termes qui évoquent le même cadre sémantique moins bien que certaines relations lexicales telles que la synonymie. Nous montrons que cet écart est attribuable au fait que les termes qui évoquent des cadres sémantiques comprennent une proportion importante de verbes et aux différences importantes entre les modèles qui produisent les meilleurs résultats pour la dérivation syntaxique et les relations paradigmatiques classiques telles que la synonymie. En somme, nous évaluons deux modèles distributionnels différents, analysons systématiquement l’influence de leurs paramètres et vérifions comment cette influence varie en fonction de divers aspects du projet terminologique. Nous montrons de nombreux exemples de voisinages distributionnels, que nous explorons au moyen de graphes, et discutons les sources d’erreurs. Ce travail fournit ainsi des balises importantes pour l’application de méthodes distributionnelles dans le cadre du travail terminologique. / Identifying semantic relations is one of the main tasks involved in terminology work. This task, which aims to establish links between terms whose meanings are related, can be assisted by computational methods, including those based on distributional semantics. These methods estimate the semantic similarity of words based on corpus data, which can help terminologists identify semantic relations. The quality of the results produced by distributional methods depends on several decisions that must be made when applying them, such as choosing a model and selecting its parameters. In turn, these decisions depend on various factors related to the target application, such as the types of semantic relations one wishes to identify. These can include typical paradigmatic relations such as (near-)synonymy (e.g. preserve -> protect), but also other relations such as syntactic derivation (e.g. preserve -> preservation). This dissertation aims to further the development of a methodological framework based on distributional semantics for the identification of semantic relations using specialized corpora. To this end, we investigate how various aspects of terminology work must be accounted for when selecting a distributional semantic model and its parameters, as well as those of the method used to query the model. These aspects include the descriptive framework, the target relations, the part of speech of the terms being described, and the language (in this case, French or English). Our results show that two of the relations that distributional semantic models capture most accurately are (near-)synonymy and syntactic derivation. However, the models that produce the best results for these two relations are very different. Thus, the target relations are an important factor to consider when choosing a model and tuning it to obtain the most accurate results. Another factor that should be considered is the part of speech of the terms that are being worked on. Among other things, our results suggest that relations between verbs are not captured as accurately as those between nouns or adjectives by distributional semantic models. The descriptive framework used for a given project is also an important factor to consider. In this work, we compare two descriptive frameworks, one based on lexical semantics and another based on frame semantics. Our results show that terms that evoke the same semantic frame are not captured as accurately as certain semantic relations, such as synonymy. We show that this is due to (at least) two reasons: a high percentage of frame-evoking terms are verbs, and the models that capture syntactic derivation most accurately are very different than those that work best for typical paradigmatic relations such as synonymy. In summary, we evaluate two different distributional semantic models, we analyze the influence of their parameters, and we investigate how this influence varies with respect to various aspects of terminology work. We show many examples of distributional neighbourhoods, which we explore using graphs, and discuss sources of noise. This dissertation thus provides important guidelines for the use of distributional semantic models for terminology work.
50

Klasifikátor pro sémantické vzory užívání anglických sloves / Classifier for semantic patterns of English verbs

Kríž, Vincent January 2012 (has links)
The goal of the diploma thesis is to design, implement and evaluate classifiers for automatic classification of semantic patterns of English verbs according to a pattern lexicon that draws on the Corpus Pattern Analysis. We use a pilot collection of 30 sample English verbs as training and test data sets. We employ standard methods of machine learning. In our experiments we use decision trees, k-nearest neighbourghs (kNN), support vector machines (SVM) and Adaboost algorithms. Among other things we concentrate on feature design and selection. We experiment with both morpho-syntactic and semantic features. Our results show that the morpho-syntactic features are the most important for statistically-driven semantic disambiguation. Nevertheless, for some verbs the use of semantic features plays an important role.

Page generated in 0.0628 seconds