Global ETD Search

81	Measuring Semantic Distance using Distributional Profiles of Concepts Mohammad, Saif 01 August 2008 (has links) Semantic distance is a measure of how close or distant in meaning two units of language are. A large number of important natural language problems, including machine translation and word sense disambiguation, can be viewed as semantic distance problems. The two dominant approaches to estimating semantic distance are the WordNet-based semantic measures and the corpus-based distributional measures. In this thesis, I compare them, both qualitatively and quantitatively, and identify the limitations of each. This thesis argues that estimating semantic distance is essentially a property of concepts (rather than words) and that two concepts are semantically close if they occur in similar contexts. Instead of identifying the co-occurrence (distributional) profiles of words (distributional hypothesis), I argue that distributional profiles of concepts (DPCs) can be used to infer the semantic properties of concepts and indeed to estimate semantic distance more accurately. I propose a new hybrid approach to calculating semantic distance that combines corpus statistics and a published thesaurus (Macquarie Thesaurus). The algorithm determines estimates of the DPCs using the categories in the thesaurus as very coarse concepts and, notably, without requiring any sense-annotated data. Even though the use of only about 1000 concepts to represent the vocabulary of a language seems drastic, I show that the method achieves results better than the state-of-the-art in a number of natural language tasks. I show how cross-lingual DPCs can be created by combining text in one language with a thesaurus from another. Using these cross-lingual DPCs, we can solve problems in one, possibly resource-poor, language using a knowledge source from another, possibly resource-rich, language. I show that the approach is also useful in tasks that inherently involve two or more languages, such as machine translation and multilingual text summarization. The proposed approach is computationally inexpensive, it can estimate both semantic relatedness and semantic similarity, and it can be applied to all parts of speech. Extensive experiments on ranking word pairs as per semantic distance, real-word spelling correction, solving Reader's Digest word choice problems, determining word sense dominance, word sense disambiguation, and word translation show that the new approach is markedly superior to previous ones. Computational Linguistics Natural Language Processing Lexical semantics semantic distance distributional similarity semantic similarity semantic relatedness word concept co-occurrence matrix distributional profiles of concepts thesaurus corpus-based techniques word senses cross-lingual techniques word sense dominance word sense disambiguation wordnet 0984 0800
82	Investigating the universality of a semantic web-upper ontology in the context of the African languages Anderson, Winston Noël 08 1900 (has links) Ontologies are foundational to, and upper ontologies provide semantic integration across, the Semantic Web. Multilingualism has been shown to be a key challenge to the development of the Semantic Web, and is a particular challenge to the universality requirement of upper ontologies. Universality implies a qualitative mapping from lexical ontologies, like WordNet, to an upper ontology, such as SUMO. Are a given natural language family's core concepts currently included in an existing, accepted upper ontology? Does SUMO preserve an ontological non-bias with respect to the multilingual challenge, particularly in the context of the African languages? The approach to developing WordNets mapped to shared core concepts in the non-Indo-European language families has highlighted these challenges and this is examined in a unique new context: the Southern African languages. This is achieved through a new mapping from African language core concepts to SUMO. It is shown that SUMO has no signi ficant natural language ontology bias. / Computing / M. Sc. (Computer Science) Upper Ontology Suggested Upper Merged Ontology (SUMO) Tree comparison Ontology Resource Description Framework (RDF) Lexical semantics Semantic networks Language resources Open environment WordNet Extensible Mark-up Language (XML) African languages of Sub-Saharan Origin Proto-Bantu language 401.430285 Linguistic universals Semantic Web Semantics -- Data processing Ontologies (Information retrieval) Document markup languages Proto-Bantu language RDF (Document markup language)
83	Vers des moteurs de recherche "intelligents" : un outil de détection automatique de thèmes : méthode basée sur l'identification automatique des chaînes de référence / Toward "intelligent" search engines : an automatic topic detection tool : method based on automatic reference chains identification Longo, Laurence 12 December 2013 (has links) Cette thèse se situe dans le domaine du Traitement Automatique des Langues et vise à optimiser la classification des documents dans les moteurs de recherche. Les travaux se concentrent sur le développement d’un outil de détection automatique des thèmes des documents (ATDS-fr). Utilisant peu de connaissances, la méthode hybride adoptée allie des techniques statistiques de segmentation thématique à des méthodes linguistiques identifiant des marqueurs de cohésion. Parmi eux, les chaînes de référence – séquence d’expressions référentielles se rapportant à la même entité du discours (e.g. Paul…il…cet homme) – ont fait l’objet d’une attention particulière, car elles constituent un indice textuel important dans la détection des thèmes (i.e. ce sont des marqueurs d’introduction, de maintien et de changement thématique). Ainsi, à partir d’une étude des chaînes de référence menée dans un corpus issu de genres textuels variés (analyses politiques, rapports publics, lois européennes,éditoriaux, roman), nous avons développé un module d’identification automatique des chaînes de référence RefGen qui a été évalué suivant les métriques actuelles de la coréférence. / This thesis in the field of Natural Language Processing aims at optimizing documents classification in search engines. This work focuses on the development of a tool that automatically detects documents topics (ATDS-fr). Using poor knowledge, the hybrid method combines statistical techniques for topic segmentation and linguistic methods that identify cohesive markers. Among them, reference chains - sequences of referential expressions referring to the same entity (e.g. Paul ... he ... this man) - have been given special attention as they are important topic markers (i.e. they are markers of topic introduction, maintenance and change). Thus, from a study of reference chains extracted from a corpus composed of various textual genres (newspapers, public reports, European laws, editorials and novel) we developed RefGen, an automatic reference chains identification module, which was evaluated according to current coreference metrics. Détection automatique de thèmes Chaînes de référence Traitement automatique des langues Sémantique lexicale Coréférence Genres textuels Segmentation thématique Marqueurs linguistiques Cohésion Linguistique de corpus Topic detection Reference chains Natural language processing Lexical semantics Coreference Textual genre Topic segmentation Linguistic markers Cohesion Corpus linguistics 401.4 004.678
84	Étude sémantique des mots "chance", "fortune", "hasard" et "risque" du XVIIIe au XXIe siècle : perspectives sur le lexique du français et ses usages / A semantic study of the words "chance", "fortune", "hasard" and "risque" from the eighteenth century onward : approaches to the French lexicon and its uses Courbon, Bruno 09 September 2009 (has links) La recherche a pour objet la structuration du champ lexical des mots « chance », « fortune », « hasard » et « risque » du XVIIIe au XXIe siècle. Témoin de mutations qu’a connues la civilisation occidentale durant cette période, ce champ, qui se rattache à la notion de fortune / hasard, présente une relative homogénéité sémantique.Les mots (et leurs dérivés) sont étudiés à travers le déploiement, la régulation et la répartition des normes d’usages, non seulement en français hexagonal, mais aussi en français québécois. L’étude se fonde sur l’exploitation de deux types de corpus. D’une part, un corpus d’articles extraits d’une cinquantaine de dictionnaires sert à mettre en évidence la productivité morphosémantique et sémantique de ces unités dans une perspective historique large. D’autre part, un grand ensemble d’énoncés diversifiés permet, par la mise au jour de types de contextes, d’effectuer un suivi diachronique des usages. L’approche continuiste des différences d’usages s’appuie sur une représentation fréquentielle des changements sémantiques.La thèse apporte une contribution à la question de la variation des usages et du changement sémantique, qui ouvre sur plusieurs perspectives. Elle se veut d’abord une réflexion sur la théorie et la méthodologie descriptives, appréhendées à la lumière de l’analyse de la nature et du rôle des corpus. Elle met ensuite en évidence l’importance de la dimension intersubjective dans l’activité de signification, en particulier le rôle déterminant des structures syntagmatiques dans l’établissement de nouveaux usages sémantiques. Enfin, elle permet de mettre en relation le changement sémantique avec les conditions sociohistoriques et les représentations collectives. / The present study deals with the way in which the lexical field regrouping the words “chance”, “fortune”, “hazard” and “risqué” has been structured in the French language from the eighteenth century till the present day. Revealing major changes in western societies during this period of time, the field, which corresponds to the linguistic representation of the notion of fortune / hasard, presents a certain coherence.We have examined these words and their derived forms through the display, regulation, and distribution of norms of use, not only in Hexagonal French, but also in Quebec French. Two types of corpora have been analysed. On the one hand, a corpus of articles from around 50 dictionaries has been used to emphasize the lexical and semantic productivity of the different units on a large historical scale. On the other hand, in revealing context types, a set of texts reflecting French language varieties has allowed for carrying out a diachronic analysis of lexical uses. The continuist approach to semantic differences rests upon a frequential representation of semantic changes.The thesis brings a significant contribution to the question of usage variations and semantic change, providing new perspectives. It first deals with theory and methodology of lexical description, considered through the analysis of the nature and the role of corpora. It then evidences the central role of syntagmatic structures in the setting of new semantic uses. The study has finally put into relation semantic changes with their historical background and the collective representations of the time. Sémantique lexicale Champ lexical Unité lexicale Chance Fortune Hasard Risque Diachronie Diatopie Néologie Structure Syntagmatique Actanciel Évolution Changement sémantique Usages Représentations sociales Variation Corpus Linguistics Historical semantics Historical lexicology Hexagonal French French-speaking world Lexical semantics Lexicology Lexicography Lexical field Lexical unit Diachrony Dialects Neology Structure Lexical combinations Semantic roles Semantic change Social representations
85	La polysémie des noms de parties du corps humain en français : analyse sémantique de artère, bouche, coeur épaule et pied / Polysemy of French human body part nouns : semantic analysis of artère, bouche, cœur, épaule and pied Bertin, Thomas 26 October 2018 (has links) Cette étude s'inscrit dans le champ de la sémantique lexicale et explore plus particulièrement la question de la polysémie dans le domaine nominal. Dans une première étape, on explicite les enjeux théoriques d'une telle recherche. Cela conduit à accorder une place centrale au concept d'invariant sémantique pour rendre compte de l'identité sémantique d'un nom (en langue) par-delà sa variation de sens (en contexte). Dans une deuxième étape, on circonscrit l'objet empirique – les noms de parties du corps humain en français contemporain – tout en justifiant ce terrain d'étude. Puis, on précise l'approche méthodologique. La suite de la thèse est consacrée à l'investigation empirique proprement dite. Il s'agit d'abord d'offrir une description générale du potentiel de variation sémantique des noms de parties du corps humain en français. Ensuite, c'est une analyse sémantique approfondie du nom cœur qui est proposée. D'une part, on formule un invariant sémantique susceptible de subsumer tous ses emplois (au cœur du sujet, Paul a mal au cœur, avoir à cœur de réussir...). D'autre part, on montre en quoi la diversité de ses emplois présente un caractère finalement régulier. Enfin, quatre autres noms (artère, épaule, bouche et pied) font également l'objet d'une analyse spécifique. Chacune de ces quatre études est l'occasion d'éprouver la pertinence du concept d'invariant sémantique pour rendre compte de la polysémie dans le domaine nominal. / This study comes within the scope of lexical semantics. More specifically, it deals with the topic of polysemy in the nominals. As a first step, theoretical issues of such a research are clarified. It leads to focus on the concept of semantic invariant to give an insight into the semantic identity of a given noun regardless of its contextual variations. As a second step, the empirical object of this research – human body part nouns in contemporary French – is delimited. This gives an opportunity to justify the choice of these nouns as a field of research and to set out the methodological approach. The rest of the dissertation consists in the empirical investigation itself. First of all, an overall description of the semantic variation of the French human body part nouns is provided. Then, a semantic analysis of the noun cœur (“heart”) is developed: on one hand, a semantic invariant – suiting all cœur's contextual variations (au cœur du sujet, Paul a mal au cœur, avoir à cœur de réussir…) – is formulated; on the other hand, it is shown how this variation is deeply regular. Eventually, four more nouns (artère “artery”, épaule “shoulder”, bouche “mouth” and pied “foot”) are studied from a semantic point of view. Each of these four studies offers a new opportunity to test the relevance of the semantic invariant concept in order to give an account of the polysemy in the nominals. Sémantique lexicale Domaine nominal Polysémie Invariant sémantique Nom de partie de corps humain Coeur Artère Épaule Bouche Pied Lexical semantics Nominals Polysemy Semantic invariant Human body part noun Coeur ("heart") Artère ("artery") Épaule ("shoulder") Bouche ("mouth") Pied ("foot") 401.4
86	[en] ESPÍRITO DE CORPUS: CREATION OF A MARINE CORPS BILINGUAL LEXICON / [pt] ESPÍRITO DE CORPUS: CRIAÇÃO DE UM LÉXICO BILÍNGUE DO CORPO DE FUZILEIROS NAVAIS MARIANA LEMOS MULLER 07 June 2022 (has links) [pt] Este estudo apresenta uma pesquisa temática envolvendo Terminologia, Estudos de Tradução Baseados em Corpus, Terminologia Computacional e Semântica Lexical, e tem como objeto de estudo a área do Corpo de Fuzileiros Navais. O objetivo desta pesquisa foi de criar um material terminológico por meio de uma metodologia híbrida de extração de termos desenvolvida a partir de testes com ferramentas de Extração Automática de Termos (EAT). Assim, buscou-se solucionar tanto problemas tradutórios relacionados à subárea de estudo quanto à detecção e validação de candidatos a termos em um corpus. Primeiramente, foi realizado um estudo piloto com o objetivo de avaliar as ferramentas TermoStat Web 3.0 e AntConc 3.5.7. Após os testes por meio da análise de um corpus paralelo bilíngue, foram selecionadas as melhores condições identificadas para se obter uma metodologia eficaz de extração automática de termos aliada à análise humana. Em seguida, essa metodologia foi utilizada para a análise de um corpus bilíngue comparável. Os candidatos a termos extraídos foram então validados pelos critérios de Semântica Lexical propostos por L Homme (2020) e, em seguida, foram detectados seus equivalentes terminológicos. Este estudo resultou na criação do léxico bilíngue Espírito de Corpus. / [en] This study presents a thematic research in the Marine Corps area involving Terminology, Corpus-Based Translation Studies, Computational Terminology and Lexical Semantics. The objective of this research was to create a terminological material through a hybrid methodology of term extraction developed from tests with Automatic Term Extraction (ATE) tools. Thus, we sought to solve both translation problems related to the subarea of study and to the detection and validation of term candidates in a corpus. First, a pilot study was conducted aiming to analyze two tools – TermoStat Web 3.0 and AntConc 3.5.7. After the conduction of the tests through the analysis of a bilingual parallel corpus, the best conditions identified were selected to obtain an effective methodology of automatic extraction of terms allied to human analysis. Then, this methodology was used for the analysis of a comparable bilingual corpus. The term candidates automatically extracted were then validated by the Lexical Semantics criteria proposed by L Homme (2020) and their translation equivalents were detected. This study resulted in the creation of the bilingual lexicon Espírito de Corpus. [pt] LINGUA PORTUGUESA [pt] USMC [pt] CFN [pt] TERMOSTAT [pt] ANTCONC [pt] EXTRACAO AUTOMATICA DE TERMOS [pt] TERMINOLOGIA COMPUTACIONAL [pt] CORPO DE FUZILEIROS NAVAIS [pt] LINGUA INGLESA [pt] LEXICO [pt] TERMINOLOGIA [pt] SEMANTICA LEXICAL [en] PORTUGUESE LANGUAGE [en] UNITED STATES MARINE CORPS [en] CFN [en] TERMOSTAT [en] ANTCONC [en] AUTOMATIC TERM EXTRACTION [en] COMPUTATIONAL LINGUISTICS [en] MARINE CORPS [en] CORPUS-BASED TRANSLATION STUDIES [en] ENGLISH LANGUAGE [en] LEXICON [en] TERMINOLOGY [en] LEXICAL SEMANTICS
87	Från Holocaust till Förintelsen : Etableringsprocessen av begreppet Förintelsen i svenskt språkbruk / The transfer and the establishment from the concept of Holocaust to the Swedish word förintelsen with its new meaning Förintelsen : About the establishment of the word and the concept of Förintelsen in Swedish language Andrée, Johan January 2022 (has links) Jag har i min masteruppsats, utifrån ett begreppshistoriskt perspektiv med transnationell tillämpning, analyserat begreppsöverföringen och etableringen från begreppet Holocaust till det svenska ordet förintelsen med dess nya betydelse Förintelsen. Jag har även analyserat när, hur och vilka som befäste begreppet Förintelsen i svenskt språkbruk. Ordet och begreppet Holocaust Min analys visar att begreppet Holocaust överförs till Sverige och etableras i Sverige genom det svenska ordet och begreppet Förintelsen. Ordet och begreppet Förintelsen Mitt arbete presenterar ett resultat som kullkastar tidigare uppfattningar om etableringen av ordet och begreppet Förintelsen i svenskt språkbruk. Etableringen av ordet och begreppet Förintelsen genomfördes på följande sätt. Bonniers Förlag lanserade år 1978 en översättning av romanen Holocaust, i Sverige marknadsförd med titeln Förintelsen. År 1979 premiärvisade SVT den amerikanska tv-serien Holocaust, i Sverige marknadsförd med titeln Förintelsen. Tidningarna fylldes från år 1979 av artiklar om det nazistiska folkmordet på judar under andra världskriget med rubriken Förintelsen samtidigt som Skandinaviska Institutet för judisk utbildning och kultur producerade läromedel för Sveriges samtliga skolor och bibliotek under begreppet Förintelsen. År 1982 repriserades tv-serien Förintelsen, omgärdad av faktaprogram, allt under det samlande namnet Förintelsen. Ordet och begreppet Förintelsen hade slutligen, efter fyra års intensiv användning, slagit igenom i svenskt språkbruk. Mitt arbete visar att det fanns en samstämmighet, en gemensam vilja hos Bonniers Förlag, SVT, massmedia och judiska organisationer att benämna det nazistiska folkmordet på judar under andra världskriget under den gemensamma benämningen Förintelsen. Alla samverkade för att befästa ordet och begreppet Förintelsen. Jag har i mitt arbete visat att det inte enbart var titeln på en tv-serie som befäste begreppet Förintelsen i svenskt språkbruk. Det krävdes en omfattande samverkan för att etablera ordet och begreppet Förintelsen i Sverige såsom ett samlande begrepp för det nazistiska folkmordet på judar under andra världskriget. / In my master's thesis, from a conceptual historical perspective with transnational application, I have analyzed the concept transfer and the establishment from the concept of Holocaust to the Swedish word förintelsen with its new meaning Förintelsen. I have also analyzed when, how and who consolidated the concept of Förintelsen in Swedish language use. The word and the concept of the Holocaust My analysis shows that the concept of the Holocaust is transferred to Sweden and established in Sweden through the Swedish word and the concept of Förintelsen. The word and the concept of Förintelsen My work presents a result that overturns previous perceptions about the establishment of the word and the concept of Förintelsen in Swedish language. The establishment of the word and the concept of the Holocaust was carried out in the following way. Bonniers Förlag launched in 1978 a translation of the novel Holocaust, in Sweden marketed with the title Förintelsen.In 1979, SVT premiered the American television series Holocaust, in Sweden marketed with the title Förintelsen. From 1979, the newspapers were filled with articles about the Nazi genocide of Jews during World War II, entitled Förintelsen, while the Scandinavian Institute for Jewish Education and Culture produced teaching materials for all of Sweden's schools and libraries under the term Förintelsen. In 1982, the television series Förintelsen was repeated, surrounded by factual programs, all under the collective name Förintelsen. The word and the concept of Förintelsen had finally, after four years of intensive use, made its way into Swedish language. My work shows that there was a consensus, a common will at Bonniers Förlag, SVT, the media and Jewish organizations to name the Nazi genocide of Jews during World War II under the common name Förintelsen. Everyone worked together to consolidate the word and the concept of Förintelsen. In my work, I have shown that it was not only the title of a TV series that consolidated the concept of Förintelsen in Swedish language. Extensive collaboration was required to establish the word and concept of Förintelsen in Sweden as a unifying concept for the Nazi genocide of Jews during World War II. History Historia

Page generated in 0.1774 seconds