• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 23
  • 21
  • 10
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 87
  • 87
  • 28
  • 25
  • 22
  • 15
  • 14
  • 14
  • 12
  • 11
  • 11
  • 10
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Measuring Semantic Distance using Distributional Profiles of Concepts

Mohammad, Saif 01 August 2008 (has links)
Semantic distance is a measure of how close or distant in meaning two units of language are. A large number of important natural language problems, including machine translation and word sense disambiguation, can be viewed as semantic distance problems. The two dominant approaches to estimating semantic distance are the WordNet-based semantic measures and the corpus-based distributional measures. In this thesis, I compare them, both qualitatively and quantitatively, and identify the limitations of each. This thesis argues that estimating semantic distance is essentially a property of concepts (rather than words) and that two concepts are semantically close if they occur in similar contexts. Instead of identifying the co-occurrence (distributional) profiles of words (distributional hypothesis), I argue that distributional profiles of concepts (DPCs) can be used to infer the semantic properties of concepts and indeed to estimate semantic distance more accurately. I propose a new hybrid approach to calculating semantic distance that combines corpus statistics and a published thesaurus (Macquarie Thesaurus). The algorithm determines estimates of the DPCs using the categories in the thesaurus as very coarse concepts and, notably, without requiring any sense-annotated data. Even though the use of only about 1000 concepts to represent the vocabulary of a language seems drastic, I show that the method achieves results better than the state-of-the-art in a number of natural language tasks. I show how cross-lingual DPCs can be created by combining text in one language with a thesaurus from another. Using these cross-lingual DPCs, we can solve problems in one, possibly resource-poor, language using a knowledge source from another, possibly resource-rich, language. I show that the approach is also useful in tasks that inherently involve two or more languages, such as machine translation and multilingual text summarization. The proposed approach is computationally inexpensive, it can estimate both semantic relatedness and semantic similarity, and it can be applied to all parts of speech. Extensive experiments on ranking word pairs as per semantic distance, real-word spelling correction, solving Reader's Digest word choice problems, determining word sense dominance, word sense disambiguation, and word translation show that the new approach is markedly superior to previous ones.
82

Investigating the universality of a semantic web-upper ontology in the context of the African languages

Anderson, Winston Noël 08 1900 (has links)
Ontologies are foundational to, and upper ontologies provide semantic integration across, the Semantic Web. Multilingualism has been shown to be a key challenge to the development of the Semantic Web, and is a particular challenge to the universality requirement of upper ontologies. Universality implies a qualitative mapping from lexical ontologies, like WordNet, to an upper ontology, such as SUMO. Are a given natural language family's core concepts currently included in an existing, accepted upper ontology? Does SUMO preserve an ontological non-bias with respect to the multilingual challenge, particularly in the context of the African languages? The approach to developing WordNets mapped to shared core concepts in the non-Indo-European language families has highlighted these challenges and this is examined in a unique new context: the Southern African languages. This is achieved through a new mapping from African language core concepts to SUMO. It is shown that SUMO has no signi ficant natural language ontology bias. / Computing / M. Sc. (Computer Science)
83

Vers des moteurs de recherche "intelligents" : un outil de détection automatique de thèmes : méthode basée sur l'identification automatique des chaînes de référence / Toward "intelligent" search engines : an automatic topic detection tool : method based on automatic reference chains identification

Longo, Laurence 12 December 2013 (has links)
Cette thèse se situe dans le domaine du Traitement Automatique des Langues et vise à optimiser la classification des documents dans les moteurs de recherche. Les travaux se concentrent sur le développement d’un outil de détection automatique des thèmes des documents (ATDS-fr). Utilisant peu de connaissances, la méthode hybride adoptée allie des techniques statistiques de segmentation thématique à des méthodes linguistiques identifiant des marqueurs de cohésion. Parmi eux, les chaînes de référence – séquence d’expressions référentielles se rapportant à la même entité du discours (e.g. Paul…il…cet homme) – ont fait l’objet d’une attention particulière, car elles constituent un indice textuel important dans la détection des thèmes (i.e. ce sont des marqueurs d’introduction, de maintien et de changement thématique). Ainsi, à partir d’une étude des chaînes de référence menée dans un corpus issu de genres textuels variés (analyses politiques, rapports publics, lois européennes,éditoriaux, roman), nous avons développé un module d’identification automatique des chaînes de référence RefGen qui a été évalué suivant les métriques actuelles de la coréférence. / This thesis in the field of Natural Language Processing aims at optimizing documents classification in search engines. This work focuses on the development of a tool that automatically detects documents topics (ATDS-fr). Using poor knowledge, the hybrid method combines statistical techniques for topic segmentation and linguistic methods that identify cohesive markers. Among them, reference chains - sequences of referential expressions referring to the same entity (e.g. Paul ... he ... this man) - have been given special attention as they are important topic markers (i.e. they are markers of topic introduction, maintenance and change). Thus, from a study of reference chains extracted from a corpus composed of various textual genres (newspapers, public reports, European laws, editorials and novel) we developed RefGen, an automatic reference chains identification module, which was evaluated according to current coreference metrics.
84

Étude sémantique des mots "chance", "fortune", "hasard" et "risque" du XVIIIe au XXIe siècle : perspectives sur le lexique du français et ses usages / A semantic study of the words "chance", "fortune", "hasard" and "risque" from the eighteenth century onward : approaches to the French lexicon and its uses

Courbon, Bruno 09 September 2009 (has links)
La recherche a pour objet la structuration du champ lexical des mots « chance », « fortune », « hasard » et « risque » du XVIIIe au XXIe siècle. Témoin de mutations qu’a connues la civilisation occidentale durant cette période, ce champ, qui se rattache à la notion de fortune / hasard, présente une relative homogénéité sémantique.Les mots (et leurs dérivés) sont étudiés à travers le déploiement, la régulation et la répartition des normes d’usages, non seulement en français hexagonal, mais aussi en français québécois. L’étude se fonde sur l’exploitation de deux types de corpus. D’une part, un corpus d’articles extraits d’une cinquantaine de dictionnaires sert à mettre en évidence la productivité morphosémantique et sémantique de ces unités dans une perspective historique large. D’autre part, un grand ensemble d’énoncés diversifiés permet, par la mise au jour de types de contextes, d’effectuer un suivi diachronique des usages. L’approche continuiste des différences d’usages s’appuie sur une représentation fréquentielle des changements sémantiques.La thèse apporte une contribution à la question de la variation des usages et du changement sémantique, qui ouvre sur plusieurs perspectives. Elle se veut d’abord une réflexion sur la théorie et la méthodologie descriptives, appréhendées à la lumière de l’analyse de la nature et du rôle des corpus. Elle met ensuite en évidence l’importance de la dimension intersubjective dans l’activité de signification, en particulier le rôle déterminant des structures syntagmatiques dans l’établissement de nouveaux usages sémantiques. Enfin, elle permet de mettre en relation le changement sémantique avec les conditions sociohistoriques et les représentations collectives. / The present study deals with the way in which the lexical field regrouping the words “chance”, “fortune”, “hazard” and “risqué” has been structured in the French language from the eighteenth century till the present day. Revealing major changes in western societies during this period of time, the field, which corresponds to the linguistic representation of the notion of fortune / hasard, presents a certain coherence.We have examined these words and their derived forms through the display, regulation, and distribution of norms of use, not only in Hexagonal French, but also in Quebec French. Two types of corpora have been analysed. On the one hand, a corpus of articles from around 50 dictionaries has been used to emphasize the lexical and semantic productivity of the different units on a large historical scale. On the other hand, in revealing context types, a set of texts reflecting French language varieties has allowed for carrying out a diachronic analysis of lexical uses. The continuist approach to semantic differences rests upon a frequential representation of semantic changes.The thesis brings a significant contribution to the question of usage variations and semantic change, providing new perspectives. It first deals with theory and methodology of lexical description, considered through the analysis of the nature and the role of corpora. It then evidences the central role of syntagmatic structures in the setting of new semantic uses. The study has finally put into relation semantic changes with their historical background and the collective representations of the time.
85

La polysémie des noms de parties du corps humain en français : analyse sémantique de artère, bouche, coeur épaule et pied / Polysemy of French human body part nouns : semantic analysis of artère, bouche, cœur, épaule and pied

Bertin, Thomas 26 October 2018 (has links)
Cette étude s'inscrit dans le champ de la sémantique lexicale et explore plus particulièrement la question de la polysémie dans le domaine nominal. Dans une première étape, on explicite les enjeux théoriques d'une telle recherche. Cela conduit à accorder une place centrale au concept d'invariant sémantique pour rendre compte de l'identité sémantique d'un nom (en langue) par-delà sa variation de sens (en contexte). Dans une deuxième étape, on circonscrit l'objet empirique – les noms de parties du corps humain en français contemporain – tout en justifiant ce terrain d'étude. Puis, on précise l'approche méthodologique. La suite de la thèse est consacrée à l'investigation empirique proprement dite. Il s'agit d'abord d'offrir une description générale du potentiel de variation sémantique des noms de parties du corps humain en français. Ensuite, c'est une analyse sémantique approfondie du nom cœur qui est proposée. D'une part, on formule un invariant sémantique susceptible de subsumer tous ses emplois (au cœur du sujet, Paul a mal au cœur, avoir à cœur de réussir...). D'autre part, on montre en quoi la diversité de ses emplois présente un caractère finalement régulier. Enfin, quatre autres noms (artère, épaule, bouche et pied) font également l'objet d'une analyse spécifique. Chacune de ces quatre études est l'occasion d'éprouver la pertinence du concept d'invariant sémantique pour rendre compte de la polysémie dans le domaine nominal. / This study comes within the scope of lexical semantics. More specifically, it deals with the topic of polysemy in the nominals. As a first step, theoretical issues of such a research are clarified. It leads to focus on the concept of semantic invariant to give an insight into the semantic identity of a given noun regardless of its contextual variations. As a second step, the empirical object of this research – human body part nouns in contemporary French – is delimited. This gives an opportunity to justify the choice of these nouns as a field of research and to set out the methodological approach. The rest of the dissertation consists in the empirical investigation itself. First of all, an overall description of the semantic variation of the French human body part nouns is provided. Then, a semantic analysis of the noun cœur (“heart”) is developed: on one hand, a semantic invariant – suiting all cœur's contextual variations (au cœur du sujet, Paul a mal au cœur, avoir à cœur de réussir…) – is formulated; on the other hand, it is shown how this variation is deeply regular. Eventually, four more nouns (artère “artery”, épaule “shoulder”, bouche “mouth” and pied “foot”) are studied from a semantic point of view. Each of these four studies offers a new opportunity to test the relevance of the semantic invariant concept in order to give an account of the polysemy in the nominals.
86

[en] ESPÍRITO DE CORPUS: CREATION OF A MARINE CORPS BILINGUAL LEXICON / [pt] ESPÍRITO DE CORPUS: CRIAÇÃO DE UM LÉXICO BILÍNGUE DO CORPO DE FUZILEIROS NAVAIS

MARIANA LEMOS MULLER 07 June 2022 (has links)
[pt] Este estudo apresenta uma pesquisa temática envolvendo Terminologia, Estudos de Tradução Baseados em Corpus, Terminologia Computacional e Semântica Lexical, e tem como objeto de estudo a área do Corpo de Fuzileiros Navais. O objetivo desta pesquisa foi de criar um material terminológico por meio de uma metodologia híbrida de extração de termos desenvolvida a partir de testes com ferramentas de Extração Automática de Termos (EAT). Assim, buscou-se solucionar tanto problemas tradutórios relacionados à subárea de estudo quanto à detecção e validação de candidatos a termos em um corpus. Primeiramente, foi realizado um estudo piloto com o objetivo de avaliar as ferramentas TermoStat Web 3.0 e AntConc 3.5.7. Após os testes por meio da análise de um corpus paralelo bilíngue, foram selecionadas as melhores condições identificadas para se obter uma metodologia eficaz de extração automática de termos aliada à análise humana. Em seguida, essa metodologia foi utilizada para a análise de um corpus bilíngue comparável. Os candidatos a termos extraídos foram então validados pelos critérios de Semântica Lexical propostos por L Homme (2020) e, em seguida, foram detectados seus equivalentes terminológicos. Este estudo resultou na criação do léxico bilíngue Espírito de Corpus. / [en] This study presents a thematic research in the Marine Corps area involving Terminology, Corpus-Based Translation Studies, Computational Terminology and Lexical Semantics. The objective of this research was to create a terminological material through a hybrid methodology of term extraction developed from tests with Automatic Term Extraction (ATE) tools. Thus, we sought to solve both translation problems related to the subarea of study and to the detection and validation of term candidates in a corpus. First, a pilot study was conducted aiming to analyze two tools – TermoStat Web 3.0 and AntConc 3.5.7. After the conduction of the tests through the analysis of a bilingual parallel corpus, the best conditions identified were selected to obtain an effective methodology of automatic extraction of terms allied to human analysis. Then, this methodology was used for the analysis of a comparable bilingual corpus. The term candidates automatically extracted were then validated by the Lexical Semantics criteria proposed by L Homme (2020) and their translation equivalents were detected. This study resulted in the creation of the bilingual lexicon Espírito de Corpus.
87

Från Holocaust till Förintelsen : Etableringsprocessen av begreppet Förintelsen i svenskt språkbruk / The transfer and the establishment from the concept of Holocaust to the Swedish word förintelsen with its new meaning Förintelsen : About the establishment of the word and the concept of Förintelsen in Swedish language

Andrée, Johan January 2022 (has links)
Jag har i min masteruppsats, utifrån ett begreppshistoriskt perspektiv med transnationell tillämpning, analyserat begreppsöverföringen och etableringen från begreppet Holocaust till det svenska ordet förintelsen med dess nya betydelse Förintelsen. Jag har även analyserat när, hur och vilka som befäste begreppet Förintelsen i svenskt språkbruk. Ordet och begreppet Holocaust Min analys visar att begreppet Holocaust överförs till Sverige och etableras i Sverige genom det svenska ordet och begreppet Förintelsen. Ordet och begreppet Förintelsen Mitt arbete presenterar ett resultat som kullkastar tidigare uppfattningar om etableringen av ordet och begreppet Förintelsen i svenskt språkbruk. Etableringen av ordet och begreppet Förintelsen genomfördes på följande sätt. Bonniers Förlag lanserade år 1978 en översättning av romanen Holocaust, i Sverige marknadsförd med titeln Förintelsen. År 1979 premiärvisade SVT den amerikanska tv-serien Holocaust, i Sverige marknadsförd med titeln Förintelsen. Tidningarna fylldes från år 1979 av artiklar om det nazistiska folkmordet på judar under andra världskriget med rubriken Förintelsen samtidigt som Skandinaviska Institutet för judisk utbildning och kultur producerade läromedel för Sveriges samtliga skolor och bibliotek under begreppet Förintelsen. År 1982 repriserades tv-serien Förintelsen, omgärdad av faktaprogram, allt under det samlande namnet Förintelsen. Ordet och begreppet Förintelsen hade slutligen, efter fyra års intensiv användning, slagit igenom i svenskt språkbruk. Mitt arbete visar att det fanns en samstämmighet, en gemensam vilja hos Bonniers Förlag, SVT, massmedia och judiska organisationer att benämna det nazistiska folkmordet på judar under andra världskriget under den gemensamma benämningen Förintelsen. Alla samverkade för att befästa ordet och begreppet Förintelsen. Jag har i mitt arbete visat att det inte enbart var titeln på en tv-serie som befäste begreppet Förintelsen i svenskt språkbruk. Det krävdes en omfattande samverkan för att etablera ordet och begreppet Förintelsen i Sverige såsom ett samlande begrepp för det nazistiska folkmordet på judar under andra världskriget. / In my master's thesis, from a conceptual historical perspective with transnational application, I have analyzed the concept transfer and the establishment from the concept of Holocaust to the Swedish word förintelsen with its new meaning Förintelsen. I have also analyzed when, how and who consolidated the concept of Förintelsen in Swedish language use. The word and the concept of the Holocaust My analysis shows that the concept of the Holocaust is transferred to Sweden and established in Sweden through the Swedish word and the concept of Förintelsen. The word and the concept of Förintelsen My work presents a result that overturns previous perceptions about the establishment of the word and the concept of Förintelsen in Swedish language. The establishment of the word and the concept of the Holocaust was carried out in the following way. Bonniers Förlag launched in 1978 a translation of the novel Holocaust, in Sweden marketed with the title Förintelsen.In 1979, SVT premiered the American television series Holocaust, in Sweden marketed with the title Förintelsen. From 1979, the newspapers were filled with articles about the Nazi genocide of Jews during World War II, entitled Förintelsen, while the Scandinavian Institute for Jewish Education and Culture produced teaching materials for all of Sweden's schools and libraries under the term Förintelsen. In 1982, the television series Förintelsen was repeated, surrounded by factual programs, all under the collective name Förintelsen. The word and the concept of Förintelsen had finally, after four years of intensive use, made its way into Swedish language. My work shows that there was a consensus, a common will at Bonniers Förlag, SVT, the media and Jewish organizations to name the Nazi genocide of Jews during World War II under the common name Förintelsen. Everyone worked together to consolidate the word and the concept of Förintelsen. In my work, I have shown that it was not only the title of a TV series that consolidated the concept of Förintelsen in Swedish language. Extensive collaboration was required to establish the word and concept of Förintelsen in Sweden as a unifying concept for the Nazi genocide of Jews during World War II.

Page generated in 0.1774 seconds