• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 527
  • 43
  • 39
  • 18
  • 13
  • 11
  • 8
  • 4
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 778
  • 778
  • 540
  • 317
  • 302
  • 296
  • 296
  • 238
  • 200
  • 190
  • 126
  • 119
  • 115
  • 98
  • 84
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Word sense selection in texts an integrated model /

Kwong, Oi Yee. January 1900 (has links)
Thesis (Ph. D.)--University of Cambridge, 2000. / Cover title. "September 2000." Includes bibliographical references.
52

Broad-coverage hierarchical word sense disambiguation /

Ciaramita, Massimiliano. January 2005 (has links)
Thesis (Ph.D.)--Brown University, 2005. / Vita. Thesis advisor: Mark Johnson. Includes bibliographical references (leaves 127-138). Also available online.
53

Grammars for generating isiXhosa and isiZulu weather bulletin verbs

Mahlaza, Zola January 2018 (has links)
The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects.
54

Spatially motivated dialogue for a pedestrian robot

Frost, Jamie January 2012 (has links)
In the field of robotics, there has recently been tremendous progress in the development of autonomous robots that offer various services to their users. Most of the systems developed so far, however, are restricted to indoor scenarios, non-urban outdoor environments, or road usage with cars. There is a serious lack of capabilities of mobile robots to navigate safely in highly populated outdoor environments. This ability, however, is a key competence for a series of robotic applications. We consider the task of developing a spatially motivated dialogue system that can operate on a robotic platform, where the purpose of such a robot is to aid pedestrians in urban environments to provide information about surrounding objects and services, and guide users to desired destinations. In this thesis, we make a number of contributions to the fields of spatial language interpretation/generation and discourse modelling. This includes the development of a dialogue framework called HURDLE which builds on the strengths of existing systems, accompanied by a specific implementation for spatially oriented dialogue including disambiguating amongst objects and locations in the environment, and a natural language parser which combines an extension of Synchronous Context Free Grammars with a Part-of-Speech tagger. Our research also presents a number of probabilistic models for spatial prepositions such as `in front of' and `between' that make significant advances in effectively utilising geometric environment data, encompassing visibility considerations and being reusable for both indoor and outdoor environments. We also present a number of algorithms in which these models can be utilised, most significantly a novel and highly effective algorithm that can generate natural language descriptions of objects that disambiguates on their location. All these components, while modular, operate in tandem and interact with a variety of external components (such as path planning) on the robot platform.
55

Compositional entity-level sentiment analysis

Moilanen, Karo January 2010 (has links)
This thesis presents a computational text analysis tool called AFFECTiS (Affect Interpretation/Inference System) which focuses on the task of interpreting natural language text based on its subjective, non-factual, affective properties that go beyond the 'traditional' factual, objective dimensions of meaning that have so far been the main focus of Natural Language Processing and Computational Linguistics. The thesis presents a fully compositional uniform wide-coverage computational model of sentiment in text that builds on a number of fundamental compositional sentiment phenomena and processes discovered by detailed linguistic analysis of the behaviour of sentiment across key syntactic constructions in English. Driven by the Principle of Semantic Compositionality, the proposed model breaks sentiment interpretation down into strictly binary combinatory steps each of which explains the polarity of a given sentiment expression as a function of the properties of the sentiment carriers contained in it and the grammatical and semantic context(s) involved. An initial implementation of the proposed compositional sentiment model is de- scribed which attempts direct logical sentiment reasoning rather than basing compu- tational sentiment judgements on indirect data-driven evidence. Together with deep grammatical analysis and large hand-written sentiment lexica, the model is applied recursively to assign sentiment to all (sub )sentential structural constituents and to concurrently equip all individual entity mentions with gradient sentiment scores. The system was evaluated on an extensive multi-level and multi-task evaluation framework encompassing over 119,000 test cases from which detailed empirical ex- perimental evidence is drawn. The results across entity-, phrase-, sentence-, word-, and document-level data sets demonstrate that AFFECTiS is capable of human-like sentiment reasoning and can interpret sentiment in a way that is not only coherent syntactically but also defensible logically - even in the presence of the many am- biguous extralinguistic, paralogical, and mixed sentiment anomalies that so tellingly characterise the challenges involved in non-factual classification.
56

An inheritance-based theory of the lexicon in combinatory categorial grammar

McConville, Mark January 2008 (has links)
This thesis proposes an extended version of the Combinatory Categorial Grammar (CCG) formalism, with the following features: 1. grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based constraint language 2. CCG lexicons are, or at least can be, functions from forms to these lexical types This formalism, which I refer to as ‘inheritance-driven’ CCG (I-CCG), is conceptualised as a partially model-theoretic system, involving a distinction between category descriptions and their underlying category models, with these two notions being related by logical satisfaction. I argue that the I-CCG formalism retains all the advantages of both the core CCG framework and proposed generalisations involving such things as multiset categories, unary modalities or typed feature structures. In addition, I-CCG: 1. provides non-redundant lexicons for human languages 2. captures a range of well-known implicational word order universals in terms of an acquisition-based preference for shorter grammars This thesis proceeds as follows: Chapter 2 introduces the ‘baseline’ CCG formalism, which incorporates just the essential elements of category notation, without any of the proposed extensions. Chapter 3 reviews parts of the CCG literature dealing with linguistic competence in its most general sense, showing how the formalism predicts a number of language universals in terms of either its restricted generative capacity or the prioritisation of simpler lexicons. Chapter 4 analyses the first motivation for generalising the baseline category notation, demonstrating how certain fairly simple implicational word order universals are not formally predicted by baseline CCG, although they intuitively do involve considerations of grammatical economy. Chapter 5 examines the second motivation underlying many of the customised CCG category notations — to reduce lexical redundancy, thus allowing for the construction of lexicons which assign (each sense of) open class words and morphemes to no more than one lexical category, itself denoted by a non-composite lexical type. Chapter 6 defines the I-CCG formalism, incorporating into the notion of a CCG grammar both a type hierarchy of saturated category symbols and an inheritance hierarchy of constrained lexical types. The constraint language is a simple, feature-based, highly underspecified notation, interpreted against an underlying notion of category models — this latter point is crucial, since it allows us to abstract away from any particular inference procedure and focus on the category notation itself. I argue that the partially model-theoretic I-CCG formalism solves the lexical redundancy problem fairly definitively, thereby subsuming all the other proposed variant category notations. Chapter 7 demonstrates that the I-CCG formalism also provides the beginnings of a theory of the CCG lexicon in a stronger sense — with just a small number of substantive assumptions about types, it can be shown to formally predict many implicational word order universals in terms of an acquisition-based preference for simpler lexical inheritance hierarchies, i.e. those with fewer types and fewer constraints. Chapter 8 concludes the thesis.
57

Fördomsfulla associationer i en svenskvektorbaserad semantisk modell / Bias in a Swedish Word Embedding

Jonasson, Michael January 2019 (has links)
Semantiska vektormodeller är en kraftfull teknik där ords mening kan representeras av vektorervilka består av siffror. Vektorerna tillåter geometriska operationer vilka fångar semantiskt viktigaförhållanden mellan orden de representerar. I denna studie implementeras och appliceras WEAT-metoden för att undersöka om statistiska förhållanden mellan ord som kan uppfattas somfördomsfulla existerar i en svensk semantisk vektormodell av en svensk nyhetstidning. Resultatetpekar på att ordförhållanden i vektormodellen har förmågan att återspegla flera av de sedantidigare IAT-dokumenterade fördomar som undersöktes. I studien implementeras och applicerasockså WEFAT-metoden för att undersöka vektormodellens förmåga att representera två faktiskastatistiska samband i verkligheten, vilket görs framgångsrikt i båda undersökningarna. Resultatenav studien som helhet ger stöd till metoderna som används och belyser samtidigt problematik medatt använda semantiska vektormodeller i språkteknologiska applikationer. / Word embeddings are a powerful technique where word meaning can be represented by vectors containing actual numbers. The vectors allow  geometric operations that capture semantically important relationships between the words. In this study WEAT is applied in order to examine whether statistical properties of words pertaining to bias can be found in a swedish word embedding trained on a corpus from a swedish newspaper. The results shows that the word embedding can represent several of the IAT documented biases that where tested. A second method, WEFAT, is applied to the word embedding in order to explore the embeddings ability to represent actual statistical properties, which is also done successfully. The results from this study lends support to the validity of both methods aswell as illuminating the issue of problematic relationships between words in word embeddings.
58

Sentiment Analysis of Equity Analyst Research Reports using Convolutional Neural Networks

Olof, Löfving January 2019 (has links)
Natural language processing, a subfield of artificial intelligence and computer science, has recently been of great research interest due to the vast amount of information created on the internet in the modern era. One of the main natural language processing areas concerns sentiment analysis. This is a field that studies the polarity of human natural language and generally tries to categorize it as either positive, negative or neutral. In this thesis, sentiment analysis has been applied to research reports written by equity analysts. The objective has been to investigate if there exist a distinct distribution of the reports and if one is able to classify sentiment in these reports. The thesis consist of two parts; firstly investigating possibilities on how to divide the reports into different sentiment labelling regimes and secondly categorizing the sentiment using machine learning techniques. Logistic regression as well as several convolutional neural network structures has been used to classify the sentiment. Working with textual data requires the mapping of text to real valued values called features. Several feature extraction methods has been investigated including Bag of Words, term frequency-inverse document frequency and Word2vec. Out of the tested labelling regimes, classifying the documents using upgrades and downgrades of report recommendation shows the most promising potential. For this regime, the convolutional neural network architectures outperform logistic regression by a significant margin. Out of the networks tested, a double input channel utilizing two different Word2vec representations performs the best. The two different representations originate from different sources; one from the set of equity research reports and the other trained by the Google Brain team on an extensive Google news data set. This suggests that using one representation that represent topic specific words and one that is better at representing more common words enhances classification performance.
59

Automatic Error Detection and Correction in Neural Machine Translation : A comparative study of Swedish to English and Greek to English

Papadopoulou, Anthi January 2019 (has links)
Automatic detection and automatic correction of machine translation output are important steps to ensure an optimal quality of the final output. In this work, we compared the output of neural machine translation of two different language pairs, Swedish to English and Greek to English. This comparison was made using common machine translation metrics (BLEU, METEOR, TER) and syntax-related ones (POSBLEU, WPF, WER on POS classes). It was found that neither common metrics nor purely syntax-related ones were able to capture the quality of the machine translation output accurately, but the decomposition of WER over POS classes was the most informative one. A sample of each language was taken, so as to aid in the comparison between manual and automatic error categorization of five error categories, namely reordering errors, inflectional errors, missing and extra words, and incorrect lexical choices. Both Spearman’s ρ and Pearson’s r showed that there is a good correlation with human judgment with values above 0.9. Finally, based on the results of this error categorization, automatic post editing rules were implemented and applied, and their performance was checked against the sample, and the rest of the data set, showing varying results. The impact on the sample was greater, showing improvement in all metrics, while the impact on the rest of the data set was negative. An investigation of that, alongside the fact that correction was not possible for Greek due to extremely free reference translations and lack of error patterns in spoken speech, reinforced the belief that automatic post-editing is tightly connected to consistency in the reference translation, while also proving that in machine translation output handling, potentially more than one reference translations would be needed to ensure better results.
60

Transcription of Historical Encrypted Manuscripts : Evaluation of an automatic interactive transcription tool.

Johansson, Kajsa January 2019 (has links)
Countless of historical sources are saved in national libraries and archives all over the world and contain important information about our history. Some of these sources are encrypted to prevent people from reading it. This thesis examines a semi-automated Interactive transcription Tool based on unsupervised learning without any labelled training data that has been developed for transcription of encrypted sources and compares it to manual transcription. The interactive transcription tool is based on handwritten text recognition techniques and the system identifies cluster of symbols based on similarity measures. The tool is evaluated on ciphers with number sequences that have previously been transcribed manually to compare how well the transcription tool performs. The weaknesses of the tool are described and suggestions on how the tool can be improved are proposed. Transcription based on HTR techniques and clustering shows promising results and the unsupervised method based on clustering should be further investigated on ciphers with various symbol sets.

Page generated in 0.1708 seconds