Global ETD Search

11	From E-Language to I-Language : foundations of a pre-processor for the Construction Integration Model Powell, Christopher Mark January 2005 (has links) No description available. 006.35
12	Example-based methods for natural language processing with applications to machine translation and preposition correction Smith, James Sullivan January 2012 (has links) We investigate the use of example-based methods for Natural Language Processing tasks. Specifi- cally, we look at machine translation and preposition prediction. We propose a new framework for the hybridisation of Example-Based and Statistical Machine Translation (EBMT and SMT) systems. We add powerful new functionality to the Moses SMT system to allow it to work effectively with our EBMT system. Within this framework, we investigate the use of two types of EBMT system. We first create an EBMT system which uses string-based matching and evaluate it within the hybrid framework. We investigate several variations, but find that the hybrid system is unable to match the performance of the pure SMT system. We next created a syntax-based EBMT system which uses dependency trees to compare inputs to the example base, and show that this system is consistently better than the string-based approach. We find that while the SMT system still performs better overall, the syntax-based hybrid does perform particularly well for some examples. We then look at the application of example-based methods to preposition prediction for non- native English writers. We create two systems, one of which is syntax-based, and the other string- based. The syntax-based system again uses dependency information to make predictions, and we show that it performs with a very high precision but a low recall. The string-based system uses n-gram counts to make preposition predictions. We show that this approach is simple and fast, and performs as well as or better than other leading systems in the field. We conclude that example-based techniques continue to yield impressive results for NLP tasks, and expect the field to benefit further as computing and data resources develop. 006.35
13	Corpus and sentiment analysis Cheng, Tai Wai David January 2007 (has links) Information extraction/retrieval has been of interest to researchers since the early 1960's. A series of conferences and competitions have been held by DARPA/NIST since the late 1980's has resulted in the analysis of news reports and government reports in English and other languages, notably Chinese and Arabic. A number of methods have been developed for analysing `free' natural language texts. Furthermore, a number of systems for understanding messages have been developed, focusing on the area of named entity extraction, templates for dealing with certain kinds of news. The templates were handcrafted, and a lot of ad-hoc knowledge went into the creation of such systems. Seven of these systems have been reviewed. Despite the fact that IE systems built for different tasks often differ from each other, the core elements are shared by nearly every extraction system. Some of these core elements such as parser and part of speech (POS) tagger, are tuned for optimal performance for a specific domain, or text with pre-defined structures. The extensive use of gazetteers and manually crafted grammar rules further limits the portability of the existing IE systems to work language and domain independently. The goal of this thesis is to develop an algorithm that can be used to extract information from free texts, in our case, from financial news text; and from arbitrary domains unambiguously. We believe the use of corpus linguistics and statistical techniques would be more appropriate and efficient for this task, instead of using other approaches that rely on machine learning, POS taggers, parsers, and so on, which are tuned to work for a predefined domain. Based on this belief, a framework using corpus linguistics and statistical techniques, to extract information as unambiguously as possible from arbitrary domains was developed. A contrastive evaluation has been carried out not only in the domain of financial texts and movie reviews, but also with multi-lingual texts (Chinese and English). The results are encouraging. Our preliminarily evaluation, based on the correlation between a time series of positive (negative) sentiment word (phrase) counts with a time series of indices produced by stock exchanges (Financial Times Stock Exchange, Dow Jones Industrial Average, Nasdaq, S&P 500, Hang Seng Index, Shanghai Index, and Shenzhen Index) showed that when the positive (negative) sentiment series correlates with the stock exchange index, the negative (positive) shows a smaller degree of correlation and in many cases a degree of anti-correlation. Any interpretation of our result requires a careful econometrically well grounded analysis of the financial time series - this is beyond the scope of this work. 006.35
14	Word order and case in models of simulated language evolution Moy, Joanna January 2005 (has links) No description available. 006.35
15	Measuring Topic Homogeneity and its Application to Dictionary-based Word Sense Disambiguation Gledson, Ann Lesley January 2008 (has links) The use of topical features is abundant in Natural Language Processing (NLP), a major example being in dictionary-based Word Sense Disambiguation (WSD). Topic features rely on the context of a target word, but although the role of context has been discussed as an 'open problem' in the WSD literature, the nature of context, and how it might vary between different styles of documents has been largely ignored. 006.35
16	Authoring access control policies with controlled natural language Shi, Leilei January 2011 (has links) This thesis is based on the research carried out under the EPSRC-funded EEAP project and the EC-funded TAS3 project. The research aimed to develop a technique enabling users to write access control policies in natural language. One of the main intentions of the research was to help non- technical users overcome the difficulty of understanding the security policy authoring within computer languages. Policies are relatively easy for humans to specify in natural language, but are much more difficult for them to specify in computer based languages e.g. XML. Consequently humans usually need some sort of Human Computer Interface (HCI) in order to ease the task of policy specification. The usual solution to this problem is to devise a Graphical User Interface (GUI) that is relatively easy for humans to use, and that is capable of converting the chosen icons, menu items and entered text strings into the computer based policy language. However, users still have to learn how to use the GUI, and this can be difficult for them, especially for novice users. This thesis describes the research that was performed in order to allow human I users to specify access control policies using a subset of English called Controlled Natural Language (CNL). The CNL was designed for the task of authoring access control policies based on the Role Based Access Control (RBAC) model, with enhancements for a distributed environment. An ontology was made as a common representation of policies from different languages. As the result of the research, the author has designed and implemented an interface enabling users to author access control policies in the CNL. The policy in CNL can be converted to a policy in one of several machine language formats, so that it can be automatically enforced by a Policy Enforcement Point (PEP) and Policy Decision Point (PDP). The design is modular and a set of APIs have been specified, so that new modules can be added or existing modules can be extended in functionality or replaced. 006.35
17	Computational modelling of word sense sentiment Su, Fangzhong January 2010 (has links) In recent years, sentiment analysis which employs computational models to tackle opin- ion, attitude, emotion or judgement in text has become an important discipline in the area of natural language processing. It also has great potential in real world applications, ranging from personal decisions such as recognizing customer opinions in movie or hotel service reviews, to industry or government concerns such as analyzing user feedback on new products or tracking the public's reaction to national or international events. The main objective of this thesis is to investigate the interaction between word sense ambiguity and sentiment analysis. Towards this goal, it validates three key hypotheses: (1) sentiment can be assigned to word senses by humans, and sentiment assignment at the word sense level is also more reliable than at the word level; (2) word sense sentiment can be assigned by automatic algorithms with high accuracy and limited training data; and (3) word sense sentiment can improve word translation. This work begins with an investigation of the reliability of manual subjectivity and polarity label assignment on word senses. High agreement is gained from the human annotation study, indicating that subjectivity and polarity labelling on word senses is a well-defined task and should be suitable for automatic learning as well. Then various ma- chine learning approaches including heuristic unsupervised learning, supervised learning, and graph-based semi-supervised learning are proposed to automatically determine word sense sentiment. The experimental results show the effectiveness of all the three learn- ing models. In particular, for word sense subjectivity classification, the proposed semi- supervised graph-cut approach significantly outperforms the unsupervised heuristic-based approach, the supervised approach, as well as all prior competing approaches proposed by other researchers. We then automatically generate a complete subjectivity lexicon of more than 110,000 word senses by the semi-supervised graph-cut approach. Lastly, the poten- tial application of word sense sentiment information in cross-lingual lexical substitution is also explored. We posit a new assumption that good word substitutions will transfer a word's contextual sentiment from the source language into the target language. In prac- tice, to test this assumption, the word sense subjectivity information is incorporated as an additional feature in a system for English-Chinese lexical substitution. The usefulness of word sense sentiment information is then confirmed by experiments, as the incorpora- tion of subjectivity information yields significant improvement over a sentiment-unaware system. 006.35
18	Information extraction across sentences Swampillai, Kumutha January 2011 (has links) Most relation extraction systems identify relations by searching within- sentences (within-sentence relations). Such an approach excludes finding any relations that cross sentence boundaries (cross-sentence relations). This thesis quantifies the cross-sentence relations in two major information ex- traction corpora: ACE03 (9.4%) and MUC6 (27.4%), revealing the extent of this limitation. In response. a composite kernel approach to cross-sentence relation extraction is proposed which models relations using parse tree and fiat surface features. Support vector machine classifiers are trained using cross-sentential relations from the !vIUC6 corpus to determine the effective- ness of this approach. It was shown .that composite kernels are able to extract cross-sentential relations with f-measure scores of 0.512, 0.116 and 0.633 for PerOrg. PerPost and PostOrg models. respectively. Moreover. combining within-sentence and cross-sentence extraction models increases the number of relations correctly identified by 24% over within-sentence relation extraction alone. 006.35
19	BDI agents and the semantic Web : developing user-facing autonomous applications Dickinson, Ian John January 2006 (has links) No description available. 006.35
20	Natural Language Generation (NLG) of discourse relations for different reading levels Williams, Sandra January 2004 (has links) This thesis describes original research in the field of Natural Language Generation (NLG). NLG is the subfield of artificial intelligence that is concerned with the automatic production of documents from underlying data. This thesis claims that an NLG system can generate more readable output texts by making appropriate choices at the discourse level. For instance, use shorter sentences and more common discourse cue phrases. The choices we investigated were selection and placement of cue phrases, ordering and punctuation. We investigated the effects of the choices on good readers and poor readers. The NLG system built for this research was called GIRL (Generator for Individual Reading Levels). GIRL is part of a literacy assessment application. It generates feedback reports about reading skills for adults with poor literacy. This research focussed on the microplanner. Microplanning transforms discourse representations from hierarchical tree structures into ordered lists of individual sentence structures. The key innovations in microplanning were new ways to represent discourse-level knowledge and new algorithms for making discourse-level decisions. Knowledge about how humans realise discourse relations was acquired from a corpus annotated with discourse relations. This was represented in GIRL's microplanner as constraint satisfaction problem (CSP) graphs. A CSP problem solver was incorporated into the microplanner to generate all "legal" ways for realising each input discourse relation. Knowledge about which discourse-level choices affect readability was obtained from pilot experiments and from psycholinguistics. It was represented as sets of rules for scoring solutions output from the CSP solver. GIRL's output was evaluated with thirty-eight users including both good readers and poor readers. We developed a methodology for an evaluation experiment that involved measuring reading speed and comprehension and eliciting judgements. The results, although not statistically significant, indicated that the algorithms produced more readable output and that the effect was greater on poor readers. 006.35

Search results