Global ETD Search

421	Development of a Graphics Ontology for Natural Language Interfaces Niknam, Mehdi 13 October 2010 (has links) The overall context of this thesis research is to explore natural language as a medium to interact with computer software in the graphics domain, e.g. programs like MS Paint or OpenGL. A core element of most natural language understanding systems is an ontology, which represents concepts and items of the underlying domain of discourse. This thesis presents an ontology for the graphics domain based on several resources, including documentation and textbooks on graphics systems, existing ontologies, and - most importantly - a collection of natural language instructions to create and modify graphic images. The ontology was developed in several phases, and finally tested as part of a complex natural language interface. This natural language interface accepts verbal instructions in the graphics domain as input and creates matching graphic images as output. The results of our tests indicate an accuracy of the system in the area of 80%. Natural language Processing Natural Language Interface Graphics Domain Ontology Graphics Ontology Ontology Development
422	A functional theory of creative reading : process, knowledge, and evaluation Moorman, Kenneth Matthew 08 1900 (has links) No description available. Reading, Psychology of
423	Efficient prediction of relational structure and its application to natural language processing Riedel, Sebastian January 2009 (has links) Many tasks in Natural Language Processing (NLP) require us to predict a relational structure over entities. For example, in Semantic Role Labelling we try to predict the ’semantic role’ relation between a predicate verb and its argument constituents. Often NLP tasks not only involve related entities but also relations that are stochastically correlated. For instance, in Semantic Role Labelling the roles of different constituents are correlated: we cannot assign the agent role to one constituent if we have already assigned this role to another. Statistical Relational Learning (also known as First Order Probabilistic Logic) allows us to capture the aforementioned nature of NLP tasks because it is based on the notions of entities, relations and stochastic correlations between relationships. It is therefore often straightforward to formulate an NLP task using a First Order probabilistic language such as Markov Logic. However, the generality of this approach comes at a price: the process of finding the relational structure with highest probability, also known as maximum a posteriori (MAP) inference, is often inefficient, if not intractable. In this work we seek to improve the efficiency of MAP inference for Statistical Relational Learning. We propose a meta-algorithm, namely Cutting Plane Inference (CPI), that iteratively solves small subproblems of the original problem using any existing MAP technique and inspects parts of the problem that are not yet included in the current subproblem but could potentially lead to an improved solution. Our hypothesis is that this algorithm can dramatically improve the efficiency of existing methods while remaining at least as accurate. We frame the algorithm in Markov Logic, a language that combines First Order Logic and Markov Networks. Our hypothesis is evaluated using two tasks: Semantic Role Labelling and Entity Resolution. It is shown that the proposed algorithm improves the efficiency of two existing methods by two orders of magnitude and leads an approximate method to more probable solutions. We also give show that CPI, at convergence, is guaranteed to be at least as accurate as the method used within its inner loop. Another core contribution of this work is a theoretic and empirical analysis of the boundary conditions of Cutting Plane Inference. We describe cases when Cutting Plane Inference will definitely be difficult (because it instantiates large networks or needs many iterations) and when it will be easy (because it instantiates small networks and needs only few iterations). 005.1
424	Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz Schlünz, Georg Isaac January 2014 (has links) Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Natural language processing Text-to-speech synthesis Prosody Discourse Information structure Affect OCC model E-motif
425	Modélisation du langage à l'aide de pénalités structurées Nelakanti, Anil Kumar 11 February 2014 (has links) (PDF) Modeling natural language is among fundamental challenges of artificial intelligence and the design of interactive machines, with applications spanning across various domains, such as dialogue systems, text generation and machine translation. We propose a discriminatively trained log-linear model to learn the distribution of words following a given context. Due to data sparsity, it is necessary to appropriately regularize the model using a penalty term. We design a penalty term that properly encodes the structure of the feature space to avoid overfitting and improve generalization while appropriately capturing long range dependencies. Some nice properties of specific structured penalties can be used to reduce the number of parameters required to encode the model. The outcome is an efficient model that suitably captures long dependencies in language without a significant increase in time or space requirements. In a log-linear model, both training and testing become increasingly expensive with growing number of classes. The number of classes in a language model is the size of the vocabulary which is typically very large. A common trick is to cluster classes and apply the model in two-steps; the first step picks the most probable cluster and the second picks the most probable word from the chosen cluster. This idea can be generalized to a hierarchy of larger depth with multiple levels of clustering. However, the performance of the resulting hierarchical classifier depends on the suitability of the clustering to the problem. We study different strategies to build the hierarchy of categories from their observations. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Convex optimization Natural language processing
426	TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes Finch, Dezon K. 01 January 2012 (has links) Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines' performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable. Information Extraction Machine Learning Natural Language Processing Semi-structured data Computer Sciences Library and Information Science
427	DASE: Document-Assisted Symbolic Execution for Improving Automated Test Generation Zhang, Lei 17 June 2015 (has links) Software testing is crucial for uncovering software defects and ensuring software reliability. Symbolic execution has been utilized for automatic test generation to improve testing effectiveness. However, existing test generation techniques based on symbolic execution fail to take full advantage of programs’ rich amount of documentation specifying their input constraints, which can further enhance the effectiveness of test generation. In this paper we present a general approach, Document-Assisted Symbolic Execution (DASE), to improve automated test generation and bug detection. DASE leverages natural language processing techniques and heuristics to analyze programs’ readily available documentation and extract input constraints. The input constraints are then used as pruning criteria; inputs far from being valid are trimmed off. In this way, DASE guides symbolic execution to focus on those inputs that are semantically more important. We evaluated DASE on 88 programs from 5 mature real-world software suites: GNU Coreutils, GNU findutils, GNU grep, GNU Binutils, and elftoolchain. Compared to symbolic execution without input constraints, DASE increases line coverage, branch coverage, and call coverage by 5.27–22.10%, 5.83–21.25% and 2.81–21.43% respectively. In addition, DASE detected 13 previously unknown bugs, 6 of which have already been confirmed by the developers. Automated Test Generation Bug Detection Symbolic Execution Path Pruning Documentation Analysis Natural Language Processing
428	Outomatiese Afrikaanse woordsoortetikettering / deur Suléne Pilon Pilon, Suléne January 2005 (has links) Any community that wants to be part of technological progress has to ensure that the language(s) of that community has/have the necessary human language technology resources. Part of these resources are so-called "core technologies", including part-of-speech taggers. The first part-of-speech tagger for Afrikaans is developed in this research project. It is indicated that three resources (a tag set, a twig algorithm and annotated training data) are necessary for the development of such a part-of-speech tagger. Since none of these resources exist for Afrikaans, three objectives are formulated for this project, i.e. (a) to develop a linpsticdy accurate tag set for Afrikaans; (b) to deter- mine which algorithm is the most effective one to use; and (c) to find an effective method for generating annotated Afrikaans training data. To reach the first objective, a unique and language-specific tag set was developed for Afrikaans. The resulting tag set is relatively big and consists of 139 tags. The level of specificity of the tag set can easily be adjusted to make the tag set smaller and less specific. After the development of the tag set, research is done on different approaches to, and techniques that can be used in, the development of a part-of-speech tagger. The available algorithms are evaluated by means of prerequisites that were set and in doing so, the most effective algorithm for the purposes of this project, TnT, is identified. Bootstrapping is then used to generate training data with the help of the TnT algorithm. This process results in 20,000 correctly annotated words, and thus annotated training data, the hard resource which is necessary for the development of a part-of-speech tagger, is developed. The tagger that is trained with 20,000 words reaches an accuracy of 85.87% when evaluated. The tag set is then simplified to thirteen tags in order to determine the effect that the size of the tag set has on the accuracy of the tagger. The tagger is 93.69% accurate when using the diminished tag set. The main conclusion of this study is that training data of 20,000 words is not enough for the Afrikaans TnT tagger to compete with other state-of-the-art taggers. The tagger and the data that is developed in this project can be used to generate even more training data in order to develop an optimally accurate Afrikaans TnT tagger. Different techniques might also lead to better results; therefore other algorithms should be tested. / Thesis (M.A.)--North-West University, Potchefstroom Campus, 2005. Afrikaans Part-of-speech tagging Syntax Morphology Natural language processing Computasional linguistics
429	Development of a Graphics Ontology for Natural Language Interfaces Niknam, Mehdi 13 October 2010 (has links) The overall context of this thesis research is to explore natural language as a medium to interact with computer software in the graphics domain, e.g. programs like MS Paint or OpenGL. A core element of most natural language understanding systems is an ontology, which represents concepts and items of the underlying domain of discourse. This thesis presents an ontology for the graphics domain based on several resources, including documentation and textbooks on graphics systems, existing ontologies, and - most importantly - a collection of natural language instructions to create and modify graphic images. The ontology was developed in several phases, and finally tested as part of a complex natural language interface. This natural language interface accepts verbal instructions in the graphics domain as input and creates matching graphic images as output. The results of our tests indicate an accuracy of the system in the area of 80%. Natural language Processing Natural Language Interface Graphics Domain Ontology Graphics Ontology Ontology Development
430	Determining non-urgent emergency room use factors from primary care data and natural language processing: a proof of concept St-Maurice, Justin 28 March 2012 (has links) The objective of this study was to discover biopsychosocial concepts from primary care that were statistically related to inappropriate emergency room use by using natural language processing tools. De-identified free text was extracted from a clinic in Guelph, Ontario and analyzed with MetaMap and GATE. Over 10 million concepts were extracted from 13,836 patient records. There were 77 codes that fell within the realm of biopsychosocial, were very statistically significant (p < 0.001) and had an OR > 2.0. Thematically, these codes involved mental health and pain related biopsychosocial concepts. Similar to other literature, pain and mental health problems are seen to be important factors of inappropriate emergency room use. Despite sources error in the NLP procedure, the study demonstrates the feasibly of combining natural language processing and primary care data to analyze the issue of inappropriate emergency room use. This technique could be used to analyze other, more complex problems. / Graduate natural language processing nlp primary care primary care data emergency room use inappropriate use proof of concept

Search results