Global ETD Search

411	Outomatiese Afrikaanse woordsoortetikettering / deur Suléne Pilon Pilon, Suléne January 2005 (has links) Any community that wants to be part of technological progress has to ensure that the language(s) of that community has/have the necessary human language technology resources. Part of these resources are so-called "core technologies", including part-of-speech taggers. The first part-of-speech tagger for Afrikaans is developed in this research project. It is indicated that three resources (a tag set, a twig algorithm and annotated training data) are necessary for the development of such a part-of-speech tagger. Since none of these resources exist for Afrikaans, three objectives are formulated for this project, i.e. (a) to develop a linpsticdy accurate tag set for Afrikaans; (b) to deter- mine which algorithm is the most effective one to use; and (c) to find an effective method for generating annotated Afrikaans training data. To reach the first objective, a unique and language-specific tag set was developed for Afrikaans. The resulting tag set is relatively big and consists of 139 tags. The level of specificity of the tag set can easily be adjusted to make the tag set smaller and less specific. After the development of the tag set, research is done on different approaches to, and techniques that can be used in, the development of a part-of-speech tagger. The available algorithms are evaluated by means of prerequisites that were set and in doing so, the most effective algorithm for the purposes of this project, TnT, is identified. Bootstrapping is then used to generate training data with the help of the TnT algorithm. This process results in 20,000 correctly annotated words, and thus annotated training data, the hard resource which is necessary for the development of a part-of-speech tagger, is developed. The tagger that is trained with 20,000 words reaches an accuracy of 85.87% when evaluated. The tag set is then simplified to thirteen tags in order to determine the effect that the size of the tag set has on the accuracy of the tagger. The tagger is 93.69% accurate when using the diminished tag set. The main conclusion of this study is that training data of 20,000 words is not enough for the Afrikaans TnT tagger to compete with other state-of-the-art taggers. The tagger and the data that is developed in this project can be used to generate even more training data in order to develop an optimally accurate Afrikaans TnT tagger. Different techniques might also lead to better results; therefore other algorithms should be tested. / Thesis (M.A.)--North-West University, Potchefstroom Campus, 2005. Afrikaans Part-of-speech tagging Syntax Morphology Natural language processing Computasional linguistics
412	Natūralios kalbos apdorojimo terminų ontologija: kūrimo problemos ir jų sprendimo būdai / Ontology of natural language processing terms: development issues and their solutions Ramonas, Vilmantas 17 June 2010 (has links) Šiame darbe aptariamas natūralios kalbos apdorojimo terminų ontologijos kūrimas, kūrimo problemos ir jų sprendimo būdai. Tam, iš skirtingų šaltinių surinkta 217 NLP terminų. Terminai išversti į lietuvių kalbą. Trumpai aptartos problemos verčiant. Aprašytos tiek kompiuterinės, tiek filosofinės ontologijos, paminėti jų panašumai ir skirtumai. Išsamiau aptartas filosofinis požiūris į sąvokų ir daiktų panašumą, ką reikia žinoti, siekiant kiek galima geriau suprasti kompiuterinių ontologijų sudarymo principus. Išnagrinėtas pats NLP terminas, kas sudaro NLP, kokios natūralios kalbos apdorojimo technologijos jau sukurtos, kokios dar kuriamos. NLP terminų ontologijos sudarymui pasirinkus Teminių žemėlapių ontologijos struktūrą ir principus, plačiai aprašyti Teminių žemėlapių (TM) sudarymo principai, pagrindinės TM sudedamosios dalys: temos, temų vardai, asociacijos, vaidmenys asociacijose ir kiti. Vėliau, iš turimų terminų, paliekant tokią struktūrą, kokia rasta šaltinyje, nubraižytas medis. Prieita išvados, jog terminų skaičių reikia mažinti ir atsisakyti pirminės iš šaltinių atsineštos struktūros. Tad palikti tik 69 terminai, darant prielaidą, jog šie svarbiausi. Šiems terminams priskirta keliolika tipų, taip juos suskirstant į grupes. Ieškant dar geresnio skirstymo būdo, kiekvienam iš terminų priskirtas vienas ar keli jį geriausiai nusakantys meta aprašymai, pvz.: mašininis vertimas – vertimas, aukštas automatizavimo lygis. Visi meta aprašymai suskirstyti į 7 stambiausias grupes... [toliau žr. visą tekstą] / In this work it is discussed the development of ontology of natural language processing terms, developmental problems and their solutions. In order to reveal the topic of this work was gathered a collection of 217 NLP terms from different sources. The terms were translated into Lithuanian language. Briefly were revealed the problems of translation. There were described both the computer and philosophical ontology, mentioned their similarities and differences. There was discussed in detail the philosophical approach to the similarity of concepts and objects which is needed to know seeking to understand the ontology of computer principles as much as possible. There was examined the term of NLP, what is the NLP, which natural language processing technologies have already been developed, which are still being developed. For the composition of ontology of NLP terms were chosen the structure and principles of the Topic Maps in order to describe in broad the principles of composition of Topic Maps (TM), the main components of TM: theme, topic names, associations, role in association and others. Later from the got terms there was drawn the tree leaving the structure which was found in the source. It was found that the number of terms should be reduced and it is needed to refuse the primary structure taken from the sources. So, there were left only 69 terms, assuming that they are the most important. There were assigned several types for these terms dividing them into the groups... [to full text] Philology Ontologija Natūralios kalbos apdorojimas Teminiai žemėlapiai Ontology Natural language processing Topic maps
413	Data Mining in Social Media for Stock Market Prediction Xu, Feifei 09 August 2012 (has links) In this thesis, machine learning algorithms are used in NLP to get the public sentiment on individual stocks from social media in order to study its relationship with the stock price change. The NLP approach of sentiment detection is a two-stage process by implementing Neutral v.s. Polarized sentiment detection before Positive v.s. Negative sentiment detection, and SVMs are proved to be the best classifiers with the overall accuracy rates of 71.84% and 74.3%, respectively. It is discovered that users’ activity on StockTwits overnight significantly positively correlates to the stock trading volume the next business day. The collective sentiments for afterhours have powerful prediction on the change of stock price for the next day in 9 out of 15 stocks studied by using the Granger Causality test; and the overall accuracy rate of predicting the up and down movement of stocks by using the collective sentiments is 58.9%. data mining sentiment detection machine learning stock market social media natural language processing StockTwits
414	Answer extraction for simple and complex questions Joty, Shafiz Rayhan, University of Lethbridge. Faculty of Arts and Science January 2008 (has links) When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire document contents to find the precise piece of information he was looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain question answering. We have considered both simple and complex questions. Simple questions (i.e. factoid and list) are easier to answer than questions that have complex information needs and require inferencing and synthesizing information from multiple documents. Our question answering system for simple questions is based on question classification and document tagging. Question classification extracts useful information (i.e. answer type) about how to answer the question and document tagging extracts useful information from the documents, which is used in finding the answer to the question. For complex questions, we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. One hill climbing local search strategy is used to fine-tune the feature-weights. We also experimented with two unsupervised machine learning techniques: k-means and Expectation Maximization (EM) algorithms and evaluated their performance. For all these methods, we have shown the effects of different kinds of features. / xi, 214 leaves : ill. (some col.) ; 29 cm. -- Dissertations, Academic Electronic dissertations Keyword searching
415	The Effect of Natural Language Processing in Bioinspired Design Burns, Madison Suzann 1987- 14 March 2013 (has links) Bioinspired design methods are a new and evolving collection of techniques used to extract biological principles from nature to solve engineering problems. The application of bioinspired design methods is typically confined to existing problems encountered in new product design or redesign. A primary goal of this research is to utilize existing bioinspired design methods to solve a complex engineering problem to examine the versatility of the method in solving new problems. Here, current bioinspired design methods are applied to seek a biologically inspired solution to geoengineering. Bioinspired solutions developed in the case study include droplet density shields, phosphorescent mineral injection, and reflective orbiting satellites. The success of the methods in the case study indicates that bioinspired design methods have the potential to solve new problems and provide a platform of innovation for old problems. A secondary goal of this research is to help engineers use bioinspired design methods more efficiently by reducing post-processing time and eliminating the need for extensive knowledge of biological terminology by applying natural language processing techniques. Using the complex problem of geoengineering, a hypothesis is developed that asserts the usefulness of nouns in creating higher quality solutions. A designation is made between the types of nouns in a sentence, primary and spatial, and the hypothesis is refined to state that primary nouns are the most influential part of speech in providing biological inspiration for high quality ideas. Through three design experiments, the author determines that engineers are more likely to develop a higher quality solution using the primary noun in a given passage of biological text. The identification of primary nouns through part of speech tagging will provide engineers an analogous biological system without extensive analysis of the results. The use of noun identification to improve the efficiency of bioinspired design method applications is a new concept and is the primary contribution of this research. Part of Speech Primary Noun Biomimetic Natural Language Processing NLP Bioinspired Design
416	Development of a Graphics Ontology for Natural Language Interfaces Niknam, Mehdi 13 October 2010 (has links) The overall context of this thesis research is to explore natural language as a medium to interact with computer software in the graphics domain, e.g. programs like MS Paint or OpenGL. A core element of most natural language understanding systems is an ontology, which represents concepts and items of the underlying domain of discourse. This thesis presents an ontology for the graphics domain based on several resources, including documentation and textbooks on graphics systems, existing ontologies, and - most importantly - a collection of natural language instructions to create and modify graphic images. The ontology was developed in several phases, and finally tested as part of a complex natural language interface. This natural language interface accepts verbal instructions in the graphics domain as input and creates matching graphic images as output. The results of our tests indicate an accuracy of the system in the area of 80%. Natural language Processing Natural Language Interface Graphics Domain Ontology Graphics Ontology Ontology Development
417	A functional theory of creative reading : process, knowledge, and evaluation Moorman, Kenneth Matthew 08 1900 (has links) No description available. Reading, Psychology of
418	Efficient prediction of relational structure and its application to natural language processing Riedel, Sebastian January 2009 (has links) Many tasks in Natural Language Processing (NLP) require us to predict a relational structure over entities. For example, in Semantic Role Labelling we try to predict the ’semantic role’ relation between a predicate verb and its argument constituents. Often NLP tasks not only involve related entities but also relations that are stochastically correlated. For instance, in Semantic Role Labelling the roles of different constituents are correlated: we cannot assign the agent role to one constituent if we have already assigned this role to another. Statistical Relational Learning (also known as First Order Probabilistic Logic) allows us to capture the aforementioned nature of NLP tasks because it is based on the notions of entities, relations and stochastic correlations between relationships. It is therefore often straightforward to formulate an NLP task using a First Order probabilistic language such as Markov Logic. However, the generality of this approach comes at a price: the process of finding the relational structure with highest probability, also known as maximum a posteriori (MAP) inference, is often inefficient, if not intractable. In this work we seek to improve the efficiency of MAP inference for Statistical Relational Learning. We propose a meta-algorithm, namely Cutting Plane Inference (CPI), that iteratively solves small subproblems of the original problem using any existing MAP technique and inspects parts of the problem that are not yet included in the current subproblem but could potentially lead to an improved solution. Our hypothesis is that this algorithm can dramatically improve the efficiency of existing methods while remaining at least as accurate. We frame the algorithm in Markov Logic, a language that combines First Order Logic and Markov Networks. Our hypothesis is evaluated using two tasks: Semantic Role Labelling and Entity Resolution. It is shown that the proposed algorithm improves the efficiency of two existing methods by two orders of magnitude and leads an approximate method to more probable solutions. We also give show that CPI, at convergence, is guaranteed to be at least as accurate as the method used within its inner loop. Another core contribution of this work is a theoretic and empirical analysis of the boundary conditions of Cutting Plane Inference. We describe cases when Cutting Plane Inference will definitely be difficult (because it instantiates large networks or needs many iterations) and when it will be easy (because it instantiates small networks and needs only few iterations). 005.1
419	Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz Schlünz, Georg Isaac January 2014 (has links) Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Natural language processing Text-to-speech synthesis Prosody Discourse Information structure Affect OCC model E-motif
420	Modélisation du langage à l'aide de pénalités structurées Nelakanti, Anil Kumar 11 February 2014 (has links) (PDF) Modeling natural language is among fundamental challenges of artificial intelligence and the design of interactive machines, with applications spanning across various domains, such as dialogue systems, text generation and machine translation. We propose a discriminatively trained log-linear model to learn the distribution of words following a given context. Due to data sparsity, it is necessary to appropriately regularize the model using a penalty term. We design a penalty term that properly encodes the structure of the feature space to avoid overfitting and improve generalization while appropriately capturing long range dependencies. Some nice properties of specific structured penalties can be used to reduce the number of parameters required to encode the model. The outcome is an efficient model that suitably captures long dependencies in language without a significant increase in time or space requirements. In a log-linear model, both training and testing become increasingly expensive with growing number of classes. The number of classes in a language model is the size of the vocabulary which is typically very large. A common trick is to cluster classes and apply the model in two-steps; the first step picks the most probable cluster and the second picks the most probable word from the chosen cluster. This idea can be generalized to a hierarchy of larger depth with multiple levels of clustering. However, the performance of the resulting hierarchical classifier depends on the suitability of the clustering to the problem. We study different strategies to build the hierarchy of categories from their observations. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Convex optimization Natural language processing

Search results