Global ETD Search

131	Data-Driven Solutions to Bottlenecks in Natural Language Generation Biran, Or January 2016 (has links) Concept-to-text generation suffers from what can be called generation bottlenecks - aspects of the generated text which should change for different subject domains, and which are usually hard to obtain or require manual work. Some examples are domain-specific content, a type system, a dictionary, discourse style and lexical style. These bottlenecks have stifled attempts to create generation systems that are generic, or at least apply to a wide range of domains in non-trivial applications. This thesis is comprised of two parts. In the first, we propose data-driven solutions that automate obtaining the information and models required to solve some of these bottlenecks. Specifically, we present an approach to mining domain-specific paraphrasal templates from a simple text corpus; an approach to extracting a domain-specific taxonomic thesaurus from Wikipedia; and a novel document planning model which determines both ordering and discourse relations, and which can be extracted from a domain corpus. We evaluate each solution individually and independently from its ultimate use in generation, and show significant improvements in each. In the second part of the thesis, we describe a framework for creating generation systems that rely on these solutions, as well as on hybrid concept-to-text and text-to-text generation, and which can be automatically adapted to any domain using only a domain-specific corpus. We illustrate the breadth of applications that this framework applies to with three examples: biography generation and company description generation, which we use to evaluate the framework itself and the contribution of our solutions; and justification of machine learning predictions, a novel application which we evaluate in a task-based study to show its importance to users. Computer science Artificial intelligence
132	Apply syntactic features in a maximum entropy framework for English and Chinese reading comprehension. / CUHK electronic theses & dissertations collection January 2008 (has links) Automatic reading comprehension (RC) systems integrate various kinds of natural language processing (NLP) technologies to analyze a given passage and generate or extract answers in response to questions about the passage. Previous work applied a lot of NLP technologies including shallow syntactic analyses (e.g. base noun phrases), semantic analyses (e.g. named entities) and discourse analyses (e.g. pronoun referents) in the bag-of-words (BOW) matching approach. This thesis proposes a novel RC approach that integrates a set of NLP technologies in a maximum entropy (ME) framework to estimate candidate answer sentences' probabilities being answers. In contrast to previous RC approaches, which are in English-only, the presented RC approach is the first one for both English and Chinese, the two languages used by most people in the world. In order to support the evaluation of the bilingual RC systems, a parallel English and Chinese corpus is also designed and developed. Annotations deemed relevant to the RC task are also included in the corpus. In addition, useful NLP technologies are explored from a new perspective---referring the pedagogical guidelines of humans, reading skills are summarized and mapped to various NLP technologies. Practical NLP technologies, categorized as shallow syntactic analyses (i.e. part-of-speech tags, voices and tenses) and deep syntactic analyses (i.e. syntactic parse trees and dependency parse trees) are then selected for integration. The proposed approach is evaluated on an English corpus, namely Remedia and our bilingual corpus. The experimental results show that our approach significantly improves the RC results on both English and Chinese corpora. / Xu, Kui. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3618. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 132-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Entropy (Information theory) Reading comprehension
133	Unsupervised learning of Arabic non-concatenative morphology Khaliq, Bilal January 2015 (has links) Unsupervised approaches to learning the morphology of a language play an important role in computer processing of language from a practical and theoretical perspective, due their minimal reliance on manually produced linguistic resources and human annotation. Such approaches have been widely researched for the problem of concatenative affixation, but less attention has been paid to the intercalated (non-concatenative) morphology exhibited by Arabic and other Semitic languages. The aim of this research is to learn the root and pattern morphology of Arabic, with accuracy comparable to manually built morphological analysis systems. The approach is kept free from human supervision or manual parameter settings, assuming only that roots and patterns intertwine to form a word. Promising results were obtained by applying a technique adapted from previous work in concatenative morphology learning, which uses machine learning to determine relatedness between words. The output, with probabilistic relatedness values between words, was then used to rank all possible roots and patterns to form a lexicon. Analysis using trilateral roots resulted in correct root identification accuracy of approximately 86% for inflected words. Although the machine learning-based approach is effective, it is conceptually complex. So an alternative, simpler and computationally efficient approach was then devised to obtain morpheme scores based on comparative counts of roots and patterns. In this approach, root and pattern scores are defined in terms of each other in a mutually recursive relationship, converging to an optimized morpheme ranking. This technique gives slightly better accuracy while being conceptually simpler and more efficient. The approach, after further enhancements, was evaluated on a version of the Quranic Arabic Corpus, attaining a final accuracy of approximately 93%. A comparative evaluation shows this to be superior to two existing, well used manually built Arabic stemmers, thus demonstrating the practical feasibility of unsupervised learning of non-concatenative morphology. 004
134	A natural language based indexing technique for Chinese information retrieval. January 1997 (has links) Pang Chun Kiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 101-107). / Chapter 1 --- Introduction --- p.2 / Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6 / Chapter 1.2 --- Objectives --- p.8 / Chapter 1.3 --- An Overview of the Thesis --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10 / Chapter 2.2 --- Related Work --- p.13 / Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13 / Chapter 2.2.2 --- Syntactical approaches --- p.15 / Chapter 2.2.3 --- Semantic approaches --- p.17 / Chapter 2.2.4 --- Noun Phrases Approach --- p.18 / Chapter 2.2.5 --- Chinese Information Retrieval --- p.20 / Chapter 2.3 --- Our Approach --- p.21 / Chapter 3 --- Chinese Noun Phrases --- p.23 / Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23 / Chapter 3.2 --- Ambiguous noun phrases --- p.27 / Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27 / Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28 / Chapter 3.2.3 --- Statistical data on the three NPs --- p.33 / Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35 / Chapter 4.1 --- Word Segmentation --- p.36 / Chapter 4.2 --- Part-of-speech tagging --- p.37 / Chapter 4.3 --- Noun Phrase Extraction --- p.37 / Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38 / Chapter 4.5 --- Handling Parsing Ambiguity --- p.40 / Chapter 4.6 --- Index Building Strategy --- p.41 / Chapter 4.7 --- The cross-set generation rules --- p.44 / Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46 / Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48 / Chapter 4.10 --- Experimental results and Discussion --- p.49 / Chapter 5 --- Indexing Compound Nouns --- p.52 / Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53 / Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55 / Chapter 5.2.1 --- About the thesaurus《同義詞詞林》 --- p.56 / Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58 / Chapter 5.4 --- Corpus learning approach --- p.59 / Chapter 5.4.1 --- An Example --- p.60 / Chapter 5.4.2 --- Experimental Setup --- p.63 / Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65 / Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66 / Chapter 5.5 --- Contextual Approach --- p.68 / Chapter 5.5.1 --- The algorithm --- p.69 / Chapter 5.5.2 --- An Illustrative Example --- p.71 / Chapter 5.5.3 --- Experiments on compound nouns --- p.72 / Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73 / Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75 / Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76 / Chapter 5.5.7 --- The Final Algorithm --- p.79 / Chapter 5.5.8 --- Experiments on other compounds --- p.82 / Chapter 5.5.9 --- Discussion --- p.83 / Chapter 6 --- Overall Effectiveness --- p.85 / Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86 / Chapter 6.2 --- Experimental Setup --- p.90 / Chapter 6.3 --- Experimental Results & Discussion --- p.91 / Chapter 7 --- Conclusion --- p.95 / Chapter 7.1 --- Summary --- p.95 / Chapter 7.2 --- Contributions --- p.97 / Chapter 7.3 --- Future Directions --- p.98 / Chapter 7.3.1 --- Word-sense determination --- p.98 / Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99 / Chapter A --- Cross-set Generation Rules --- p.108 / Chapter B --- Tag set by Tsinghua University --- p.110 / Chapter C --- Noun Phrases Test Set --- p.113 / Chapter D --- Compound Nouns Test Set --- p.124 / Chapter D.l --- Three-term Compound Nouns --- p.125 / Chapter D.1.1 --- NVN --- p.125 / Chapter D.1.2 --- Other three-term compound nouns --- p.129 / Chapter D.2 --- Four-term Compound Nouns --- p.133 / Chapter D.3 --- Five-term and six-term Compound Nouns --- p.134 Chinese language--Data processing Indexing
135	The use of multiple speech recognition hypotheses for natural language understanding. January 2003 (has links) Wang Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 102-104). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.3 / Chapter 2 --- Background --- p.4 / Chapter 2.1 --- Speech Recognition --- p.4 / Chapter 2.2 --- Natural Language Understanding --- p.6 / Chapter 2.2.1 --- Rule-based Approach --- p.7 / Chapter 2.2.2 --- Corpus-based Approach --- p.7 / Chapter 2.3 --- Integration of Speech Recognition with NLU --- p.8 / Chapter 2.3.1 --- Word Graph --- p.9 / Chapter 2.3.2 --- N-best List --- p.9 / Chapter 2.4 --- The ATIS Domain --- p.10 / Chapter 2.5 --- Chapter Summary --- p.14 / Chapter 3 --- Generation of Speech Recognition Hypotheses --- p.15 / Chapter 3.1 --- Grammar Development for the OpenSpeech Recognizer --- p.16 / Chapter 3.2 --- Generation of Speech Recognition Hypotheses --- p.22 / Chapter 3.3 --- Evaluation of Speech Recognition Hypotheses --- p.24 / Chapter 3.3.1 --- Recognition Accuracy --- p.24 / Chapter 3.3.2 --- Concept Accuracy --- p.28 / Chapter 3.4 --- Results and Analysis --- p.33 / Chapter 3.5 --- Chapter Summary --- p.38 / Chapter 4 --- Belief Networks for NLU --- p.40 / Chapter 4.1 --- Problem Formulation --- p.40 / Chapter 4.2 --- The Original NLU Framework --- p.41 / Chapter 4.2.1 --- Semantic Tagging --- p.41 / Chapter 4.2.2 --- Concept Selection --- p.42 / Chapter 4.2.3 --- Bayesian Inference --- p.43 / Chapter 4.2.4 --- Thresholding --- p.44 / Chapter 4.2.5 --- Goal Identification --- p.45 / Chapter 4.3 --- Evaluation Method of Goal Identification Performance --- p.45 / Chapter 4.4 --- Baseline Result --- p.48 / Chapter 4.5 --- Chapter Summary --- p.50 / Chapter 5 --- The Effects of Recognition Errors on NLU --- p.51 / Chapter 5.1 --- Experiments --- p.51 / Chapter 5.1.1 --- Perfect Case´ؤThe Use of Transcripts --- p.53 / Chapter 5.1.2 --- Train on Recognition Hypotheses --- p.53 / Chapter 5.1.3 --- Test on Recognition Hypotheses --- p.55 / Chapter 5.1.4 --- Train and Test on Recognition Hypotheses --- p.56 / Chapter 5.2 --- Analysis of Results --- p.60 / Chapter 5.3 --- Chapter Summary --- p.67 / Chapter 6 --- The Use of Multiple Speech Recognition Hypotheses for NLU --- p.69 / Chapter 6.1 --- The Extended NLU Framework --- p.76 / Chapter 6.1.1 --- Semantic Tagging --- p.76 / Chapter 6.1.2 --- Recognition Confidence Score Normalization --- p.77 / Chapter 6.1.3 --- Concept Selection --- p.79 / Chapter 6.1.4 --- Bayesian Inference --- p.80 / Chapter 6.1.5 --- Combination with Confidence Scores --- p.81 / Chapter 6.1.6 --- Thresholding --- p.84 / Chapter 6.1.7 --- Goal Identification --- p.84 / Chapter 6.2 --- Experiments --- p.86 / Chapter 6.2.1 --- The Use of First Best Recognition Hypothesis --- p.86 / Chapter 6.2.2 --- Train on Multiple Recognition Hypotheses --- p.86 / Chapter 6.2.3 --- Test on Multiple Recognition Hypotheses --- p.87 / Chapter 6.2.4 --- Train and Test on Multiple Recognition Hypotheses --- p.88 / Chapter 6.3 --- Significance Testing --- p.90 / Chapter 6.4 --- Result Analysis --- p.91 / Chapter 6.5 --- Chapter Summary --- p.97 / Chapter 7 --- Conclusions and Future Work --- p.98 / Chapter 7.1 --- Conclusions --- p.98 / Chapter 7.2 --- Contribution --- p.99 / Chapter 7.3 --- Future Work --- p.100 / Bibliography --- p.102 / Chapter A --- Speech Recognition Hypotheses Distribution --- p.105 / Chapter B --- Recognition Errors in Three Kinds of Queries --- p.107 / Chapter C --- The Effects of Recognition Errors in N-Best list on NLU --- p.114 / Chapter D --- Training on Multiple Recognition Hypotheses --- p.117 / Chapter E --- Testing on Multiple Recognition Hypotheses --- p.132 / Chapter F --- Hand-designed Grammar For ATIS --- p.139 Automatic speech recognition
136	Application of Boolean Logic to Natural Language Complexity in Political Discourse Taing, Austin 01 January 2019 (has links) Press releases serve as a major influence on public opinion of a politician, since they are a primary means of communicating with the public and directing discussion. Thus, the public’s ability to digest them is an important factor for politicians to consider. This study employs several well-studied measures of linguistic complexity and proposes a new one to examine whether politicians change their language to become more or less difficult to parse in different situations. This study uses 27,500 press releases from the US Senate between 2004–2008 and examines election cycles and natural disasters, namely hurricanes, as situations where politicians’ language may change. We calculate the syntactic complexity measures clauses per sentence, T-unit length, and complex-T ratio, as well as the Automated Readability Index and Flesch Reading Ease of each press release. We also propose a proof-of-concept measure called logical complexity to find if classical Boolean logic can be applied as a practical linguistic complexity measure. We find that language becomes more complex in coastal senators’ press releases concerning hurricanes, but see no significant change for those in election cycles. Our measure shows similar results to the well-established ones, showing that logical complexity is a useful lens for measuring linguistic complexity. linguistic complexity readability natural language processing Computational Linguistics Computer Sciences
137	A System for Natural Language Unmarked Clausal Transformations in Text-to-Text Applications Miller, Daniel 01 June 2009 (has links) A system is proposed which separates clauses from complex sentences into simpler stand-alone sentences. This is useful as an initial step on raw text, where the resulting processed text may be fed into text-to-text applications such as Automatic Summarization, Question Answering, and Machine Translation, where complex sentences are difficult to process. Grammatical natural language transformations provide a possible method to simplify complex sentences to enhance the results of text-to-text applications. Using shallow parsing, this system improves the performance of existing systems to identify and separate marked and unmarked embedded clauses in complex sentence structure resulting in syntactically simplified source for further processing. Natural Language Processing Natural Language Information Retrieval Other Computer Engineering
138	Chatbot for Information Retrieval from Unstructured Natural Language Documents Fredriksson, Joakim, Höppner, Falk January 2019 (has links) This thesis brings forward the development of a chatbot which retrieves information from a data source consisting of unstructured natural language text. This project was made in collaboration with the company Jayway in Halmstad. Elasticsearch was used to create the search function and the service Dialogflow was used to process the natural language input from the user. A Python script was created to retrieve the information from the data source, and a request handler was written which connected the tools together to create a working chatbot. The chatbot correctly answers questions with a accuracy of 72% according to testing with a sample of n = 25. The testing consisted of asking the chatbot questions and determining if the answer is correct. Possible further research could be done to explore how chatbots might help the elderly or people with disabilities use the web with a natural dialogue instead of a traditional user interface. chatbot natural language processing Computer Sciences Datavetenskap (datalogi)
139	GeneTUC: Natural Language Understanding in Medical Text Sætre, Rune January 2006 (has links) <p>Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists.</p><p>The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems.</p><p>The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities.</p><p>The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.</p> Information Extraction (IE) Natural Language Processing (NLP) Bio-informatics
140	GPSG-Recognition is NP-Hard Ristad, Eric Sven 01 March 1985 (has links) Proponents of generalized phrase structure grammar (GPSG) cite its weak context-free generative power as proof of the computational tractability of GPSG-Recognition. Since context-free languages (CFLs) can be parsed in time proportional to the cube of the sentence length, and GPSGs only generate CFLs, it seems plausible the GPSGs can also be parsed in cubic time. This longstanding, widely assumed GPSG "efficient parsability" result in misleading: parsing the sentences of an arbitrary GPSG is likely to be intractable, because a reduction from 3SAT proves that the universal recognition problem for the GPSGs of Gazdar (1981) is NP-hard. Crucially, the time to parse a sentence of a CFL can be the product of sentence length cubed and context-free grammar size squared, and the GPSG grammar can result in an exponentially large set of derived context-free rules. A central object in the 1981 GPSG theory, the metarule, inherently results in an intractable parsing problem, even when severely constrained. The implications for linguistics and natural language parsing are discussed. GPSG parsing complexity natural language linguistics snatural language parsing

Search results