Global ETD Search

91	The use of multiple speech recognition hypotheses for natural language understanding. January 2003 (has links) Wang Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 102-104). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.3 / Chapter 2 --- Background --- p.4 / Chapter 2.1 --- Speech Recognition --- p.4 / Chapter 2.2 --- Natural Language Understanding --- p.6 / Chapter 2.2.1 --- Rule-based Approach --- p.7 / Chapter 2.2.2 --- Corpus-based Approach --- p.7 / Chapter 2.3 --- Integration of Speech Recognition with NLU --- p.8 / Chapter 2.3.1 --- Word Graph --- p.9 / Chapter 2.3.2 --- N-best List --- p.9 / Chapter 2.4 --- The ATIS Domain --- p.10 / Chapter 2.5 --- Chapter Summary --- p.14 / Chapter 3 --- Generation of Speech Recognition Hypotheses --- p.15 / Chapter 3.1 --- Grammar Development for the OpenSpeech Recognizer --- p.16 / Chapter 3.2 --- Generation of Speech Recognition Hypotheses --- p.22 / Chapter 3.3 --- Evaluation of Speech Recognition Hypotheses --- p.24 / Chapter 3.3.1 --- Recognition Accuracy --- p.24 / Chapter 3.3.2 --- Concept Accuracy --- p.28 / Chapter 3.4 --- Results and Analysis --- p.33 / Chapter 3.5 --- Chapter Summary --- p.38 / Chapter 4 --- Belief Networks for NLU --- p.40 / Chapter 4.1 --- Problem Formulation --- p.40 / Chapter 4.2 --- The Original NLU Framework --- p.41 / Chapter 4.2.1 --- Semantic Tagging --- p.41 / Chapter 4.2.2 --- Concept Selection --- p.42 / Chapter 4.2.3 --- Bayesian Inference --- p.43 / Chapter 4.2.4 --- Thresholding --- p.44 / Chapter 4.2.5 --- Goal Identification --- p.45 / Chapter 4.3 --- Evaluation Method of Goal Identification Performance --- p.45 / Chapter 4.4 --- Baseline Result --- p.48 / Chapter 4.5 --- Chapter Summary --- p.50 / Chapter 5 --- The Effects of Recognition Errors on NLU --- p.51 / Chapter 5.1 --- Experiments --- p.51 / Chapter 5.1.1 --- Perfect Case´ؤThe Use of Transcripts --- p.53 / Chapter 5.1.2 --- Train on Recognition Hypotheses --- p.53 / Chapter 5.1.3 --- Test on Recognition Hypotheses --- p.55 / Chapter 5.1.4 --- Train and Test on Recognition Hypotheses --- p.56 / Chapter 5.2 --- Analysis of Results --- p.60 / Chapter 5.3 --- Chapter Summary --- p.67 / Chapter 6 --- The Use of Multiple Speech Recognition Hypotheses for NLU --- p.69 / Chapter 6.1 --- The Extended NLU Framework --- p.76 / Chapter 6.1.1 --- Semantic Tagging --- p.76 / Chapter 6.1.2 --- Recognition Confidence Score Normalization --- p.77 / Chapter 6.1.3 --- Concept Selection --- p.79 / Chapter 6.1.4 --- Bayesian Inference --- p.80 / Chapter 6.1.5 --- Combination with Confidence Scores --- p.81 / Chapter 6.1.6 --- Thresholding --- p.84 / Chapter 6.1.7 --- Goal Identification --- p.84 / Chapter 6.2 --- Experiments --- p.86 / Chapter 6.2.1 --- The Use of First Best Recognition Hypothesis --- p.86 / Chapter 6.2.2 --- Train on Multiple Recognition Hypotheses --- p.86 / Chapter 6.2.3 --- Test on Multiple Recognition Hypotheses --- p.87 / Chapter 6.2.4 --- Train and Test on Multiple Recognition Hypotheses --- p.88 / Chapter 6.3 --- Significance Testing --- p.90 / Chapter 6.4 --- Result Analysis --- p.91 / Chapter 6.5 --- Chapter Summary --- p.97 / Chapter 7 --- Conclusions and Future Work --- p.98 / Chapter 7.1 --- Conclusions --- p.98 / Chapter 7.2 --- Contribution --- p.99 / Chapter 7.3 --- Future Work --- p.100 / Bibliography --- p.102 / Chapter A --- Speech Recognition Hypotheses Distribution --- p.105 / Chapter B --- Recognition Errors in Three Kinds of Queries --- p.107 / Chapter C --- The Effects of Recognition Errors in N-Best list on NLU --- p.114 / Chapter D --- Training on Multiple Recognition Hypotheses --- p.117 / Chapter E --- Testing on Multiple Recognition Hypotheses --- p.132 / Chapter F --- Hand-designed Grammar For ATIS --- p.139 Automatic speech recognition
92	Application of Boolean Logic to Natural Language Complexity in Political Discourse Taing, Austin 01 January 2019 (has links) Press releases serve as a major influence on public opinion of a politician, since they are a primary means of communicating with the public and directing discussion. Thus, the public’s ability to digest them is an important factor for politicians to consider. This study employs several well-studied measures of linguistic complexity and proposes a new one to examine whether politicians change their language to become more or less difficult to parse in different situations. This study uses 27,500 press releases from the US Senate between 2004–2008 and examines election cycles and natural disasters, namely hurricanes, as situations where politicians’ language may change. We calculate the syntactic complexity measures clauses per sentence, T-unit length, and complex-T ratio, as well as the Automated Readability Index and Flesch Reading Ease of each press release. We also propose a proof-of-concept measure called logical complexity to find if classical Boolean logic can be applied as a practical linguistic complexity measure. We find that language becomes more complex in coastal senators’ press releases concerning hurricanes, but see no significant change for those in election cycles. Our measure shows similar results to the well-established ones, showing that logical complexity is a useful lens for measuring linguistic complexity. linguistic complexity readability natural language processing Computational Linguistics Computer Sciences
93	Chatbot for Information Retrieval from Unstructured Natural Language Documents Fredriksson, Joakim, Höppner, Falk January 2019 (has links) This thesis brings forward the development of a chatbot which retrieves information from a data source consisting of unstructured natural language text. This project was made in collaboration with the company Jayway in Halmstad. Elasticsearch was used to create the search function and the service Dialogflow was used to process the natural language input from the user. A Python script was created to retrieve the information from the data source, and a request handler was written which connected the tools together to create a working chatbot. The chatbot correctly answers questions with a accuracy of 72% according to testing with a sample of n = 25. The testing consisted of asking the chatbot questions and determining if the answer is correct. Possible further research could be done to explore how chatbots might help the elderly or people with disabilities use the web with a natural dialogue instead of a traditional user interface. chatbot natural language processing Computer Sciences Datavetenskap (datalogi)
94	GeneTUC: Natural Language Understanding in Medical Text Sætre, Rune January 2006 (has links) <p>Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists.</p><p>The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems.</p><p>The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities.</p><p>The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.</p> Information Extraction (IE) Natural Language Processing (NLP) Bio-informatics
95	GeneTUC: Natural Language Understanding in Medical Text Sætre, Rune January 2006 (has links) Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists. The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems. The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities. The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis. Information Extraction (IE) Natural Language Processing (NLP) Bio-informatics
96	Automatic Supervised Thesauri Construction with Roget’s Thesaurus Kennedy, Alistair H 07 December 2012 (has links) Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget’s Thesaurus. I use the Thesaurus as a source of training data in order to learn from Roget’s for the purpose of updating Roget’s. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget’s to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget’s to place a new word. Existing words from Roget’s are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process. I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget’s. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus. The second kind of evaluation is to establish the usefulness of the updated Roget’s Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications. Roget's Thesaurus Natural Language Processing Distributional Semantics Thesauri construction
97	Corpus construction based on Ontological domain knowledge Benis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links) The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java. Text mining Biomedical text mining Natural Language Processing
98	Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text Strommer, Claus Walter January 2011 (has links) In this thesis we propose a novel metric of document intent evaluation based on the detection and classification of rhetorical figure. In doing so we dispel the notion that rhetoric lacks the structure and consistency necessary to be relevant to computational linguistics. We show how the combination of document attributes available through shallow parsing and rules extracted from the definitions of rhetorical figures produce a metric which can be used to reliably classify the intent of texts. This metric works equally well on entire documents as on portions of a document. rhetoric classification natural language processing computational linguistics epanaphora Computer Science
99	Generating natural language text in response to questions about database structure / McKeown, Kathleen R. January 1900 (has links) Thesis (Ph. D.)--University of Pennsylvania, 1982. / Cover title. Includes bibliographical references and index.
100	Model selection based speaker adaptation and its application to nonnative speech recognition / He, Xiaodong, January 2003 (has links) Thesis (Ph. D.)--University of Missouri-Columbia, 2003. / Typescript. Vita. Includes bibliographical references (leaves 99-110). Also available on the Internet.

Search results