• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 897
  • 156
  • 74
  • 55
  • 27
  • 18
  • 16
  • 11
  • 10
  • 8
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1557
  • 1557
  • 1557
  • 605
  • 549
  • 451
  • 376
  • 366
  • 256
  • 248
  • 237
  • 223
  • 214
  • 194
  • 190
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Spectral Methods for Natural Language Processing

Stratos, Karl January 2016 (has links)
Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce. In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods: 1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent. 2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art.
122

Data-Driven Solutions to Bottlenecks in Natural Language Generation

Biran, Or January 2016 (has links)
Concept-to-text generation suffers from what can be called generation bottlenecks - aspects of the generated text which should change for different subject domains, and which are usually hard to obtain or require manual work. Some examples are domain-specific content, a type system, a dictionary, discourse style and lexical style. These bottlenecks have stifled attempts to create generation systems that are generic, or at least apply to a wide range of domains in non-trivial applications. This thesis is comprised of two parts. In the first, we propose data-driven solutions that automate obtaining the information and models required to solve some of these bottlenecks. Specifically, we present an approach to mining domain-specific paraphrasal templates from a simple text corpus; an approach to extracting a domain-specific taxonomic thesaurus from Wikipedia; and a novel document planning model which determines both ordering and discourse relations, and which can be extracted from a domain corpus. We evaluate each solution individually and independently from its ultimate use in generation, and show significant improvements in each. In the second part of the thesis, we describe a framework for creating generation systems that rely on these solutions, as well as on hybrid concept-to-text and text-to-text generation, and which can be automatically adapted to any domain using only a domain-specific corpus. We illustrate the breadth of applications that this framework applies to with three examples: biography generation and company description generation, which we use to evaluate the framework itself and the contribution of our solutions; and justification of machine learning predictions, a novel application which we evaluate in a task-based study to show its importance to users.
123

Apply syntactic features in a maximum entropy framework for English and Chinese reading comprehension. / CUHK electronic theses & dissertations collection

January 2008 (has links)
Automatic reading comprehension (RC) systems integrate various kinds of natural language processing (NLP) technologies to analyze a given passage and generate or extract answers in response to questions about the passage. Previous work applied a lot of NLP technologies including shallow syntactic analyses (e.g. base noun phrases), semantic analyses (e.g. named entities) and discourse analyses (e.g. pronoun referents) in the bag-of-words (BOW) matching approach. This thesis proposes a novel RC approach that integrates a set of NLP technologies in a maximum entropy (ME) framework to estimate candidate answer sentences' probabilities being answers. In contrast to previous RC approaches, which are in English-only, the presented RC approach is the first one for both English and Chinese, the two languages used by most people in the world. In order to support the evaluation of the bilingual RC systems, a parallel English and Chinese corpus is also designed and developed. Annotations deemed relevant to the RC task are also included in the corpus. In addition, useful NLP technologies are explored from a new perspective---referring the pedagogical guidelines of humans, reading skills are summarized and mapped to various NLP technologies. Practical NLP technologies, categorized as shallow syntactic analyses (i.e. part-of-speech tags, voices and tenses) and deep syntactic analyses (i.e. syntactic parse trees and dependency parse trees) are then selected for integration. The proposed approach is evaluated on an English corpus, namely Remedia and our bilingual corpus. The experimental results show that our approach significantly improves the RC results on both English and Chinese corpora. / Xu, Kui. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3618. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 132-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
124

Unsupervised learning of Arabic non-concatenative morphology

Khaliq, Bilal January 2015 (has links)
Unsupervised approaches to learning the morphology of a language play an important role in computer processing of language from a practical and theoretical perspective, due their minimal reliance on manually produced linguistic resources and human annotation. Such approaches have been widely researched for the problem of concatenative affixation, but less attention has been paid to the intercalated (non-concatenative) morphology exhibited by Arabic and other Semitic languages. The aim of this research is to learn the root and pattern morphology of Arabic, with accuracy comparable to manually built morphological analysis systems. The approach is kept free from human supervision or manual parameter settings, assuming only that roots and patterns intertwine to form a word. Promising results were obtained by applying a technique adapted from previous work in concatenative morphology learning, which uses machine learning to determine relatedness between words. The output, with probabilistic relatedness values between words, was then used to rank all possible roots and patterns to form a lexicon. Analysis using trilateral roots resulted in correct root identification accuracy of approximately 86% for inflected words. Although the machine learning-based approach is effective, it is conceptually complex. So an alternative, simpler and computationally efficient approach was then devised to obtain morpheme scores based on comparative counts of roots and patterns. In this approach, root and pattern scores are defined in terms of each other in a mutually recursive relationship, converging to an optimized morpheme ranking. This technique gives slightly better accuracy while being conceptually simpler and more efficient. The approach, after further enhancements, was evaluated on a version of the Quranic Arabic Corpus, attaining a final accuracy of approximately 93%. A comparative evaluation shows this to be superior to two existing, well used manually built Arabic stemmers, thus demonstrating the practical feasibility of unsupervised learning of non-concatenative morphology.
125

A natural language based indexing technique for Chinese information retrieval.

January 1997 (has links)
Pang Chun Kiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 101-107). / Chapter 1 --- Introduction --- p.2 / Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6 / Chapter 1.2 --- Objectives --- p.8 / Chapter 1.3 --- An Overview of the Thesis --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10 / Chapter 2.2 --- Related Work --- p.13 / Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13 / Chapter 2.2.2 --- Syntactical approaches --- p.15 / Chapter 2.2.3 --- Semantic approaches --- p.17 / Chapter 2.2.4 --- Noun Phrases Approach --- p.18 / Chapter 2.2.5 --- Chinese Information Retrieval --- p.20 / Chapter 2.3 --- Our Approach --- p.21 / Chapter 3 --- Chinese Noun Phrases --- p.23 / Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23 / Chapter 3.2 --- Ambiguous noun phrases --- p.27 / Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27 / Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28 / Chapter 3.2.3 --- Statistical data on the three NPs --- p.33 / Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35 / Chapter 4.1 --- Word Segmentation --- p.36 / Chapter 4.2 --- Part-of-speech tagging --- p.37 / Chapter 4.3 --- Noun Phrase Extraction --- p.37 / Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38 / Chapter 4.5 --- Handling Parsing Ambiguity --- p.40 / Chapter 4.6 --- Index Building Strategy --- p.41 / Chapter 4.7 --- The cross-set generation rules --- p.44 / Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46 / Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48 / Chapter 4.10 --- Experimental results and Discussion --- p.49 / Chapter 5 --- Indexing Compound Nouns --- p.52 / Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53 / Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55 / Chapter 5.2.1 --- About the thesaurus《同義詞詞林》 --- p.56 / Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58 / Chapter 5.4 --- Corpus learning approach --- p.59 / Chapter 5.4.1 --- An Example --- p.60 / Chapter 5.4.2 --- Experimental Setup --- p.63 / Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65 / Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66 / Chapter 5.5 --- Contextual Approach --- p.68 / Chapter 5.5.1 --- The algorithm --- p.69 / Chapter 5.5.2 --- An Illustrative Example --- p.71 / Chapter 5.5.3 --- Experiments on compound nouns --- p.72 / Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73 / Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75 / Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76 / Chapter 5.5.7 --- The Final Algorithm --- p.79 / Chapter 5.5.8 --- Experiments on other compounds --- p.82 / Chapter 5.5.9 --- Discussion --- p.83 / Chapter 6 --- Overall Effectiveness --- p.85 / Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86 / Chapter 6.2 --- Experimental Setup --- p.90 / Chapter 6.3 --- Experimental Results & Discussion --- p.91 / Chapter 7 --- Conclusion --- p.95 / Chapter 7.1 --- Summary --- p.95 / Chapter 7.2 --- Contributions --- p.97 / Chapter 7.3 --- Future Directions --- p.98 / Chapter 7.3.1 --- Word-sense determination --- p.98 / Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99 / Chapter A --- Cross-set Generation Rules --- p.108 / Chapter B --- Tag set by Tsinghua University --- p.110 / Chapter C --- Noun Phrases Test Set --- p.113 / Chapter D --- Compound Nouns Test Set --- p.124 / Chapter D.l --- Three-term Compound Nouns --- p.125 / Chapter D.1.1 --- NVN --- p.125 / Chapter D.1.2 --- Other three-term compound nouns --- p.129 / Chapter D.2 --- Four-term Compound Nouns --- p.133 / Chapter D.3 --- Five-term and six-term Compound Nouns --- p.134
126

The use of multiple speech recognition hypotheses for natural language understanding.

January 2003 (has links)
Wang Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 102-104). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.3 / Chapter 2 --- Background --- p.4 / Chapter 2.1 --- Speech Recognition --- p.4 / Chapter 2.2 --- Natural Language Understanding --- p.6 / Chapter 2.2.1 --- Rule-based Approach --- p.7 / Chapter 2.2.2 --- Corpus-based Approach --- p.7 / Chapter 2.3 --- Integration of Speech Recognition with NLU --- p.8 / Chapter 2.3.1 --- Word Graph --- p.9 / Chapter 2.3.2 --- N-best List --- p.9 / Chapter 2.4 --- The ATIS Domain --- p.10 / Chapter 2.5 --- Chapter Summary --- p.14 / Chapter 3 --- Generation of Speech Recognition Hypotheses --- p.15 / Chapter 3.1 --- Grammar Development for the OpenSpeech Recognizer --- p.16 / Chapter 3.2 --- Generation of Speech Recognition Hypotheses --- p.22 / Chapter 3.3 --- Evaluation of Speech Recognition Hypotheses --- p.24 / Chapter 3.3.1 --- Recognition Accuracy --- p.24 / Chapter 3.3.2 --- Concept Accuracy --- p.28 / Chapter 3.4 --- Results and Analysis --- p.33 / Chapter 3.5 --- Chapter Summary --- p.38 / Chapter 4 --- Belief Networks for NLU --- p.40 / Chapter 4.1 --- Problem Formulation --- p.40 / Chapter 4.2 --- The Original NLU Framework --- p.41 / Chapter 4.2.1 --- Semantic Tagging --- p.41 / Chapter 4.2.2 --- Concept Selection --- p.42 / Chapter 4.2.3 --- Bayesian Inference --- p.43 / Chapter 4.2.4 --- Thresholding --- p.44 / Chapter 4.2.5 --- Goal Identification --- p.45 / Chapter 4.3 --- Evaluation Method of Goal Identification Performance --- p.45 / Chapter 4.4 --- Baseline Result --- p.48 / Chapter 4.5 --- Chapter Summary --- p.50 / Chapter 5 --- The Effects of Recognition Errors on NLU --- p.51 / Chapter 5.1 --- Experiments --- p.51 / Chapter 5.1.1 --- Perfect Case´ؤThe Use of Transcripts --- p.53 / Chapter 5.1.2 --- Train on Recognition Hypotheses --- p.53 / Chapter 5.1.3 --- Test on Recognition Hypotheses --- p.55 / Chapter 5.1.4 --- Train and Test on Recognition Hypotheses --- p.56 / Chapter 5.2 --- Analysis of Results --- p.60 / Chapter 5.3 --- Chapter Summary --- p.67 / Chapter 6 --- The Use of Multiple Speech Recognition Hypotheses for NLU --- p.69 / Chapter 6.1 --- The Extended NLU Framework --- p.76 / Chapter 6.1.1 --- Semantic Tagging --- p.76 / Chapter 6.1.2 --- Recognition Confidence Score Normalization --- p.77 / Chapter 6.1.3 --- Concept Selection --- p.79 / Chapter 6.1.4 --- Bayesian Inference --- p.80 / Chapter 6.1.5 --- Combination with Confidence Scores --- p.81 / Chapter 6.1.6 --- Thresholding --- p.84 / Chapter 6.1.7 --- Goal Identification --- p.84 / Chapter 6.2 --- Experiments --- p.86 / Chapter 6.2.1 --- The Use of First Best Recognition Hypothesis --- p.86 / Chapter 6.2.2 --- Train on Multiple Recognition Hypotheses --- p.86 / Chapter 6.2.3 --- Test on Multiple Recognition Hypotheses --- p.87 / Chapter 6.2.4 --- Train and Test on Multiple Recognition Hypotheses --- p.88 / Chapter 6.3 --- Significance Testing --- p.90 / Chapter 6.4 --- Result Analysis --- p.91 / Chapter 6.5 --- Chapter Summary --- p.97 / Chapter 7 --- Conclusions and Future Work --- p.98 / Chapter 7.1 --- Conclusions --- p.98 / Chapter 7.2 --- Contribution --- p.99 / Chapter 7.3 --- Future Work --- p.100 / Bibliography --- p.102 / Chapter A --- Speech Recognition Hypotheses Distribution --- p.105 / Chapter B --- Recognition Errors in Three Kinds of Queries --- p.107 / Chapter C --- The Effects of Recognition Errors in N-Best list on NLU --- p.114 / Chapter D --- Training on Multiple Recognition Hypotheses --- p.117 / Chapter E --- Testing on Multiple Recognition Hypotheses --- p.132 / Chapter F --- Hand-designed Grammar For ATIS --- p.139
127

Application of Boolean Logic to Natural Language Complexity in Political Discourse

Taing, Austin 01 January 2019 (has links)
Press releases serve as a major influence on public opinion of a politician, since they are a primary means of communicating with the public and directing discussion. Thus, the public’s ability to digest them is an important factor for politicians to consider. This study employs several well-studied measures of linguistic complexity and proposes a new one to examine whether politicians change their language to become more or less difficult to parse in different situations. This study uses 27,500 press releases from the US Senate between 2004–2008 and examines election cycles and natural disasters, namely hurricanes, as situations where politicians’ language may change. We calculate the syntactic complexity measures clauses per sentence, T-unit length, and complex-T ratio, as well as the Automated Readability Index and Flesch Reading Ease of each press release. We also propose a proof-of-concept measure called logical complexity to find if classical Boolean logic can be applied as a practical linguistic complexity measure. We find that language becomes more complex in coastal senators’ press releases concerning hurricanes, but see no significant change for those in election cycles. Our measure shows similar results to the well-established ones, showing that logical complexity is a useful lens for measuring linguistic complexity.
128

A System for Natural Language Unmarked Clausal Transformations in Text-to-Text Applications

Miller, Daniel 01 June 2009 (has links)
A system is proposed which separates clauses from complex sentences into simpler stand-alone sentences. This is useful as an initial step on raw text, where the resulting processed text may be fed into text-to-text applications such as Automatic Summarization, Question Answering, and Machine Translation, where complex sentences are difficult to process. Grammatical natural language transformations provide a possible method to simplify complex sentences to enhance the results of text-to-text applications. Using shallow parsing, this system improves the performance of existing systems to identify and separate marked and unmarked embedded clauses in complex sentence structure resulting in syntactically simplified source for further processing.
129

Chatbot for Information Retrieval from Unstructured Natural Language Documents

Fredriksson, Joakim, Höppner, Falk January 2019 (has links)
This thesis brings forward the development of a chatbot which retrieves information from a data source consisting of unstructured natural language text. This project was made in collaboration with the company Jayway in Halmstad. Elasticsearch was used to create the search function and the service Dialogflow was used to process the natural language input from the user. A Python script was created to retrieve the information from the data source, and a request handler was written which connected the tools together to create a working chatbot. The chatbot correctly answers questions with a accuracy of 72% according to testing with a sample of n = 25. The testing consisted of asking the chatbot questions and determining if the answer is correct. Possible further research could be done to explore how chatbots might help the elderly or people with disabilities use the web with a natural dialogue instead of a traditional user interface.
130

Semantic annotation of Chinese texts with message structures based on HowNet

Wong, Ping-wai. January 2007 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2007. / Title proper from title frame. Also available in printed format.

Page generated in 0.0956 seconds