Global ETD Search

391	Using Zipf Frequencies As A Representativeness Measure In Statistical Active Learning Of Natural Language Cobanoglu, Onur 01 June 2008 (has links) (PDF) Active learning has proven to be a successful strategy in quick development of corpora to be used in statistical induction of natural language. A vast majority of studies in this field has concentrated on finding and testing various informativeness measures for samples / however, representativeness measures for samples have not been thoroughly studied. In this thesis, we introduce a novel representativeness measure which is, being based on Zipf&#039 / s law, model-independent and validated both theoretically and empirically. Experiments conducted on WSJ corpus with a wide-coverage parser show that our representativeness measure leads to better performance than previously introduced representativeness measures when used with most of the known informativeness measures. QA General Works 36-39
392	Sentiment Analysis In Turkish Erogul, Umut 01 June 2009 (has links) (PDF) Sentiment analysis is the automatic classification of a text, trying to determine the attitude of the writer with respect to a specific topic. The attitude may be either their judgment or evaluation, their feelings or the intended emotional communication. The recent increase in the use of review sites and blogs, has made a great amount of subjective data available. Nowadays, it is nearly impossible to manually process all the relevant data available, and as a consequence, the importance given to the automatic classification of unformatted data, has increased. Up to date, all of the research carried on sentiment analysis was focused on English language. In this thesis, two Turkish datasets tagged with sentiment information is introduced and existing methods for English are applied on these datasets. This thesis also suggests new methods for Turkish sentiment analysis. QA General 15707
393	Ontology Based Information Extraction On Free Text Radiological Reports Using Natural Language Processing Approach Soysal, Ergin 01 September 2010 (has links) (PDF) This thesis describes an information extraction system that is designed to process free text Turkish radiology reports in order to extract and convert the available information into a structured information model. The system uses natural language processing techniques together with domain ontology in order to transform the verbal descriptions into a target information model, so that they can be used for computational purposes. The developed domain ontology is effectively used in entity recognition and relation extraction phases of the information extraction task. The ontology provides the flexibility in the design of extraction rules, and the structure of the ontology also determines the information model that describes the structure of the extracted semantic information. In addition, some of the missing terms in the sentences are identified with the help of the ontology. One of the main contributions of this thesis is the usage of ontology in information extraction that increases the expressive power of extraction rules and helps to determine missing items in the sentences. The system is the first information extraction system for Turkish texts. Since Turkish is a morphologically rich language, the system uses a morphological analyzer and the extraction rules are also based on the morphological features. TRIES achieved 93% recall and 98% precision results in the performance evaluations. T Information Technology 58.5-58.64
394	Chinese to English machine translation using SNePS as an interlingua Liao, Min-Hung. January 1997 (has links) Thesis (M.A.)--State University of New York at Buffalo, 1997. / Includes bibliographical references (leaves 172-174). Also available in print.
395	Understanding acknowledments / Ward, Karen, January 2001 (has links) Thesis (Ph. D.)--Oregon Graduate Institute, 2001.
396	Generating documents by means of computational registers Oldham, Joseph Dowell. January 2000 (has links) (PDF) Thesis (Ph. D.)--University of Kentucky, 2000. / Title from document title page. Document formatted into pages; contains ix, 169 p. : ill. Includes abstract. Includes bibliographical references (p. 160-167).
397	The use of prosodic features in Chinese speech recognition and spoken language processing / Wong, Jimmy Pui Fung. January 2003 (has links) Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
398	Learning language from ambiguous perceptual context Chen, David Lieh-Chiang 05 July 2012 (has links) Building a computer system that can understand human languages has been one of the long-standing goals of artificial intelligence. Currently, most state-of-the-art natural language processing (NLP) systems use statistical machine learning methods to extract linguistic knowledge from large, annotated corpora. However, constructing such corpora can be expensive and time-consuming due to the expertise it requires to annotate such data. In this thesis, we explore alternative ways of learning which do not rely on direct human supervision. In particular, we draw our inspirations from the fact that humans are able to learn language through exposure to linguistic inputs in the context of a rich, relevant, perceptual environment. We first present a system that learned to sportscast for RoboCup simulation games by observing how humans commentate a game. Using the simple assumption that people generally talk about events that have just occurred, we pair each textual comment with a set of events that it could be referring to. By applying an EM-like algorithm, the system simultaneously learns a grounded language model and aligns each description to the corresponding event. The system does not use any prior language knowledge and was able to learn to sportscast in both English and Korean. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans. For the sportscasting task, while each comment could be aligned to one of several events, the level of ambiguity was low enough that we could enumerate all the possible alignments. However, it is not always possible to restrict the set of possible alignments to such limited numbers. Thus, we present another system that allows each sentence to be aligned to one of exponentially many connected subgraphs without explicitly enumerating them. The system first learns a lexicon and uses it to prune the nodes in the graph that are unrelated to the words in the sentence. By only observing how humans follow navigation instructions, the system was able to infer the corresponding hidden navigation plans and parse previously unseen instructions in new environments for both English and Chinese data. With the rise in popularity of crowdsourcing, we also present results on collecting additional training data using Amazon’s Mechanical Turk. Since our system only needs supervision in the form of language being used in relevant contexts, it is easy for virtually anyone to contribute to the training data. / text Natural language processing Natural language learning Connecting language and perception Machine learning Artificial intelligence
399	Grounded language learning models for ambiguous supervision Kim, Joo Hyun, active 2013 30 January 2014 (has links) Communicating with natural language interfaces is a long-standing, ultimate goal for artificial intelligence (AI) agents to pursue, eventually. One core issue toward this goal is "grounded" language learning, a process of learning the semantics of natural language with respect to relevant perceptual inputs. In order to ground the meanings of language in a real world situation, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR). In this thesis, we focus on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical MRs. We present probabilistic generative models for learning such correspondences along with a reranking model to improve the performance further. First, we present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map an NL sentence to its corresponding MR form depicted in a single tree structure. We perform evaluations on the RoboCup sportscasting corpus, proving that our model is more effective than those proposed by previous researchers. Next, we describe two PCFG induction models for grounded language learning that extend the previous grounded language learning model of Börschinger, Jones, and Johnson (2011). Börschinger et al.’s approach works well in situations of limited ambiguity, such as in the sportscasting task. However, it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task first studied by Chen and Mooney (2011). The two models we present overcome such limitations by employing a learned semantic lexicon as a basic correspondence unit between NL and MR for PCFG rule generation. Finally, we present a method of adapting discriminative reranking to grounded language learning in order to improve the performance of our proposed generative models. Although such generative models are easy to implement and are intuitive, it is not always the case that generative models perform best, since they are maximizing the joint probability of data and model, rather than directly maximizing conditional probability. Because we do not have gold-standard references for training a secondary conditional reranker, we incorporate weak supervision of evaluations against the perceptual world during the process of improving model performance. All these approaches are evaluated on the two publicly available domains that have been actively used in many other grounded language learning studies. Our methods demonstrate consistently improved performance over those of previous studies in the domains with different languages; this proves that our methods are language-independent and can be generally applied to other grounded learning problems as well. Further possible applications of the presented approaches include summarized machine translation tasks and learning from real perception data assisted by computer vision and robotics. / text Grounded language learning Semantic parsing Learning from ambiguous supervision Probabilistic alignment Natural language processing
400	Retrieving information from heterogeneous freight data sources to answer natural language queries Seedah, Dan Paapanyin Kofi 09 February 2015 (has links) The ability to retrieve accurate information from databases without an extensive knowledge of the contents and organization of each database is extremely beneficial to the dissemination and utilization of freight data. The challenges, however, are: 1) correctly identifying only the relevant information and keywords from questions when dealing with multiple sentence structures, and 2) automatically retrieving, preprocessing, and understanding multiple data sources to determine the best answer to user’s query. Current named entity recognition systems have the ability to identify entities but require an annotated corpus for training which in the field of transportation planning does not currently exist. A hybrid approach which combines multiple models to classify specific named entities was therefore proposed as an alternative. The retrieval and classification of freight related keywords facilitated the process of finding which databases are capable of answering a question. Values in data dictionaries can be queried by mapping keywords to data element fields in various freight databases using ontologies. A number of challenges still arise as a result of different entities sharing the same names, the same entity having multiple names, and differences in classification systems. Dealing with ambiguities is required to accurately determine which database provides the best answer from the list of applicable sources. This dissertation 1) develops an approach to identify and classifying keywords from freight related natural language queries, 2) develops a standardized knowledge representation of freight data sources using an ontology that both computer systems and domain experts can utilize to identify relevant freight data sources, and 3) provides recommendations for addressing ambiguities in freight related named entities. Finally, the use of knowledge base expert systems to intelligently sift through data sources to determine which ones provide the best answer to a user’s question is proposed. / text Freight data Heterogeneous data sources Freight ontology Natural language processing Ambiguity Disambiguation Knowledge systems

Search results