Global ETD Search

121	Exploiting Vocabulary, Morphological, and Subtree Knowledge to Improve Chinese Syntactic Analysis / 語彙的、形態的、および部分木知識を用いた中国語構文解析の精度向上 Shen, Mo 23 March 2016 (has links) In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Kyoto University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink. / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19848号 / 情博第599号 / 新制\|\|情\|\|104(附属図書館) / 32884 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)准教授河原大輔, 教授黒橋禎夫, 教授鹿島久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Natural language processing Chinese language processing Syntax Dependency parsing Morphology Word segmentation Corpus building 007
122	High-quality Knowledge Acquisition of Predicate-argument Structures for Syntactic and Semantic Analysis / 構文・意味解析のための高品質な述語項構造知識の獲得 Jin, Gongye 23 March 2016 (has links) If the author of the published paper digitizes such paper and releases it to third parties using digital media such as computer networks or CD-ROMs, the volume, number, and pages of the Journal of Natural Language Processing of the publication must be indicated in a clear manner for all viewers. / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19850号 / 情博第601号 / 新制\|\|情\|\|105(附属図書館) / 32886 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)准教授河原大輔, 教授黒橋禎夫, 教授河原達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM natural language processing knowledge acquisition semantic role labeling case frame construction Chinese language processing 007
123	Weighted Aspects for Sentiment Analysis Byungkyu Yoo (14216267) 05 December 2022 (has links) <p>When people write a review about a business, they write and rate it based on their personal experience of the business. Sentiment analysis is a natural language processing technique that determines the sentiment of text, including reviews. However, unlike computers, the personal experience of humans emphasizes their preferences and observations that they deem important while ignoring other components that may not be as important to them personally. Traditional sentiment analysis does not consider such preferences. To utilize these human preferences in sentiment analysis, this paper explores various methods of weighting aspects in an attempt to improve sentiment analysis accuracy. Two types of methods are considered. The first method applies human preference by assigning weights to aspects in calculating overall sentiment analysis. The second method uses the results of the first method to improve the accuracy of traditional supervised sentiment analysis. The results show that the methods have high accuracy when people have strong opinions, but the weights of the aspects do not significantly improve the accuracy.</p> Natural language processing Natural Language Processing (NLP) Sentiment Analysis Aspect Based Sentiment Analysis
124	Methods for measuring semantic similarity of texts Gaona, Miguel Angel Rios January 2014 (has links) Measuring semantic similarity is a task needed in many Natural Language Processing (NLP) applications. For example, in Machine Translation evaluation, semantic similarity is used to assess the quality of the machine translation output by measuring the degree of equivalence between a reference translation and the machine translation output. The problem of semantic similarity (Corley and Mihalcea, 2005) is de ned as measuring and recognising semantic relations between two texts. Semantic similarity covers di erent types of semantic relations, mainly bidirectional and directional. This thesis proposes new methods to address the limitations of existing work on both types of semantic relations. Recognising Textual Entailment (RTE) is a directional relation where a text T entails the hypothesis H (entailment pair) if the meaning of H can be inferred from the meaning of T (Dagan and Glickman, 2005; Dagan et al., 2013). Most of the RTE methods rely on machine learning algorithms. de Marne e et al. (2006) propose a multi-stage architecture where a rst stage determines an alignment between the T-H pairs to be followed by an entailment decision stage. A limitation of such approaches is that instead of recognising a non-entailment, an alignment that ts an optimisation criterion will be returned, but the alignment by itself is a poor predictor for iii non-entailment. We propose an RTE method following a multi-stage architecture, where both stages are based on semantic representations. Furthermore, instead of using simple similarity metrics to predict the entailment decision, we use a Markov Logic Network (MLN). The MLN is based on rich relational features extracted from the output of the predicate-argument alignment structures between T-H pairs. This MLN learns to reward pairs with similar predicates and similar arguments, and penalise pairs otherwise. The proposed methods show promising results. A source of errors was found to be the alignment step, which has low coverage. However, we show that when an alignment is found, the relational features improve the nal entailment decision. The task of Semantic Textual Similarity (STS) (Agirre et al., 2012) is de- ned as measuring the degree of bidirectional semantic equivalence between a pair of texts. The STS evaluation campaigns use datasets that consist of pairs of texts from NLP tasks such as Paraphrasing and Machine Translation evaluation. Methods for STS are commonly based on computing similarity metrics between the pair of sentences, where the similarity scores are used as features to train regression algorithms. Existing methods for STS achieve high performances over certain tasks, but poor results over others, particularly on unknown (surprise) tasks. Our solution to alleviate this unbalanced performances is to model STS in the context of Multi-task Learning using Gaussian Processes (MTL-GP) ( Alvarez et al., 2012) and state-of-the-art iv STS features ( Sari c et al., 2012). We show that the MTL-GP outperforms previous work on the same datasets. 006.3
125	Generation of referring expressions for an unknown audience Kutlák, Roman January 2014 (has links) When computers generate text, they have to consider how to describe the entities mentioned in the text. This situation becomes more difficult when the audience is unknown, as it is not clear what information is available to the addressees. This thesis investigates generation of descriptions in situations when an algorithm does not have a precise model of addressee's knowledge. This thesis starts with the collection and analysis of a corpus of descriptions of famous people. The analysis of the corpus revealed a number of useful patterns, which informed the remainder of this thesis. One of the difficult questions is how to choose information that helps addressees identify the described person. This thesis introduces a corpus-based method for determining which properties are more likely to be known by the addressees, and a probability-based method to identify properties that are distinguishing. One of the patterns observed in the collected corpus is the inclusion of multiple properties each of which uniquely identifies the referent. This thesis introduces a novel corpus-based method for determining how many properties to include in a description. Finally, a number of algorithms that leverage the findings of the corpus analysis and their computational implementation are proposed and tested in an evaluation involving human participants. The proposed algorithms outperformed the Incremental Algorithm in terms of numbers of correctly identified referents and in terms of providing a better mental image of the referent. The main contributions of this thesis are: (1) a corpus-based analysis of descriptions produced for an unknown audience; (2) a computational heuristic for estimating what information is likely to be known to addressees; and (3) algorithms that can generate referring expressions that benefit addressees without having an explicit model of addressee's knowledge. 004
126	Representations of spatial location in language processing Apel, Jens January 2010 (has links) The production or comprehension of linguistic information is often not an isolated task decoupled from the visual environment. Rather, people refer to objects or listen to other people describing objects around them. Previous studies have shown that in such situations people either fixate these objects, often multiple times (Cooper, 1974), or they attend to the objects much longer than is required for mere identification (Meyer, Sleiderink, & Levelt, 1998). Most interestingly, during comprehension people also attend to the location of objects even when those objects were removed (Altmann, 2004). The main focus of this thesis was to investigate the role of the spatial location of objects during language processing. The first part of the thesis tested whether attention to objects’ former locations facilitates language production and comprehension processes (Experiments 1-‐5). In two initial eye-‐tracking experiments, participants were instructed to name objects that either changed their positions (Experiment 1) or were withdrawn from the computer screen (Experiment 2) during language production. Production was impaired when speakers did not attend to the original position of the objects. Most interestingly, fixating an empty region in which an object was located resulted in faster articulation and initiation times. During the language comprehension tasks, participants were instructed to evaluate facts presented by talking heads appearing in different positions on the computer screen. During evaluation, the talking heads changed position (Experiment 3) or were withdrawn from the screen (Experiments 4-‐5). People showed a strong tendency to gaze at the centre of the screen and only moved towards the head’s former locations if the screen was empty and if evaluation was not preceded by an intervening task as tested in Experiment 5. Fixating the former location resulted in faster response time but not in better accuracy of evaluation. The second part of this thesis investigated the role of spatial location representations in reading (Experiments 6-‐7). Specifically, I examined to what extent people reading garden-‐path sentences regress to specific target words in order to reanalyse the sentences. The results of two eye-‐tracking experiments showed that readers do not target very precisely. A spatial representation is used, but it appears to be fairly coarse (i.e., only represents whether information is to the left or to the right of fixation). The findings from this thesis give us a clearer understanding of the influence of spatial location information on language processing. In language production particularly, it appears that spatial location is an integral part of the cognitive model and strongly connected with linguistic and visual representations. 153.7
127	Primary semantic type labeling in monologue discourse using a hierarchical classification approach Larson, Erik John 20 August 2010 (has links) The question of whether a machine can reproduce human intelligence is older than modern computation, but has received a great deal of attention since the first digital computers emerged decades ago. Language understanding, a hallmark of human intelligence, has been the focus of a great deal of work in Artificial Intelligence (AI). In 1950, mathematician Alan Turing proposed a kind of game, or test, to evaluate the intelligence of a machine by assessing its ability to understand written natural language. But nearly sixty years after Turing proposed his test of machine intelligence—pose questions to a machine and a person without seeing either, and try to determine which is the machine—no system has passed the Turing Test, and the question of whether a machine can understand natural language cannot yet be answered. The present investigation is, firstly, an attempt to advance the state of the art in natural language understanding by building a machine whose input is English natural language and whose output is a set of assertions that represent answers to certain questions posed about the content of the input. The machine we explore here, in other words, should pass a simplified version of the Turing Test and by doing so help clarify and expand on our understanding of the machine intelligence. Toward this goal, we explore a constraint framework for partial solutions to the Turing Test, propose a problem whose solution would constitute a significant advance in natural language processing, and design and implement a system adequate for addressing the problem proposed. The fully implemented system finds primary specific events and their locations in monologue discourse using a hierarchical classification approach, and as such provides answers to questions of central importance in the interpretation of discourse. / text Machine learning Hierarchical classification Natural language processing Discourse interpretation
128	SPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENT Swaminathan, Ranjini January 2011 (has links) Recent advances in technology have made it possible to offer educational content online in the form of e-learning systems. The Semantically Linked Instructional Content (SLIC) system, developed at The University of Arizona,is one such system that hosts educational and technical videos online.This dissertation proposes the integration of speech and language technologies with the SLIC system.Speech transcripts are being used increasingly in video browsing systems to help understand the video content better and to do search on the content with text queries. Transcripts are especially useful for people with disabilities and those who have a limited understanding of the language of the video. Automatic Speech Recognizers (ASRs) are commonly used to generate speech transcripts for videos but are not consistent in their performance. This issue is more pronounced in a system like SLIC due to the technical nature of talks with words not seen in the ASR vocabulary and many speakers with different voices and accents making recognition harder.The videos in SLIC come with presentation slides that contain words specific to the talk subject and the speech transcript itself can be considered to be composed of these slide words interspersed with other words. Furthermore, the errors in the transcript are words that sound similar to what was actually spoken; notes instead of nodes for example. The errors that occur due to misrecognized slide words can be fixed if we know which slide words were actually spoken and where they occur in the transcript. In other words, the slide words are matched or aligned with the transcript.In this dissertation two algorithms are developed to phonetically align transcript words with slide words based on a Hidden Markov Model and a Hybrid hidden semi-Markov model respectively. The slide words constitute the hidden states and the transcript words are the observed states in both models. The alignment algorithms are adapted for different applications such as transcript correction (as already mentioned), search and indexing, video segmentation and closed captioning. Results from experiments conducted show that the corrected transcripts have improved accuracy andyield better search results for slide word queries. machine learning multimedia Computer Science education technology language processing
129	Mental representation and processing of syntactic structure : evidence from Chinese Cai, Zhenguang January 2011 (has links) From the perspective of cognitive psychology, our knowledge of language can be viewed as mental representations and our use of language can be understood as the computation or processing of mental representations. This thesis explores the mental representation and processing of syntactic structure. The method used in this thesis is structural priming, a phenomenon in which people tend to repeat the linguistic structure that they have recently processed. The language under investigation is Chinese. The main research theme is divided up into four different questions. The first question is how syntactic structure is mentally represented. For a long time this has been a question for syntacticians whose main evidence is their intuition. There are, however, recent calls for experimental methods in the investigation of syntactic representation. I propose that structural priming can be used as an experimental approach to the investigation of syntactic representation. More specifically, structural priming can illuminate the constituent structure of a syntactic construction and help us determine which syntactic analysis corresponds to the representation of the construction. Three structural priming experiments on some controversial constructions in Mandarin were reported to show that structural priming can be used to distinguish alternative analyses of a syntactic construction. The second question concerns the use of thematic and lexical information in grammatical encoding in sentence production. Models of grammatical encoding differ in the locus of conceptual effects on grammatical encoding and the extent to which grammatical encoding is lexically guided. Five experiments were reported on these two issues. First, the results indicate that thematic information affects grammatical encoding by prompting the processor map thematic roles onto the same linear order as they were previously mapped. Though conceptual information was previously believed to only affect the assignment of grammatical functions (e.g., subject and object) to nouns (i.e., functional processing), this finding suggests that it can influence the linear order of sentence constituents (i.e., positional processing) as well. The results also show that the processor persists in using the same argument structure of the verb, implying that grammatical encoding is lexically guided to some extent. The third question concerns the processing of verb-phrase (VP) ellipsis in comprehension. Previous research on this topic disagrees on whether the interpretation of VP ellipsis is based over the syntactic or semantic representation of the antecedent and whether the antecedent representation is copied or reconstructed at the ellipsis site. An experiment was presented and the results show no structural priming effect from the ellipsis site. This suggests that no syntactic structure is reconstructed at the ellipsis and possibly no copying of the antecedent structure either. The results then favour a semantic account of VP ellipsis processing. The last question concerns the lexico-syntactic representation of cognates in Cantonese-Mandarin bilinguals. Previous research has paid little attention as to whether cognates have shared or distinct lemmas in bilinguals. Two experiments show that the structural priming effect from the cognate of a verb was smaller than from the verb itself, suggesting that Cantonese/Mandarin cognates have distinct rather than shared lemmas, though the syntactic information associated with cognates is collectively represented across the two languages. At the end of the thesis, I discussed the implications of these empirical studies and directions of further research. 150.724
130	Personality and alignment processes in dialogue : towards a lexically-based unified model Brockmann, Carsten January 2009 (has links) This thesis explores approaches to modelling individual differences in language use. The differences under consideration fall into two broad categories: Variation of the personality projected through language, and modelling of language alignment behaviour between dialogue partners. In a way, these two aspects oppose each other – language related to varying personalities should be recognisably different, while aligning speakers agree on common language during a dialogue. The central hypothesis is that such variation can be captured and produced with restricted computational means. Results from research on personality psychology and psycholinguistics are transformed into a series of lexically-based Affective Language Production Models (ALPMs) which are parameterisable for personality and alignment. The models are then explored by varying the parameters and observing the language they generate. ALPM-1 and ALPM-2 re-generate dialogues from existing utterances which are ranked and filtered according to manually selected linguistic and psycholinguistic features that were found to be related to personality. ALPM-3 is based on true overgeneration of paraphrases from semantic representations using the OPENCCG framework for Combinatory Categorial Grammar (CCG), in combination with corpus-based ranking and filtering by way of n-gram language models. Personality effects are achieved through language models built from the language of speakers of known personality. In ALPM-4, alignment is captured via a cache language model that remembers the previous utterance and thus influences the choice of the next. This model provides a unified treatment of personality and alignment processes in dialogue. In order to evaluate the ALPMs, dialogues between computer characters were generated and presented to human judges who were asked to assess the characters’ personality. In further internal simulations, cache language models were used to reproduce results of psycholinguistic priming studies. The experiments showed that the models are capable of producing natural language dialogue which exhibits human-like personality and alignment effects. 410

Search results