Global ETD Search

21	Word Sense Disambiguation Using WordNet and Conceptual Expansion Guo, Jian-Yi 24 January 2006 (has links) As a single English word can have several different meanings, a single meaning can be expressed by several different English words. The meaning of a word depends on the sense intended. Thus to select the most appropriate meaning for an ambiguous word within a context is a critical problem for the applications using the technologies of natural language processing. However, at present, most word sense disambiguation methods either disambiguate only restricted parts of speech words such as only nouns or the accuracy in disambiguating word senses is not satisfiable. The ambiguous situation often bothers users. In this study, a new word sense disambiguation method using WordNet lexicon database, SemCor text files, and the Web is presented. In addition to nouns, the proposed method also attempts to disambiguate verbs, adjectives, and adverbs in sentences. The text files and sentences investigated in the experiments were randomly selected from SemCor. The semantic similarity between the senses of individually semantically ambiguous words in a word pair is measured to select the applicable candidate senses of a target word in that word pair. By a synonym weighting method, the possible sense diversity in synonym sets is considered based on the synonym sets WordNet provides. Thus corresponding synonym sets of the candidate senses are determined. The candidate senses expanded with the senses in the corresponding synonym sets, and enhanced by the context window technique form new queries. After the new queries are submitted to a search engine to search for the matching documents on the Web, the candidate senses are ranked by the number of the matching documents found. The first sense in the list of the ranked candidate senses is viewed as the most appropriate sense of the target word. The proposed method as well as Stetina et al.¡¦s and Mihalcea et al.¡¦s methods are evaluated based on the SemCor text files. The experimental results show that for the top sense selected this method having the average accuracy of disambiguating word senses with 81.3% for nouns, verbs, adjectives, and adverbs is slightly better than Stetina et al.¡¦s method of 80% and Mihalcea et al.¡¦s method of 80.1%. Furthermore, the proposed method is the only method with the accuracy of disambiguating word senses for verbs achieving 70% for the top one sense selected. Moreover, for the top three senses selected this method is superior to the other two methods by an average accuracy of the four parts of speech exceeding 96%. It is expected that the proposed method can improve the performance of the word sense disambiguation applications in machine translation, document classification, or information retrieval. Natural Language Processing SemCor Word Sense Disambiguation WordNet
22	Disambiguating Multiple Links in Historical Record Linkage Richards, Laura 30 August 2013 (has links) Historians and social scientists are very interested in longitudinal data created from historical sources as the longitudinal data creates opportunities for studying people’s lives over time. However, its generation is a challenging problem since historical sources do not have personal identifiers. At the University of Guelph, the People-in-Motion group have currently constructed a record linkage system to link the 1871 Canadian census to the 1881 Canadian census. In this thesis, we discuss one aspect of linking historical census data, the problem of disambiguating multiple links that are created at the linkage step. We show that the disambiguating techniques explored in this thesis improve upon the linkage rate of the People-in-Motion’s system, while maintaining a false positive rate no greater than 5%. historical record linkage record linkage canadian census disambiguation
23	Closing the gap in WSD : supervised results with unsupervised methods Brody, Samuel January 2009 (has links) Word-Sense Disambiguation (WSD), holds promise for many NLP applications requiring broad-coverage language understanding, such as summarization (Barzilay and Elhadad, 1997) and question answering (Ramakrishnan et al., 2003). Recent studies have also shown that WSD can benefit machine translation (Vickrey et al., 2005) and information retrieval (Stokoe, 2005). Much work has focused on the computational treatment of sense ambiguity, primarily using data-driven methods. The most accurate WSD systems to date are supervised and rely on the availability of sense-labeled training data. This restriction poses a significant barrier to widespread use of WSD in practice, since such data is extremely expensive to acquire for new languages and domains. Unsupervised WSD holds the key to enable such application, as it does not require sense-labeled data. However, unsupervised methods fall far behind supervised ones in terms of accuracy and ease of use. In this thesis we explore the reasons for this, and present solutions to remedy this situation. We hypothesize that one of the main problems with unsupervised WSD is its lack of a standard formulation and general purpose tools common to supervised methods. As a first step, we examine existing approaches to unsupervised WSD, with the aim of detecting independent principles that can be utilized in a general framework. We investigate ways of leveraging the diversity of existing methods, using ensembles, a common tool in the supervised learning framework. This approach allows us to achieve accuracy beyond that of the individual methods, without need for extensive modification of the underlying systems. Our examination of existing unsupervised approaches highlights the importance of using the predominant sense in case of uncertainty, and the effectiveness of statistical similarity methods as a tool for WSD. However, it also serves to emphasize the need for a way to merge and combine learning elements, and the potential of a supervised-style approach to the problem. Relying on existing methods does not take full advantage of the insights gained from the supervised framework. We therefore present an unsupervised WSD system which circumvents the question of actual disambiguation method, which is the main source of discrepancy in unsupervised WSD, and deals directly with the data. Our method uses statistical and semantic similarity measures to produce labeled training data in a completely unsupervised fashion. This allows the training and use of any standard supervised classifier for the actual disambiguation. Classifiers trained with our method significantly outperform those using other methods of data generation, and represent a big step in bridging the accuracy gap between supervised and unsupervised methods. Finally, we address a major drawback of classical unsupervised systems – their reliance on a fixed sense inventory and lexical resources. This dependence represents a substantial setback for unsupervised methods in cases where such resources are unavailable. Unfortunately, these are exactly the areas in which unsupervised methods are most needed. Unsupervised sense-discrimination, which does not share those restrictions, presents a promising solution to the problem. We therefore develop an unsupervised sense discrimination system. We base our system on a well-studied probabilistic generative model, Latent Dirichlet Allocation (Blei et al., 2003), which has many of the advantages of supervised frameworks. The model’s probabilistic nature lends itself to easy combination and extension, and its generative aspect is well suited to linguistic tasks. Our model achieves state-of-the-art performance on the unsupervised sense induction task, while remaining independent of any fixed sense inventory, and thus represents a fully unsupervised, general purpose, WSD tool. 005.1
24	Towards the Development of an Automatic Diacritizer for the Persian Orthography based on the Xerox Finite State Transducer Nojoumian, Peyman 12 August 2011 (has links) Due to the lack of short vowels or diacritics in Persian orthography, many Natural Language Processing applications for this language, including information retrieval, machine translation, text-to-speech, and automatic speech recognition systems need to disambiguate the input first, in order to be able to do further processing. In machine translation, for example, the whole text should be correctly diacritized first so that the correct words, parts of speech and meanings are matched and retrieved from the lexicon. This is primarily because of Persian’s ambiguous orthography. In fact, the core engine of any Persian language processor should utilize a diacritizer and a lexical disambiguator. This dissertation describes the design and implementation of an automatic diacritizer for Persian based on the state-of-the-art Finite State Transducer technology developed at Xerox by Beesley & Karttunen (2003). The result of morphological analysis and generation on a test corpus is shown, including the insertion of diacritics. This study will also look at issues that are raised by phonological and semantic ambiguities as a result of short vowels in Persian being absent in the writing system. It suggests a hybrid model (rule-based & inductive) that is inspired by psycholinguistic experiments on the human mental lexicon for the disambiguation of heterophonic homographs in Persian using frequency and collocation information. A syntactic parser can be developed based on the proposed model to discover Ezafe (the linking short vowel /e/ within a noun phrase) or disambiguate homographs, but its implementation is left for future work. Persian Persian computational linguistics diacritizer morphological analyzer heterophonic homograph disambiguation
25	Privacy policy-based framework for privacy disambiguation in distributed systems Alhalafi, Dhafer January 2015 (has links) With an increase in the pervasiveness of distributed systems, now and into the future, there will be an increasing concern for the privacy of users in a world where almost everyone will be connected to the internet through numerous devices. Current ways of considering privacy in distributed system development are based on the idea of protecting personally-identifiable information such as name and national insurance number, however, with the abundance of distributed systems it is becoming easier to identify people through information that is not personally-identifiable, thus increasing privacy concerns. As a result ideas about privacy have changed and should be reconsidered towards the development of distributed systems. This requires a new way to conceptualise privacy. In spite of active effort on handling the privacy and security worries throughout the initial periods of plan of distributed systems, there has not been much work on creating a reliable and meaningful contribution towards stipulating and scheming a privacy policy framework. Beside developing and fully understanding how the earliest stage of this work is been carried out, the procedure for privacy policy development risks marginalising stakeholders, and therefore defeating the object of what such policies are designed to do. The study proposes a new Privacy Policy Framework (PPF) which is based on a combination of a new method for disambiguating the meaning of privacy from users, owners and developers of distributed systems with distributed system architecture and technical considerations. Towards development of the PPF semi-structured interviews and questionnaires were conducted to determine the current situation regards privacy policy and technical considerations, these methods were also employed to demonstrate the application and evaluation of the PPF itself. The study contributes a new understanding and approach to the consideration of privacy in distributed systems and a practical approach to achieving user privacy and privacy disambiguation through the development of a privacy button concept. 005.8
26	Towards the Development of an Automatic Diacritizer for the Persian Orthography based on the Xerox Finite State Transducer Nojoumian, Peyman January 2011 (has links) Due to the lack of short vowels or diacritics in Persian orthography, many Natural Language Processing applications for this language, including information retrieval, machine translation, text-to-speech, and automatic speech recognition systems need to disambiguate the input first, in order to be able to do further processing. In machine translation, for example, the whole text should be correctly diacritized first so that the correct words, parts of speech and meanings are matched and retrieved from the lexicon. This is primarily because of Persian’s ambiguous orthography. In fact, the core engine of any Persian language processor should utilize a diacritizer and a lexical disambiguator. This dissertation describes the design and implementation of an automatic diacritizer for Persian based on the state-of-the-art Finite State Transducer technology developed at Xerox by Beesley & Karttunen (2003). The result of morphological analysis and generation on a test corpus is shown, including the insertion of diacritics. This study will also look at issues that are raised by phonological and semantic ambiguities as a result of short vowels in Persian being absent in the writing system. It suggests a hybrid model (rule-based & inductive) that is inspired by psycholinguistic experiments on the human mental lexicon for the disambiguation of heterophonic homographs in Persian using frequency and collocation information. A syntactic parser can be developed based on the proposed model to discover Ezafe (the linking short vowel /e/ within a noun phrase) or disambiguate homographs, but its implementation is left for future work. Persian Persian computational linguistics diacritizer morphological analyzer heterophonic homograph disambiguation
27	Examining concepts of author disambiguation: co-authorship as a disambiguation feature in EconBiz Wiechmann, Swantje 13 April 2022 (has links) Name ambiguity of authors is a long-standing challenge in Digital Libraries. Simple string searches for authors often have unsatisfactory results: publications by the author in which their name is written differently cannot be found, and publications by other authors of the same name get included. The authors can be distinguished with the use of persistent identifers. This improves the search function and also contributes to the data linking process. But many catalogs of libraries are not fully disambiguated and it is not feasible for big libraries like the ZBW - Leibniz Information Centre for Economics to disambiguate the bibliographic records manually. The goal of this work is to help the ZBW with their disambiguation task. For this purpose I analysed the datasets of the ZBW and identifed the challenges that need to be overcome. I proposed an approach which uses already disambiguated records to assign persistent identifer to ambiguous author references. The approach could disambiguate ambiguous author references, though the quality of the method needs to be further evaluated.:List of Figures List of Tables Listings List of Algorithms 1 Introduction 1.1 Aim of the work 1.2 Structure of the thesis 2 Author disambiguation 2.1 Formal defnition 2.2 Types of disambiguation methods 2.3 Types of disambiguation features 2.4 Co-authorship as a disambiguation feature 3 The datasets of the ZBW 3.1 Methodology 3.2 Analysing the Econis data 3.3 Disambiguating names from other sources 4 Current approaches 4.1 Solving the common name problem 4.2 An incremental approach 4.3 Approaches in related fields 5 Proposal of an application-relevant approach 5.1 Disambiguation approach 5.2 Test datasets 5.3 Evaluation of the results 5.3.1 Partly disambiguated bibliographic records 5.3.2 Ambiguous bibliographic records with two or more authors 5.3.3 Ambiguous bibliograpic records with one author 6 Conclusion A Full figures B Extended tables C Full results of the tests of the proposed method C.1 Partly disambiguated bibliographic records C.2 Ambiguous bibliographic records Bibliography
28	High resolution fMRI of hippocampal subfields and medial temporal cortex during working memory Newmark, Randall 22 January 2016 (has links) Computational models combined with electrophysiological studies have informed our understanding about the role of hippocampal subfields (dentate gyrus, DG; CA subfields, subiculum) and Medial Temporal Lobe (MTL) cortex (entorhinal, perirhinal, parahippocampal cortices) during working memory (WM) tasks. Only recently have functional neuroimaging studies begun to examine under which conditions the MTL are recruited for WM processing in humans, but subfield contributions have not been examined in the WM context. High-resolution fMRI is well suited to test hypotheses regarding the recruitment of MTL subregions and hippocampal subfields. This dissertation describes three experiments using high-resolution fMRI to examine the role of hippocampal subfields and MTL structures in humans during WM. Experiment 1 investigated MTL activity when participants performed a task that required encoding and maintaining overlapping and non-overlapping stimulus pairs during WM. During encoding, activity in CA3/DG and CA1 was greater for stimulus pairs with overlapping features. During delay, activity in CA1 and entorhinal cortex was greater for overlapping stimuli. These results indicate that CA3/DG and CA1 support disambiguating overlapping representations while CA1 and entorhinal cortex maintain these overlapping items. Experiment 2 investigated MTL activity when participants performed a WM task that required encoding and maintaining either low or high WM loads. The results show a load effect in entorhinal and perirhinal cortex during the delay period and suggest that these regions act as a buffer for WM by actively maintaining novel information in a capacity-dependent manner. Experiment 3 investigated MTL activity when participants performed a WM task that required maintaining similar and dissimilar items at different loads. Analysis of a load by similarity interaction effect revealed areas of activity localized to the CA1 subfield. CA1 showed greater activity for higher WM loads for dissimilar, but not similar stimuli. Our findings help identify hippocampal and MTL regions that contribute to disambiguation in a WM context and regions that are active in a capacity-dependent manner which may support long-term memory formation. These results help inform our understanding of the contributions of hippocampal subfields and MTL subregions during WM and help translate findings from animal work to the cognitive domain of WM in humans. Neurosciences CA1 Dentate gyrus Disambiguation Encoding Entorhinal cortex Maintenance
29	Translation of keywords between English and Swedish / Översättning av nyckelord mellan engelska och svenska Ahmady, Tobias, Klein Rosmar, Sander January 2014 (has links) In this project, we have investigated how to perform rule-based machine translation of sets of keywords between two languages. The goal was to translate an input set, which contains one or more keywords in a source language, to a corresponding set of keywords, with the same number of elements, in the target language. However, some words in the source language may have several senses and may be translated to several, or no, words in the target language. If ambiguous translations occur, the best translation of the keyword should be chosen with respect to the context. In traditional machine translation, a word's context is determined by a phrase or sentences where the word occurs. In this project, the set of keywords represents the context. By investigating traditional approaches to machine translation (MT), we designed and described models for the specific purpose of keyword- translation. We have proposed a solution, based on direct translation for translating keywords between English and Swedish. In the proposed solu- tion, we also introduced a simple graph-based model for solving ambigu- ous translations. / I detta projekt har vi undersökt hur man utför regelbaserad maskinöver- sättning av nyckelord mellan två språk. Målet var att översätta en given mängd med ett eller flera nyckelord på ett källspråk till en motsvarande, lika stor mängd nyckelord på målspråket. Vissa ord i källspråket kan dock ha flera betydelser och kan översättas till flera, eller inga, ord på målsprå- ket. Om tvetydiga översättningar uppstår ska nyckelordets bästa över- sättning väljas med hänsyn till sammanhanget. I traditionell maskinö- versättning bestäms ett ords sammanhang av frasen eller meningen som det befinner sig i. I det här projektet representerar den givna mängden nyckelord sammanhanget. Genom att undersöka traditionella tillvägagångssätt för maskinöversätt- ning har vi designat och beskrivit modeller specifikt för översättning av nyckelord. Vi har presenterat en direkt maskinöversättningslösning av nyckelord mellan engelska och svenska där vi introducerat en enkel graf- baserad modell för tvetydiga översättningar. machine translation MT rule-based machine translation RBMT word sense disambiguation WSD translation disambiguation translation keyword translation maskinöversättning översättning tvetydiga översättningar disambiguering
30	Adaptive Semantic Annotation of Entity and Concept Mentions in Text Mendes, Pablo N. 05 June 2014 (has links) No description available. Computer Science semantic annotation semantic tagging named entity recognition name resolution entity disambiguation entity linking keyphrase extraction word sense disambiguation entity classification entity extraction adaptive flexible

Search results