• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 16
  • 9
  • 6
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 116
  • 61
  • 61
  • 41
  • 37
  • 30
  • 30
  • 28
  • 26
  • 22
  • 20
  • 18
  • 17
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Five English Verbs : A Comparison between Dictionary meanings and Meanings in Corpus collocations

Sörensen, Susanne January 2006 (has links)
In Norstedts Comprehensive English-Swedish Dictionary (2000) it is said that the numbered list of senses under each headword is frequency ordered. Thus, the aim of this study is to see whether this frequency order of senses agrees with the frequencies appearing in the British National Corpus (BNC). Five English, polysemous verbs were studied. For each verb, a simple search in the corpus was carried out, displaying 50 random occurrences. Each collocate was encoded with the most compatible sense from the numbered list of senses in the dictionary. The encoded tokens were compiled and listed in frequency order. This list was compared to the dictionary's list of senses. Only two of the verbs reached agreement between the highest ranked dictionary sense and the most frequent sense in the BNC simple search. None of the verbs' dictionary orders agreed completely with the emerged frequency order of the corpus occurrences, why complementary collocational learning is advocated.
102

La géométrie du sens : la polysémie verbale en question / The geometry of meaning : verbal polysemy in question

Sendi, Monia 18 December 2015 (has links)
Notre étude porte sur la notion de géométrie de sens. Ainsi, l’étude de la notion de verbe occupe l’axe principal de notre thèse. Cette notion est connue par sa complexité. Cela s’explique par la forte polysémie des verbes français. Nous avons focalisé notre étude sur l’analyse syntactico-sémantique des verbes « monter» et « passer ». La notion de polysémie, malgré sa grande importance, reste toujours très difficile à formaliser. Nous avons tenté, dans cette étude, d’utiliser trois dictionnaires électroniques pour désambiguïser les verbes « monter » et « passer ». Ce travail permet de rendre compte de l’influence de la syntaxe et du lexique sur les sens de ces deux verbes. Dans notre démarche, nous avons utilisé la méthode de désambiguïsation automatique de B. Victorri qui a pour spécificité l’analyse des unités lexicales polysémiques. Ce modèle se base sur des théories linguistiques et il exploite les mathématiques et l’informatique dans l’objectif de bien décrire et de résoudre le problème de la polysémie verbale. Donc, notre travail est pluridisciplinaire. C’est là où l’informatique et les mathématiques sont au service de l’analyse des langues naturelles. / Our study focuses on the concept of geometry of sense. the study of the concept of verb occupies the main axis of our thesis. This concept is known for its complexity. This is explained by the polysemy of French verbs. We focused our study on the syntactic-semantic analysis of the two verbs "to climb " and "to pass". Indeed, the multiplicity of use involves the notion of verbal polysemy. This notion despite its importance is still very difficult to formulate. We have tried in this study using three electronic dictionaries to disambiguate the verbs "to climb " and "to pass". This work allows us to account for the influence of the syntax and vocabulary of the senses of these two verbs. We explained the opportunity to disambiguate a polysemous verb not by recourse to a list of synonyms but by a set of meanings in specific syntactic constructions. In our approach, we used an automatic method of disambiguation B Victorri that has specificity for the analysis of polysemous lexical units. We found that there is reconciliation between the theoretical analysis and the analysis given by this method. This model is based on linguistic theories and operates mathematics and computer with the aim to clearly describe and solve the problem of verbal polysemy. So our work is multidisciplinary. This is where computer science and mathematics is at the service of the analysis of natural languages.
103

Knowledge Extraction for Hybrid Question Answering

Usbeck, Ricardo 18 May 2017 (has links)
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.
104

Extraktion geographischer Entitäten zur Suche nutzergenerierter Inhalte für Nachrichtenereignisse

Katz, Philipp 22 October 2014 (has links)
Der Einfluss sogenannter nutzergenerierter Inhalte im Web hat in den letzten Jahren stetig zugenommen. Auf Plattformen wie Blogs, sozialen Netzwerken oder Medienportalen werden durch Anwender kontinuierlich Textnachrichten, Bilder oder Videos publiziert. Auch Inhalte, die aktuelle gesellschaftliche Ereignisse, wie beispielsweise den Euromaidan in Kiew dokumentieren, werden durch diese Plattformen verbreitet. Nutzergenerierte Inhalte bieten folglich das Potential, zusätzliche Hintergrundinformationen über Ereignisse direkt vom Ort des Geschehens zu liefern. Diese Arbeit verfolgt die Vision einer Nachrichtenplattform, die unter Verwendung von Methoden des Information Retrievals und der Informationsextraktion Nachrichtenereignisse erkennt, diese automatisiert mit relevanten nutzergenerierten Inhalten anreichert und dem Leser präsentiert. Zur Suche nutzergenerierter Inhalte kommen in dieser Arbeit maßgeblich geographische Entitäten, also Ortsbezeichnungen zum Einsatz. Für die Extraktion dieser Entitäten aus gegebenen Nachrichtendokumenten stellt die Arbeit verschiedene neue Methoden vor. Die Entitäten werden genutzt, um zielgerichtete Suchanfragen zu erzeugen. Es wird gezeigt, dass sich eine geounterstützte Suche für das Auffinden nutzergenerierter Inhalte besser eignet als eine konventionelle schlüsselwortbasierte Suche.
105

Word-sense disambiguation in biomedical ontologies

Alexopoulou, Dimitra 11 June 2010 (has links)
With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vector machines) and unsupervised machine learning (context-clustering, word-clustering, co-occurrence graphs) have been developed. Knowledge-based methods that make use of the WordNet computational lexicon have also been developed. But only few make use of ontologies, i.e. hierarchical controlled vocabularies, to solve the problem and none exploit inference over ontologies and the use of metadata from publications. This thesis addresses the WSD problem in biomedical ontologies by suggesting different approaches for word sense disambiguation that use ontologies and metadata. The "Closest Sense" method assumes that the ontology defines multiple senses of the term; it computes the shortest path of co-occurring terms in the document to one of these senses. The "Term Cooc" method defines a log-odds ratio for co-occurring terms including inferred co-occurrences. The "MetaData" approach trains a classifier on metadata; it does not require any ontology, but requires training data, which the other methods do not. These approaches are compared to each other when applied to a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The MetaData approach performs best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The Term Cooc approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The Closest Sense approach achieves on average 80% success rate. Furthermore, the thesis showcases applications ranging from ontology design to semantic search where WSD is important.
106

Zjednoznačňování slovních významů / Word Sense Disambiguation

Kraus, Michal January 2008 (has links)
The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described.  Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis.
107

A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet

Wijeratne, Sanjaya January 2018 (has links)
No description available.
108

Improving Artist Content Matching with Stacking : A comparison of meta-level learners for stacked generalization

Magnússon, Fannar January 2018 (has links)
Using automatic methods to assign incoming tracks and albums from multiple sources to artists entities in a digital rights management company, where no universal artist identifier is available and artist names can be ambiguous, is a challenging problem. In this work we propose to use stacked generalization to combine the predictions of heterogeneous classifiers for an improved quality of artist content matching on two datasets from a digital rights management company. We compare the performance of using a nonlinear meta-level learner to a linear meta-level learner for the stacked generalization on the two datasets, as well as on eight additional datasets to see how well our results general- ize. We conduct experiments and evaluate how the different meta-level learners perform, using the base learners’ class probabilities or a combination of the base learners’ class probabilities and original input features as meta-features. Our results indicate that stacking with a non-linear meta-level learner can improve predictions on the artist chooser problem. Furthermore, our results indicate that when using a linear meta-level learner for stacked generalization, using the base learners’ class probabilities as metafeatures works best, while using a combination of the base learners’ class probabilities and the original input features as meta-features works best when using a non-linear metalevel learner. Among all the evaluated stacking approaches, stacking with a non-linear meta-level learner, using a combination of the base learners’ class probabilities and the original input features as meta-features, performs the best in our experiments over the ten evaluation datasets. / Att använda automatiska metoder för att tilldela spår och album från olika källor till artister i en digital underhållningstjänst är problematiskt då det inte finns några universellt använda identifierare för artister och namn på artister kan vara tvetydiga. I det här verket föreslår vi en användning av staplad generalisering för att kombinera förutsägningar från heterogena klassificerare för förbättra artistmatchningen i två datamäng från en digital underhållningstjänst. Vi jämför prestandan mellan en linjär och en icke-linjär metainlärningsmetod för den staplade generaliseringen av de två datamängder, samt även åtta ytterligare datamäng för att se hur resultaten kan generaliseras. Vi utför experiment och utvärderar hur de olika metainlärningsmetoderna presterar genom att använda basinlärningsmetodens klassannolikheter eller en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation. Våra resultat indikerar att staplandet med en icke-linjär metainlärningsmetod kan förbättra förutsägningarna i problemet med att tilldela artister. Vidare indikerar våra resultat att när man använder en linjär metainlärningsmetod för en staplad generalisering är det bäst att använda basinlärningsmetodens klassannolikheter som metarepresentation, medan när man använder en icke-linjär metainlärningsmetod för en staplade generaliseringen är det bäst att använda en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation. Av alla utvärderade sätt att stapla är staplandet med en icke-linjär metainlärningsmetod med en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation den ansats som presterar bäst i våra experiment över de tio datamängderna.
109

電腦輔助克漏詞多選題出題系統之研究 / A Study on Computer Aided Generation of Multiple-Choice Cloze Items

王俊弘, Wang , Chun-Hung Unknown Date (has links)
多選題測驗試題已證明能有效地評估學生的學習成效,然而,以人為方式建立題庫是一件耗時費力的工作。藉由電腦高速運算的能力,電腦輔助產生試題系統能有效率地建置大規模的題庫,同時減少人為的干預而得以保持試題的隱密性。受惠於網路上充裕的文字資源,本研究發展一套克漏詞試題出題系統,利用既有的語料自動產生涵蓋各種不同主題的克漏詞試題。藉由分析歷屆大學入學考試的資料,系統可產生類似難度的模擬試題,並且得到出題人員在遴選測驗標的方面的規律性。在產生試題的過程中導入詞義辨析的演算法,利用詞典與selectional preference模型的輔助,分析句子中特定詞彙的語義,以擷取包含測驗編撰者所要測驗的詞義的句子,並以collocation為基礎的方法篩選誘答選項。實驗結果顯示系統可在每產生1.6道試題中,得到1道可用的試題。我們嘗試產生不同類型的試題,並將這套系統融入網路線上英文測驗的環境中,依學生的作答情形分析試題的鑑別度。 / Multiple-choice tests have proved to be an efficient tool for measuring students’ achievement. Manually constructing tests items, however, is a time- consuming and labor-intensive task. Harnessing the computing power of computers, computer-assisted item generation offers the possibility of creating large amount of items, thereby alleviating the problem of keeping the items secure. With the abundant text resource on the Web, this study develops a system capable of generating cloze items that cover a wide range of topics based on existing corpra. By analyzing training data from the College Entrance Examinations in Taiwan, we identify special regularities of the test items, and our system can generate items of similar style based on results of the analysis. We propose a word sense disambiguation-based method for locating sentences in which designated words carry specific senses, and apply collocation-based methods for selecting distractors. Experimental results indicate that our system was able to produce a usable item for every 1.6 items it returned. We try to create different types of items and integrate the reported item generator in a Web-based system for learning English. The outcome of on-line examinations is analyzed in order to estimate the item discrimination of the test items generated by our system.
110

Measuring Semantic Distance using Distributional Profiles of Concepts

Mohammad, Saif 01 August 2008 (has links)
Semantic distance is a measure of how close or distant in meaning two units of language are. A large number of important natural language problems, including machine translation and word sense disambiguation, can be viewed as semantic distance problems. The two dominant approaches to estimating semantic distance are the WordNet-based semantic measures and the corpus-based distributional measures. In this thesis, I compare them, both qualitatively and quantitatively, and identify the limitations of each. This thesis argues that estimating semantic distance is essentially a property of concepts (rather than words) and that two concepts are semantically close if they occur in similar contexts. Instead of identifying the co-occurrence (distributional) profiles of words (distributional hypothesis), I argue that distributional profiles of concepts (DPCs) can be used to infer the semantic properties of concepts and indeed to estimate semantic distance more accurately. I propose a new hybrid approach to calculating semantic distance that combines corpus statistics and a published thesaurus (Macquarie Thesaurus). The algorithm determines estimates of the DPCs using the categories in the thesaurus as very coarse concepts and, notably, without requiring any sense-annotated data. Even though the use of only about 1000 concepts to represent the vocabulary of a language seems drastic, I show that the method achieves results better than the state-of-the-art in a number of natural language tasks. I show how cross-lingual DPCs can be created by combining text in one language with a thesaurus from another. Using these cross-lingual DPCs, we can solve problems in one, possibly resource-poor, language using a knowledge source from another, possibly resource-rich, language. I show that the approach is also useful in tasks that inherently involve two or more languages, such as machine translation and multilingual text summarization. The proposed approach is computationally inexpensive, it can estimate both semantic relatedness and semantic similarity, and it can be applied to all parts of speech. Extensive experiments on ranking word pairs as per semantic distance, real-word spelling correction, solving Reader's Digest word choice problems, determining word sense dominance, word sense disambiguation, and word translation show that the new approach is markedly superior to previous ones.

Page generated in 0.0791 seconds