• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 2
  • 1
  • 1
  • Tagged with
  • 15
  • 15
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Graph-Based Keyphrase Extraction Using Wikipedia

Dandala, Bharath 12 1900 (has links)
Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.
2

Using the SKOS Model for Standardizing Semantic Similarity and Relatedness Measures for Ontological Terminologies

Arockiasamy, Savarimuthu 14 August 2009 (has links)
No description available.
3

Role of semantic indexing for text classification

Sani, Sadiq January 2014 (has links)
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, failure to take into account the semantic relatedness between terms means that document similarity is not properly captured in the VSM. To address this problem, semantic indexing approaches have been proposed for modelling the semantic relatedness between terms in document representations. Accordingly, in this thesis, we empirically review the impact of semantic indexing on text classification. This empirical review allows us to answer one important question: how beneficial is semantic indexing to text classification performance. We also carry out a detailed analysis of the semantic indexing process which allows us to identify reasons why semantic indexing may lead to poor text classification performance. Based on our findings, we propose a semantic indexing framework called Relevance Weighted Semantic Indexing (RWSI) that addresses the limitations identified in our analysis. RWSI uses relevance weights of terms to improve the semantic indexing of documents. A second problem with the VSM is the lack of supervision in the process of creating document representations. This arises from the fact that the VSM was originally designed for unsupervised document retrieval. An important feature of effective document representations is the ability to discriminate between relevant and non-relevant documents. For text classification, relevance information is explicitly available in the form of document class labels. Thus, more effective document vectors can be derived in a supervised manner by taking advantage of available class knowledge. Accordingly, we investigate approaches for utilising class knowledge for supervised indexing of documents. Firstly, we demonstrate how the RWSI framework can be utilised for assigning supervised weights to terms for supervised document indexing. Secondly, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. A further limitation of the standard VSM is that an indexing vocabulary that consists only of terms from the document collection is used for document representation. This is based on the assumption that terms alone are sufficient to model the meaning of text documents. However for certain classification tasks, terms are insufficient to adequately model the semantics needed for accurate document classification. A solution is to index documents using semantically rich concepts. Accordingly, we present an event extraction framework called Rule-Based Event Extractor (RUBEE) for identifying and utilising event information for concept-based indexing of incident reports. We also demonstrate how certain attributes of these events e.g. negation, can be taken into consideration to distinguish between documents that describe the occurrence of an event, and those that mention the non-occurrence of that event.
4

Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical Science

January 2016 (has links)
abstract: Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2016
5

Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

Hassan, Samer 08 1900 (has links)
While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.
6

Deriving A Better Metric To Assess theQuality of Word Embeddings Trained OnLimited Specialized Corpora

Munbodh, Mrinal January 2020 (has links)
No description available.
7

Semantic Methods for Intelligent Distributed Design Environments

Witherell, Paul W. 01 September 2009 (has links)
Continuous advancements in technology have led to increasingly comprehensive and distributed product development processes while in pursuit of improved products at reduced costs. Information associated with these products is ever changing, and structured frameworks have become integral to managing such fluid information. Ontologies and the Semantic Web have emerged as key alternatives for capturing product knowledge in both a human-readable and computable manner. The primary and conclusive focus of this research is to characterize relationships formed within methodically developed distributed design knowledge frameworks to ultimately provide a pervasive real-time awareness in distributed design processes. Utilizing formal logics in the form of the Semantic Web’s OWL and SWRL, causal relationships are expressed to guide and facilitate knowledge acquisition as well as identify contradictions between knowledge in a knowledge base. To improve the efficiency during both the development and operational phases of these “intelligent” frameworks, a semantic relatedness algorithm is designed specifically to identify and rank underlying relationships within product development processes. After reviewing several semantic relatedness measures, three techniques, including a novel meronomic technique, are combined to create AIERO, the Algorithm for Identifying Engineering Relationships in Ontologies. In determining its applicability and accuracy, AIERO was applied to three separate, independently developed ontologies. The results indicate AIERO is capable of consistently returning relatedness values one would intuitively expect. To assess the effectiveness of AIERO in exposing underlying causal relationships across product development platforms, a case study involving the development of an industry-inspired printed circuit board (PCB) is presented. After instantiating the PCB knowledge base and developing an initial set of rules, FIDOE, the Framework for Intelligent Distributed Ontologies in Engineering, was employed to identify additional causal relationships through extensional relatedness measurements. In a conclusive PCB redesign, the resulting “intelligent” framework demonstrates its ability to pass values between instances, identify inconsistencies amongst instantiated knowledge, and identify conflicting values within product development frameworks. The results highlight how the introduced semantic methods can enhance the current knowledge acquisition, knowledge management, and knowledge validation capabilities of traditional knowledge bases.
8

Semantic Representation of L2 Lexicon in Japanese University Students

Matikainen, Tiina Johanna January 2011 (has links)
In a series of studies using semantic relatedness judgment response times, Jiang (2000, 2002, 2004a) has claimed that L2 lexical entries fossilize with their equivalent L1 content or something very close to it. In another study using a more productive test of lexical knowledge (Jiang 2004b), however, the evidence for this conclusion was less clear. The present study is a partial replication of Jiang (2004b) with Japanese learners of English. The aims of the study are to investigate the influence of the first language (L1) on second language (L2) lexical knowledge, to investigate whether lexical knowledge displays frequency-related, emergent properties, and to investigate the influence of the L1 on the acquisition of L2 word pairs that have a common L1 equivalent. Data from a sentence completion task was completed by 244 participants, who were shown sentence contexts in which they chose between L2 word pairs sharing a common equivalent in the students' first language, Japanese. The data were analyzed using the statistical analyses available in the programming environment R to quantify the participants' ability to discriminate between synonymous and non-synonymous use of these L2 word pairs. The results showed a strong bias against synonymy for all word pairs; the participants tended to make a distinction between the two synonymous items by assigning each word a distinct meaning. With the non-synonymous items, lemma frequency was closely related to the participants' success in choosing the correct word in the word pair. In addition, lemma frequency and the degree of similarity between the words in the word pair were closely related to the participants' overall knowledge of the non-synonymous meanings of the vocabulary items. The results suggest that the participants had a stronger preference for non-synonymous options than for the synonymous option. This suggests that the learners might have adopted a one-word, one-meaning learning strategy (Willis, 1998). The reasonably strong relationship between several of the usage-based statistics and the item measures from R suggest that with exposure learners are better able to use words in ways that are similar to native speakers of English, to differentiate between appropriate and inappropriate contexts and to recognize the boundary separating semantic overlap and semantic uniqueness. Lexical similarity appears to play a secondary role, in combination with frequency, in learners' ability to differentiate between appropriate and inappropriate contexts when using L2 word pairs that have a single translation in the L1. / CITE/Language Arts
9

Indirect Relatedness, Evaluation, and Visualization for Literature Based Discovery

Henry, Sam 01 January 2019 (has links)
The exponential growth of scientific literature is creating an increased need for systems to process and assimilate knowledge contained within text. Literature Based Discovery (LBD) is a well established field that seeks to synthesize new knowledge from existing literature, but it has remained primarily in the theoretical realm rather than in real-world application. This lack of real-world adoption is due in part to the difficulty of LBD, but also due to several solvable problems present in LBD today. Of these problems, the ones in most critical need of improvement are: (1) the over-generation of knowledge by LBD systems, (2) a lack of meaningful evaluation standards, and (3) the difficulty interpreting LBD output. We address each of these problems by: (1) developing indirect relatedness measures for ranking and filtering LBD hypotheses; (2) developing a representative evaluation dataset and applying meaningful evaluation methods to individual components of LBD; (3) developing an interactive visualization system that allows a user to explore LBD output in its entirety. In addressing these problems, we make several contributions, most importantly: (1) state of the art results for estimating direct semantic relatedness, (2) development of set association measures, (3) development of indirect association measures, (4) development of a standard LBD evaluation dataset, (5) division of LBD into discrete components with well defined evaluation methods, (6) development of automatic functional group discovery, and (7) integration of indirect relatedness measures and automatic functional group discovery into a comprehensive LBD visualization system. Our results inform future development of LBD systems, and contribute to creating more effective LBD systems.
10

Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and Thesauri

Dagerman, Björn January 2013 (has links)
Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the semantic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be acceptable by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems.

Page generated in 0.1045 seconds