• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatic Document Topic Identification Using Hierarchical Ontology Extracted from Human Background Knowledge

Hassan, Mostafa January 2013 (has links)
The rapid growth in the number of documents available to various end users from around the world has led to a greatly increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. We introduce in this thesis a novel approach for identifying document topics. In this approach, we try to utilize human background knowledge to help us to automatically find the best matching topic for input documents. There are several applications for this task. For example, it can be used to improve the relevancy of search engine results by categorizing the search results according to their general topic. It can also give users the ability to choose the domain which is most relevant to their needs. It can also be used for an application like a news publisher, where we want to automatically assign each news article to one of the predefined news main topics. In order to achieve this, we need to extract background knowledge in a form appropriate to this task. The thesis contributions can be summarized into two main modules. In the first module, we introduce a new approach to extract background knowledge from a human knowledge source, in the form of a knowledge repository, and store it in a well-structured and organized form, namely an ontology. We define the methodology of identifying ontological concepts, as well as defining the relations between these concepts. We use the ontology to infer the semantic similarity between documents, as well as to identify their topics. We apply our proposed approach using perhaps the best-known of the knowledge repositories, namely Wikipedia. The second module of this dissertation defines the framework for automatic document topic identification (ADTI). We present a new approach that utilizes the knowledge stored in the created ontology to automatically find the best matching topics for input documents, without the need for a training process such as in document classification. We compare ADTI to other text mining tasks by conducting several experiments to compare the performance of ADTI and its competitors, namely document clustering and document classification. Results show that our document topic identification approach outperforms several document clustering techniques. They show also that while ADTI does not require training, it nevertheless shows competitive performance with one of the state-of-the-art methods for document classification.
2

Automatic Document Topic Identification Using Hierarchical Ontology Extracted from Human Background Knowledge

Hassan, Mostafa January 2013 (has links)
The rapid growth in the number of documents available to various end users from around the world has led to a greatly increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. We introduce in this thesis a novel approach for identifying document topics. In this approach, we try to utilize human background knowledge to help us to automatically find the best matching topic for input documents. There are several applications for this task. For example, it can be used to improve the relevancy of search engine results by categorizing the search results according to their general topic. It can also give users the ability to choose the domain which is most relevant to their needs. It can also be used for an application like a news publisher, where we want to automatically assign each news article to one of the predefined news main topics. In order to achieve this, we need to extract background knowledge in a form appropriate to this task. The thesis contributions can be summarized into two main modules. In the first module, we introduce a new approach to extract background knowledge from a human knowledge source, in the form of a knowledge repository, and store it in a well-structured and organized form, namely an ontology. We define the methodology of identifying ontological concepts, as well as defining the relations between these concepts. We use the ontology to infer the semantic similarity between documents, as well as to identify their topics. We apply our proposed approach using perhaps the best-known of the knowledge repositories, namely Wikipedia. The second module of this dissertation defines the framework for automatic document topic identification (ADTI). We present a new approach that utilizes the knowledge stored in the created ontology to automatically find the best matching topics for input documents, without the need for a training process such as in document classification. We compare ADTI to other text mining tasks by conducting several experiments to compare the performance of ADTI and its competitors, namely document clustering and document classification. Results show that our document topic identification approach outperforms several document clustering techniques. They show also that while ADTI does not require training, it nevertheless shows competitive performance with one of the state-of-the-art methods for document classification.
3

Bottom-Up Ontology Creation with a Direct Instance Input Interface

Wei, Charles C.H. 01 April 2009 (has links) (PDF)
In general an ontology is created by following a top-down, or so called genus-species approach, where the species are differentiated from the genus and from each other by means of differentiae [8]. The superconcept is the genus, every subconcept is a species, and the differentiae correspond to roles. To complete it a user organizes data into a proper structure, accompanied with the instances in that domain in order to complete the construction of an ontology. It is a concept learning procedure in a school, for example. Students first learn the general knowledge and apply it to their exercise and homework for practice. After they are more familiar with the knowledge, they can use what they have learned to solve the problems in their daily life. The deductive learning approach is based on the fundamental knowledge that a student has acquired already. By contrast, a more intuitive way of learning is the bottom-up approach, which is based on atomism. That is also a frequently used way for humans to acquire knowledge. From sensing the world by vision, hearing, and touching, people learn information about actual objects, i.e., instances, in the world. After an instance has been collected, a relationship between it and existing knowledge will be created and an ontology will be formed automatically. The primary goal of this thesis is to make a better instance input interface for the ontology development tool Protégé to simplify the procedure of ontology construction. The second goal is to show the feasibility of a bottom-up approach for the building of an ontology. Without setting up the organization of classes and properties (slots) first, a user simply inputs all the information from an instance and the program will form an ontology automatically. It means after an instance has been entered, the system will find a proper location inside of the ontology to store it.
4

Sémantická anotace textu / Semantic Annotation of Text

Dytrych, Jaroslav January 2017 (has links)
This thesis deals with intelligent systems for support of the semantic annotation of text. It discusses the motivation for creation of such systems and state of the art in the areas of their usage. The thesis also describes newly proposed and realised annotation system which realizes advanced functions of semantic filtering and presentation of annotation suggestion alternatives in a unique way. The results of finished experiments clearly show the advantages of proposed solution. They also prove that the user interface of the annotation tools affects the annotation process. The optimisation of displayed information for the task of disambiguation of ambiguous entity names was done and proposed methods to speedup and increase of quality of the created annotations was experimentally evaluated. The comparison with the Protégé general tool has proven the benefits of created system for collaborative ontology creation which should be anchored in the text. In the conclusion, all achieved results are analysed and summarized.

Page generated in 0.0838 seconds