• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 294
  • 92
  • 40
  • 27
  • 22
  • 21
  • 16
  • 7
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • Tagged with
  • 600
  • 141
  • 115
  • 95
  • 90
  • 82
  • 78
  • 78
  • 65
  • 64
  • 58
  • 53
  • 53
  • 53
  • 53
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Role of semantic indexing for text classification

Sani, Sadiq January 2014 (has links)
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, failure to take into account the semantic relatedness between terms means that document similarity is not properly captured in the VSM. To address this problem, semantic indexing approaches have been proposed for modelling the semantic relatedness between terms in document representations. Accordingly, in this thesis, we empirically review the impact of semantic indexing on text classification. This empirical review allows us to answer one important question: how beneficial is semantic indexing to text classification performance. We also carry out a detailed analysis of the semantic indexing process which allows us to identify reasons why semantic indexing may lead to poor text classification performance. Based on our findings, we propose a semantic indexing framework called Relevance Weighted Semantic Indexing (RWSI) that addresses the limitations identified in our analysis. RWSI uses relevance weights of terms to improve the semantic indexing of documents. A second problem with the VSM is the lack of supervision in the process of creating document representations. This arises from the fact that the VSM was originally designed for unsupervised document retrieval. An important feature of effective document representations is the ability to discriminate between relevant and non-relevant documents. For text classification, relevance information is explicitly available in the form of document class labels. Thus, more effective document vectors can be derived in a supervised manner by taking advantage of available class knowledge. Accordingly, we investigate approaches for utilising class knowledge for supervised indexing of documents. Firstly, we demonstrate how the RWSI framework can be utilised for assigning supervised weights to terms for supervised document indexing. Secondly, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. A further limitation of the standard VSM is that an indexing vocabulary that consists only of terms from the document collection is used for document representation. This is based on the assumption that terms alone are sufficient to model the meaning of text documents. However for certain classification tasks, terms are insufficient to adequately model the semantics needed for accurate document classification. A solution is to index documents using semantically rich concepts. Accordingly, we present an event extraction framework called Rule-Based Event Extractor (RUBEE) for identifying and utilising event information for concept-based indexing of incident reports. We also demonstrate how certain attributes of these events e.g. negation, can be taken into consideration to distinguish between documents that describe the occurrence of an event, and those that mention the non-occurrence of that event.
72

In-house indexing of periodical literature : a study of university libraries in Kenya

Matanji, Peter Hezron Marisia 03 1900 (has links)
The present study investigated identification, access and usage of periodicals in university libraries in Kenya, with a view of recommending a tool for assisting users to identify information. Using questionnaires completed by 316 university library users and 27 librarians, backed with participant observations, document analysis as well as interviews, it was found that usage of periodicals was low as most users browse through periodicals to identify information, a method that is not effective. In-house indexing was investigated and found to be an effective tool in facilitating access to relevant information. The study recommends establishment of in-house indexing programs and databases in university libraries; formulation of consistent indexing policies to achieve quality indexing; and that indexing should be focused on both content and user requirements by specifying points- of- view, and study methodologies to enhance retrieval of relevant information. / Information Science / M. A. (Information Science)
73

Image manipulation and user-supplied index terms.

Schultz, Leah 05 1900 (has links)
This study investigates the relationships between the use of a zoom tool, the terms they supply to describe the image, and the type of image being viewed. Participants were assigned to two groups, one with access to the tool and one without, and were asked to supply terms to describe forty images, divided into four categories: landscape, portrait, news, and cityscape. The terms provided by participants were categorized according to models proposed in earlier image studies. Findings of the study suggest that there was not a significant difference in the number of terms supplied in relation to access to the tool, but a large variety in use of the tool was demonstrated by the participants. The study shows that there are differences in the level of meaning of the terms supplied in some of the models. The type of image being viewed was related to the number of zooms and relationships between the type of image and the number of terms supplied as well as their level of meaning in the various models from previous studies exist. The results of this study provide further insight into how people think about images and how the manipulation of those images may affect the terms they assign to describe images. The inclusion of these tools in search and retrieval scenarios may affect the outcome of the process and the more collection managers know about how people interact with images will improve their ability to provide access to the growing amount of pictorial information.
74

Combining Image Features For Semantic Descriptions

Soysal, Medeni 01 January 2003 (has links) (PDF)
Digital multimedia content production and the amount of content present all over the world have exploded in the recent years. The consequences of this fact can be observed everywhere in many different forms, to exemplify, huge digital video archives of broadcasting companies, commercial image archives, virtual museums, etc. In order for these sources to be useful and accessible, this technological advance must be accompanied by the effective techniques of indexing and retrieval. The most effective way of indexing is the one providing a basis for retrieval in terms of semantic concepts, upon which ordinary users of multimedia databases base their queries. On the other hand, semantic classification of images using low-level features is a challenging problem. Combining experts with different classifier structures, trained by MPEG-7low-level color and texture descriptors, is examined as a solution alternative. For combining different classifiers and features, advanced decision mechanisms are proposed, which utilize basic expert combination strategies in different settings. Each of these decision mechanisms, namely Single Feature Combination (SFC), Multiple Feature Direct Combination (MFDC), and Multiple Feature Cascaded Combination (MFCC) enjoy significant classification performance improvements over single experts. Simulations are conducted on eight different visual semantic classes, resulting in accuracy improvements between 3.5-6.5%, when they are compared with the best performance of single expert systems.
75

Novelty Detection by Latent Semantic Indexing

Zhang, Xueshan January 2013 (has links)
As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources. To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected. We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure.
76

Indexing Compressed Text

He, Meng January 2003 (has links)
As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issues are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers. In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.
77

Mobilių objektų indeksavimas duomenų bazėse / Indexing of mobile objects in databases

Tamošiūnas, Saulius 02 July 2014 (has links)
Pagrindinis šio darbo tikslas yra išnagrinėti judančių objektų indeksavimo duomenų bazėse problemas, siūlomus sprendimus bei palyginti keleto iš jų veiksmingumą. Įvairiais pjūviais buvo lyginami praeities duomenis indeksuojantys R ir iš jo išvesti STR bei TB medžiai. Eksperimentai atlikti naudojant sugeneruotus judančių objektų duomenis. Gauti rezultatai parodė, kad indeksų veiksmingas priklauso nuo tam tikrų sąlygų ir aplinkybių, kuriomis jie naudojami. / Over the past few years, there has been a continuous improvement in the wireless communications and the positioning technologies. As a result, tracking the changing positions of continuously moving objects is becoming increasingly feasible and necessary. Databases that deal with objects that change their location and/or shape over time are called spatio-temporal databases. Traditional database approaches for effective information retrieval cannot be used as the moving objects database is highly dynamic. A need for so called spatio-temporal indexing techniques comes to scene. Mainly, by the problem they are addressed to, indices are divided into two groups: a) indexing the past and b) indexing the current and predicted future positions. Also the have been proposed techniques covering both problems. This work is a survey for well known and used indices. Also there is a performance comparison between several past indexing methods. STR Tree, TB Tree and the predecessor of many indices, the R Tree are compared in various aspects using generated datasets of simulated objects movement.
78

Novelty Detection by Latent Semantic Indexing

Zhang, Xueshan January 2013 (has links)
As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources. To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected. We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure.
79

Modelling and managing temporal data and its application to Scottish dental information systems

Lu, Jiang January 1997 (has links)
No description available.
80

Improved indexes for next generation bioinformatics applications

Wu, Man-kit, Edward., 胡文傑. January 2009 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy

Page generated in 0.1632 seconds