Global ETD Search

511	Usability of a Keyphrase Browsing Tool Based on a Semantic Cloud Model Johnston, Onaje Omotola 08 1900 (has links) The goal of this research was to facilitate the scrutiny and utilization of Web search engine retrieval results. I used a graphical keyphrase browsing interface to visualize the conceptual information space of the results, presenting document characteristics that make document relevance determinations easier. Information retrieval. Internet searching. keyphrases usability semantic cloud browsing
512	PC-Gipsy:a usable PC-based image processing system Melder, Karl Henry 26 January 2010 (has links) Master of Information Systems GIPSY (Information retrieval system) LD5655.V851 1990.M453
513	Information Storage and Retrieval Systems Creech, Teresa Adams 05 1900 (has links) This thesis describes the implementation of a general purpose personal information storage and retrieval system. Chapter one contains an introduction to information storage and retrieval. Chapter two contains a description of the features a useful personal information retrieval system should contain. This description forms the basis for the implementation of the personal information storage and retrieval system described in chapter three. The system is implemented in UCSD Pascal on an Apple II microcomputer. computer programming Microcomputers -- Programming. information retrieval
514	Using Information Retrieval to Improve Integration Testing Alazzam, Iyad January 2012 (has links) Software testing is an important factor of the software development process. Integration testing is an important and expensive level of the software testing process. Unfortunately, since the developers have limited time to perform integration testing and debugging and integration testing becomes very hard as the combinations grow in size, the chain of calls from one module to another grow in number, length, and complexity. This research is about providing new methodology for integration testing to reduce the number of test cases needed to a significant degree while returning as much of its effectiveness as possible. The proposed approach shows the best order in which to integrate the classes currently available for integration and the external method calls that should be tested and in their order for maximum effectiveness. Our approach limits the number of integration test cases. The integration test cases number depends mainly on the dependency among modules and on the number of the integrated classes in the application. The dependency among modules is determined by using an information retrieval technique called Latent Semantic Indexing (LSI). In addition, this research extends the mutation testing for use in integration testing as a method to evaluate the effectiveness of the integration testing process. We have developed a set of integration mutation operators to support development of integration mutation testing. We have conducted experiments based on ten Java applications. To evaluate the proposed methodology, we have created mutants using new mutation operators that exercise the integration testing. Our experiments show that the test cases killed more than 60% of the created mutants. Information retrieval. Latent semantic indexing. Mutation testing of computer programs.
515	Multi-Perspective Semantic Information Retrieval in the Biomedical Domain January 2020 (has links) abstract: Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. IR is a valuable component of several downstream Natural Language Processing (NLP) tasks, such as Question Answering. Practically, IR is at the heart of many widely-used technologies like search engines. While probabilistic ranking functions, such as the Okapi BM25 function, have been utilized in IR systems since the 1970's, modern neural approaches pose certain advantages compared to their classical counterparts. In particular, the release of BERT (Bidirectional Encoder Representations from Transformers) has had a significant impact in the NLP community by demonstrating how the use of a Masked Language Model (MLM) trained on a considerable corpus of data can improve a variety of downstream NLP tasks, including sentence classification and passage re-ranking. IR Systems are also important in the biomedical and clinical domains. Given the continuously-increasing amount of scientific literature across biomedical domain, the ability find answers to specific clinical queries from a repository of millions of articles is a matter of practical value to medics, doctors, and other medical professionals. Moreover, there are domain-specific challenges present in the biomedical domain, including handling clinical jargon and evaluating the similarity or relatedness of various medical symptoms when determining the relevance between a query and a sentence. This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain. First, it introduces Multi-Perspective Sentence Relevance, a novel methodology of utilizing BERT-based models for contextual IR. The system is evaluated using the BioASQ Biomedical IR Challenge. Finally, practical contributions in the form of a live IR system for medics and a proposed challenge on the Living Systematic Review clinical task are provided. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Computer science BERT information retrieval natural language processing
516	Vícejazyčné vyhledávání informací v oblasti medicíny / Cross-Lingual Information Retrieval in the Medical Domain Saleh, Shadi January 2020 (has links) Cross-Lingual Information Retrieval in the Medical Domain Shadi Saleh In recent years, there has been an exponential growth of the digital content available on the Internet, which has correlated with the increasing number of non-English Internet users due to the spread of the Internet across the globe. This raises the importance of unlocking resources for those who want to look up information not limited to the languages they understand. For example, those who want to use the Internet to find medical content related to their health conditions (self-diagnosis) but they do not have access to resources in their language. Cross-Lingual Information Retrieval (CLIR) breaks the lan- guage barriers by allowing search for documents written in a language different from the query language. This thesis tackles the task of CLIR in the medical domain and investigates the two main approaches: query translation (QT) where queries are machine translated to the language of documents and document translation (DT) where documents are translated to the language of queries. We proceed with our research by employing Statistical Machine Translation (SMT) systems that are tuned for the QT approach and the DT approach in the medical domain for seven European languages (Czech, German, French, Spanish, Hungarian, Polish and Swedish) and...
517	A Theory for the Measurement of Internet Information Retrieval MacCall, Steven Leonard 05 1900 (has links) The purpose of this study was to develop and evaluate a measurement model for Internet information retrieval strategy performance evaluation whose theoretical basis is a modification of the classical measurement model embodied in the Cranfield studies and their progeny. Though not the first, the Cranfield studies were the most influential of the early evaluation experiments. The general problem with this model was and continues to be the subjectivity of the concept of relevance. In cyberspace, information scientists are using quantitative measurement models for evaluating information retrieval performance that are based on the Cranfield model. This research modified this model by incorporating enduser relevance judgment rather than using objective relevance judgments, and by adopting a fundamental unit of measure developed for the cyberspace of Internet information retrieval rather than using recall and precision-type measures. The proposed measure, the Content-bearing Click (CBC) Ratio, was developed as a quantitative measure reflecting the performance of an Internet IR strategy. Since the hypertext "click" is common to many Internet IR strategies, it was chosen as the fundamental unit of measure rather than the "document." The CBC Ratio is a ratio of hypertext click counts that can be viewed as a false drop measure that determines the average number of irrelevant content-bearing clicks that an enduser check before retrieving relevant information. After measurement data were collected, they were used to evaluate the reliability of several methods for aggregating relevance judgments. After reliability coefficients were calculated, measurement model was used to compare web catalog and web database performance in an experimental setting. Conclusions were the reached concerning the reliability of the proposed measurement model and its ability to measure Internet IR performance, as well as implications for clinical use of the Internet and for future research in Information Science. Information retrieval. Internet. enduser relevance content-bearing click IR strategy
518	The Cluster Hypothesis: A Visual/Statistical Analysis Sullivan, Terry 05 1900 (has links) By allowing judgments based on a small number of exemplar documents to be applied to a larger number of unexamined documents, clustered presentation of search results represents an intuitively attractive possibility for reducing the cognitive resource demands on human users of information retrieval systems. However, clustered presentation of search results is sensible only to the extent that naturally occurring similarity relationships among documents correspond to topically coherent clusters. The Cluster Hypothesis posits just such a systematic relationship between document similarity and topical relevance. To date, experimental validation of the Cluster Hypothesis has proved problematic, with collection-specific results both supporting and failing to support this fundamental theoretical postulate. The present study consists of two computational information visualization experiments, representing a two-tiered test of the Cluster Hypothesis under adverse conditions. Both experiments rely on multidimensionally scaled representations of interdocument similarity matrices. Experiment 1 is a term-reduction condition, in which descriptive titles are extracted from Associated Press news stories drawn from the TREC information retrieval test collection. The clustering behavior of these titles is compared to the behavior of the corresponding full text via statistical analysis of the visual characteristics of a two-dimensional similarity map. Experiment 2 is a dimensionality reduction condition, in which inter-item similarity coefficients for full text documents are scaled into a single dimension and then rendered as a two-dimensional visualization; the clustering behavior of relevant documents within these unidimensionally scaled representations is examined via visual and statistical methods. Taken as a whole, results of both experiments lend strong though not unqualified support to the Cluster Hypothesis. In Experiment 1, semantically meaningful 6.6-word document surrogates systematically conform to the predictions of the Cluster Hypothesis. In Experiment 2, the majority of the unidimensionally scaled datasets exhibit a marked nonuniformity of distribution of relevant documents, further supporting the Cluster Hypothesis. Results of the two experiments are profoundly question-specific. Post hoc analyses suggest that it may be possible to predict the success of clustered searching based on the lexical characteristics of users' natural-language expression of their information need. Information retrieval. Database searching. Document search Document collection Information theory
519	Learning hash codes for multimedia retrieval Chen, Junjie 28 August 2019 (has links) The explosive growth of multimedia data in online media repositories and social networks has led to the high demand of fast and accurate services for large-scale multimedia retrieval. Hashing, due to its effectiveness in coding high-dimensional data into a low-dimensional binary space, has been considered to be effective for the retrieval application. Despite the progress that has been made recently, how to learn the optimal hashing models which can make the best trade-off between the retrieval efficiency and accuracy remains to be open research issues. This thesis research aims to develop hashing models which are effective for image and video retrieval. An unsupervised hashing model called APHash is first proposed to learn hash codes for images by exploiting the distribution of data. To reduce the underlying computational complexity, a methodology that makes use of an asymmetric similarity matrix is explored and found effective. In addition, the deep learning approach to learn hash codes for images is also studied. In particular, a novel deep model called DeepQuan which tries to incorporate product quantization methods into an unsupervised deep model for the learning. Other than adopting only the quadratic loss as the optimization objective like most of the related deep models, DeepQuan optimizes the data representations and their quantization codebooks to explores the clustering structure of the underlying data manifold where the introduction of a weighted triplet loss into the learning objective is found to be effective. Furthermore, the case with some labeled data available for the learning is also considered. To alleviate the high training cost (which is especially crucial given a large-scale database), another hashing model named Similarity Preserving Deep Asymmetric Quantization (SPDAQ) is proposed for both image and video retrieval where the compact binary codes and quantization codebooks for all the items in the database can be explicitly learned in an efficient manner. All the aforementioned hashing methods proposed have been rigorously evaluated using benchmark datasets and found to outperform the related state-of-the-art methods.
520	Using Morphological Analysis in an Information Retrieval System for Résumés / Användning av morfologisk analys i ett informationssökningssystem för CVn Norrby, Sara January 2016 (has links) This thesis investigates the usage of an information retrieval system among résumés in Swedish and how the usage of morphological methods, such as lemmatization, affects the results. In order to investigate this, a small information retrieval system was built using lemmatization and compound splitting. This thesis also discusses how the relevance of a résumé can be decided and evaluates the information retrieval system in terms of precision, recall and ranking ability. The results show that using morphological analysis had a positive effect in some cases, especially when the query contained more Swedish words than names of skills. In the cases where there were mostly technical skills in the query it proved to have a negative impact. Lemmatization was the method that had a small positive effect on ranking ability but the compound splitting had a negative impact regardless on the queries' features. / I detta examensarbete undersöks hur användning av morfologisk analys, så som lemmatisering, påverkar prestandan hos ett informationssökningssystem för CV:n på svenska. Det tas också upp hur relevans hos ett CV kan bedömas och informationssökningssystemet utvärderas utifrån precision och täckning men även ''discounted cumulative gain'' vilket är ett mått på rankningsförmåga. Resultaten visar att morfologisk analys ger positiva effekter i de fall då frågan till söksystemet innehåller många svenska ord. När frågan innehöll många namn på olika tekniker så visade det sig vara negativt att använda morfologi, framförallt när det gäller uppdelning av sammansatta ord. Lemmatisering var den metod som hade positiv effekt i vissa fall medan uppdelning av sammansatta ord endast hade en negativ effekt. résumé information retrieval morphology Swedish Computer Sciences Datavetenskap (datalogi)

Search results