Global ETD Search

581	The effects of search strategies and information interaction on sensemaking Wilson, Mathew J. January 2015 (has links) No description available. 004
582	Evaluation and development of conceptual document similarity metrics with content-based recommender applications Gouws, Stephan 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The World Wide Web brought with it an unprecedented level of information overload. Computers are very effective at processing and clustering numerical and binary data, however, the automated conceptual clustering of natural-language data is considerably harder to automate. Most past techniques rely on simple keyword-matching techniques or probabilistic methods to measure semantic relatedness. However, these approaches do not always accurately capture conceptual relatedness as measured by humans. In this thesis we propose and evaluate the use of novel Spreading Activation (SA) techniques for computing semantic relatedness, by modelling the article hyperlink structure of Wikipedia as an associative network structure for knowledge representation. The SA technique is adapted and several problems are addressed for it to function over the Wikipedia hyperlink structure. Inter-concept and inter-document similarity metrics are developed which make use of SA to compute the conceptual similarity between two concepts and between two natural-language documents. We evaluate these approaches over two document similarity datasets and achieve results which compare favourably with the state of the art. Furthermore, document preprocessing techniques are evaluated in terms of the performance gain these techniques can have on the well-known cosine document similarity metric and the Normalised Compression Distance (NCD) metric. Results indicate that a near two-fold increase in accuracy can be achieved for NCD by applying simple preprocessing techniques. Nonetheless, the cosine similarity metric still significantly outperforms NCD. Finally, we show that using our Wikipedia-based method to augment the cosine vector space model provides superior results to either in isolation. Combining the two methods leads to an increased correlation of Pearson p = 0:72 over the Lee (2005) document similarity dataset, which matches the reported result for the state-of-the-art Explicit Semantic Analysis (ESA) technique, while requiring less than 10% of the Wikipedia database as required by ESA. As a use case for document similarity techniques, a purely content-based news-article recommender system is designed and implemented for a large online media company. This system is used to gather additional human-generated relevance ratings which we use to evaluate the performance of three state-of-the-art document similarity metrics for providing content-based document recommendations. / AFRIKAANSE OPSOMMING: Die Wêreldwye-Web het ’n vlak van inligting-oorbelading tot gevolg gehad soos nog nooit tevore. Rekenaars is baie effektief met die verwerking en groepering van numeriese en binêre data, maar die konsepsuele groepering van natuurlike-taal data is aansienlik moeiliker om te outomatiseer. Tradisioneel berus sulke algoritmes op eenvoudige sleutelwoordherkenningstegnieke of waarskynlikheidsmetodes om semantiese verwantskappe te bereken, maar hierdie benaderings modelleer nie konsepsuele verwantskappe, soos gemeet deur die mens, baie akkuraat nie. In hierdie tesis stel ons die gebruik van ’n nuwe aktiverings-verspreidingstrategie (AV) voor waarmee inter-konsep verwantskappe bereken kan word, deur die artikel skakelstruktuur van Wikipedia te modelleer as ’n assosiatiewe netwerk. Die AV tegniek word aangepas om te funksioneer oor die Wikipedia skakelstruktuur, en verskeie probleme wat hiermee gepaard gaan word aangespreek. Inter-konsep en inter-dokument verwantskapsmaatstawwe word ontwikkel wat gebruik maak van AV om die konsepsuele verwantskap tussen twee konsepte en twee natuurlike-taal dokumente te bereken. Ons evalueer hierdie benadering oor twee dokument-verwantskap datastelle en die resultate vergelyk goed met die van ander toonaangewende metodes. Verder word teks-voorverwerkingstegnieke ondersoek in terme van die moontlike verbetering wat dit tot gevolg kan hê op die werksverrigting van die bekende kosinus vektorruimtemaatstaf en die genormaliseerde kompressie-afstandmaatstaf (GKA). Resultate dui daarop dat GKA se akkuraatheid byna verdubbel kan word deur gebruik te maak van eenvoudige voorverwerkingstegnieke, maar dat die kosinus vektorruimtemaatstaf steeds aansienlike beter resultate lewer. Laastens wys ons dat die Wikipedia-gebasseerde metode gebruik kan word om die vektorruimtemaatstaf aan te vul tot ’n gekombineerde maatstaf wat beter resultate lewer as enige van die twee metodes afsonderlik. Deur die twee metodes te kombineer lei tot ’n verhoogde korrelasie van Pearson p = 0:72 oor die Lee dokument-verwantskap datastel. Dit is gelyk aan die gerapporteerde resultaat vir Explicit Semantic Analysis (ESA), die huidige beste Wikipedia-gebasseerde tegniek. Ons benadering benodig egter minder as 10% van die Wikipedia databasis wat benodig word vir ESA. As ’n toetstoepassing vir dokument-verwantskaptegnieke ontwerp en implementeer ons ’n stelsel vir ’n aanlyn media-maatskappy wat nuusartikels aanbeveel vir gebruikers, slegs op grond van die artikels se inhoud. Joernaliste wat die stelsel gebruik ken ’n punt toe aan elke aanbeveling en ons gebruik hierdie data om die akkuraatheid van drie toonaangewende maatstawwe vir dokument-verwantskap te evalueer in die konteks van inhoud-gebasseerde nuus-artikel aanbevelings. Document similarity Wikipedia Spreading activation Information retrieval Dissertations -- Electronic engineering Theses -- Electronic engineering
583	REQUIREMENTS TRACING USING INFORMATION RETRIEVAL Sundaram, Senthil Karthikeyan 01 January 2007 (has links) It is important to track how a requirement changes throughout the software lifecycle. Each requirement should be validated during and at the end of each phase of the software lifecycle. It is common to build traceability matrices to demonstrate that requirements are satisfied by the design. Traceability matrices are needed in various tasks in the software development process. Unfortunately, developers and designers do not always build traceability matrices or maintain traceability matrices to the proper level of detail. Therefore, traceability matrices are often built after-the-fact. The generation of traceability matrices is a time consuming, error prone, and mundane process. Most of the times, the traceability matrices are built manually. Consider the case where an analyst is tasked to trace a high level requirement document to a lower level requirement specification. The analyst may have to look through M x N elements, where M and N are the number of high and low level requirements, respectively. There are not many tools available to assist the analysts in tracing unstructured textual artifacts and the very few tools that are available require enormous pre-processing. The prime objective of this work was to dynamically generate traceability links for unstructured textual artifacts using information retrieval (IR) methods. Given a user query and a document collection, IR methods identify all the documents that match the query. A closer observation of the requirements tracing process reveals the fact that it can be stated as a recursive IR problem. The main goals of this work were to solve the requirements traceability problem using IR methods and to improve the accuracy of the traceability links generated while best utilizing the analysts time. This work looked into adopting different IR methods and using user feedback to improve the traceability links generated. It also applied wrinkles such as filtering to the original IR methods. It also analyzed using a voting mechanism to select the traceability links identified by different IR methods. Finally, the IR methods were evaluated using six datasets. The results showed that automating requirements tracing process using IR methods helped save analysts time and generate good quality traceability matrices.
584	Structured and collaborative search: an integrated approach to share documents among users Francq, Pascal 02 June 2003 (has links) <p align="justify">Aujourd'hui, la gestion des documents est l'un des problèmes les plus importants en informatique. L'objectif de cette thèse est de proposer un système de gestion documentaire basé sur une approche appelée recherche structurée et collaborative. Les caractéristiques essentielles sont :</p> <ul><li><p align="justify">Dès lors que les utilisateurs ont plusieurs centres d'intérêts, ils sont décrits par des profils, un profil correspondant à un centre d'intérêt particulier. C'est la partie structurée du système.</li> </p> <li><p align="justify">Pour construire une description des profils, les utilisateurs jugent des documents en fonction de leur intérêt</li> </p> <li><p align="justify">Le système regroupe les profils similaires pour former un certain nombre de communautés virtuelles</li></p> <li><p align="justify">Une fois les communautés virtuelles définies, des documents jugés comme intéressants par certains utilisateurs d'une communauté peuvent être partagés dans toute la communauté. C'est la partie collaborative du système.</p> </li></ul> <p align="justify">Le système a été validé sur plusieurs corpora de documents en utilisant une méthodologie précise et offre des résultats prometteurs.</p> Intelligence artificielle Virtual communities Users profiling Collaboration Information sharing Information retrieval Informatique appliquée Logiciel
585	Computer assisted tutoring in radiology Jeffery, Nathan January 1997 (has links) No description available. 370
586	A SYSTEM ANALYSIS OF A MULTILEVEL SECURE LOCAL AREA NETWORK (COMPUTER). Benbrook, Jimmie Glen, 1943- January 1986 (has links) No description available. Computer network protocols.
587	MINING UNSTRUCTURED SOFTWARE REPOSITORIES USING IR MODELS Thomas, STEPHEN 12 December 2012 (has links) Mining Software Repositories, which is the process of analyzing the data related to software development practices, is an emerging field which aims to aid development teams in their day to day tasks. However, data in many software repositories is currently unused because the data is unstructured, and therefore difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed specifically to handle unstructured data, have recently been used by researchers to mine and analyze the unstructured data in software repositories, with some success. The main contribution of this thesis is the idea that the research and practice of using IR models to mine unstructured software repositories can be improved by going beyond the current state of affairs. First, we propose new applications of IR models to existing software engineering tasks. Specifically, we present a technique to prioritize test cases based on their IR similarity, giving highest priority to those test cases that are most dissimilar. In another new application of IR models, we empirically recover how developers use their mailing list while developing software. Next, we show how the use of advanced IR techniques can improve results. Using a framework for combining disparate IR models, we find that bug localization performance can be improved by 14–56% on average, compared to the best individual IR model. In addition, by using topic evolution models on the history of source code, we can uncover the evolution of source code concepts with an accuracy of 87–89%. Finally, we show the risks of current research, which uses IR models as black boxes without fully understanding their assumptions and parameters. We show that data duplication in source code has undesirable effects for IR models, and that by eliminating the duplication, the accuracy of IR models improves. Additionally, we find that in the bug localization task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal parameters can achieve an accuracy of 55%. Through empirical case studies on real-world systems, we show that all of our proposed techniques and methodologies significantly improve the state-of-the-art. / Thesis (Ph.D, Computing) -- Queen's University, 2012-12-12 12:34:59.854 empirical studies mining software repositories data mining machine learning software engineering information retrieval
588	Role of Semantic web in the changing context of Enterprise Collaboration Khilwani, Nitesh January 2011 (has links) In order to compete with the global giants, enterprises are concentrating on their core competencies and collaborating with organizations that compliment their skills and core activities. The current trend is to develop temporary alliances of independent enterprises, in which companies can come together to share skills, core competencies and resources. However, knowledge sharing and communication among multidiscipline companies is a complex and challenging problem. In a collaborative environment, the meaning of knowledge is drastically affected by the context in which it is viewed and interpreted; thus necessitating the treatment of structure as well as semantics of the data stored in enterprise repositories. Keeping the present market and technological scenario in mind, this research aims to propose tools and techniques that can enable companies to assimilate distributed information resources and achieve their business goals. 658.162
589	Data mining using the crossing minimization paradigm Abdullah, Ahsan January 2007 (has links) Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis. 005.3
590	Faculty Use of the World Wide Web: Modeling Information Seeking Behavior in a Digital Environment Fortin, Maurice G. 12 1900 (has links) There has been a long history of studying library users and their information seeking behaviors and activities. Researchers developed models to better understand these information seeking behaviors and activities of users. Most of these models were developed before the onset of the Internet. This research project studied faculty members' use of and their information seeking behaviors and activities on the Internet at Angelo State University, a Master's I institution. Using both a quantitative and qualitative methodology, differences were found between tenured and tenure-track faculty members on the perceived value of the Internet to meet their research and classroom information needs. Similar differences were also found among faculty members in the broad discipline areas of the humanities, social sciences, and sciences. Tenure-track faculty members reported a higher average Internet use per week than tenured faculty members. Based on in-depth, semi-structured interviews with seven tenured and seven tenure-track faculty members, an Internet Information Seeking Activities Model was developed to describe the information seeking activities on the Internet by faculty members at Angelo State University. The model consisted of four basic stages of activities: "Gathering," "Validating," "Linking" with a sub-stage of "Re-validating," and "Monitoring." There were two parallel stages included in the model. These parallel stages were "Communicating" and "Mentoring." The Internet Information Seeking Activities Model was compared to the behavioral model of information seeking by faculty members developed by Ellis. The Internet Model placed a greater emphasis on validating information retrieved from the Internet. Otherwise there were no other substantive changes to Ellis' model. Internet in higher education. Internet searching. Information retrieval. Angelo State University -- Faculty. Internet information seeking

Search results