Spelling suggestions: "subject:"forminformation retrieval"" "subject:"informationation retrieval""
581 |
The effects of search strategies and information interaction on sensemakingWilson, Mathew J. January 2015 (has links)
No description available.
|
582 |
Evaluation and development of conceptual document similarity metrics with content-based recommender applicationsGouws, Stephan 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The World Wide Web brought with it an unprecedented level of information overload.
Computers are very effective at processing and clustering numerical and binary data,
however, the automated conceptual clustering of natural-language data is considerably
harder to automate. Most past techniques rely on simple keyword-matching techniques
or probabilistic methods to measure semantic relatedness. However, these approaches do
not always accurately capture conceptual relatedness as measured by humans.
In this thesis we propose and evaluate the use of novel Spreading Activation (SA)
techniques for computing semantic relatedness, by modelling the article hyperlink structure
of Wikipedia as an associative network structure for knowledge representation. The
SA technique is adapted and several problems are addressed for it to function over the
Wikipedia hyperlink structure. Inter-concept and inter-document similarity metrics are
developed which make use of SA to compute the conceptual similarity between two concepts
and between two natural-language documents. We evaluate these approaches over
two document similarity datasets and achieve results which compare favourably with the
state of the art.
Furthermore, document preprocessing techniques are evaluated in terms of the performance
gain these techniques can have on the well-known cosine document similarity metric
and the Normalised Compression Distance (NCD) metric. Results indicate that a near
two-fold increase in accuracy can be achieved for NCD by applying simple preprocessing
techniques. Nonetheless, the cosine similarity metric still significantly outperforms NCD.
Finally, we show that using our Wikipedia-based method to augment the cosine vector
space model provides superior results to either in isolation. Combining the two methods
leads to an increased correlation of Pearson p = 0:72 over the Lee (2005) document similarity
dataset, which matches the reported result for the state-of-the-art Explicit Semantic
Analysis (ESA) technique, while requiring less than 10% of the Wikipedia database as
required by ESA.
As a use case for document similarity techniques, a purely content-based news-article
recommender system is designed and implemented for a large online media company.
This system is used to gather additional human-generated relevance ratings which we
use to evaluate the performance of three state-of-the-art document similarity metrics for
providing content-based document recommendations. / AFRIKAANSE OPSOMMING: Die Wêreldwye-Web het ’n vlak van inligting-oorbelading tot gevolg gehad soos nog nooit
tevore. Rekenaars is baie effektief met die verwerking en groepering van numeriese en
binêre data, maar die konsepsuele groepering van natuurlike-taal data is aansienlik moeiliker
om te outomatiseer. Tradisioneel berus sulke algoritmes op eenvoudige sleutelwoordherkenningstegnieke
of waarskynlikheidsmetodes om semantiese verwantskappe te bereken,
maar hierdie benaderings modelleer nie konsepsuele verwantskappe, soos gemeet deur
die mens, baie akkuraat nie.
In hierdie tesis stel ons die gebruik van ’n nuwe aktiverings-verspreidingstrategie (AV)
voor waarmee inter-konsep verwantskappe bereken kan word, deur die artikel skakelstruktuur
van Wikipedia te modelleer as ’n assosiatiewe netwerk. Die AV tegniek word aangepas
om te funksioneer oor die Wikipedia skakelstruktuur, en verskeie probleme wat hiermee
gepaard gaan word aangespreek. Inter-konsep en inter-dokument verwantskapsmaatstawwe
word ontwikkel wat gebruik maak van AV om die konsepsuele verwantskap tussen twee
konsepte en twee natuurlike-taal dokumente te bereken. Ons evalueer hierdie benadering
oor twee dokument-verwantskap datastelle en die resultate vergelyk goed met die van
ander toonaangewende metodes.
Verder word teks-voorverwerkingstegnieke ondersoek in terme van die moontlike verbetering
wat dit tot gevolg kan hê op die werksverrigting van die bekende kosinus vektorruimtemaatstaf
en die genormaliseerde kompressie-afstandmaatstaf (GKA). Resultate
dui daarop dat GKA se akkuraatheid byna verdubbel kan word deur gebruik te maak van
eenvoudige voorverwerkingstegnieke, maar dat die kosinus vektorruimtemaatstaf steeds
aansienlike beter resultate lewer.
Laastens wys ons dat die Wikipedia-gebasseerde metode gebruik kan word om die
vektorruimtemaatstaf aan te vul tot ’n gekombineerde maatstaf wat beter resultate lewer
as enige van die twee metodes afsonderlik. Deur die twee metodes te kombineer lei tot ’n
verhoogde korrelasie van Pearson p = 0:72 oor die Lee dokument-verwantskap datastel.
Dit is gelyk aan die gerapporteerde resultaat vir Explicit Semantic Analysis (ESA), die
huidige beste Wikipedia-gebasseerde tegniek. Ons benadering benodig egter minder as
10% van die Wikipedia databasis wat benodig word vir ESA.
As ’n toetstoepassing vir dokument-verwantskaptegnieke ontwerp en implementeer ons
’n stelsel vir ’n aanlyn media-maatskappy wat nuusartikels aanbeveel vir gebruikers, slegs
op grond van die artikels se inhoud. Joernaliste wat die stelsel gebruik ken ’n punt toe aan
elke aanbeveling en ons gebruik hierdie data om die akkuraatheid van drie toonaangewende
maatstawwe vir dokument-verwantskap te evalueer in die konteks van inhoud-gebasseerde
nuus-artikel aanbevelings.
|
583 |
REQUIREMENTS TRACING USING INFORMATION RETRIEVALSundaram, Senthil Karthikeyan 01 January 2007 (has links)
It is important to track how a requirement changes throughout the software lifecycle. Each requirement should be validated during and at the end of each phase of the software lifecycle. It is common to build traceability matrices to demonstrate that requirements are satisfied by the design. Traceability matrices are needed in various tasks in the software development process. Unfortunately, developers and designers do not always build traceability matrices or maintain traceability matrices to the proper level of detail. Therefore, traceability matrices are often built after-the-fact. The generation of traceability matrices is a time consuming, error prone, and mundane process. Most of the times, the traceability matrices are built manually. Consider the case where an analyst is tasked to trace a high level requirement document to a lower level requirement specification. The analyst may have to look through M x N elements, where M and N are the number of high and low level requirements, respectively. There are not many tools available to assist the analysts in tracing unstructured textual artifacts and the very few tools that are available require enormous pre-processing. The prime objective of this work was to dynamically generate traceability links for unstructured textual artifacts using information retrieval (IR) methods. Given a user query and a document collection, IR methods identify all the documents that match the query. A closer observation of the requirements tracing process reveals the fact that it can be stated as a recursive IR problem. The main goals of this work were to solve the requirements traceability problem using IR methods and to improve the accuracy of the traceability links generated while best utilizing the analysts time. This work looked into adopting different IR methods and using user feedback to improve the traceability links generated. It also applied wrinkles such as filtering to the original IR methods. It also analyzed using a voting mechanism to select the traceability links identified by different IR methods. Finally, the IR methods were evaluated using six datasets. The results showed that automating requirements tracing process using IR methods helped save analysts time and generate good quality traceability matrices.
|
584 |
Structured and collaborative search: an integrated approach to share documents among usersFrancq, Pascal 02 June 2003 (has links)
<p align="justify">Aujourd'hui, la gestion des documents est l'un des problèmes les plus importants en informatique. L'objectif de cette thèse est de proposer un système de gestion documentaire basé sur une approche appelée recherche structurée et collaborative. Les caractéristiques essentielles sont :</p>
<ul><li><p align="justify">Dès lors que les utilisateurs ont plusieurs centres d'intérêts, ils sont décrits par des profils, un profil correspondant à un centre d'intérêt particulier. C'est la partie structurée du système.</li>
</p>
<li><p align="justify">Pour construire une description des profils, les utilisateurs jugent des documents en fonction de leur intérêt</li>
</p>
<li><p align="justify">Le système regroupe les profils similaires pour former un certain nombre de communautés virtuelles</li></p>
<li><p align="justify">Une fois les communautés virtuelles définies, des documents jugés comme intéressants par certains utilisateurs d'une communauté peuvent être partagés dans toute la communauté. C'est la partie collaborative du système.</p>
</li></ul>
<p align="justify">Le système a été validé sur plusieurs corpora de documents en utilisant une méthodologie précise et offre des résultats prometteurs.</p>
|
585 |
Computer assisted tutoring in radiologyJeffery, Nathan January 1997 (has links)
No description available.
|
586 |
A SYSTEM ANALYSIS OF A MULTILEVEL SECURE LOCAL AREA NETWORK (COMPUTER).Benbrook, Jimmie Glen, 1943- January 1986 (has links)
No description available.
|
587 |
MINING UNSTRUCTURED SOFTWARE REPOSITORIES USING IR MODELSThomas, STEPHEN 12 December 2012 (has links)
Mining Software Repositories, which is the process of analyzing the data related
to software development practices, is an emerging field which aims to
aid development teams in their day to day tasks. However, data in many
software repositories is currently unused because the data is unstructured, and therefore
difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed
specifically to handle unstructured data, have recently been used by researchers to mine
and analyze the unstructured data in software repositories, with some success.
The main contribution of this thesis is the idea that the research and practice of using
IR models to mine unstructured software repositories can be improved by going beyond the
current state of affairs. First, we propose new applications of IR models to existing software
engineering tasks. Specifically, we present a technique to prioritize test cases based on their
IR similarity, giving highest priority to those test cases that are most dissimilar. In another
new application of IR models, we empirically recover how developers use their mailing list
while developing software.
Next, we show how the use of advanced IR techniques can improve results. Using a
framework for combining disparate IR models, we find that bug localization performance
can be improved by 14–56% on average, compared to the best individual IR model. In
addition, by using topic evolution models on the history of source code, we can uncover the
evolution of source code concepts with an accuracy of 87–89%.
Finally, we show the risks of current research, which uses IR models as black boxes without
fully understanding their assumptions and parameters. We show that data duplication
in source code has undesirable effects for IR models, and that by eliminating the duplication,
the accuracy of IR models improves. Additionally, we find that in the bug localization
task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal
parameters can achieve an accuracy of 55%.
Through empirical case studies on real-world systems, we show that all of our proposed
techniques and methodologies significantly improve the state-of-the-art. / Thesis (Ph.D, Computing) -- Queen's University, 2012-12-12 12:34:59.854
|
588 |
Role of Semantic web in the changing context of Enterprise CollaborationKhilwani, Nitesh January 2011 (has links)
In order to compete with the global giants, enterprises are concentrating on their core competencies and collaborating with organizations that compliment their skills and core activities. The current trend is to develop temporary alliances of independent enterprises, in which companies can come together to share skills, core competencies and resources. However, knowledge sharing and communication among multidiscipline companies is a complex and challenging problem. In a collaborative environment, the meaning of knowledge is drastically affected by the context in which it is viewed and interpreted; thus necessitating the treatment of structure as well as semantics of the data stored in enterprise repositories. Keeping the present market and technological scenario in mind, this research aims to propose tools and techniques that can enable companies to assimilate distributed information resources and achieve their business goals.
|
589 |
Data mining using the crossing minimization paradigmAbdullah, Ahsan January 2007 (has links)
Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis.
|
590 |
Faculty Use of the World Wide Web: Modeling Information Seeking Behavior in a Digital EnvironmentFortin, Maurice G. 12 1900 (has links)
There has been a long history of studying library users and their information seeking behaviors and activities. Researchers developed models to better understand these information seeking behaviors and activities of users. Most of these models were developed before the onset of the Internet. This research project studied faculty members' use of and their information seeking behaviors and activities on the Internet at Angelo State University, a Master's I institution. Using both a quantitative and qualitative methodology, differences were found between tenured and tenure-track faculty members on the perceived value of the Internet to meet their research and classroom information needs. Similar differences were also found among faculty members in the broad discipline areas of the humanities, social sciences, and sciences. Tenure-track faculty members reported a higher average Internet use per week than tenured faculty members. Based on in-depth, semi-structured interviews with seven tenured and seven tenure-track faculty members, an Internet Information Seeking Activities Model was developed to describe the information seeking activities on the Internet by faculty members at Angelo State University. The model consisted of four basic stages of activities: "Gathering," "Validating," "Linking" with a sub-stage of "Re-validating," and "Monitoring." There were two parallel stages included in the model. These parallel stages were "Communicating" and "Mentoring." The Internet Information Seeking Activities Model was compared to the behavioral model of information seeking by faculty members developed by Ellis. The Internet Model placed a greater emphasis on validating information retrieved from the Internet. Otherwise there were no other substantive changes to Ellis' model.
|
Page generated in 0.147 seconds