Global ETD Search

541	Improving Web Search Ranking Using the Internet Archive Li, Liyan 02 June 2020 (has links) Current web search engines retrieve relevant results only based on the latest content of web pages stored in their indices despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and separately model texts and relevance signals in the newly added, retained, and removed parts. We particularly examine the Internet Archive, the largest web archiving service thus far, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this moment is affected by the practical coverage of the Internet Archive and the amount of regularly-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services. / Master of Science / Current web search engines show search documents only based on the most recent version of web pages stored in their database despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and get the newly added, retained, and removed parts. We examine the Internet Archive in particular, the largest web archiving service now, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this point is affected by the practical coverage of the Internet Archive and the amount of ever-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services. Information Retrieval Web Archiving Internet Archive Search Result Ranking
542	Experimental comparison of schemes for interpreting Boolean queries Lee, Whay C. January 1988 (has links) The standard interpretation of the logical operators in a Boolean retrieval system is in general too strict. A standard Boolean query rarely comes close to retrieving all and only those documents which are relevant to the user. An AND query is often too narrow and an 0 R query is often too broad. The choice of the AND results in retrieving on the left end of a typical average recall-precision graph, while the choice of the OR results in retrieving on the right end, implying a tradeoff between precision and recall. This study basically examines various proposed schemes, the P-norm, Classical Fuzzy-Set, MMM, Paice and TIRS, which provide means to soften the interpretation of the logical operators, and thus to attain both high precision and high recall search performance. Each of the above schemes has shown great improvement over the standard Boolean scheme in terms of retrieval effectiveness. The differences in retrieval effectiveness between P-norm, Paice and MMM are shown to be relatively small. However, related performance results obtained gives evidence of the ranking: P-norm, Paice, MMM and then TIRS. This study employs the INNER PRODUCT function for computing the similarity between a document point and a query point in TIRS. There may be other choices of similarity functions for TIRS, but irrespective of the function used, the TIRS approach, having to deal with associated min-terms rather than the original query, is difficult to realize and involves far greater computational overhead than the other schemes. The P-norm scheme, being a distance-based approach, has greater intuitive appeal than the Paice or MMM scheme. However, in terms of computational overhead required of each scheme, both the Paice and MM M are superior to P-norm. The Paice and MMM schemes are essentially variations of the classical fuzzy-set scheme. Both perform much better than the classical fuzzy-set scheme in terms of retrieval effectiveness. / Master of Science LD5655.V855 1988.L449 Information retrieval Online data processing
543	Array processor support in GIPSY Fabregas, Gregg Roland January 1989 (has links) The CSPI mini-MAP array processor is supported for use with a RATFOR preprocessor in the software environment defined by the Generalized Image Processing System (GIPSY). A set of interface routines presents the mini-MAP as a tightly-coupled slave processor with well-defined rules for control from the host computer. The slave is programmed by adapting host-based software, using a proscribed set of guidelines for conversion. A software protocol has been defined to allow mini-MAP data memory to be allocated dynamically. Several examples of modified GIPSY commands are examined. / Master of Science LD5655.V855 1989.F327 Array processors GIPSY (Information retrieval system)
544	Performance Evaluation of Web Archiving Through In-Memory Page Cache Vishwasrao, Saket Dilip 23 June 2017 (has links) This study proposes and evaluates a new method for Web archiving. We leverage the caching infrastructure in Web servers for archiving. Redis is used as the page cache and its persistence mechanism is exploited for archiving. We experimentally evaluate the performance of our archival technique using the Greek version of Wikipedia deployed on Amazon cloud infrastructure. We show that there is a slight increase in latencies of the rendered pages due to archiving. Though the server performance is comparable at larger page cache sizes, the maximum throughput the server can handle decreases significantly at lower cache sizes due to more disk write operations as a result of archiving. Since pages are dynamically rendered and the technology stack of Wikipedia is extensively used in a number of Web applications, our results should have broad impact. / Master of Science / This study proposes and evaluates a new method for Web archiving. To reduce response time for serving webpages, Web Servers store recently rendered pages in memory. This process is known as caching. We modify this caching mechanism of Web Servers for archival. We then experimentally evaluate the impact of our archival technique on Web Servers. We observe that the time to render a Web page increases slightly as long as the Web Server is under moderate load. Through our experiments, we establish limits on the maximum requests a Web Server can handle without increasing the response time. We ensure our experiments are conducted on Web Servers using technologies that are widely used today. Thus our results should have broad impact. Information Retrieval Transactional Web Archiving Caching Benchmarking Wikipedia
545	Information Retrieval Models for Software Test Selection and Prioritization Gådin, Oskar January 2024 (has links) There are a lot of software systems currently in use for different applications. To make sure that these systems function there is a need to properly test and maintain them.When a system grows in scope it becomes more difficult to test and maintain, and so test selection and prioritization tools that incorporate artificial intelligence, information retrieval and natural language processing are useful. In this thesis, different information retrieval models were implemented and evaluated using multiple datasets based on different filters and pre-processing methods. The data was provided by Westermo Network Technologies AB and represent one of their systems. The datasets contained data with information about the test results and what data was used for the tests. The results showed that for models that are not trained for this data it is more beneficial to give them less data which is only related to test failures. Allowing the models to have access to more data showed that they made connections that were inaccurate as the data were unrelated. The results also showed that if a model is not adjusted to the data, a simple model could be more effective compared to a more advanced model. / Det finns många mjukvarusystem som för närvarande används för olika tjänster. För att säkerställa att dessa system fungerar korrekt är det nödvändigt att testa och underhålla dem ordentligt.När ett system växer i omfattning blir det svårare att testa och underhålla, och testverktyg för testselektion och testprioritering som integrerar artificiell intelligens, informationssökning och natural language processing är därför användbara. I denna rapport utvärderades olika informationssökningsmodeller med hjälp av olika dataset som är baserade på olika filter och förbehandlingsmetoder. Datan tillhandahölls av Westermo Network Technologies AB och representerar ett av deras system. Dataseten innehöll data med information om testresultaten och vilken data som användes för testen. Resultaten visade att för modeller som inte är tränade för denna data är det mer fördelaktigt att ge dem mindre data som endast är relaterade till testfel. Att ge modellerna tillgång till mer data visade att de gjorde felaktiga kopplingar eftersom data var orelaterad. Resultaten visade också att givet en modell inte var justerad mot data, kunde en enklare modell vara mer effektiv än en mer avancerad modell. Information retrieval Natural language processing Computer and Information Sciences Data- och informationsvetenskap
546	Query Expansion Study for Clinical Decision Support Zhuang, Wenjie 12 February 2018 (has links) Information retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature. Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly. / Master of Science / Information retrieval is widely used for retrieving relevant information among a variety of data. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However the actual promise of this area remains unfulfilled due to certain properties of medical data and the sheer volume of medical literature. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. This thesis presents several ways to implement query expansion in order to make more efficient retrieval. Then this thesis discusses some approaches to put documents relevant to the queries at the top. Query Expansion Information Retrieval Doc2Vec MeSH Term Learning to Rank
547	A Generic Approach to Component-Level Evaluation in Information Retrieval Kürsten, Jens 19 November 2012 (has links) (PDF) Research in information retrieval deals with the theories and models that constitute the foundations for any kind of service that provides access or pointers to particular elements of a collection of documents in response to a submitted information need. The specific field of information retrieval evaluation is concerned with the critical assessment of the quality of search systems. Empirical evaluation based on the Cranfield paradigm using a specific collection of test queries in combination with relevance assessments in a laboratory environment is the classic approach to compare the impact of retrieval systems and their underlying models on retrieval effectiveness. In the past two decades international campaigns, like the Text Retrieval Conference, have led to huge advances in the design of experimental information retrieval evaluations. But in general the focus of this system-driven paradigm remained on the comparison of system results, i.e. retrieval systems are treated as black boxes. This approach to the evaluation of retrieval system has been criticised for treating systems as black boxes. Recent works on this subject have proposed the study of the system configurations and their individual components. This thesis proposes a generic approach to the evaluation of retrieval systems at the component-level. The focus of the thesis at hand is on the key components that are needed to address typical ad-hoc search tasks, like finding books on a particular topic in a large set of library records. A central approach in this work is the further development of the Xtrieval framework by the integration of widely-used IR toolkits in order to eliminate the limitations of individual tools. Strong empirical results at international campaigns that provided various types of evaluation tasks confirm both the validity of this approach and the flexibility of the Xtrieval framework. Modern information retrieval systems contain various components that are important for solving particular subtasks of the retrieval process. This thesis illustrates the detailed analysis of important system components needed to address ad-hoc retrieval tasks. Here, the design and implementation of the Xtrieval framework offers a variety of approaches for flexible system configurations. Xtrieval has been designed as an open system and allows the integration of further components and tools as well as addressing search tasks other than ad-hoc retrieval. This approach ensures that it is possible to conduct automated component-level evaluation of retrieval approaches. Both the scale and impact of these possibilities for the evaluation of retrieval systems are demonstrated by the design of an empirical experiment that covers more than 13,000 individual system configurations. This experimental set-up is tested on four test collections for ad-hoc search. The results of this experiment are manifold. For instance, particular implementations of ranking models fail systematically on all tested collections. The exploratory analysis of the ranking models empirically confirms the relationships between different implementations of models that share theoretical foundations. The obtained results also suggest that the impact on retrieval effectiveness of most instances of IR system components depends on the test collections that are being used for evaluation. Due to the scale of the designed component-level evaluation experiment, not all possible interactions of the system component under examination could be analysed in this work. For this reason the resulting data set will be made publicly available to the entire research community. / Das Forschungsgebiet Information Retrieval befasst sich mit Theorien und Modellen, die die Grundlage für jegliche Dienste bilden, die als Antwort auf ein formuliertes Informationsbedürfnis den Zugang zu oder einen Verweis auf entsprechende Elemente einer Dokumentsammlung ermöglichen. Die Qualität von Suchalgorithmen wird im Teilgebiet Information Retrieval Evaluation untersucht. Der klassische Ansatz für den empirischen Vergleich von Retrievalsystemen basiert auf dem Cranfield-Paradigma und nutzt einen spezifischen Korpus mit einer Menge von Beispielanfragen mit zugehörigen Relevanzbewertungen. Internationale Evaluationskampagnen, wie die Text Retrieval Conference, haben in den vergangenen zwei Jahrzehnten zu großen Fortschritten in der Methodik der empirischen Bewertung von Suchverfahren geführt. Der generelle Fokus dieses systembasierten Ansatzes liegt jedoch nach wie vor auf dem Vergleich der Gesamtsysteme, dass heißt die Systeme werden als Black Box betrachtet. In jüngster Zeit ist diese Evaluationsmethode vor allem aufgrund des Black-Box-Charakters des Untersuchungsgegenstandes in die Kritik geraten. Aktuelle Arbeiten fordern einen differenzierteren Blick in die einzelnen Systemeigenschaften, bzw. ihrer Komponenten. In der vorliegenden Arbeit wird ein generischer Ansatz zur komponentenbasierten Evaluation von Retrievalsystemen vorgestellt und empirisch untersucht. Der Fokus der vorliegenden Dissertation liegt deshalb auf zentralen Komponenten, die für die Bearbeitung klassischer Ad-Hoc Suchprobleme, wie dem Finden von Büchern zu einem bestimmten Thema in einer Menge von Bibliothekseinträgen, wichtig sind. Ein zentraler Ansatz der Arbeit ist die Weiterentwicklung des Xtrieval Frameworks mittels der Integration weitverbreiteter Retrievalsysteme mit dem Ziel der gegenseitigen Eliminierung systemspezifischer Schwächen. Herausragende Ergebnisse im internationalen Vergleich, für verschiedenste Suchprobleme, verdeutlichen sowohl das Potenzial des Ansatzes als auch die Flexibilität des Xtrieval Frameworks. Moderne Retrievalsysteme beinhalten zahlreiche Komponenten, die für die Lösung spezifischer Teilaufgaben im gesamten Retrievalprozess wichtig sind. Die hier vorgelegte Arbeit ermöglicht die genaue Betrachtung der einzelnen Komponenten des Ad-hoc Retrievals. Hierfür wird mit Xtrieval ein Framework dargestellt, welches ein breites Spektrum an Verfahren flexibel miteinander kombinieren lässt. Das System ist offen konzipiert und ermöglicht die Integration weiterer Verfahren sowie die Bearbeitung weiterer Retrievalaufgaben jenseits des Ad-hoc Retrieval. Damit wird die bislang in der Forschung verschiedentlich geforderte aber bislang nicht erfolgreich umgesetzte komponentenbasierte Evaluation von Retrievalverfahren ermöglicht. Mächtigkeit und Bedeutung dieser Evaluationsmöglichkeiten werden anhand ausgewählter Instanzen der Komponenten in einer empirischen Analyse mit über 13.000 Systemkonfigurationen gezeigt. Die Ergebnisse auf den vier untersuchten Ad-Hoc Testkollektionen sind vielfältig. So wurden beispielsweise systematische Fehler bestimmter Ranking-Modelle identifiziert und die theoretischen Zusammenhänge zwischen spezifischen Klassen dieser Modelle anhand empirischer Ergebnisse nachgewiesen. Der Maßstab des durchgeführten Experiments macht eine Analyse aller möglichen Einflüsse und Zusammenhänge zwischen den untersuchten Komponenten unmöglich. Daher werden die erzeugten empirischen Daten für weitere Studien öffentlich bereitgestellt. komponentenbasierte Analyse Software Framework Component-Level Evaluation ddc:004 ddc:020 Information-Retrieval-System Metasuchmaschine Evaluation Information Retrieval
548	Recuperação de informação: análise sobre a contribuição da ciência da computação para a ciência da informação / Information Retrieval: analysis about the contribution of Computer Science to Information Science Ferneda, Edberto 15 December 2003 (has links) Desde o seu nascimento, a Ciência da Informação vem estudando métodos para o tratamento automático da informação. Esta pesquisa centrou-se na Recuperação de Informação, área que envolve a aplicação de métodos computacionais no tratamento e recuperação da informação, para avaliar em que medida a Ciência da Computação contribui para o avanço da Ciência da Informação. Inicialmente a Recuperação de Informação é contextualizada no corpo interdisciplinar da Ciência da Informação e são apresentados os elementos básicos do processo de recuperação de informação. Os modelos computacionais de recuperação de informação são analisados a partir da categorização em quantitativos" e dinâmicos". Algumas técnicas de processamento da linguagem natural utilizadas na recuperação de informação são igualmente discutidas. No contexto atual da Web são apresentadas as técnicas de representação e recuperação da informação desde os mecanismos de busca até a Web Semântica. Conclui-se que, apesar da inquestionável importância dos métodos e técnicas computacionais no tratamento da informação, estas se configuram apenas como ferramentas auxiliares, pois utilizam uma conceituação de informação" extremamente restrita em relação àquela utilizada pela Ciência da Informação / Since its birth, Information Science has been studying methods for the automatic treatment of information. This research has focused on Information Retrieval, an area that involves the application of computational methods in the treatment and retrieval of information, in order to assess how Computer Science contributes to the progress of Information Science. Initially, Information Retrieval is contextualized in the interdisciplinary body of Information Science and, after that, the basic elements of the information retrieval process are presented. Computational models related to information retrieval are analyzed according to "quantitative" and "dynamic" categories. Some natural language processing techniques used in information retrieval are equally discussed. In the current context of the Web, the techniques of information retrieval are presented, from search engines to the Semantic Web. It can be concluded that in spite of the unquestionable importance of the computational methods and techniques for dealing with information, they are regarded only as auxiliary tools, because their concept of "information" is extremely restrict in relation to that used by the Information Science. Ciência da Computação Ciência da Informação Computer Science Informação Information Information Retrieval Information Retrieval Models Information Science Modelos de recuperação de informação Recuperação de Informação
549	Interactive analogical retrieval: practice, theory and technology Vattam, Swaroop 24 August 2012 (has links) Analogy is ubiquitous in human cognition. One of the important questions related to understanding the situated nature of analogy-making is how people retrieve source analogues via their interactions with external environments. This dissertation studies interactive analogical retrieval in the context of biologically inspired design (BID). BID involves creative use of analogies to biological systems to develop solutions for complex design problems (e.g., designing a device for acquiring water in desert environments based on the analogous fog-harvesting abilities of the Namibian Beetle). Finding the right biological analogues is one of the critical first steps in BID. Designers routinely search online in order to find their biological sources of inspiration. But this task of online bio-inspiration seeking represents an instance of interactive analogical retrieval that is extremely time consuming and challenging to accomplish. This dissertation focuses on understanding and supporting the task of online bio-inspiration seeking. Through a series of field studies, this dissertation uncovered the salient characteristics and challenges of online bio-inspiration seeking. An information-processing model of interactive analogical retrieval was developed in order to explain those challenges and to identify the underlying causes. A set of measures were put forth to ameliorate those challenges by targeting the identified causes. These measures were then implemented in an online information-seeking technology designed to specifically support the task of online bio-inspiration seeking. Finally, the validity of the proposed measures was investigated through a series of experimental studies and a deployment study. The trends are encouraging and suggest that the proposed measures has the potential to change the dynamics of online bio-inspiration seeking in favor of ameliorating the identified challenges of online bio-inspiration seeking. Analogy Information foraging HCI Design cognition Model-based tagging Biologically-inspired computing Knowledge, Theory of Information retrieval Information retrieval Computer programs
550	Kontextbasiertes Information-Retrieval : Modell, Konzeption und Realisierung kontextbasierter Information-Retrieval-Systeme / Morgenroth, Karlheinz. January 2006 (has links) Universiẗat, Diss., 2006--Bamberg.

Search results