Spelling suggestions: "subject:"forminformation retrieval"" "subject:"informationation retrieval""
291 |
Leveraging Structure for Effective Question AnsweringBonadiman, Daniele 25 September 2020 (has links)
In this thesis, we focus on Answer Sentence Selection (A2S) that is the core task of retrieval based question answering. A2S consists of selecting the sentences that answer user queries from a collection of documents retrieved by a search engine. Over more than two decades, several solutions based on machine learning have been proposed to solve this task, starting from simple approaches based on manual feature engineering to more complex Structural Tree Kernels models, and recently Neural Network architectures.
In particular, the latter requires little human effort as they can automatically extract relevant features from plain text. The development of neural architectures brought improvements in many areas of A2S, reaching unprecedented results. They substantially increase accuracy on almost all benchmark datasets for A2S. However, this has come with the cost of a huge increase in the number of parameters and computational costs of the models. A large number of parameters has led to two drawbacks. The model requires a massive amount of data to train effectively, and huge computational power to maintain an acceptable transaction per second in a production environment. Current state-of-the-art techniques for A2S use huge Transformer architectures, having up to 340 million parameters, pre-trained on a massive amount of data, e.g., BERT. The latter and related models in the same family, such as RoBERTa, are general architectures, i.e., they can be applied to many tasks of NLP without any architectural change.
In contrast to the trend above, we focus on specialized architectures for A2S that can effectively encode the local structure of the question and answer candidate and global information, i.e., the structure of the task and the context in which the answer candidate appears.
In particular, we propose solutions to effectively encode both the local and the global structure of A2S in efficient neural network models. (i) We encode syntactic information in a fast CNN architecture exploiting the capabilities of Structural Tree Kernel to encode the syntactic structure. (ii) We propose an efficient model that can use semantic relational information between question and answer candidates by pretraining word representations on a relational knowledge base. (iii) This efficient approach is further extended to encode each answer candidate's contextual information, encoding all answer candidates in the original context. Lastly, (iv) we propose a solution to encode task-specific structure that is available, for example, available on the community Question Answering task.
The final model, which encodes different aspects of the task, achieves state-of-the-art performance on A2S compared with other efficient architectures. The proposed model is more efficient than attention based architectures and outperforms BERT by two orders of magnitude in terms of transaction per second during training and testing, i.e., it processes 700 questions per second compared to 6 questions per second for BERT when training on a single GPU.
|
292 |
Enhancing Document Retrieval in the FinTech Domain : Applications of Advanced Language ModelsHansen, Jesper January 2024 (has links)
In this thesis, methods of creating an information retrieval (IR) model within the Fin-Tech domain are explored. Given the domain-specific and data-scarce environment, methods of artificially generating data to train and evaluate IR models are implemented and their limitations are discussed. The generative model GPT-J 6B is used to generate pseudo-queries for a document corpus, resulting in a training- and test-set of 148 and 166 query-document pairs respectively. Transformer-based models, fine-tuned- and original versions, are put to the test against the baseline model BM25 which historically has been seen as an effective document retrieval model. The models are evaluated using mean reciprocal rank at k (MRR@k) and time-cost to retrieve relevant documents. The main findings are that the historical BM25 model performs well in comparison to the transformer alternatives, it reaches the highest score for MRR@2 = 0.612. The results show that for MRR@5 and MRR@10, a combination model of BM25 and a cross encoder slightly outperforms the baseline reaching scores of MRR@5 = 0.655 and MRR@10 = 0.672. However, the increase in performance is slim and may not be enough to motivate an implementation. Finally, further research using real-world data is required to argue that transformer-based models are more robust in a real-world setting.
|
293 |
Smoothing the information seeking path: Removing representational obstacles in the middle-school digital library.Abbas, June M. 05 1900 (has links)
Middle school student's interaction within a digital library is explored. Issues of interface features used, obstacles encountered, search strategies and search techniques used, and representation obstacles are examined. A mechanism for evaluating user's descriptors is tested and effects of augmenting the system's resource descriptions with these descriptors on retrieval is explored. Transaction log data analysis (TLA) was used, with external corroborating achievement data provided by teachers. Analysis was conducted using quantitative and qualitative methods. Coding schemes for the failure analysis, search strategies and techniques analysis, as well as extent of match analysis between terms in student's questions and their search terms, and extent of match analysis between search terms and controlled vocabulary were developed. There are five chapters with twelve supporting appendixes. Chapter One presents an introduction to the problem and reviews the pilot study. Chapter Two presents the literature review and theoretical basis for the study. Chapter Three describes the research questions, hypotheses and methods. Chapter Four presents findings. Chapter Five presents a summary of the findings and their support of the hypotheses. Unanticipated findings, limitations, speculations, and areas of further research are indicated. Findings indicate that middle school users interact with the system in various sequences of patterns. User groups' interactions and scaffold use are influenced by the teacher's objectives for using the ADL. Users preferred to use single word searches over Boolean, phrase or natural language searches. Users tended to use a strategy of repeating the same exact search, instead of using the advanced scaffolds. A high percent of users attempted at least one search that included spelling or typographical errors, punctuation, or sequentially repeated searches. Search terms matched the DQ's in some instantiation 54% of all searches. Terms used by the system to represent the resources do not adequately represent the user groups' information needs, however, using student generated keywords to augment resource descriptions can have a positive effect on retrieval.
|
294 |
Building an Intelligent Filtering System Using Idea IndexingYang, Li 08 1900 (has links)
The widely used vector model maintains its popularity because of its simplicity, fast speed, and the appeal of using spatial proximity for semantic proximity. However, this model faces a disadvantage that is associated with the vagueness from keywords overlapping. Efforts have been made to improve the vector model. The research on improving document representation has been focused on four areas, namely, statistical co-occurrence of related items, forming term phrases, grouping of related words, and representing the content of documents. In this thesis, we propose the idea-indexing model to improve document representation for the filtering task in IR. The idea-indexing model matches document terms with the ideas they express and indexes the document with these ideas. This indexing scheme represents the document with its semantics instead of sets of independent terms. We show in this thesis that indexing with ideas leads to better performance.
|
295 |
Three-dimensional Information Space : An Exploration of a World Wide Web-based, Three-dimensional, Hierarchical Information Retrieval Interface Using Virtual Reality Modeling LanguageScannell, Peter 12 1900 (has links)
This study examined the differences between a 3-D, VRML search interface, similar to Cone Trees, as a front-end to Yahoo on the World Wide Web and a conventional text-based, 1-Dinterface to the same database. The study sought to determine how quickly users could find information using both interfaces, their degree of satisfaction with both search interfaces, and which interface they preferred.
|
296 |
Information fusion for monolingual and cross-language spoken document retrieval. / CUHK electronic theses & dissertations collection / Digital dissertation consortiumJanuary 2002 (has links)
Lo Wai-kit. / "October 2002." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (p. 170-184). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
|
297 |
Análise de métodos de produção de interfaces visuais para recuperação da informação /Xavier, Raphael Figueiredo. January 2009 (has links)
Orientador: Edberto Ferneda / Banca: Guilherme de Ataíde Dias / Banca: Silvana Aparecida Borseti Gregório Vidotti / Resumo: O advento da Web e o conseqüente aumento no volume de informações eletrônicas acarretaram muitos problemas em relação ao acesso, busca, localização e recuperação de informação em grandes volumes de dados. O presente trabalho realiza uma revisão dos diferentes modelos, métodos e algoritmos existentes para a geração de Interfaces Visuais para Recuperação da Informação, classificados segundo ao seu processo de produção: Análise e Transformação dos Dados, Aplicação de Algoritmos de Classificação e Distribuição Visual e Aplicação de Técnicas de Transformação Visual. Os resultados pretendem servir a outros investigadores como ferramenta para a eleição de uma ou outra combinação metodológica no desenvolvimento de propostas específicas de Interfaces Visuais para Recuperação da Informação, além de sugerir a necessidade de maiores investigações sobre novas técnicas de transformação visual. / Abstract: The advent of the Web and the consequent increase in the volume of electronic information had caused many problems about access, search, location and retrieval of information in large volumes of data. This work is a revision of the different models, methods and algorithms to create interfaces for Visual Information Retrieval, classified according to their production process: Analysis and Data Processing, Implementation of algorithms for classification and distribution of Visual and Application Processing Techniques of Visual. The results of other researchers want to serve as a tool for the election of one or another combination methodology in the development of specific proposals for visual interfaces for information retrieval, and suggest the need for more research into new techniques for processing visual. / Mestre
|
298 |
Shifts of Focus Among Dimensions of User Information Problems as Represented During Interactive Information RetrievalRobins, David B. (David Bruce) 05 1900 (has links)
The goal of this study is to increase understanding of information problems as they are revealed in interactions among users and search intermediaries during information retrieval. Specifically, this study seeks to investigate: (a) how interaction between users and search intermediaries reveals aspects of user information problems; (b) to explore the concept of representation with respect to information problems in interactive information retrieval; and (c) how user and search intermediaries focus on aspects of user information problems during the course of searches. This project extends research on interactive information retrieval, and presents a theoretical framework that synthesizes rational and non-rational questions concerning mental representation as it pertains to user's understanding of information problems.
|
299 |
A Framework for the Development of Social Linking TheoryThomas Ciszek 18 November 2005 (has links)
This paper characterizes the need for a theory that links context to information through the behaviors rooted in cultural identity and social awareness. Based on hypermedia objects and four methods of social communication, I develop a framework for a theory of social linking. This theory assumes that social interaction is the plinth from which we communicate and argues that studies in human computer interaction and information retrieval require ongoing exploration of social communication.
|
300 |
The retrieval and reuse of engineering knowledge from records of design rationaleWang, Hongwei January 2012 (has links)
No description available.
|
Page generated in 0.0998 seconds