Global ETD Search

1	Execution performance issues in full-text information retrieval Brown, Eric William 01 January 1996 (has links) The task of an information retrieval system is to identify documents that will satisfy a user's information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and measuring similarity between the two. The maturity and proven effectiveness of these systems has resulted in demand for increased capacity, performance, scalability, and functionality, especially as information retrieval is integrated into more traditional database management environments. In this dissertation we explore a number of functionality and performance issues in information retrieval. First, we consider creation and modification of the document collection, concentrating on management of the inverted file index. An inverted file architecture based on a persistent object store is described and experimental results are presented for inverted file creation and modification. Our architecture provides performance that scales well with document collection size and the database features supported by the persistent object store provide many solutions to issues that arise during integration of information retrieval into more general database environments. We then turn to query evaluation speed and introduce a new optimization technique for statistical ranking retrieval systems that support structured queries. Experimental results from a variety of query sets show that execution time can be reduced by more than 50% with no noticeable impact on retrieval effectiveness, making these more complex retrieval models attractive alternatives for environments that demand high performance. Read more Computer science\|Information Systems
2	Adaptive query modification in a probabilistic information retrieval model Haines, David Leon 01 January 1996 (has links) There is a vast amount of information available with the aid of computers. It is now far easier to make information available on a CD-ROM or on the Internet than it is to find specific information to fill someone's need. To expect all users to be experts in negotiating the vast amount of available data is unrealistic. Information retrieval systems are designed to help users sort through this sea of text and find the documents that best meet their needs. Information retrieval systems search for documents that match a user's information need based on some user-supplied representation of that need. One important consideration is that the naive users, the ones who most need help, are unlikely to be able to express their need in the best possible way. The specification of the user's query is a difficult task for the user to do well and for the system to understand completely. One important source of information about the user's need is a collection of example documents that illustrate how the user's need can be met. These documents not only provide more information than the user could possibly specify directly, they are also often possible to obtain at a low cost. In this dissertation, a probabilistic theory of how to utilize information available in example documents to automatically improve a user's query and to thereby improve the effectiveness of the information retrieval system is described. This has been done by extending the inference network model of information retrieval developed by Turtle and Croft (47) by adding the mechanism of annotated inference networks and by providing methods to measure and control the contribution of individual components of a query. The research described here not only provides a sound theoretical understanding of how to extract information from example documents but also suggests methods that lead to practical improvements in performance. Read more Computer science\|Information Systems
3	Supporting connection mobility in wireless networks Ramjee, Ramachandran 01 January 1997 (has links) A multimedia connection in a wireless network typically utilizes three important network resources: wireless link resources, wired link resources and network server resources. When the users participating in the connection are mobile, these resources must be reallocated as the users move in a manner so that the connection is not disrupted. This dissertation contributes a set of algorithms for supporting connection mobility through efficient and, in certain cases, optimal use of these network resources. In the first part of this thesis, we examine various techniques for allocating wireless channel resources to connections. We define three important practical problems in channel allocation faced by network engineers. We then derive new and optimal admission control policies for each of these problems. We further show that the optimal policies provide significant performance gains over other previously proposed policies. We also develop computationally-efficient algorithms for deploying these optimal policies in real-time at the base-stations. In the second part of this thesis, we examine ways of rerouting the connections of mobile users so that the wired link resources are utilized efficiently. We propose, implement, and experimentally and analytically evaluate the performance of several connection rerouting schemes. Our study shows that one of our schemes is particularly well suited for performing connection rerouting. This scheme operates in two phases: a real-time phase where a reroute operation is executed without causing any disruption to user traffic, and a non-real-time phase where more efficient reroutes are effected. In the third and final part of this thesis, we examine ways of efficiently utilizing the computational resources in the network. We study policies for migrating user agents, which act as proxies for mobile users, as users move. We show that two simple threshold policies that we propose, a Count policy which limits the number of agents in each server and a Distance policy which gives preference to migration of agents that are farther away from their users, deliver excellent performance across a wide range of system parameters and configurations. Read more Computer science\|Information Systems
4	Solving the word mismatch problem through automatic text analysis Xu, Jinxi 01 January 1997 (has links) Information Retrieval (IR) is concerned with locating documents that are relevant for a user's information need or query from a large collection of documents. A fundamental problem for information retrieval is word mismatch. A query is usually a short and incomplete description of the underlying information need. The users of IR systems and the authors of the documents often use different words to refer to the same concepts. This thesis addresses the word mismatch problem through automatic text analysis. We investigate two text analysis techniques, corpus analysis and local context analysis, and apply them in two domains of word mismatch, stemming and general query expansion. Experimental results show that these techniques can result in more effective retrieval. Computer science\|Information Systems
5	A language modeling approach to information retrieval Ponte, Jay Michael 01 January 1998 (has links) In today's world, there is no shortage of information. However, for a specific information need, only a small subset of all of the available information will be useful. The field of information retrieval (IR) is the study of methods to provide users with that small subset of information relevant to their needs and to do so in a timely fashion. Information sources can take many forms, but this thesis will focus on text based information systems and investigate problems germane to the retrieval of written natural language documents. Central to these problems is the notion of "topic." In other words, what are documents about? However, topics depend on the semantics of documents and retrieval systems are not endowed with knowledge of the semantics of natural language. The approach taken in this thesis will be to make use of probabilistic language models to investigate text based information retrieval and related problems. One such problem is the prediction of topic shifts in text, the topic segmentation problem. It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection. Two complementary sets of features are studied individually and then combined into a single language model. The language modeling approach allows this problem to be approached in a principled way without complex semantic modeling. Next, the problem of document retrieval in response to a user query will be investigated. Models of document indexing and document retrieval have been extensively studied over the past three decades. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. Much of the reason for this is that the indexing component requires inferences as to the semantics of documents. Instead, an approach to retrieval based on probabilistic language modeling will be presented. Models are estimated for each document individually. The approach to modeling is non-parametric and integrates the entire retrieval process into a single model. One advantage of this approach is that collection statistics, which are used heuristically for the assignment of concept probabilities in other probabilistic models, are used directly in the estimation of language model probabilities in this approach. The language modeling approach has been implemented and tested empirically and performs very well on standard test collections and query sets. In order to improve retrieval effectiveness, IR systems use additional techniques such as relevance feedback, unsupervised query expansion and structured queries. These and other techniques are discussed in terms of the language modeling approach and empirical results are given for several of the techniques developed. These results provide further proof of concept for the use of language models for retrieval tasks. Read more Computer science\|Information Systems
6	Network support for applications requiring quality of service in heterogeneous environments Firoiu, Victor 01 January 1998 (has links) Group communication, be it one-to-many (such as TV broadcasting) or many-to-many (such as teleconferencing) is becoming increasingly important because it enables the widespread dissemination of information (such as in today's Word Wide Web) and the collaboration between remote groups. This kind of communication can be supported efficiently in digital networks through multicasting, a technique of non-redundant simultaneous data transmission from a sender to a set of receivers. Multicast applications such as voice and video require Quality of Service guarantees (such as maximum packet delay, packet loss probability), which can be provided by reserving network resources. In this dissertation we propose solutions to several critical problems of multicasting in heterogeneous environments: differences in network resource availability, differences in receiver Quality of Service requirements, differences in network resource availability and differences in resource reservation protocols. In the first part of the dissertation we consider the problem of resource reservation for multicast sessions in the context of both network and receiver heterogeneity. We develop centralized and distributed algorithms that accommodate this heterogeneity by performing a differentiated per-link resource reservation. We apply these algorithms in the context of packetized voice and MPEG video multicast connections over wide area networks. We find that our algorithms enable a network to carry as much as a 50% more traffic compared to the case where the network does not accommodate heterogeneity. In the second part of the dissertation we present algorithms for local (link) admission control and resource reservation at an Earliest Deadline First packet scheduler that provides heterogeneous packet delay guarantees at a link. When the data transmission is characterized by piecewise linear traffic envelopes, we show that the algorithms have very low computational complexity and thus, practical applicability. In the third part of the dissertation we focus on resource reservation protocols in the heterogeneous environment of IP over ATM networks. We describe a method for establishing reservations in the ATM network for IP flows (named ATM shortcutting). This method provides better performance to IP flows by avoiding the IP processing of IP packets, and better utilization of ATM network resources. In the last part of the dissertation we quantify the improvement in utilization of IP/ATM network when using ATM shortcutting. We present methods to evaluate this benefit given an IP/ATM network topology, link capacities and traffic patterns. We use this methods in simulation experiments using random networks. These experiments indicate that in many cases ATM shortcutting brings benefits in network utilization when it decreases the average length of network routes. Read more Computer science\|Information Systems
7	A generative theory of relevance Lavrenko, Victor 01 January 2004 (has links) We present a new theory of relevance for the field of Information Retrieval. Relevance is viewed as a generative process, and we hypothesize that both user queries and relevant documents represent random observations from that process. Based on this view, we develop a formal retrieval model that has direct applications to a wide range of search scenarios. The new model substantially outperforms strong baselines on the tasks of ad-hoc retrieval, cross-language retrieval, handwriting retrieval, automatic image annotation, video retrieval, and topic detection and tracking. Empirical success of our approach is due to a new technique we propose for modeling exchangeable sequences of discrete random variables. The new technique represents an attractive counterpart to existing formulations, such as multinomial mixtures, pLSI and LDA: it is effective, easy to train, and makes no assumptions about the geometric structure of the data. Computer science\|Information systems
8	NEUROLINGUISTICALLY CONSTRAINED SIMULATION OF SENTENCE COMPREHENSION: INTEGRATING ARTIFICIAL INTELLIGENCE AND BRAIN THEORY GIGLEY, HELEN MUELLER 01 January 1982 (has links) An artificial intelligence approach to the simulation of neurolinguistically constrained processes in sentence comprehension is developed using control strategies for simulation of cooperative computation in associative networks. The desirability of this control strategy in contrast to ATN and production system strategies is explained. A first pass implementation of HOPE, an artificial intelligence simulation model of sentence comprehension, constrained by studies of aphasic performance, psycholinguistics, neurolinguistics, and linguistic theory is described. Claims that the model could serve as a basis for sentence production simulation and for a model of language acquisition as associative learning are discussed. HOPE is a model that performs in a "normal" state and includes a "lesion" simulation facility. HOPE is also a research tool. Its modifiability and use as a tool to investigate hypothesized "causes" of degradation in comprehension performance by aphasic patients are described. Issues of using behavioral constraints in modelling and obtaining appropriate data for simulated process modelling are discussed. Finally, problems of validation of the simulation results are raised; and issues of how to interpret clinical results to define the evolution of the model are discussed. Conclusions with respect to the feasibility of artificial intelligence simulation process modelling are discussed based on the current state of the research. The significance of the research for artificial intelligence techniques, the need for AI simulation models, the use of such models as investigative tools, the potential use for enriching our understanding of the brain and its function, and the potential for contributing to better understanding of aphasic performance leading to enhanced therapy, together suggest many exciting prospects for future development. Read more Computer science\|Information Systems
9	Inference networks for document retrieval Turtle, Howard Robert 01 January 1991 (has links) Information retrieval is concerned with selecting documents from a collection that will be of interest to a user with a stated information need or query. Research aimed at improving the performance of retrieval systems, that is, selecting those documents most likely to match the user's information need, remains an area of considerable theoretical and practical importance. This dissertation describes a new formal retrieval model that uses probabilistic inference networks to represent documents and information needs. Retrieval is viewed as an evidential reasoning process in which multiple sources of evidence about document and query content are combined to estimate the probability that a given document matches a query. This model generalizes several current retrieval models and provides a framework within which disparate information retrieval research results can be integrated. To test the effectiveness of the inference network model, a retrieval system based on the model was implemented. Two test collections were built and used to compare retrieval performance with that of conventional retrieval models. The inference network model gives substantial improvements in retrieval performance with computational costs that are comparable to those associated with conventional retrieval models and which are feasible for large collections. Read more Computer science\|Information Systems
10	Sentence level information patterns for novelty detection Li, Xiaoyan 01 January 2006 (has links) The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user's information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine "what is novelty detection" in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses' information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested. Read more Computer science\|Information systems

1	Execution performance issues in full-text information retrieval Brown, Eric William 01 January 1996 (has links) The task of an information retrieval system is to identify documents that will satisfy a user's information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and measuring similarity between the two. The maturity and proven effectiveness of these systems has resulted in demand for increased capacity, performance, scalability, and functionality, especially as information retrieval is integrated into more traditional database management environments. In this dissertation we explore a number of functionality and performance issues in information retrieval. First, we consider creation and modification of the document collection, concentrating on management of the inverted file index. An inverted file architecture based on a persistent object store is described and experimental results are presented for inverted file creation and modification. Our architecture provides performance that scales well with document collection size and the database features supported by the persistent object store provide many solutions to issues that arise during integration of information retrieval into more general database environments. We then turn to query evaluation speed and introduce a new optimization technique for statistical ranking retrieval systems that support structured queries. Experimental results from a variety of query sets show that execution time can be reduced by more than 50% with no noticeable impact on retrieval effectiveness, making these more complex retrieval models attractive alternatives for environments that demand high performance. Read more Computer science\|Information Systems
2	Adaptive query modification in a probabilistic information retrieval model Haines, David Leon 01 January 1996 (has links) There is a vast amount of information available with the aid of computers. It is now far easier to make information available on a CD-ROM or on the Internet than it is to find specific information to fill someone's need. To expect all users to be experts in negotiating the vast amount of available data is unrealistic. Information retrieval systems are designed to help users sort through this sea of text and find the documents that best meet their needs. Information retrieval systems search for documents that match a user's information need based on some user-supplied representation of that need. One important consideration is that the naive users, the ones who most need help, are unlikely to be able to express their need in the best possible way. The specification of the user's query is a difficult task for the user to do well and for the system to understand completely. One important source of information about the user's need is a collection of example documents that illustrate how the user's need can be met. These documents not only provide more information than the user could possibly specify directly, they are also often possible to obtain at a low cost. In this dissertation, a probabilistic theory of how to utilize information available in example documents to automatically improve a user's query and to thereby improve the effectiveness of the information retrieval system is described. This has been done by extending the inference network model of information retrieval developed by Turtle and Croft (47) by adding the mechanism of annotated inference networks and by providing methods to measure and control the contribution of individual components of a query. The research described here not only provides a sound theoretical understanding of how to extract information from example documents but also suggests methods that lead to practical improvements in performance. Read more Computer science\|Information Systems
3	Supporting connection mobility in wireless networks Ramjee, Ramachandran 01 January 1997 (has links) A multimedia connection in a wireless network typically utilizes three important network resources: wireless link resources, wired link resources and network server resources. When the users participating in the connection are mobile, these resources must be reallocated as the users move in a manner so that the connection is not disrupted. This dissertation contributes a set of algorithms for supporting connection mobility through efficient and, in certain cases, optimal use of these network resources. In the first part of this thesis, we examine various techniques for allocating wireless channel resources to connections. We define three important practical problems in channel allocation faced by network engineers. We then derive new and optimal admission control policies for each of these problems. We further show that the optimal policies provide significant performance gains over other previously proposed policies. We also develop computationally-efficient algorithms for deploying these optimal policies in real-time at the base-stations. In the second part of this thesis, we examine ways of rerouting the connections of mobile users so that the wired link resources are utilized efficiently. We propose, implement, and experimentally and analytically evaluate the performance of several connection rerouting schemes. Our study shows that one of our schemes is particularly well suited for performing connection rerouting. This scheme operates in two phases: a real-time phase where a reroute operation is executed without causing any disruption to user traffic, and a non-real-time phase where more efficient reroutes are effected. In the third and final part of this thesis, we examine ways of efficiently utilizing the computational resources in the network. We study policies for migrating user agents, which act as proxies for mobile users, as users move. We show that two simple threshold policies that we propose, a Count policy which limits the number of agents in each server and a Distance policy which gives preference to migration of agents that are farther away from their users, deliver excellent performance across a wide range of system parameters and configurations. Read more Computer science\|Information Systems
4	Solving the word mismatch problem through automatic text analysis Xu, Jinxi 01 January 1997 (has links) Information Retrieval (IR) is concerned with locating documents that are relevant for a user's information need or query from a large collection of documents. A fundamental problem for information retrieval is word mismatch. A query is usually a short and incomplete description of the underlying information need. The users of IR systems and the authors of the documents often use different words to refer to the same concepts. This thesis addresses the word mismatch problem through automatic text analysis. We investigate two text analysis techniques, corpus analysis and local context analysis, and apply them in two domains of word mismatch, stemming and general query expansion. Experimental results show that these techniques can result in more effective retrieval. Computer science\|Information Systems
5	A language modeling approach to information retrieval Ponte, Jay Michael 01 January 1998 (has links) In today's world, there is no shortage of information. However, for a specific information need, only a small subset of all of the available information will be useful. The field of information retrieval (IR) is the study of methods to provide users with that small subset of information relevant to their needs and to do so in a timely fashion. Information sources can take many forms, but this thesis will focus on text based information systems and investigate problems germane to the retrieval of written natural language documents. Central to these problems is the notion of "topic." In other words, what are documents about? However, topics depend on the semantics of documents and retrieval systems are not endowed with knowledge of the semantics of natural language. The approach taken in this thesis will be to make use of probabilistic language models to investigate text based information retrieval and related problems. One such problem is the prediction of topic shifts in text, the topic segmentation problem. It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection. Two complementary sets of features are studied individually and then combined into a single language model. The language modeling approach allows this problem to be approached in a principled way without complex semantic modeling. Next, the problem of document retrieval in response to a user query will be investigated. Models of document indexing and document retrieval have been extensively studied over the past three decades. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. Much of the reason for this is that the indexing component requires inferences as to the semantics of documents. Instead, an approach to retrieval based on probabilistic language modeling will be presented. Models are estimated for each document individually. The approach to modeling is non-parametric and integrates the entire retrieval process into a single model. One advantage of this approach is that collection statistics, which are used heuristically for the assignment of concept probabilities in other probabilistic models, are used directly in the estimation of language model probabilities in this approach. The language modeling approach has been implemented and tested empirically and performs very well on standard test collections and query sets. In order to improve retrieval effectiveness, IR systems use additional techniques such as relevance feedback, unsupervised query expansion and structured queries. These and other techniques are discussed in terms of the language modeling approach and empirical results are given for several of the techniques developed. These results provide further proof of concept for the use of language models for retrieval tasks. Read more Computer science\|Information Systems
6	Network support for applications requiring quality of service in heterogeneous environments Firoiu, Victor 01 January 1998 (has links) Group communication, be it one-to-many (such as TV broadcasting) or many-to-many (such as teleconferencing) is becoming increasingly important because it enables the widespread dissemination of information (such as in today's Word Wide Web) and the collaboration between remote groups. This kind of communication can be supported efficiently in digital networks through multicasting, a technique of non-redundant simultaneous data transmission from a sender to a set of receivers. Multicast applications such as voice and video require Quality of Service guarantees (such as maximum packet delay, packet loss probability), which can be provided by reserving network resources. In this dissertation we propose solutions to several critical problems of multicasting in heterogeneous environments: differences in network resource availability, differences in receiver Quality of Service requirements, differences in network resource availability and differences in resource reservation protocols. In the first part of the dissertation we consider the problem of resource reservation for multicast sessions in the context of both network and receiver heterogeneity. We develop centralized and distributed algorithms that accommodate this heterogeneity by performing a differentiated per-link resource reservation. We apply these algorithms in the context of packetized voice and MPEG video multicast connections over wide area networks. We find that our algorithms enable a network to carry as much as a 50% more traffic compared to the case where the network does not accommodate heterogeneity. In the second part of the dissertation we present algorithms for local (link) admission control and resource reservation at an Earliest Deadline First packet scheduler that provides heterogeneous packet delay guarantees at a link. When the data transmission is characterized by piecewise linear traffic envelopes, we show that the algorithms have very low computational complexity and thus, practical applicability. In the third part of the dissertation we focus on resource reservation protocols in the heterogeneous environment of IP over ATM networks. We describe a method for establishing reservations in the ATM network for IP flows (named ATM shortcutting). This method provides better performance to IP flows by avoiding the IP processing of IP packets, and better utilization of ATM network resources. In the last part of the dissertation we quantify the improvement in utilization of IP/ATM network when using ATM shortcutting. We present methods to evaluate this benefit given an IP/ATM network topology, link capacities and traffic patterns. We use this methods in simulation experiments using random networks. These experiments indicate that in many cases ATM shortcutting brings benefits in network utilization when it decreases the average length of network routes. Read more Computer science\|Information Systems
7	A generative theory of relevance Lavrenko, Victor 01 January 2004 (has links) We present a new theory of relevance for the field of Information Retrieval. Relevance is viewed as a generative process, and we hypothesize that both user queries and relevant documents represent random observations from that process. Based on this view, we develop a formal retrieval model that has direct applications to a wide range of search scenarios. The new model substantially outperforms strong baselines on the tasks of ad-hoc retrieval, cross-language retrieval, handwriting retrieval, automatic image annotation, video retrieval, and topic detection and tracking. Empirical success of our approach is due to a new technique we propose for modeling exchangeable sequences of discrete random variables. The new technique represents an attractive counterpart to existing formulations, such as multinomial mixtures, pLSI and LDA: it is effective, easy to train, and makes no assumptions about the geometric structure of the data. Computer science\|Information systems
8	NEUROLINGUISTICALLY CONSTRAINED SIMULATION OF SENTENCE COMPREHENSION: INTEGRATING ARTIFICIAL INTELLIGENCE AND BRAIN THEORY GIGLEY, HELEN MUELLER 01 January 1982 (has links) An artificial intelligence approach to the simulation of neurolinguistically constrained processes in sentence comprehension is developed using control strategies for simulation of cooperative computation in associative networks. The desirability of this control strategy in contrast to ATN and production system strategies is explained. A first pass implementation of HOPE, an artificial intelligence simulation model of sentence comprehension, constrained by studies of aphasic performance, psycholinguistics, neurolinguistics, and linguistic theory is described. Claims that the model could serve as a basis for sentence production simulation and for a model of language acquisition as associative learning are discussed. HOPE is a model that performs in a "normal" state and includes a "lesion" simulation facility. HOPE is also a research tool. Its modifiability and use as a tool to investigate hypothesized "causes" of degradation in comprehension performance by aphasic patients are described. Issues of using behavioral constraints in modelling and obtaining appropriate data for simulated process modelling are discussed. Finally, problems of validation of the simulation results are raised; and issues of how to interpret clinical results to define the evolution of the model are discussed. Conclusions with respect to the feasibility of artificial intelligence simulation process modelling are discussed based on the current state of the research. The significance of the research for artificial intelligence techniques, the need for AI simulation models, the use of such models as investigative tools, the potential use for enriching our understanding of the brain and its function, and the potential for contributing to better understanding of aphasic performance leading to enhanced therapy, together suggest many exciting prospects for future development. Read more Computer science\|Information Systems
9	Inference networks for document retrieval Turtle, Howard Robert 01 January 1991 (has links) Information retrieval is concerned with selecting documents from a collection that will be of interest to a user with a stated information need or query. Research aimed at improving the performance of retrieval systems, that is, selecting those documents most likely to match the user's information need, remains an area of considerable theoretical and practical importance. This dissertation describes a new formal retrieval model that uses probabilistic inference networks to represent documents and information needs. Retrieval is viewed as an evidential reasoning process in which multiple sources of evidence about document and query content are combined to estimate the probability that a given document matches a query. This model generalizes several current retrieval models and provides a framework within which disparate information retrieval research results can be integrated. To test the effectiveness of the inference network model, a retrieval system based on the model was implemented. Two test collections were built and used to compare retrieval performance with that of conventional retrieval models. The inference network model gives substantial improvements in retrieval performance with computational costs that are comparable to those associated with conventional retrieval models and which are feasible for large collections. Read more Computer science\|Information Systems
10	Sentence level information patterns for novelty detection Li, Xiaoyan 01 January 2006 (has links) The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user's information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine "what is novelty detection" in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses' information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested. Read more Computer science\|Information systems

Search results