Spelling suggestions: "subject:"forminformation retrieval"" "subject:"informationation retrieval""
201 |
Crawling, Collecting, and Condensing News CommentsGobaan, Raveendran January 2013 (has links)
Traditionally, public opinion and policy is decided by issuing surveys and performing censuses designed to measure what the public thinks about a certain topic. Within the past five years social networks such as Facebook and Twitter have gained traction for collection of public opinion about current events. Academic research on Facebook data proves difficult since the platform is generally closed. Twitter on the other hand restricts the conversation of its users making it difficult to extract large scale concepts from the microblogging infrastructure.
News comments provide a rich source of discourse from individuals who are passionate about an issue. Furthermore, due to the overhead of commenting, the population of commenters is necessarily biased towards individual who have either strong opinions of a topic or in depth knowledge of the given issue. Furthermore, their comments are often a collection of insight derived from reading multiple articles on any given topic. Unfortunately the commenting systems employed by news companies are not implemented by a single entity, and are often stored and generated using AJAX, which causes traditional crawlers to ignore them. To make matters worse they are often noisy; containing spam, poor grammar, and excessive typos. Furthermore, due to the anonymity of comment systems, conversations can often be derailed by malicious users or inherent biases in the commenters.
In this thesis we discuss the design and creation of a crawler designed to extract comments from domains across the internet. For practical purposes we create a semiautomatic parser generator and describe how our system attempts to employ user feedback to predict which remote procedure calls are used to load comments. By reducing comment systems into remote procedure calls, we simplify the internet into a much simpler space, where we can focus on the data, almost independently from its presentation. Thus we are able to quickly create high fidelity parsers to extract comments from a web page.
Once we have our system, we show the usefulness by attempting to extract meaningful opinions from the large collections we collect. Unfortunately doing so in real time is shown to foil traditional summarization systems, which are designed to handle dozens of well formed documents. In attempting to solve this problem we create a new algorithm, KLSum+, that outperforms all its competitors in efficiency while generally scoring well against the ROUGE SU4 metric. This algorithm factors in background models to boost accuracy, but performs over 50 times faster than alternatives. Furthermore, using the summaries we see that the data collected can provide useful insight into public opinion and even provide the key points of discourse.
|
202 |
An Evaluation of Contextual SuggestionDean-Hall, Adriel 21 January 2014 (has links)
This thesis examines techniques that can be used to evaluate systems that solve the complex task of suggesting points of interest to users. A traveller visiting an unfamiliar, foreign city might be looking for a place to have fun in the last few hours before returning home. Our traveller might browse various search engines and travel websites to find something that he is interested in doing, however this process is time consuming and the visitor may want to find some suggestion quickly.
We will consider the type of system that is able to handle this complex request in such a way that the user is satisfied. Because the type of suggestion one person wants will differ from the type of suggestion another person wants we will consider systems that incorporate some level of personalization. In this work we will develop user profiles that are based on real users and set up experiments that many research groups can participate in, competing to develop the best techniques for implementing this kind of system. These systems will make suggestion of attractions to visit in various different US cities to many users.
This thesis is divided into two stages. During the first stage we will look at what information will go into our user profiles and what information we need to know about the users in order to decide whether they would visit an attraction. The second stage will be deciding how to evaluate the suggestions that various systems make in order to determine which system is able to make the best suggestions.
|
203 |
Design and Evaluation of Temporal Summarization SystemsGuttikonda, Rakesh January 2014 (has links)
Temporal Summarization (TS) is a new track introduced as part of the Text REtrieval Conference (TREC) in 2013. This track aims to develop systems which can return important updates related to an event over time. In TREC 2013, the TS track specifically used disaster related events such as earthquake, hurricane, bombing, etc. This thesis mainly focuses on building an effective TS system by using a combination of Information Retrieval techniques. The developed TS system returns updates related to disaster related events in a timely manner.
By participating in TREC 2013 and with experiments conducted after TREC, we examine the effectiveness of techniques such as distributional similarity for term expansion, which can be employed in building TS systems. Also, this thesis describes the effectiveness of other techniques such as stemming, adaptive sentence selection over time and de-duplication in our system, by comparing it with other baseline systems.
The second part of the thesis examines the current methodology used for evaluating TS systems. We propose a modified evaluation method which could reduce the manual effort of assessors, and also correlates well with the official track’s evaluation. We also propose a supervised learning based evaluation method, which correlates well with the official track’s evaluation of systems and could save the assessor’s time by as much as 80%.
|
204 |
A multi-paradigm query interface for an object-oriented databaseDoan, Dac Khoa January 1996 (has links)
No description available.
|
205 |
Learning Automatic Question Answering from Community DataWang, Di 21 August 2012 (has links)
Although traditional search engines can retrieval thousands or millions of web links related to input keywords, users still need to manually locate answers to their information needs from multiple returned documents or initiate further searches. Question Answering (QA) is an effective paradigm to address this problem, which automatically finds one or more accurate and concise answers to natural language questions. Existing QA systems often rely on off-the-shelf Natural Language Processing (NLP) resources and tools that are not optimized for the QA task. Additionally, they tend to require hand-crafted rules to extract properties from input questions which, in turn, means that it would be time and manpower consuming to build comprehensive QA systems. In this thesis, we study the potentials of using the Community Question Answering (cQA) archives as a central building block of QA systems. To that end, this thesis proposes two cQA-based query expansion and structured query generation approaches, one employed in Text-based QA and the other in Ontology-based QA. In addition, based on above structured query generation method, an end-to-end open-domain Ontology-based QA is developed and evaluated on a standard factoid QA benchmark.
|
206 |
Novelty and Diversity in Retrieval EvaluationKolla, Maheedhar 21 December 2012 (has links)
Queries submitted to search engines rarely provide a complete and precise
description of a user's information need.
Most queries are ambiguous to some extent, having multiple interpretations.
For example, the seemingly unambiguous query ``tennis lessons'' might be submitted
by a user interested in attending classes in her neighborhood, seeking lessons
for her child, looking for online videos lessons, or planning to start a business
teaching tennis.
Search engines face the challenging task of satisfying different groups of users
having diverse information needs associated with a given query.
One solution is to optimize ranking functions to satisfy diverse sets of information
needs.
Unfortunately, existing evaluation frameworks do not support such optimization.
Instead, ranking functions are rewarded for satisfying the most likely intent
associated with a given query.
In this thesis, we propose a framework and associated evaluation metrics that are
capable of optimizing ranking functions to satisfy diverse information needs.
Our proposed measures explicitly reward those ranking functions capable of presenting
the user with information that is novel with respect to previously viewed
documents.
Our measures reflects quality of a ranking function by taking into account its
ability to satisfy diverse users submitting a query.
Moreover, the task of identifying and establishing test frameworks to compare
ranking functions on a web-scale can be tedious.
One reason for this problem is the dynamic nature of the web, where documents
are constantly added and updated, making it necessary for search engine developers
to seek additional human assessments.
Along with issues of novelty and diversity, we explore one approximate
approach to compare different ranking functions by overcoming the problem of
lacking complete human assessments.
We demonstrate that our approach is capable of accurately sorting ranking
functions based on their capability of satisfying diverse users, even in the
face of incomplete human assessments.
|
207 |
An ontology-driven concept-based information retrieval approach for web documentsLi, Zhan 11 1900 (has links)
Building computer agents that can utilize the meanings in the text of Web documents is a promising extension of current search technology. Concept-based information retrieval applies "intelligent" agents to identify Web documents that match user queries. A new concept-based information retrieval framework, Hybrid Ontology-based Textual Information Retrieval (HOTIR), is introduced in this thesis. HOTIR accepts conventional keyword-based queries, translates them into concept-based queries, enriches definitions of concepts with supplementary knowledge from a knowledge base, and ranks documents by aggregating "equivalent" concepts identified in them. The concept-based queries in HOTIR are organized in a hierarchy of concepts (HofC) and definitions of concepts are added from a knowledge base to enhance their meanings. The knowledge base is a modified ontology (ModOnt) that can enrich the HofC with concept definitions in the form of related-concepts, terms, their importance values, and their relations. The ModOnt relies on an adaptive assignment of term importance (AATI) scheme that continuously updates the importance of terms/concepts using Web documents. The identified concepts in a Web document that match those in the HofC are evaluated using ordered weighted averaging (OWA) operators, and documents are ranked according to the degree to which they satisfy the HofC. The case studies and experiments presented in the thesis are designed to validate the performance of HOTIR. / Computer Engineering
|
208 |
Belief Revision for Adaptive Information AgentsLau, Raymond Yiu Keung January 2003 (has links)
As the richness and diversity of information available to us in our everyday lives has expanded, so the need to manage this information grows. The lack of effective information management tools has given rise to what is colloquially known as the information overload problem. Intelligent agent technologies have been explored to develop personalised tools for autonomous information retrieval (IR). However, these so-called adaptive information agents are still primitive in terms of their learning autonomy, inference power, and explanatory capabilities. For instance, users often need to provide large amounts of direct relevance feedback to train the agents before these agents can acquire the users' specific information requirements. Existing information agents are also weak in dealing with the serendipity issue in IR because they cannot infer document relevance with respect to the possibly related IR contexts. This thesis exploits the theories and technologies from the fields of Information Retrieval (IR), Symbolic Artificial Intelligence and Intelligent Agents for the development of the next generation of adaptive information agents to alleviate the problem of information overload. In particular, the fundamental issues such as representation, learning, and classjfication (e.g., classifying documents as relevant or not) pertaining to these agents are examined. The design of the adaptive information agent model stems from a basic intuition in IR. By way of illustration, given the retrieval context involving a science student, and a query "Java", what information items should an intelligent information agent recommend to its user? The agent should recommend documents about "Computer Programming" if it believes that its user is a computer science student and every computer science student needs to learn programming. However, if the agent later discovers that its user is studying "volcanology", and the agent also believes that volcanists are interested in the volcanos in Java, the agent may recommend documents about "Merapi" (a volcano in Java with a recent eruption in 1994). This scenario illustrates that a retrieval context is not only about a set of terms and their frequencies but also the relationships among terms (e.g., java Λ science → computer, computer → programming, java Λ science Λ volcanology → merapi, etc.) In addition, retrieval contexts represented in information agents should be revised in accordance with the changing information requirements of the users. Therefore, to enhance the adaptive and proactive IR behaviour of information agents, an expressive representation language is needed to represent complex retrieval contexts and an effective learning mechanism is required to revise the agents' beliefs about the changing retrieval contexts. Moreover, a sound reasoning mechanism is essential for information agents to infer document relevance with respect to some retrieval contexts to enhance their proactiveness and learning autonomy. The theory of belief revision advocated by Alchourrón, Gärdenfors, and Makinson (AGM) provides a rigorous formal foundation to model evolving retrieval contexts in terms of changing epistemic states in adaptive information agents. The expressive power of the AGM framework allows sufficient details of retrieval contexts to be captured. Moreover, the AGM framework enforces the principles of minimal and consistent belief changes. These principles coincide with the requirements of modelling changing information retrieval contexts. The AGM belief revision logic has a close connection with the Logical Uncertainty Principle which describes the fundamental approach for logic-based IR models. Accordingly, the AGM belief functions are applied to develop the learning components of adaptive information agents. Expectationinference which is characterised by axioms leading to conservatively monotonic IR behaviour plays a significant role in developing the agents' classification components. Because of the direct connection between the AGM belief functions and the expectation inference relations, seamless integration of the information agents' learning and classification components is made possible. Essentially, the learning functions and the classification functions of adaptive information agents are conceptualised by and q d respectively. This conceptualisation can be interpreted as: (1) learning is the process of revising the representation K of a retrieval context with respect to a user's relevance feedback q which can be seen as a refined query; (2) classification is the process of determining the degree of relevance of a document d with respect to the refined query q given the agent's expectation (i.e., beliefs) K about the retrieval context. At the computational level, how to induce epistemic entrenchment which defines the AGM belief functions, and how to implement the AGM belief functions by means of an effective and efficient computational algorithm are among the core research issues addressed. Automated methods of discovering context sensitive term associations such as (computer → programming) and preclusion relations such as (volcanology ⁄→ programming) are explored. In addition, an effective classification method which is underpinned by expectation inference is developed for adaptive information agents. Last but not least, quantitative evaluations, which are based on well-known IR bench-marking processes, are applied to examine the performance of the prototype agent system. The performance of the belief revision based information agent system is compared with that of a vector space based agent system and other adaptive information filtering systems participated in TREC-7. As a whole, encouraging results are obtained from our initial experiments.
|
209 |
Automated spatial information retrieval and visualisation of spatial dataWalker, Arron R. January 2007 (has links)
An increasing amount of freely available Geographic Information System (GIS) data
on the Internet has stimulated recent research into Spatial Information Retrieval (SIR).
Typically, SIR looks at the problem of retrieving spatial data on a dataset by dataset
basis. However in practice, GIS datasets are generally not analysed in isolation. More
often than not multiple datasets are required to create a map for a particular analysis
task. To do this using the current SIR techniques, each dataset is retrieved one by one
using traditional retrieval methods and manually added to the map. To automate map
creation the traditional SIR paradigm of matching a query to a single dataset type
must be extended to include discovering relationships between different dataset types.
This thesis presents a Bayesian inference retrieval framework that will incorporate
expert knowledge in order to retrieve all relevant datasets and automatically create a
map given an initial user query. The framework consists of a Bayesian network that
utilises causal relationships between GIS datasets. A series of Bayesian learning
algorithms are presented that automatically discover these causal linkages from
historic expert knowledge about GIS datasets. This new retrieval model improves
support for complex and vague queries through the discovered dataset relationships.
In addition, the framework will learn which datasets are best suited for particular
query input through feedback supplied by the user.
This thesis evaluates the new Bayesian Framework for SIR. This was achieved by
utilising a test set of queries and responses and measuring the performance of the
respective new algorithms against conventional algorithms. This contribution will
increase the performance and efficiency of knowledge extraction from GIS by
allowing users to focus on interpreting data, instead of focusing on finding which data
is relevant to their analysis. In addition, they will allow GIS to reach non-technical
people.
|
210 |
Resource Discovery and Fair Intelligent Admission Control over Scalable InternetJanuary 2004 (has links)
The Internet currently supports a best-effort connectivity service. There has been an increasing demand for the Internet to support Quality of Service (QoS) to satisfy stringent service requirements from many emerging networking applications and yet to utilize the network resources efficiently. However, it has been found that even with augmented QoS architecture, the Internet cannot achieve the desired QoS and furthermore, there are concerns about the scalability of any available QoS solutions. If the network is not provisioned adequately, the Internet is not capable to handle congestion condition. This is because the Internet is unaware of its internal network QoS states therefore it is not possible to provide QoS when the network state changes dynamically. This thesis addresses the following question: Is it possible to deliver the applications with QoS in the Internet fairly and efficiently while keeping scalability? In this dissertation we answer this question affirmatively by proposing an innovative service architecture: the Resource Discovery (RD) and Fair Intelligent Admission Control (FIAC) over scalable Internet. The main contributions of this dissertation are as follows: 1. To detect the network QoS state, we propose the Resource Discovery (RD) framework to provide network QoS state dynamically. The Resource Discovery (RD) adopts feedback loop mechanism to collect the network QoS state and reports to the Fair Intelligent Admission Control module, so that FIAC is capable to take resource control efficiently and fairly. 2. To facilitate network resource management and flow admission control, two scalable Fair Intelligent Admission Control architectures are designed and analyzed on two levels: per-class level and per-flow level. Per-class FIAC handles the aggregate admission control for certain pre-defined aggregate. Per-flow FIAC handles the flow admission control in terms of fairness within the class. 3. To further improve its scalability, the Edge-Aware Resource Discovery and Fair Intelligent Admission Control is proposed which does not need the core routers involvement. We devise and analyze implementation of the proposed solutions and demonstrate the effectiveness of the approach. For the Resource Discovery, two closed-loop feedback solutions are designed and investigated. The first one is a core-aware solution which is based on the direct QoS state information. To further improve its scalability, the edge-aware solution is designed where only the edges (not core)are involved in the feedback QoS state estimation. For admission control, FIAC module bridges the gap between 'external' traffic requirements and the 'internal' network ability. By utilizing the QoS state information from RD, FIAC intelligently allocate resources via per-class admission control and per-flow fairness control. We study the performance and robustness of RD-FIAC through extensive simulations. Our results show that RD can obtain the internal network QoS state and FIAC can adjust resource allocation efficiently and fairly.
|
Page generated in 0.4486 seconds