Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems.
In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts. / Ph. D. / Analysis of real-time, social media can provide critical insights into ongoing societal events. Where consequences and implications of specific events include monetary losses, threats to critical infrastructure and national security, disruptions to daily life, and a potential to cause loss of life and physical property. It is imperative for developing good ‘ground truth’ to develop adequate data-driven information systems, i.e., an authoritative record of events reported in the media cataloged alongside important dimensions. Availability of high-quality ground truth events can support various analytic efforts, e.g., identifying precursors of attacks, developing predictive indicators using surrogate data sources, and tracking the progression of events over space and time. A dynamic search result refinement is useful for expanding a general set of user queries into a more relevant collection. The challenges of information overload and misalignment of context between the user query and retrieved results can overwhelm both human and machine. In this dissertation, we focus our efforts on these specific challenges.
With the ever-increasing volume of user-generated data large-scale analysis is a tedious task. Our first focus is to develop a scalable model that dynamically tracks and ranks evolving topics as they appear in social media. Then to simplify the cognitive tasks involving sense-making of evolving themes, we take a visual approach to retrieve situationally critical and emergent information effectively. This visual analytics approach learns from user’s interactions during the exploratory process and then generates a better representation of the data. Thus, improving the situational understanding and usability of underlying data models. Such features are crucial for big-data based decision & support systems.
To make the event-focused retrieval process more robust, we developed a context-rich procedure that adds new relevant key terms to the user’s original query by utilizing the linguistic structures in text. This context-awareness allows the algorithm to retrieve those relevant characteristics that can help users to gain adequate information from social media about real-world events. Online social commentary about events is very informal and can be incomplete. However, to get the complete picture and adequately describe these events we develop an approach that models the underlying relatedness of information and iteratively extract meaning and denotations from event-related texts. We learn how to express the high-order relationships between events and entities and group them to identify those attributes that best explain the events the user is trying to uncover.
In all the augmentations we develop, our strategy is to allow only very minimal human supervision using just a small set of seed event triggers and requires no training or labeled samples. We show a comprehensive evaluation of these augmentations on real-world domains - threats on airports, cyber attacks, and protests. We also demonstrate their applicability as for real-time analysis that provides vital event characteristics, and contextually consistent information can be a beneficial aid for emergency responders.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/84852 |
Date | 17 August 2018 |
Creators | Khandpur, Rupinder P. |
Contributors | Computer Science, Ramakrishnan, Naren, Lu, Chang-Tien, Han, Eui-Hong, North, Christopher L., Reddy, Chandan K. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0024 seconds