Global ETD Search

Return to search

Querying, Exploring and Mining the Extended Document

The evolution of the Web into an interactive medium that encourages active user engagement has ignited a huge increase in the amount, complexity and diversity of available textual data. This evolution forces us to re-evaluate our view of documents as simple pieces of text and of document collections as immutable and isolated. Extended documents published in the context of blogs, micro-blogs, on-line social networks, customer feedback portals, can be associated with a wealth of meta-data in addition to their textual component: tags, links, sentiment, entities mentioned in text, etc. Collections of user-generated documents grow, evolve, co-exist and interact: they are dynamic and integrated.
These unique characteristics of modern documents and document collections present us with exciting opportunities for improving the way we interact with them. At the same time, this additional complexity combined with the vast amounts of available textual data present us with formidable computational challenges. In this context, we introduce, study and extensively evaluate an array of effective and efficient solutions for querying, exploring and mining extended documents, dynamic and integrated document collections.
For collections of socially annotated extended documents, we present an improved probabilistic search and ranking approach based on our growing understanding of the dynamics of the social annotation process.
For extended documents, such as blog posts, associated with entities extracted from text and categorical attributes, we enable their interactive exploration through the efficient computation of strong entity associations. Associated entities are computed for all possible attribute value restrictions of the document collection.
For extended documents, such as user reviews, annotated with a numerical rating, we introduce a keyword-query refinement approach. The solution enables the interactive navigation and exploration of large result sets.
We extend the skyline query to document streams, such as news articles, associated with categorical attributes and partially ordered domains. The technique incrementally maintains a small set of recent, uniquely interesting extended documents from the stream.Finally, we introduce a solution for the scalable integration of structured data sources into Web search. Queries are analysed in order to determine what structured data, if any, should be used to augment Web search results.

http://hdl.handle.net/1807/29857

textual data

data management

0984

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/29857
Date	31 August 2011
Creators	Sarkas, Nikolaos
Contributors	Koudas, Nick
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.0015 seconds

Querying, Exploring and Mining the Extended Document

Description

Links & Downloads

Tags

Additional Fields