Global ETD Search

391	Privacy Preserving Data Mining using Unrealized Data Sets: Scope Expansion and Data Compression Fong, Pui Kuen 16 May 2013 (has links) In previous research, the author developed a novel PPDM method – Data Unrealization – that preserves both data privacy and utility of discrete-value training samples. That method transforms original samples into unrealized ones and guarantees 100% accurate decision tree mining results. This dissertation extends their research scope and achieves the following accomplishments: (1) it expands the application of Data Unrealization on other data mining algorithms, (2) it introduces data compression methods that reduce storage requirements for unrealized training samples and increase data mining performance and (3) it adds a second-level privacy protection that works perfectly with Data Unrealization. From an application perspective, this dissertation proves that statistical information (i. e. counts, probability and information entropy) can be retrieved precisely from unrealized training samples, so that Data Unrealization is applicable for all counting-based, probability-based and entropy-based data mining models with 100% accuracy. For data compression, this dissertation introduces a new number sequence – J-Sequence – as a mean to compress training samples through the J-Sampling process. J-Sampling converts the samples into a list of numbers with many replications. Applying run-length encoding on the resulting list can further compress the samples into a constant storage space regardless of the sample size. In this way, the storage requirement of the sample database becomes O(1) and the time complexity of a statistical database query becomes O(1). J-Sampling is used as an encryption approach to the unrealized samples already protected by Data Unrealization; meanwhile, data mining can be performed on these samples without decryption. In order to retain privacy preservation and to handle data compression internally, a column-oriented database management system is recommended to store the encrypted samples. / Graduate / 0984 / fong_bee@hotmail.com Data Mining Data Privacy PPDM Data Compression Database Set Theory Number Sequence Query Optimization
392	Optimal and Robust Routing of Subscriptions for Unifying Access to the Past and the Future in Publish/Subscribe Li, Guoli 18 February 2011 (has links) A flexible, scalable, and asynchronous middleware abstract is needed for business process management, which involves thousands of tasks and a large number of running instances of large business processes. The content-based publish/subscribe system is an ideal candidate to serve as enterprise service bus for these applications. In the publish/subscribe paradigm, information providers called publishers disseminate publications to all subscribers who have expressed interests by registering subscriptions through a loosely coupled interface. However, the traditional publish/subscribe paradigm only supports stateless subscriptions, that is, event correlation is ignored. Moreover, subscribers can only receive publications issued after their subscriptions. There are many application contexts, however, where access to publications from the past is necessary,such as for replaying a business process execution to debug it. Even more interesting uses arise when data from the past can be correlated with those in the future. Therefore, new languages and new functionalities are needed in the standard publish/subscribe model in order to support business process management. A new subscription language PADRES SQL(PSQL) which can express event patterns and unify both historic and future views for subscribers. PADRES allows a subscriber to access data published both in the past and in the future. Furthermore, complex event detection happens in the broker network. The main difficulties of distributed event detection are routing a composite subscription, including where and how to decompose the composite subscription, and routing the individual parts of the subscription. Our composite subscription routing decisions are based on a cost model which minimizes the routing and detection delay. An adaptive subscription routing protocol is proposed to determine efficient location with dynamic changing workloads. PADRES also provides robust message delivery by exploring alternative paths in a cyclic overlay. Routing optimizations and efficient matching algorithms are studied to improve the performance of the extended publish/subscribe model. With the above features, we propose the Ninos system, the distributed business process execution architecture as a case study,which uses light-weight activity agents to carry out business process execution in a distributed environment. Ninos proves that decentralized business process execution is the trend for next generation products, and the publish/subscribe model is ideal to serve as an enterpriser service bus (ESB) for distributed applications. publish/subscribe content-based routing query middleware BPEL workflow managment 0984
393	Ranked Retrieval in Uncertain and Probabilistic Databases Soliman, Mohamed January 2011 (has links) Ranking queries are widely used in data exploration, data analysis and decision making scenarios. While most of the currently proposed ranking techniques focus on deterministic data, several emerging applications involve data that are imprecise or uncertain. Ranking uncertain data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, the interplay between ranking and uncertainty models introduces new dimensions for ordering query results that do not exist in the traditional settings. This dissertation introduces new formulations and processing techniques for ranking queries on uncertain data. The formulations are based on marriage of traditional ranking semantics with possible worlds semantics under widely-adopted uncertainty models. In particular, we focus on studying the impact of tuple-level and attribute-level uncertainty on the semantics and processing techniques of ranking queries. Under the tuple-level uncertainty model, we introduce a processing framework leveraging the capabilities of relational database systems to recognize and handle data uncertainty in score-based ranking. The framework encapsulates a state space model, and efficient search algorithms that compute query answers by lazily materializing the necessary parts of the space. Under the attribute-level uncertainty model, we give a new probabilistic ranking model, based on partial orders, to encapsulate the space of possible rankings originating from uncertainty in attribute values. We present a set of efficient query evaluation algorithms, including sampling-based techniques based on the theory of Markov chains and Monte-Carlo method, to compute query answers. We build on our techniques for ranking under attribute-level uncertainty to support rank join queries on uncertain data. We show how to extend current rank join methods to handle uncertainty in scoring attributes. We provide a pipelined query operator implementation of uncertainty-aware rank join algorithm integrated with sampling techniques to compute query answers. Ranking Uncertainty Probabilistic Models Query Processing Top-k Partial Order Computer Science
394	Semantic Analysis of Wikipedia's Linked Data Graph for Entity Detection and Topic Identification Applications AlemZadeh, Milad January 2012 (has links) Semantic Web and Linked Data community is now the reality of the future of the Web. The standards and technologies defined in this field have opened a strong pathway towards a new era of knowledge management and representation for the computing world. The data structures and the semantic formats introduced by the Semantic Web standards offer a platform for all the data and knowledge providers in the world to present their information in a free, publicly available, semantically tagged, inter-linked, and machine-readable structure. As a result, the adaptation of the Semantic Web standards by data providers creates numerous opportunities for development of new applications which were not possible or, at best, hardly achievable using the current state of Web which is mostly consisted of unstructured or semi-structured data with minimal semantic metadata attached tailored mainly for human-readability. This dissertation tries to introduce a framework for effective analysis of the Semantic Web data towards the development of solutions for a series of related applications. In order to achieve such framework, Wikipedia is chosen as the main knowledge resource largely due to the fact that it is the main and central dataset in Linked Data community. In this work, Wikipedia and its Semantic Web version DBpedia are used to create a semantic graph which constitutes the knowledgebase and the back-end foundation of the framework. The semantic graph introduced in this research consists of two main concepts: entities and topics. The entities act as the knowledge items while topics create the class hierarchy of the knowledge items. Therefore, by assigning entities to various topics, the semantic graph presents all the knowledge items in a categorized hierarchy ready for further processing. Furthermore, this dissertation introduces various analysis algorithms over entity and topic graphs which can be used in a variety of applications, especially in natural language understanding and knowledge management fields. After explaining the details of the analysis algorithms, a number of possible applications are presented and potential solutions to these applications are provided. The main themes of these applications are entity detection, topic identification, and context acquisition. To demonstrate the efficiency of the framework algorithms, some of the applications are developed and comprehensively studied by providing detailed experimental results which are compared with appropriate benchmarks. These results show how the framework can be used in different configurations and how different parameters affect the performance of the algorithms. Wikipedia Semantic Web NLP NLU Entity Topic Graph Query Classification Electrical and Computer Engineering
395	Characterizing User Search Intent and Behavior for Click Analysis in Sponsored Search Ashkan, Azin January 2013 (has links) Interpreting user actions to better understand their needs provides an important tool for improving information access services. In the context of organic Web search, considerable effort has been made to model user behavior and infer query intent, with the goal of improving the overall user experience. Much less work has been done in the area of sponsored search, i.e., with respect to the advertisement links (ads) displayed on search result pages by many commercial search engines. This thesis develops and evaluates new models and methods required to interpret user browsing and click behavior and understand query intent in this very different context. The concern of the initial part of the thesis is on extending the query categories for commercial search and on inferring query intent, with a focus on two major tasks: i) enriching queries with contextual information obtained from search result pages returned for these queries, and ii) developing relatively simple methods for the reliable labeling of training data via crowdsourcing. A central idea of this thesis work is to study the impact of contextual factors (including query intent, ad placement, and page structure) on user behavior. Later, this information is incorporated into probabilistic models to evaluate the quality of advertisement links within the context that they are displayed in their history of appearance. In order to account for these factors, a number of query and location biases are proposed and formulated into a group of browsing and click models. To explore user intent and behavior and to evaluate the performance of the proposed models and methods, logs of query and click information provided for research purposes are used. Overall, query intent is found to have substantial impact on predictions of user click behavior in sponsored search. Predictions are further improved by considering ads in the context of the other ads displayed on a result page. The parameters of the browsing and click models are learned using an expectation maximization technique applied to click signals recorded in the logs. The initial motivation of the user to browse the ad list and their browsing persistence are found to be related to query intent and browsing/click behavior. Accommodating these biases along with the location bias in user models appear as effective contextual signals, improving the performance of the existing models. User Behavior Query Intent Clickthrough Bayesian Model Sponsored Search Computer Science
396	Interactive Classification Of Satellite Image Content Based On Query By Example Dalay, Oral 01 January 2006 (has links) (PDF) In our attempt to construct a semantic filter for satellite image content, we have built a software that allows user to indicate a few number of image regions that contains a specific geographical object, such as, a bridge, and to retrieve similar objects on the same satellite image. We are particularly interested in performing a data analysis approach based on user interaction. User can guide the classification procedure by interaction and visual observation of the results. We have applied a two step procedure for this and preliminary results show that we eliminate many true negatives while keeping most of the true positives. QA General Works 36-39
397	Semantic Analysis of Wikipedia's Linked Data Graph for Entity Detection and Topic Identification Applications AlemZadeh, Milad January 2012 (has links) Semantic Web and Linked Data community is now the reality of the future of the Web. The standards and technologies defined in this field have opened a strong pathway towards a new era of knowledge management and representation for the computing world. The data structures and the semantic formats introduced by the Semantic Web standards offer a platform for all the data and knowledge providers in the world to present their information in a free, publicly available, semantically tagged, inter-linked, and machine-readable structure. As a result, the adaptation of the Semantic Web standards by data providers creates numerous opportunities for development of new applications which were not possible or, at best, hardly achievable using the current state of Web which is mostly consisted of unstructured or semi-structured data with minimal semantic metadata attached tailored mainly for human-readability. This dissertation tries to introduce a framework for effective analysis of the Semantic Web data towards the development of solutions for a series of related applications. In order to achieve such framework, Wikipedia is chosen as the main knowledge resource largely due to the fact that it is the main and central dataset in Linked Data community. In this work, Wikipedia and its Semantic Web version DBpedia are used to create a semantic graph which constitutes the knowledgebase and the back-end foundation of the framework. The semantic graph introduced in this research consists of two main concepts: entities and topics. The entities act as the knowledge items while topics create the class hierarchy of the knowledge items. Therefore, by assigning entities to various topics, the semantic graph presents all the knowledge items in a categorized hierarchy ready for further processing. Furthermore, this dissertation introduces various analysis algorithms over entity and topic graphs which can be used in a variety of applications, especially in natural language understanding and knowledge management fields. After explaining the details of the analysis algorithms, a number of possible applications are presented and potential solutions to these applications are provided. The main themes of these applications are entity detection, topic identification, and context acquisition. To demonstrate the efficiency of the framework algorithms, some of the applications are developed and comprehensively studied by providing detailed experimental results which are compared with appropriate benchmarks. These results show how the framework can be used in different configurations and how different parameters affect the performance of the algorithms. Wikipedia Semantic Web NLP NLU Entity Topic Graph Query Classification Electrical and Computer Engineering
398	Deep Web Collection Selection King, John Douglas January 2004 (has links) The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average. information retrieval deep web collection selection singular value decomposition latent semantic analysis sampling query focused probabilistic
399	Effective and Efficient Similarity Search in Video Databases Jie Shao Unknown Date (has links) Searching relevant information based on content features in video databases is an interesting and challenging research topic that has drawn lots of attention recently. Video similarity search has many practical applications such as TV broadcast monitoring, copyright compliance enforcement and search result clustering, etc. However, existing studies are limited to provide fast and accurate solutions due to the diverse variations among the videos in large collections. In this thesis, we introduce the database support for effective and efficient video similarity search from various sources, even if there exists some transformation distortion, partial content re-ordering, insertion, deletion or replacement. Specifically, we focus on processing two different types of content-based queries: video clip retrieval in a large collection of segmented short videos, and video subsequence identification from a long unsegmented stream. The first part of the thesis investigates the problem of how to process a number of individual kNN searches on the same database simultaneously to reduce the computational overhead of current content-based video search systems. We propose a Dynamic Query Ordering (DQO) algorithm for efficiently processing Batch Nearest Neighbor (BNN) search in high-dimensional space, with advanced optimizations of both I/O cost and CPU cost. The second part of the thesis challenges an unstudied problem of temporal localization of similar content from a long unsegmented video sequence, with extension to identify the occurrence of potentially different ordering or length with respect to query due to video content editing. A graph transformation and matching approach supported by the above BNN search is proposed, as a filter-and-refine query processing strategy to effectively but still efficiently identify the most similar subsequence. The third part of the thesis extends the method of Bounded Coordinate System (BCS) we introduced earlier for video clip retrieval. A novel collective perspective of exploiting the distributional discrepancy of samples for assessing the similarity between two video clips is presented. Several ideas of non-parametric hypothesis tests in statistics are utilized to check the hypothesis whether two ensembles of points are from a same distribution. The proposed similarity measures can provide a more comprehensive analysis that captures the essence of invariant distribution information for retrieving video clips. For each part, we demonstrate comprehensive experimental evaluations, which show improved performance compared with state-of-the-art methods. In the end, some scheduled extensions of this work are highlighted as future research objectives. Multimedia Databases Content-based Video Search Indexing Methods Query Processing Similarity Measures
400	Contextual information retrieval from the WWW Limbu, Dilip Kumar January 2008 (has links) Contextual information retrieval (CIR) is a critical technique for today’s search engines in terms of facilitating queries and returning relevant information. Despite its importance, little progress has been made in its application, due to the difficulty of capturing and representing contextual information about users. This thesis details the development and evaluation of the contextual SERL search, designed to tackle some of the challenges associated with CIR from the World Wide Web. The contextual SERL search utilises a rich contextual model that exploits implicit and explicit data to modify queries to more accurately reflect the user’s interests as well as to continually build the user’s contextual profile and a shared contextual knowledge base. These profiles are used to filter results from a standard search engine to improve the relevance of the pages displayed to the user. The contextual SERL search has been tested in an observational study that has captured both qualitative and quantitative data about the ability of the framework to improve the user’s web search experience. A total of 30 subjects, with different levels of search experience, participated in the observational study experiment. The results demonstrate that when the contextual profile and the shared contextual knowledge base are used, the contextual SERL search improves search effectiveness, efficiency and subjective satisfaction. The effectiveness improves as subjects have actually entered fewer queries to reach the target information in comparison to the contemporary search engine. In the case of a particularly complex search task, the efficiency improves as subjects have browsed fewer hits, visited fewer URLs, made fewer clicks and have taken less time to reach the target information when compared to the contemporary search engine. Finally, subjects have expressed a higher degree of satisfaction on the quality of contextual support when using the shared contextual knowledge base in comparison to using their contextual profile. These results suggest that integration of a user’s contextual factors and information seeking behaviours are very important for successful development of the CIR framework. It is believed that this framework and other similar projects will help provide the basis for the next generation of contextual information retrieval from the Web. Retrieval models Personalized Web search Contextual search User profile Relevance feedback Query formulation

Search results