Spelling suggestions: "subject:"query"" "subject:"guery""
341 |
Supporting Scientific Collaboration through Workflows and ProvenanceEllqvist, Tommy January 2010 (has links)
<p>Science is changing. Computers, fast communication, and new technologies have created new ways of conducting research. For instance, researchers from different disciplines are processing and analyzing scientific data that is increasing at an exponential rate. This kind of research requires that the scientists have access to tools that can handle huge amounts of data, enable access to vast computational resources, and support the collaboration of large teams of scientists. This thesis focuses on tools that help support scientific collaboration.</p><p>Workflows and provenance are two concepts that have proven useful in supporting scientific collaboration. Workflows provide a formal specification of scientific experiments, and provenance offers a model for documenting data and process dependencies. Together, they enable the creation of tools that can support collaboration through the whole scientific life-cycle, from specification of experiments to validation of results. However, existing models for workflows and provenance are often specific to particular tasks and tools. This makes it hard to analyze the history of data that has been generated over several application areas by different tools. Moreover, workflow design is a time-consuming process and often requires extensive knowledge of the tools involved and collaboration with researchers with different expertise. This thesis addresses these problems.</p><p>Our first contribution is a study of the differences between two approaches to interoperability between provenance models: direct data conversion, and mediation. We perform a case study where we integrate three different provenance models using the mediation approach, and show the advantages compared to data conversion. Our second contribution serves to support workflow design by allowing multiple users to concurrently design workflows. Current workflow tools lack the ability for users to work simultaneously on the same workflow. We propose a method that uses the provenance of workflow evolution to enable real-time collaborative design of workflows. Our third contribution considers supporting workflow design by reusing existing workflows. Workflow collections for reuse are available, but more efficient methods for generating summaries of search results are still needed. We explore new summarization strategies that considers the workflow structure.</p><p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAADsElEQVR4nK2VTW9VVRSGn33OPgWpYLARbKWhQlCHTogoSkjEkQwclEQcNJEwlfgD/AM6NBo1xjhx5LyJ0cYEDHGkJqhtBGKUpm3SFii3vb2956wPB/t+9raEgSs52fuus89613rftdcNH8/c9q9++oe/Vzb5P+3McyNcfm2CcPj9af9w6gwjTwzvethx3Bx3x8xwd1wNM8dMcTNUHTfFLPnX6nVmZpeIYwf3cWD/PhbrvlPkblAzVFurKS6GmmGqqComaS+qmBoTI0Ncu3mXuGvWnrJ+ZSxweDgnkHf8ndVTdbiT3M7cQp2Z31dRTecHAfqydp4ejhwazh6Zezfnu98E1WIQwB3crEuJ2Y45PBTAQUVR9X4At66AppoEVO1Q8sgAOKJJjw6Am6OquDmvHskZ3R87gW+vlHz98zpmiqphkkRVbQtsfPTOC30lJKFbFTgp83bWh7Zx/uX1B6w3hI3NkkZTqEpBRDBRzG2AQHcwcYwEkOGkTERREbLQ/8HxJwuW7zdYrzfZ2iopy4qqEspKaDYravVm33k1R91Q69FA1VBRzFIVvXbx5AgXT44A8MWP81yfu0utIR2aVK3vfCnGrcUNxp8a7gKYKiLCvY2SUvo/aNtnM3e49ucK9S3p0aDdaT0UAVsKi2tVi6IWwNL9JvdqTdihaz79/l+u/rHMxmaJVMLkS2OoKKLWacdeE3IsSxctc2D5Qcl6vUlVVgNt+fkPPcFFmTw1xruvT7SCd7nuVhDQvECzJH90h0azRKoKFRkAmP5lKTWAGRdefoZL554FQNUxB92WvYeA5UN4PtSqwB2phKqsqMpBgAunRhFR3j49zuU3jnX8k6fHEQKXzh1jbmGDuYU6s4t1rt6socUeLLZHhYO2AHSHmzt19ihTZ48O8Hzl/AmunD/BjTvrvPfNX3hWsNpwJCvwYm+ngug4UilSCSq6k8YPtxDwfA+WRawIWFbgscDiULcCEaWqBFOlrLazurupOSHLqGnEKJAY8TwBEHumqUirAjNm52vEPPRV4p01XXMPAQhUBjcWm9QZwijwokgAeYHlHYA06KR1cT6ZvoV56pDUJQEjw0KeaMgj1hPEY4vz2A4eW0/e1qA7KtQdsxTYAG0H3iG4xyK1Y+xm7XmEPOJZDiENzLi2WZHngeOjj2Pe+sMg4GRYyLAsx7ME4FnsyTD9pr0PEc8zPGRAwKXBkYOPEd96cZRvf11g9MDe7e3R4Z4Q+vyEnn3P4t0XzK/W+ODN5/kPfRLewAJVEQ0AAAAASUVORK5CYII%3D" /></p>
|
342 |
Exploring Bit-Difference for Approximate KNN Search in High-dimensional DatabasesCui, Bin, Shen, Heng Tao, Shen, Jialie, Tan, Kian Lee 01 1900 (has links)
In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets. / Singapore-MIT Alliance (SMA)
|
343 |
Characterizing User Search Intent and Behavior for Click Analysis in Sponsored SearchAshkan, Azin January 2013 (has links)
Interpreting user actions to better understand their needs provides an important tool for improving information access services. In the context of organic Web search, considerable effort has been made to model user behavior and infer query intent, with the goal of improving the overall user experience. Much less work has been done in the area of sponsored search, i.e., with respect to the advertisement links (ads) displayed on search result pages by many commercial search engines. This thesis develops and evaluates new models and methods required to interpret user browsing and click behavior and understand query intent in this very different context.
The concern of the initial part of the thesis is on extending the query categories for commercial search and on inferring query intent, with a focus on two major tasks: i) enriching queries with contextual information obtained from search result pages returned for these queries, and ii) developing relatively simple methods for the reliable labeling of training data via crowdsourcing. A central idea of this thesis work is to study the impact of contextual factors (including query intent, ad placement, and page structure) on user behavior. Later, this information is incorporated into probabilistic models to evaluate the quality of advertisement links within the context that they are displayed in their history of appearance. In order to account for these factors, a number of query and location biases are proposed and formulated into a group of browsing and click models.
To explore user intent and behavior and to evaluate the performance of the proposed models and methods, logs of query and click information provided for research purposes are used. Overall, query intent is found to have substantial impact on predictions of user click behavior in sponsored search. Predictions are further improved by considering ads in the context of the other ads displayed on a result page. The parameters of the browsing and click models are learned using an expectation maximization technique applied to click signals recorded in the logs. The initial motivation of the user to browse the ad list and their browsing persistence are found to be related to query intent and browsing/click behavior. Accommodating these biases along with the location bias in user models appear as effective contextual signals, improving the performance of the existing models.
|
344 |
Mise en oeuvre de politiques de protection de données à caractère personnel : ine approche reposant sur la réécriture de requêtes SPARQLOulmakhzoune, Said 29 April 2013 (has links) (PDF)
With the constant proliferation of information systems around the globe, the need for decentralized and scalable data sharing mechanisms has become a major factor of integration in a wide range of applications. Literature on information integration across autonomous entities has tacitly assumed that the data of each party can be revealed and shared to other parties. A lot of research, concerning the management of heterogeneous sources and database integration, has been proposed, for example based on centralized or distributed mediators that control access to data managed by different parties. On the other hand, real life data sharing scenarios in many application domains like healthcare, e-commerce market, e-government show that data integration and sharing are often hampered by legitimate and widespread data privacy and security concerns. Thus, protecting the individual data may be a prerequisite for organizations to share their data in open environments such as Internet. Work undertaken in this thesis aims to ensure security and privacy requirements of software systems, which take the form of web services, using query rewriting principles. The user query (SPARQL query) is rewritten in such a way that only authorized data are returned with respect to some confidentiality and privacy preferences policy. Moreover, the rewriting algorithm is instrumented by an access control model (OrBAC) for confidentiality constraints and a privacy-aware model (PrivOrBAC) for privacy constraints. A secure and privacy-preserving execution model for data services is then defined. Our model exploits the services¿ semantics to allow service providers to enforce locally their privacy and security policies without changing the implementation of their data services i.e., data services are considered as black boxes. We integrate our model to the architecture of Axis 2.0 and evaluate its efficiency in the healthcare application domain.
|
345 |
CDAR : contour detection aggregation and routing in sensor networksPulimi, Venkat 05 May 2010
Wireless sensor networks offer the advantages of low cost, flexible measurement of phenomenon in a wide variety of applications, and easy deployment. Since sensor nodes are typically battery powered, energy efficiency is an important objective in designing sensor network algorithms. These algorithms are often application-specific, owing to the need to carefully optimize energy usage, and since deployments usually support a single or very few applications.<p>
This thesis concerns applications in which the sensors monitor a continuous scalar field, such as temperature, and addresses the problem of determining the location of a contour line in this scalar field, in response to a query, and communicating this information to a designated sink node. An energy-efficient solution to this problem is proposed and evaluated. This solution includes new contour detection and query propagation algorithms, in-network-processing algorithms, and routing algorithms. Only a small fraction of network nodes may be adjacent to the desired contour line, and the contour detection and query propagation algorithms attempt to minimize processing and communication by the other network nodes. The in-network processing algorithms reduce communication volume through suppression, compression and aggregation techniques. Finally, the routing algorithms attempt to route the contour information to the sink as efficiently as possible, while meshing with the other algorithms. Simulation results show that the proposed algorithms yield significant improvements in data and message volumes compared to baseline models, while maintaining the integrity of the contour representation.
|
346 |
Best effort query answering for mediators with union viewsPapri, Rowshon Jahan 07 1900 (has links)
Consider an SQL query that involves joins of several relations, optionally followed by selections and/or projections. It can be represented by a conjunctive datalog query Q without negation or arithmetic subgoals. We consider the problem of answering such a query Q using a mediator M. For each relation R that corresponds to a subgoal in Q, M contains several sources; each source for R provides some of the tuples in R. The capability of each source are described in terms of templates. It might not be possible to get all the tuples in the result, Result(Q), using M, due to restrictions imposed by the templates. We consider best-effort query answering: Find as many tuples in Result(Q) as possible. We present an algorithm to determine if Q can be so answered using M. / Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science.
|
347 |
Modeling and Querying Graph DataYang, Hong 12 March 2009 (has links)
Databases are used in many applications, spanning virtually the entire range of data processing services industry. The data in many database applications can be most naturally represented in the form of a graph structure consisting of various types of nodes and edges with several properties. These graph data can be classified into four categories: social networks describing the relationships between individual person and/or groups of people (e.g. genealogy, network of coauthorship among academics, etc); information networks in which the structure of the network reflects the structure of the information stored in the nodes (e.g. citation network among academic papers, etc); geographic networks, providing geographic information about public transport systems, airline routes, etc; and biological networks (e.g. biochemical networks, neuron network, etc). In order to analyze such networks and obtain desired information that users are interested in, some typical queries must be conducted. It can be seen that many of the query patterns are across multiple categories described above, such as finding nodes with certain properties in a path or graph, finding the distance between nodes, finding sub-graphs, paths enumeration, etc. However, the classical query languages like SQL, OQL are inept dealing with these types of queries needed to be performed in the above applications. Therefore, a data model that can effectively represent the graph objects and their properties, and a query language which empowers users to answer queries across multiple categories are needed. In this research work, a graph data model and a query language are proposed to resolve the issues existing in the current database applications. The proposed graph data model is an object-oriented graph data model which aims to represent the graph objects and their properties for various applications. The graph query language empowers users to query graph objects and their properties in a graph with specified conditions. The capability to specify the relationships among the entities composing the queried sub-graph makes the language more flexible than others.
|
348 |
Enhanced Web Search Engines with Query-Concept Bipartite GraphsChen, Yan 16 August 2010 (has links)
With rapid growth of information on the Web, Web search engines have gained great momentum for exploiting valuable Web resources. Although keywords-based Web search engines provide relevant search results in response to users’ queries, future enhancement is still needed. Three important issues include (1) search results can be diverse because ambiguous keywords in queries can be interpreted to different meanings; (2) indentifying keywords in long queries is difficult for search engines; and (3) generating query-specific Web page summaries is desirable for Web search results’ previews. Based on clickthrough data, this thesis proposes a query-concept bipartite graph for representing queries’ relations, and applies the queries’ relations to applications such as (1) personalized query suggestions, (2) long queries Web searches and (3) query-specific Web page summarization. Experimental results show that query-concept bipartite graphs are useful for performance improvement for the three applications.
|
349 |
Data Processing Techniques on Modern Hardware ArchitecturesTsirogiannis, Dimitrios 31 August 2011 (has links)
The last decade has been characterized by radical changes in the computing landscape. We have witnessed the advent of multi-core processors, flash-based storage systems and the proliferation of scale out architectures, such as map-reduce-based systems and massively parallel databases. Although data management systems have embraced modern hardware technologies to some extent, they have not realized
their full potential.
The goal of this thesis is two-fold. Primarily, it demonstrates the staggering potential for performance improvement offered by modern hardware architectures and, then, proposes how data management
systems must alter in order to realize this potential. Additionally, this thesis demonstrates that utilizing modern hardware architectures is important both for performance and energy-efficiency. Towards this goal, we propose query processing and indexing techniques for chip multiprocessors and we analyze the trade-offs of executing complex database queries on modern processor technologies. Subsequently, we propose query processing methods tailored to flash-based storage systems. Finally, we analyze the power consumption of database systems and we reveal opportunities for improving their
energy efficiency.
|
350 |
Data Processing Techniques on Modern Hardware ArchitecturesTsirogiannis, Dimitrios 31 August 2011 (has links)
The last decade has been characterized by radical changes in the computing landscape. We have witnessed the advent of multi-core processors, flash-based storage systems and the proliferation of scale out architectures, such as map-reduce-based systems and massively parallel databases. Although data management systems have embraced modern hardware technologies to some extent, they have not realized
their full potential.
The goal of this thesis is two-fold. Primarily, it demonstrates the staggering potential for performance improvement offered by modern hardware architectures and, then, proposes how data management
systems must alter in order to realize this potential. Additionally, this thesis demonstrates that utilizing modern hardware architectures is important both for performance and energy-efficiency. Towards this goal, we propose query processing and indexing techniques for chip multiprocessors and we analyze the trade-offs of executing complex database queries on modern processor technologies. Subsequently, we propose query processing methods tailored to flash-based storage systems. Finally, we analyze the power consumption of database systems and we reveal opportunities for improving their
energy efficiency.
|
Page generated in 0.0444 seconds