• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 507
  • 79
  • 36
  • 29
  • 22
  • 15
  • 11
  • 10
  • 9
  • 8
  • 6
  • 6
  • 5
  • 4
  • 3
  • Tagged with
  • 870
  • 286
  • 264
  • 221
  • 201
  • 169
  • 152
  • 133
  • 129
  • 128
  • 124
  • 116
  • 103
  • 101
  • 101
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
311

Efficient Querying and Analytics of Semantic Web Data / Interrogation et Analyse Efficiente des Données du Web Sémantique

Roatis, Alexandra 22 September 2014 (has links)
L'utilité et la pertinence des données se trouvent dans l'information qui peut en être extraite.Le taux élevé de publication des données et leur complexité accrue, par exemple dans le cas des données du Web sémantique autodescriptives et hétérogènes, motivent l'intérêt de techniques efficaces pour la manipulation de données.Dans cette thèse, nous utilisons la technologie mature de gestion de données relationnelles pour l'interrogation des données du Web sémantique.La première partie se concentre sur l'apport de réponse aux requêtes sur les données soumises à des contraintes RDFS, stockées dans un système de gestion de données relationnelles. L'information implicite, résultant du raisonnement RDF est nécessaire pour répondre correctement à ces requêtes.Nous introduisons le fragment des bases de données RDF, allant au-delà de l'expressivité des fragments étudiés précédemment.Nous élaborons de nouvelles techniques pour répondre aux requêtes dans ce fragment, en étendant deux approches connues de manipulation de données sémantiques RDF, notamment par saturation de graphes et reformulation de requêtes.En particulier, nous considérons les mises à jour de graphe au sein de chaque approche et proposerons un procédé incrémental de maintenance de saturation. Nous étudions expérimentalement les performances de nos techniques, pouvant être déployées au-dessus de tout moteur de gestion de données relationnelles.La deuxième partie de cette thèse considère les nouvelles exigences pour les outils et méthodes d'analyse de données, issues de l'évolution du Web sémantique.Nous revisitons intégralement les concepts et les outils pour l'analyse de données, dans le contexte de RDF.Nous proposons le premier cadre formel pour l'analyse d'entrepôts RDF. Notamment, nous définissons des schémas analytiques adaptés aux graphes RDF hétérogènes à sémantique riche, des requêtes analytiques qui (au-delà de cubes relationnels) permettent l'interrogation flexible des données et schémas, ainsi que des opérations d'agrégation puissantes de type OLAP. Des expériences sur une plateforme entièrement implémentée démontrent l'intérêt pratique de notre approche. / The utility and relevance of data lie in the information that can be extracted from it.The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation.In this thesis we leverage mature relational data management technology for querying Semantic Web data.The first part focuses on query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation. In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.The second part of this thesis considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach.
312

[en] RXQEE - RELATIONAL-XML QUERY EXECUTION ENGINE / [pt] RXQEE: UMA MÁQUINA DE EXECUÇÃO DE CONSULTAS DE INTEGRAÇÃO DE DADOS RELACIONAIS E XML

AMANDA VIEIRA LOPES 10 February 2005 (has links)
[pt] Na abordagem tradicional para execução de consultas em um ambiente de integração de dados, os dados provenientes de fontes heterogêneas são convertidos para o modelo de dados global do sistema integrador, através do uso de adaptadores (wrappers), antes de serem submetidos aos operadores algébricos de uma consulta. Como conseqüência disto, planos de execução de consultas (PECs) contêm operadores que processam dados representados apenas no modelo de dados global. Esta dissertação apresenta uma nova abordagem para a execução de consultas de integração, denominada Moving Wrappers, na qual a conversão entre os modelos de dados acontece durante o processamento, em qualquer ponto do PEC, permitindo que os operadores processem dados representados no modelo de dados original de suas fontes. Baseada nesta abordagem, foi desenvolvida uma máquina de execução de consultas (MEC) que executa PECs de integração de dados de fontes Relacionais e XML, combinando, em um mesmo PEC, operadores em ambos os modelos. Esta MEC, denominada RXQEE (Relational-XML Query Execution Engine), foi instanciada a partir do framework QEEF (Query Execution Engine Framework), desenvolvido em um projeto de pesquisa do laboratório TecBD da PUC-Rio. De modo a permitir a execução de PECs de integração, a MEC RXQEE implementa operadores algébricos, nos modelos XML e Relacional, e operadores interalgébricos, desenvolvidos para a realizar a conversão entre esses modelos de dados na MEC construída. / [en] In the traditional approach for the evaluation of data integration queries, heterogeneous data in data sources are converted into the global data model by wrappers before being delivered to algebraic operators. Consequently, query execution plans (QEPs) are composed exclusively by operations in accordance to the global data model. This work proposes a new data integration query evaluation strategy, named Moving Wrappers, in which data conversion is considered as an operation placed in any part of the QEP, based on a query optimization process. This permits the use of algebraic operators of the data sourceís data model. So, a QEP may include fragments with operations in different data models converted to the global data model by inter-algebraic operators. Based on this strategy, a query execution engine (QEE), named RXQEE (Relational-XML Query Execution Engine), was developed as an instance of QEEF (Query Execution Engine Framework). In particular, RXQEE explores integration queries over Relational and XML data, and therefore it implements algebraic operators, in XML and Relational models, and inter-algebraic operators, permiting the execution of integration QEPs.
313

Modelos de custo e estatísticas para consultas por similaridade / Cost models and statistics for similarity searching

Bêdo, Marcos Vinícius Naves 10 October 2017 (has links)
Consultas por similaridade constituem um paradigma de busca que fornece suporte à diversas tarefas computacionais, tais como agrupamento, classificação e recuperação de informação. Neste contexto, medir a similaridade entre objetos requer comparar a distância entre eles, o que pode ser formalmente modelado pela teoria de espaços métricos. Recentemente, um grande esforço de pesquisa tem sido dedicado à inclusão de consultas por similaridade em Sistemas Gerenciadores de Bases de Dados (SGBDs), com o objetivo de (i) permitir a combinação de comparações por similaridade com as comparações por identidade e ordem já existentes em SGBDs e (ii) obter escalabilidade para grandes bases de dados. Nesta tese, procuramos dar um próximo passo ao estendermos também o otimizador de consultas de um SGBD. Em particular, propomos a ampliação de dois módulos do otimizador: o módulo de Espaço de Distribuição de Dados e o módulo de Modelo de Custo. Ainda que o módulo de Espaço de Distribuição de Dados permita representar os dados armazenados, essas representações são insuficientes para modelar o comportamento das comparações em espaços métricos, sendo necessário estender este módulo para contemplar distribuições de distância. De forma semelhante, o módulo Modelo de Custo precisa ser ampliado para dar suporte à modelos de custo que utilizem estimativas sobre distribuições de distância. Toda a investigação aqui conduzida se concentra em cinco contribuições. Primeiro, foi criada uma nova sinopse para distribuições de distância, o Histograma Compactado de Distância (CDH), de onde é possível inferir valores de seletividade e raios para consultas por similaridade. Uma comparação experimental permitiu mostrar os ganhos das estimativas da sinopse CDH com relação à diversos competidores. Também foi proposto um modelo de custo baseado na sinopse CDH, o modelo Stockpile, cujas estimativas se mostraram mais precisas na comparação com outros modelos. Os Histogramas-Omni são apresentados como a terceira contribuição desta tese. Estas estruturas de indexação, construídas a partir de restrições de particionamento de histogramas, permitem a execução otimizada de consultas que mesclam comparações por similaridade, identidade e ordem. A quarta contribuição de nossa investigação se refere ao modelo RVRM, que é capaz de indicar quanto é possível empregar as estimativas das sinopses de distância para otimizar consultas por similaridade em conjuntos de dados de alta dimensionalidade. O modelo RVRM se mostrou capaz de identificar intervalos de dimensões para os quais essas consultas podem ser executadas eficientes. Finalmente, a última contribuição desta tese propõe a integração das sinopses e modelos revisados em um sistema com sintaxe de alto nível que pode ser acoplado em um otimizador de consultas. / Similarity searching is a foundational paradigm for many modern computer applications, such as clustering, classification and information retrieval. Within this context, the meaning of similarity is related to the distance between objects, which can be formally expressed by the Metric Spaces Theory. Many studies have focused on the inclusion of similarity search into Database Management Systems (DBMSs) for (i) enabling similarity comparisons to be combined with the DBMSs identity and order comparisons and (ii) providing scalability for very large databases. As a step further, we propose the extension of the DBMS Query Optimizer and, particularly, the extension of two modules of the Query Optimizer, namely Data Distribution Space and Cost Model modules. Although the Data Distribution Space enables representations of stored data, such representations are unsuitable for modeling the behavior of similarity comparisons, which requires the extension of the module to support distance distributions. Likewise, the Cost Model module must be extended to support cost models that depend on distance distributions. Our study is based on five contributions. A new synopsis for distance distributions, called Compact-Distance Histogram (CDH), is proposed and enables radius and selectivity estimation for similarity searching. An experimental comparison showed the gains of the estimates drawn from CDH in comparison to several competitors. A cost model based on the CDH synopsis and with accurate estimates, called Stockpile, is also proposed. Omni-Histograms are presented as the third contribution of the thesis. Such indexing structures are constructed according to histogram partition constraints and enable the optimization of queries that combine similarity, identity and order comparisons. The fourth contribution refers to the model RVRM, which indicates the possible use of the estimates obtained from distance-based synopses for the query optimization of high-dimensional datasets and identifies intervals of dimensions where similarity searching can be efficiently executed. Finally, the thesis proposes the integration of the reviewed synopses and cost models into a single system with a high-level language that can be coupled to a DBMS Query Optimizer.
314

Ontoloby-based semantic query processing in database systems

Necib, Chokri Ben 11 January 2008 (has links)
Die Bedeutung der in den relationalen Datenbankmanagementsystemen dargestellten Realwelt-Objekten wird weder explizit noch vollständig beschrieben. Demzufolge treffen häufig diese Systeme mit den Anfrageantworten nicht die Benutzerabsichten. Die vorliegende Dissertation präsentiert einen ontologie-basierten Ansatz für die semantische Anfrageverarbeitung. In diesem Ansatz sollen semantische Informationen aus einer gegebenen Ontologie abgeleitet und für die Umformulierung der Benutzeranfrage verwendet werden. Dies führt zu einer neuen Anfrage, die für den Benutzer sinnvollere Ergebnisse aus der Datenbank zurückliefern kann. Wir definieren und spezifizieren Einschränkungen und Abbildungen zwischen der Ontologie- und den Datenbank-Konzepten, um eine Ontologie mit einer Datenbank zu verknüpfen. Des Weiteren entwickeln wir eine Reihe von Algorithmen, die uns helfen, diese Abbildungen auf eine halbautomatische Weise zu finden. Au"serdem entwickeln wir eine Reihe von semantischen Regeln, die für die Umformulierung einer Anfrage benutzt werden. Die Haupteigenschaft einer Regel ist es, Begriffe einer Anfrage durch andere Begriffe zu ersetzen oder anzureichern, die von denselben ontologischen Konzepten dargestellt werden. Weiterhin benutzen wir die Theorie der Termersetzungssysteme, um den Transformationsprozess zu formalisieren und die wesentlichen Eigenschaften für das Anwenden der Regeln zu studieren. Aufbauend auf diesem Ansatz wurde ein Prototyp implementiert und wurde die Fähigkeit unseres Ansatzes durch einer real existierenden Anwendung ausgewertet. / Currently, database management systems solely rely on exact syntax of queries to retrieve data. As consequence query answers often do not meet the user''s intention. In this thesis we propose an ontology-based semantic query processing approach for database systems. We use ontologies to transform a user query into another query that may provide a more meaningful answer to the user. For this purpose, we define and specify different mappings that relate concepts of an ontology with those of an underlying database and develop a set of algorithms that allow us to find these mappings in a semi-automatic way. Moreover, we propose a set of semantic rules for transforming queries using terms derived from the ontology. We classify the rules and demonstrate their usefulness using practical examples. Furthermore, we make use of the theory of term rewriting systems to formalize the transformation process and to study the basic properties for applying these rules. Finally, we implement a prototype system using current technologies and evaluate its capability by using a real world application.
315

Architecting Query Compilers for Diverse Workloads

Ruby Y Tahboub (6624119) 10 June 2019 (has links)
<div>To leverage modern hardware platforms to their fullest, more and more database systems embrace compilation of query plans to native code. In the research community, there is an ongoing debate about the best way to architect such query compilers. This is perceived to be a difficult task, requiring techniques fundamentally different from traditional interpreted query execution. In this dissertation, we contribute to this discussion by drawing attention to an old but underappreciated idea known as Futamura projections, which fundamentally link interpreters and compilers. Guided by this idea, we demonstrate that efficient query compilation can actually be very simple, using techniques that are no more difficult than writing a query interpreter in a high-level language. We first develop LB2: a high-level query compiler implemented in this style that is competitive with the best compiled query engines both in sequential and parallel execution on the standard TPC-H benchmark. </div><div><br></div><div>Query engines process a variety of data types and structures including text, spatial, graphs, etc. Several spatial and graph engines are implemented as extensions to relational query engines to leverage optimized memory, storage, and evaluation. Still, the performance of these extensions is often stymied by the interpretive nature of the underlying data management, generic data structures, and the need to execute domain-specific external libraries. On that basis, compiling spatial and graph queries to native code is a desirable avenue to mitigate existing limitations and improve performance. To support compiling spatial queries, we extend the LB2 main-memory query compiler with spatial predicates, indexing structures, and spatial operators. To support compiling graph queries, we extend LB2 with graph data structures and operators. The spatial extension matches the performance of hand-written code and outperforms relational query engines and map-reduce extensions. Similarly, the graph extension matches, and sometimes outperforms, low-level graph engines.</div>
316

Automaton Meet Algebra: A Hybrid Paradigm for Efficiently Processing XQuery over XML Stream

Su, Hong 30 January 2006 (has links)
XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automaton paradigm is naturally suited for pattern retrieval on tokenized XML streams, but requires patches for implementing the filtering or restructuring functionalities common for the XML query languages. In contrast, the algebraic paradigm is well-established for processing self-contained tuples. However, it does not traditionally support token inputs. This dissertation proposes a framework called Raindrop, which accommodates both the automaton and algebra paradigms to take advantage of both. First, we propose an architecture for Raindrop. Raindrop is an algebra framework that models queries at different abstraction levels. We represent the token-based automaton computations as an algebraic subplan at the high level while exposing the automaton details at the low level. The algebraic subplan modeling automaton computations can thus be integrated with the algebraic subplan modeling the non-automaton computations. Second, we explore a novel optimization opportunity. Other XML stream processing systems always retrieve all the patterns in a query in the automaton. In contrast, Raindrop allows a plan to retrieve some of the pattern retrieval in the automaton and some out of the automaton. This opens up an automaton-in-or-out optimization opportunity. We study this optimization in two types of run-time environments, one with stable data characteristics and one with fluctuating data characteristics. We provide search strategies catering to each environment. We also describe how to migrate from a currently running plan to a new plan at run-time. Third, we optimize the automaton computations using the schema knowledge. A set of criteria are established to decide what schema constraints are useful to a given query. Optimization rules utilizing different types of schema constraints are proposed based on the criteria. We design a rule application algorithm which ensures both completeness (i.e., no optimization is missed) and minimality (i.e., no redundant optimization is introduced). The experimentations on both real and synthetic data illustrate that these techniques bring significant performance improvement with little overhead.
317

Metadata-Aware Query Processing over Data Streams

Ding, Luping 22 April 2008 (has links)
Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques.
318

Pattern Mining and Sense-Making Support for Enhancing the User Experience

Mukherji, Abhishek 07 December 2018 (has links)
While data mining techniques such as frequent itemset and sequence mining are well established as powerful pattern discovery tools in domains from science, medicine to business, a detriment is the lack of support for interactive exploration of high numbers of patterns generated with diverse parameter settings and the relationships among the mined patterns. To enhance the user experience, real-time query turnaround times and improved support for interactive mining are desired. There is also an increasing interest in applying data mining solutions for mobile data. Patterns mined over mobile data may enable context-aware applications ranging from automating frequently repeated tasks to providing personalized recommendations. Overall, this dissertation addresses three problems that limit the utility of data mining, namely, (a.) lack of interactive exploration tools for mined patterns, (b.) insufficient support for mining localized patterns, and (c.) high computational mining requirements prohibiting mining of patterns on smaller compute units such as a smartphone. This dissertation develops interactive frameworks for the guided exploration of mined patterns and their relationships. Contributions include the PARAS pre- processing and indexing framework; enabling analysts to gain key insights into rule relationships in a parameter space view due to the compact storage of rules that enables query-time reconstruction of complete rulesets. Contributions also include the visual rule exploration framework FIRE that presents an interactive dual view of the parameter space and the rule space, that together enable enhanced sense-making of rule relationships. This dissertation also supports the online mining of localized association rules computed on data subsets by selectively deploying alternative execution strategies that leverage multidimensional itemset-based data partitioning index. Finally, we designed OLAPH, an on-device context-aware service that learns phone usage patterns over mobile context data such as app usage, location, call and SMS logs to provide device intelligence. Concepts introduced for modeling mobile data as sequences include compressing context logs to intervaled context events, adding generalized time features, and identifying meaningful sequences via filter expressions.
319

Semantic Query Optimization for Processing XML Streams with Minimized Memory Footprint

Li, Ming 25 August 2007 (has links)
"XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evaluation over XML streams require the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conforming to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to avoid data buffering and to release buffered data at an earlier moment, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimization opportunities which can be triggered by runtime PNO detection are explored for memory footprint minimization. Optimization decisions are encoded using our proposed Condition-Action Graph (CAG). The optimization-embedded execution strategy is then proposed to execute an optimized plan by detecting PNO constraints at run-time and then triggering the corresponding encoded actions when certain predefined conditions are satisfied. To ensure the efficiency of such PNO-triggered optimization, we propose optimization strategy on shrinking the CAGs by utilizing constraint knowledge during the query plan compiling phase. We implement our optimization technique within the Raindrop XQuery engine. Our system implementation processes XQuery utilizing the Raindrop algebra. It is efficiently augmented by our optimization module, which uses Glushkov automaton technique to capture and monitor PNO constraints in parallel with the query-driven pattern retrieval. Finally, we conduct experimental studies using both real and synthetic data streams to illustrate that our techniques bring significant performance improvement in both memory and CPU usage as well as improved output latency over state-of-the-art solutions, with little overhead."
320

Privacy-preserving queries on encrypted databases

Meng, Xianrui 07 December 2016 (has links)
In today's Internet, with the advent of cloud computing, there is a natural desire for enterprises, organizations, and end users to outsource increasingly large amounts of data to a cloud provider. Therefore, ensuring security and privacy is becoming a significant challenge for cloud computing, especially for users with sensitive and valuable data. Recently, many efficient and scalable query processing methods over encrypted data have been proposed. Despite that, numerous challenges remain to be addressed due to the high complexity of many important queries on encrypted large-scale datasets. This thesis studies the problem of privacy-preserving database query processing on structured data (e.g., relational and graph databases). In particular, this thesis proposes several practical and provable secure structured encryption schemes that allow the data owner to encrypt data without losing the ability to query and retrieve it efficiently for authorized clients. This thesis includes two parts. The first part investigates graph encryption schemes. This thesis proposes a graph encryption scheme for approximate shortest distance queries. Such scheme allows the client to query the shortest distance between two nodes in an encrypted graph securely and efficiently. Moreover, this thesis also explores how the techniques can be applied to other graph queries. The second part of this thesis proposes secure top-k query processing schemes on encrypted relational databases. Furthermore, the thesis develops a scheme for the top-k join queries over multiple encrypted relations. Finally, this thesis demonstrates the practicality of the proposed encryption schemes by prototyping the encryption systems to perform queries on real-world encrypted datasets.

Page generated in 0.0341 seconds