Global ETD Search

41	HyQoZ - Optimisation de requêtes hybrides basée sur des contrats SLA / HyQoZ – SLA-aware hybrid query optimization Lopez-Enriquez, Carlos-Manuel 23 October 2014 (has links) On constate aujourd’hui une explosion de la quantité de données largement distribuées et produites par différents dispositifs (e.g. capteurs, dispositifs informatiques, réseaux, processus d’analyse) à travers de services dits de données. Dans ce contexte, il s’agit d’évaluer des requêtes dites hybrides car ils intègrent des aspects de requêtes classiques, mobiles et continues fournies par des services de données, statiques ou mobiles, en mode push ou pull. L’objectif de ma thèse est de proposer une approche pour l’optimisation de ces requêtes hybrides basée sur des préférences multicritère (i.e. SLA – Service Level Agreement). Le principe consiste à combiner les services de données et calcule pour construire un évaluateur de requêtes adapté au SLA requis par l’utilisateur, tout en considérant les conditions de QoS des services et du réseau. / Today we are witnesses of the explosion of data producer massively by largely distributed of data produced by different devices (e.g. sensors, personal computers, laptops, networks) by means of data services. In this context, It is about evaluate queries named hybrid because they entails aspects related with classic queries, mobile and continuous provided by static or nomad data services in mode push or pull. The objective of my thesis is to propose an approach to optimize hybrid queries based in multi-criteria preferences (i.e. SLA – Service Level Agreement). The principle is to combine data services to construct a query evaluator adapted to the preferences expressed in the SLA whereas the state of services and network is considered as QoS measures. Base de données Services de données Optimisation de requêtes Coordination de services Sla Databases Data services Query optimization Service coordination Sla 004
42	Power-Performance Tradeoffs in Database Systems Xu, Zichen 02 July 2009 (has links) With the total energy consumption of computing systems increasing at a steep rate, much attention had been paid to the design of energy-efficient computing systems and applications. So far, database system design has focused on improving the performance of query processing. The objective of this study is to explore the potential of energy conservation in relational database management systems. The hypothesis is: by modifying the query optimizer in a Database management system (DBMS) to take the energy cost of query plans into consideration, we will be able to reduce the energy usage of database servers and control the tradeoffs between energy consumption and system performance. In this thesis, we provide an in-depth anatomy of typical queries in various benchmarks and qualitatively analyze the energy profile of such queries. The results of extensive experiments show that power savings in the range of 11% to 22% can be achieved by equipping the DBMS with a simple query optimizer that selects query plans based on both estimated processing time and energy requirements. We advocate more research efforts be invested into the design and evaluation of power-aware DBMSs in hope to reach higher level of energy efficiency. Database Management System Power Modeling Power Estimation Energy Concern Query Optimization Feedback Control American Studies Arts and Humanities
43	Optimization and Execution of Complex Scientific Queries Fomkin, Ruslan January 2009 (has links) Large volumes of data produced and shared within scientific communities are analyzed by many researchers to investigate different scientific theories. Currently the analyses are implemented in traditional programming languages such as C++. This is inefficient for research productivity, since it is difficult to write, understand, and modify such programs. Furthermore, programs should scale over large data volumes and analysis complexity, which further complicates code development. This Thesis investigates the use of database technologies to implement scientific applications, in which data are complex objects describing measurements of independent events and the analyses are selections of events by applying conjunctions of complex numerical filters on each object separately. An example of such an application is analyses for the presence of Higgs bosons in collision events produced by the ATLAS experiment. For efficient implementation of such an ATLAS application, a new data stream management system SQISLE is developed. In SQISLE queries are specified over complex objects which are efficiently streamed from sources through the query engine. This streaming approach is compared with the conventional approach to load events into a database before querying. Since the queries implementing scientific analyses are large and complex, novel techniques are developed for efficient query processing. To obtain efficient plans for such queries SQISLE implements runtime query optimization strategies, which during query execution collect runtime statistics for a query, reoptimize the query using the collected statistics, and dynamically switch optimization strategies. The cost-based optimization utilizes a novel cost model for aggregate functions over nested subqueries. To alleviate estimation errors in large queries the fragments are decomposed into conjunctions of subqueries over which runtime statistics are measured. Performance is further improved by query transformation, view materialization, and partial evaluation. ATLAS queries in SQISLE using these query processing techniques perform close to or better than hard-coded C++ implementations of the same analyses. Scientific data are often stored in Grids, which manage both storage and computational resources. This Thesis includes a framework POQSEC that utilizes Grid resources to scale scientific queries over large data volumes by parallelizing the queries and shipping the data management system itself, e.g. SQISLE, to Grid computational nodes for the parallel query execution. scientific databases query processing data streams cost-based query optimization query rewritings databases and Grids Computer science Datavetenskap
44	Answering Object Queries over Knowledge Bases with Expressive Underlying Description Logics Wu, Jiewen January 2013 (has links) Many information sources can be viewed as collections of objects and descriptions about objects. The relationship between objects is often characterized by a set of constraints that semantically encode background knowledge of some domain. The most straightforward and fundamental way to access information in these repositories is to search for objects that satisfy certain selection criteria. This work considers a description logics (DL) based representation of such information sources and object queries, which allows for automated reasoning over the constraints accompanying objects. Formally, a knowledge base K=(T, A) captures constraints in the terminology (a TBox) T, and objects with their descriptions in the assertions (an ABox) A, using some DL dialect L. In such a setting, object descriptions are L-concepts and object identifiers correspond to individual names occurring in K. Correspondingly, object queries are the well known problem of instance retrieval in the underlying DL knowledge base K, which returns the identifiers of qualifying objects. This work generalizes instance retrieval over knowledge bases to provide users with answers in which both identifiers and descriptions of qualifying objects are given. The proposed query paradigm, called assertion retrieval, is favoured over instance retrieval since it provides more informative answers to users. A more compelling reason is related to performance: assertion retrieval enables a transfer of basic relational database techniques, such as caching and query rewriting, in the context of an assertion retrieval algebra. The main contributions of this work are two-fold: one concerns optimizing the fundamental reasoning task that underlies assertion retrieval, namely, instance checking, and the other establishes a query compilation framework based on the assertion retrieval algebra. The former is necessary because an assertion retrieval query can entail a large volume of instance checking requests in the form of K\|= a:C, where "a" is an individual name and "C" is a L-concept. This work thus proposes a novel absorption technique, ABox absorption, to improve instance checking. ABox absorption handles knowledge bases that have an expressive underlying dialect L, for instance, that requires disjunctive knowledge. It works particularly well when knowledge bases contain a large number of concrete domain concepts for object descriptions. This work further presents a query compilation framework based on the assertion retrieval algebra to make assertion retrieval more practical. In the framework, a suite of rewriting rules is provided to generate a variety of query plans, with a focus on plans that avoid reasoning w.r.t. the background knowledge bases when sufficient cached results of earlier requests exist. ABox absorption and the query compilation framework have been implemented in a prototypical system, dubbed CARE Assertion Retrieval Engine (CARE). CARE also defines a simple yet effective cost model to search for the best plan generated by query rewriting. Empirical studies of CARE have shown that the proposed techniques in this work make assertion retrieval a practical application over a variety of domains. object queries assertion retrieval knowledge representation description logics query optimization absorption ontology based data access Computer Science
45	Scalable view-based techniques for web data : algorithms and systems Katsifodimos, Asterios 03 July 2013 (has links) (PDF) XML was recommended by W3C in 1998 as a markup language to be used by device- and system-independent methods of representing information. XML is nowadays used as a data model for storing and querying large volumes of data in database systems. In spite of significant research and systems development, many performance problems are raised by processing very large amounts of XML data. Materialized views have long been used in databases to speed up queries. Materialized views can be seen as precomputed query results that can be re-used to evaluate (part of) another query, and have been a topic of intensive research, in particular in the context of relational data warehousing. This thesis investigates the applicability of materialized views techniques to optimize the performance of Web data management tools, in particular in distributed settings, considering XML data and queries. We make three contributions.We first consider the problem of choosing the best views to materialize within a given space budget in order to improve the performance of a query workload. Our work is the first to address the view selection problem for a rich subset of XQuery. The challenges we face stem from the expressive power and features of both the query and view languages and from the size of the search space of candidate views to materialize. While the general problem has prohibitive complexity, we propose and study a heuristic algorithm and demonstrate its superior performance compared to the state of the art.Second, we consider the management of large XML corpora in peer-to-peer networks, based on distributed hash tables (or DHTs, in short). We consider a platform leveraging distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. This thesis has contributed important scalability oriented optimizations, as well as a comprehensive set of experiments deployed in a country-wide WAN. These experiments outgrow by orders of magnitude similar competitor systems in terms of data volumes and data dissemination throughput. Thus, they are the most advanced in understanding the performance behavior of DHT-based XML content management in real settings.Finally, we present a novel approach for scalable content-based publish/subscribe (pub/sub, in short) in the presence of constraints on the available computational resources of data publishers. We achieve scalability by off-loading subscriptions from the publisher, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. Our main contribution is a novel algorithm for organizing subscriptions in a multi-level dissemination network in order to serve large numbers of subscriptions, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre XML Web data Materialized views Query optimization View selection Publish/subscribe Data management
46	Answering Object Queries over Knowledge Bases with Expressive Underlying Description Logics Wu, Jiewen January 2013 (has links) Many information sources can be viewed as collections of objects and descriptions about objects. The relationship between objects is often characterized by a set of constraints that semantically encode background knowledge of some domain. The most straightforward and fundamental way to access information in these repositories is to search for objects that satisfy certain selection criteria. This work considers a description logics (DL) based representation of such information sources and object queries, which allows for automated reasoning over the constraints accompanying objects. Formally, a knowledge base K=(T, A) captures constraints in the terminology (a TBox) T, and objects with their descriptions in the assertions (an ABox) A, using some DL dialect L. In such a setting, object descriptions are L-concepts and object identifiers correspond to individual names occurring in K. Correspondingly, object queries are the well known problem of instance retrieval in the underlying DL knowledge base K, which returns the identifiers of qualifying objects. This work generalizes instance retrieval over knowledge bases to provide users with answers in which both identifiers and descriptions of qualifying objects are given. The proposed query paradigm, called assertion retrieval, is favoured over instance retrieval since it provides more informative answers to users. A more compelling reason is related to performance: assertion retrieval enables a transfer of basic relational database techniques, such as caching and query rewriting, in the context of an assertion retrieval algebra. The main contributions of this work are two-fold: one concerns optimizing the fundamental reasoning task that underlies assertion retrieval, namely, instance checking, and the other establishes a query compilation framework based on the assertion retrieval algebra. The former is necessary because an assertion retrieval query can entail a large volume of instance checking requests in the form of K\|= a:C, where "a" is an individual name and "C" is a L-concept. This work thus proposes a novel absorption technique, ABox absorption, to improve instance checking. ABox absorption handles knowledge bases that have an expressive underlying dialect L, for instance, that requires disjunctive knowledge. It works particularly well when knowledge bases contain a large number of concrete domain concepts for object descriptions. This work further presents a query compilation framework based on the assertion retrieval algebra to make assertion retrieval more practical. In the framework, a suite of rewriting rules is provided to generate a variety of query plans, with a focus on plans that avoid reasoning w.r.t. the background knowledge bases when sufficient cached results of earlier requests exist. ABox absorption and the query compilation framework have been implemented in a prototypical system, dubbed CARE Assertion Retrieval Engine (CARE). CARE also defines a simple yet effective cost model to search for the best plan generated by query rewriting. Empirical studies of CARE have shown that the proposed techniques in this work make assertion retrieval a practical application over a variety of domains. object queries assertion retrieval knowledge representation description logics query optimization absorption ontology based data access Computer Science
47	Gestion de flux de données pour l'observation de systèmes / Data stream management for systems monitoring Petit, Loïc 10 December 2012 (has links) La popularisation de la technologie a permis d'implanter des dispositifs et des applications de plus en plus développés à la portée d'utilisateurs non experts. Ces systèmes produisent des flux ainsi que des données persistantes dont les schémas et les dynamiques sont hétérogènes. Cette thèse s'intéresse à pouvoir observer les données de ces systèmes pour aider à les comprendre et à les diagnostiquer. Nous proposons tout d'abord un modèle algébrique Astral capable de traiter sans ambiguïtés sémantiques des données provenant de flux ou relations. Le moteur d'exécution Astronef a été développé sur l'architecture à composants orientés services pour permettre une grande adaptabilité. Il est doté d'un constructeur de requête permettant de choisir un plan d'exécution efficace. Son extension Asteroid permet de s'interfacer avec un SGBD pour gérer des données persistantes de manière intégrée. Nos contributions sont confrontées à la pratique par la mise en œuvre d'un système d'observation du réseau domestique ainsi que par l'étude des performances. Enfin, nous nous sommes intéressés à la mise en place de la personnalisation des résultats dans notre système par l'introduction d'un modèle de préférences top-k. / Due to the popularization of technology, non-expert people can now use more and more advanced devices and applications. Such systems produce data streams as well as persistent data with heterogeneous schemas and dynamics. This thesis is focused on monitoring data coming from those systems to help users to understand and to perform diagnosis on them. We propose an algebraic model Astral able to treat data coming from streams or relations without semantic ambiguity. The engine Astronef has been developed on top of a service-oriented component framework to enable a large adaptability. It embeds a query builder which can select a composition of components to provide an efficient query plan. Its extension Asteroid interfaces with a DBMS in order to manage persistent data in an integrated manner. Our contributions have been confronted to practice with the deployment of a monitoring system for the digital home and with a performance study. Finally, we extend our approach with an operator to personalize the results by introducing a top-k preference model. Flux de données Observation Algèbre Optimisation de requête Équivalence de requêtes Base de données Data stream Monitoring Algebra Query optimization Query equivalence Databases
48	Declarative parallel query processing on large scale astronomical databases / Traitement parallèle et déclaratif de requêtes sur des masses de données issues d'observations astronomiques Mesmoudi, Amin 03 December 2015 (has links) Les travaux de cette thèse s'inscrivent dans le cadre du projet Petasky. Notre objectif est de proposer des outils permettant de gérer des dizaines de Peta-octets de données issues d'observations astronomiques. Nos travaux se focalisent essentiellement sur la conception des nouveaux systèmes permettant de garantir le passage à l'échelle. Dans cette thèse, nos contributions concernent trois aspects : Benchmarking des systèmes existants, conception d'un nouveau système et optimisation du système. Nous avons commencé par analyser la capacité des systèmes fondés sur le modèle MapReduce et supportant SQL à gérer les données LSST et leurs capacités d'optimisation de certains types de requêtes. Nous avons pu constater qu'il n'y a pas de technique « magique » pour partitionner, stocker et indexer les données mais l'efficacité des techniques dédiées dépend essentiellement du type de requête et de la typologie des données considérées. Suite à notre travail de Benchmarking, nous avons retenu quelques techniques qui doivent être intégrées dans un système de gestion de données à large échelle. Nous avons conçu un nouveau système de façon à garantir la capacité dudit système à supporter plusieurs mécanismes de partitionnement et plusieurs opérateurs d'évaluation. Nous avons utilisé BSP (Bulk Synchronous Parallel) comme modèle de calcul. Les données sont représentées logiquement par des graphes. L'évaluation des requêtes est donc faite en explorant le graphe de données en utilisant les arcs entrants et les arcs sortants. Les premières expérimentations ont montré que notre approche permet une amélioration significative des performances par rapport aux systèmes Map/Reduce / This work is carried out in framework of the PetaSky project. The objective of this project is to provide a set of tools allowing to manage Peta-bytes of data from astronomical observations. Our work is concerned with the design of a scalable approach. We first started by analyzing the ability of MapReduce based systems and supporting SQL to manage the LSST data and ensure optimization capabilities for certain types of queries. We analyzed the impact of data partitioning, indexing and compression on query performance. From our experiments, it follows that there is no “magic” technique to partition, store and index data but the efficiency of dedicated techniques depends mainly on the type of queries and the typology of data that are considered. Based on our work on benchmarking, we identified some techniques to be integrated to large-scale data management systems. We designed a new system allowing to support multiple partitioning mechanisms and several evaluation operators. We used the BSP (Bulk Synchronous Parallel) model as a parallel computation paradigm. Unlike MapeReduce model, we send intermediate results to workers that can continue their processing. Data is logically represented as a graph. The evaluation of queries is performed by exploring the data graph using forward and backward edges. We also offer a semi-automatic partitioning approach, i.e., we provide the system administrator with a set of tools allowing her/him to choose the manner of partitioning data using the schema of the database and domain knowledge. The first experiments show that our approach provides a significant performance improvement with respect to Map/Reduce systems Benchmarking Passage à échelle Traitement parallèle Optimisation Graphes Astronomie Benchmark Scalability Parallel query evaluation Query optimization Graphs Astronomy 004.21
49	Principles for Distributed Databases in Telecom Environment / Principer för distribuerade databaser inom Telecom Miljö Ashraf, Imran, Khokhar, Amir Shahzed January 2010 (has links) Centralized databases are becoming bottleneck for organizations that are physically distributed and access data remotely. Data management is easy in centralized databases. However, it carries high communication cost and most importantly high response time. The concept of distributing the data over various locations is very attractive for such organizations. In such cases the database is fragmented into fragments and distributed to the locations where it is needed. This kind of distribution provides local control of data and the data access is also very fast in such databases. However, concurrency control, query optimization and data allocations are the factors that affect the response time and must be investigated prior to implementing distributed databases. This thesis makes the use of mixed method approach to meet its objective. In quantitative section, we performed an experiment to compare the response time of two databases; centralized and fragmented/distributed. The experiment was performed at Ericsson. A literature review was also done to find out other important response time related issues like query optimization, concurrency control and data allocation. The literature review revealed that these factors can further improve the response time in distributed environment. Results of the experiment showed a substantial decrease in the response time due to the fragmentation and distribution. / Centraliserade databaser blir flaskhals för organisationer som är fysiskt distribuerade och tillgång till data på distans. Datahantering är lätt i centrala databaser. Men bär den höga kostnaden kommunikation och viktigast av hög svarstid. Konceptet att distribuera data över olika orter är mycket attraktiv för sådana organisationer. I sådana fall databasen är splittrade fragment och distribueras till de platser där det behövs. Denna typ av distribution ger lokal kontroll av uppgifter och dataåtkomst är också mycket snabb i dessa databaser. Men, samtidighet kontroll, frågeoptimering och data anslagen är de faktorer som påverkar svarstiden och måste utredas innan genomförandet distribuerade databaser. Denna avhandling gör användningen av blandade metod strategi för att nå sitt mål. I kvantitativa delen utförde vi ett experiment för att jämföra svarstid på två databaser, centraliserad och fragmenterad / distribueras. Försöket utfördes på Ericsson. En litteraturstudie har gjorts för att ta reda på andra viktiga svarstid liknande frågor som frågeoptimering, samtidighet kontroll och data tilldelning. Litteraturgenomgången visade att dessa faktorer ytterligare kan förbättra svarstiden i distribuerad miljö. Resultaten av försöket visade en betydande minskning av den svarstid på grund av splittring och distribution. distributed databases centralized database fragmentation data allocation query processing query optimization concurrency control Computer Sciences Datavetenskap (datalogi)
50	Graph Models For Query Focused Text Summarization And Assessment Of Machine Translation Using Stopwords Rama, B 06 1900 (has links) (PDF) Text summarization is the task of generating a shortened version of the original text where core ideas of the original text are retained. In this work, we focus on query focused summarization. The task is to generate the summary from a set of documents which answers the query. Query focused summarization is a hard task because it expects the summary to be biased towards the query and at the same time important concepts in the original documents must be preserved with high degree of novelty. Graph based ranking algorithms which use biased random surfer model like Topic-sensitive LexRank have been applied to query focused summarization. In our work, we propose look-ahead version of Topic-sensitive LexRank. We incorporate the option of look-ahead in the random walk model and we show that it helps in generating better quality summaries. Next, we consider assessment of machine translation. Assessment of a machine translation output is important for establishing benchmarks for translation quality. An obvious way to assess the quality of machine translation is through the perception of human subjects. Though highly reliable, this approach is not scalable and is time consuming. Hence mechanisms have been devised to automate the assessment process. All such assessment methods are essentially a study of correlations between human translation and the machine translation. In this work, we present a scalable approach to assess the quality of machine translation that borrows features from the study of writing styles, popularly known as Stylometry. Towards this, we quantify the characteristic styles of individual machine translators and compare them with that of human generated text. The translator whose style is closest to human style is deemed to generate a higher quality translation. We show that our approach is scalable and does not require actual source text translations for evaluation. Natural Language Processing Abstracting Query Optimization Machine Translation Text Summarization Query Focused Summarization Machine Translators Computer Science

Search results