Global ETD Search

271	Feeding a data warehouse with data coming from web services. A mediation approach for the DaWeS prototype / Alimenter un entrepôt de données par des données issues de services web. Une approche médiation pour le prototype DaWeS Samuel, John 06 October 2014 (has links) Cette thèse traite de l’établissement d’une plateforme logicielle nommée DaWeS permettant le déploiement et la gestion en ligne d’entrepôts de données alimentés par des données provenant de services web et personnalisés à destination des petites et moyennes entreprises. Ce travail s’articule autour du développement et de l’expérimentation de DaWeS. L’idée principale implémentée dans DaWeS est l’utilisation d’une approche virtuelle d’intégration de données (la médiation) en tant queprocessus ETL (extraction, transformation et chargement des données) pour les entrepôts de données gérés par DaWeS. A cette fin, un algorithme classique de réécriture de requêtes (l’algorithme inverse-rules) a été adapté et testé. Une étude théorique sur la sémantique des requêtes conjonctives et datalog exprimées avec des relations munies de limitations d’accès (correspondant aux services web) a été menée. Cette dernière permet l’obtention de bornes supérieures sur les nombres d’appels aux services web requis dans l’évaluation de telles requêtes. Des expérimentations ont été menées sur des services web réels dans trois domaines : le marketing en ligne, la gestion de projets et les services d’aide aux utilisateurs. Une première série de tests aléatoires a été effectuée pour tester le passage à l’échelle. / The role of data warehouse for business analytics cannot be undermined for any enterprise, irrespective of its size. But the growing dependence on web services has resulted in a situation where the enterprise data is managed by multiple autonomous and heterogeneous service providers. We present our approach and its associated prototype DaWeS [Samuel, 2014; Samuel and Rey, 2014; Samuel et al., 2014], a DAta warehouse fed with data coming from WEb Services to extract, transform and store enterprise data from web services and to build performance indicators from them (stored enterprise data) hiding from the end users the heterogeneity of the numerous underlying web services. Its ETL process is grounded on a mediation approach usually used in data integration. This enables DaWeS (i) to be fully configurable in a declarative manner only (XML, XSLT, SQL, datalog) and (ii) to make part of the warehouse schema dynamic so it can be easily updated. (i) and (ii) allow DaWeS managers to shift from development to administration when they want to connect to new web services or to update the APIs (Application programming interfaces) of already connected ones. The aim is to make DaWeS scalable and adaptable to smoothly face the ever-changing and growing web services offer. We point out the fact that this also enables DaWeS to be used with the vast majority of actual web service interfaces defined with basic technologies only (HTTP, REST, XML and JSON) and not with more advanced standards (WSDL, WADL, hRESTS or SAWSDL) since these more advanced standards are not widely used yet to describe real web services. In terms of applications, the aim is to allow a DaWeS administrator to provide to small and medium companies a service to store and query their business data coming from their usage of third-party services, without having to manage their own warehouse. In particular, DaWeS enables the easy design (as SQL Queries) of personalized performance indicators. We present in detail this mediation approach for ETL and the architecture of DaWeS. Besides its industrial purpose, working on building DaWeS brought forth further scientific challenges like the need for optimizing the number of web service API operation calls or handling incomplete information. We propose a bound on the number of calls to web services. This bound is a tool to compare future optimization techniques. We also present a heuristics to handle incomplete information. Services Web Integration de données Entrepôt de données Web Services Application Programming Interface (API) Data Integration Incomplete Information Datalog Query Limited Access Patterns Adaptability Data Warehouse Conjunctive Query
272	Usage-driven unified model for user profile and data source profile extraction / Model unifié dérigé par l'usage pour l'extraction du profile de l'utilisateur et de la source de donnée Limam, Lyes 24 June 2014 (has links) La problématique traitée dans la thèse s’inscrit dans le cadre de l’analyse d’usage dans les systèmes de recherche d’information. En effet, nous nous intéressons à l’utilisateur à travers l’historique de ses requêtes, utilisées comme support d’analyse pour l’extraction d'un profil d’usage. L’objectif est de caractériser l’utilisateur et les sources de données qui interagissent dans un réseau afin de permettre des comparaisons utilisateur-utilisateur, source-source et source-utilisateur. Selon une étude que nous avons menée sur les travaux existants sur les modèles de profilage, nous avons conclu que la grande majorité des contributions sont fortement liés aux applications dans lesquelles ils étaient proposés. En conséquence, les modèles de profils proposés ne sont pas réutilisables et présentent plusieurs faiblesses. Par exemple, ces modèles ne tiennent pas compte de la source de données, ils ne sont pas dotés de mécanismes de traitement sémantique et ils ne tiennent pas compte du passage à l’échelle (en termes de complexité). C'est pourquoi, nous proposons dans cette thèse un modèle d’utilisateur et de source de données basé sur l’analyse d’usage. Les caractéristiques de ce modèle sont les suivantes. Premièrement, il est générique, permettant de représenter à la fois un utilisateur et une source de données. Deuxièmement, il permet de construire le profil de manière implicite à partir de l’historique de requêtes de recherche. Troisièmement, il définit le profil comme un ensemble de centres d’intérêts, chaque intérêt correspondant à un cluster sémantique de mots-clés déterminé par un algorithme de clustering spécifique. Et enfin, dans ce modèle le profil est représenté dans un espace vectoriel. Les différents composants du modèle sont organisés sous la forme d’un Framework, la complexité de chaque composant y est évaluée. Le Framework propose : - une méthode pour la désambigüisation de requêtes; - une méthode pour la représentation sémantique des logs sous la forme d’une taxonomie ; - un algorithme de clustering qui permet l’identification rapide et efficace des centres d’intérêt représentés par des clusters sémantiques de mots clés ; - une méthode pour le calcul du profil de l’utilisateur et du profil de la source de données à partir du modèle générique. Le Framework proposé permet d'effectuer différentes tâches liées à la structuration d’un environnement distribué d’un point de vue usage. Comme exemples d’application, le Framework est utilisé pour la découverte de communautés d’utilisateurs et la catégorisation de sources de données. Pour la validation du Framework, une série d’expérimentations est menée en utilisant des logs du moteur de recherche AOL-search, qui ont démontrées l’efficacité de la désambigüisation sur des requêtes courtes, et qui ont permis d’identification de la relation entre le clustering basé sur une fonction de qualité et le clustering basé sur la structure. / This thesis addresses a problem related to usage analysis in information retrieval systems. Indeed, we exploit the history of search queries as support of analysis to extract a profile model. The objective is to characterize the user and the data source that interact in a system to allow different types of comparison (user-to-user, source-to-source, user-to-source). According to the study we conducted on the work done on profile model, we concluded that the large majority of the contributions are strongly related to the applications within they are proposed. As a result, the proposed profile models are not reusable and suffer from several weaknesses. For instance, these models do not consider the data source, they lack of semantic mechanisms and they do not deal with scalability (in terms of complexity). Therefore, we propose a generic model of user and data source profiles. The characteristics of this model are the following. First, it is generic, being able to represent both the user and the data source. Second, it enables to construct the profiles in an implicit way based on histories of search queries. Third, it defines the profile as a set of topics of interest, each topic corresponding to a semantic cluster of keywords extracted by a specific clustering algorithm. Finally, the profile is represented according to the vector space model. The model is composed of several components organized in the form of a framework, in which we assessed the complexity of each component. The main components of the framework are: - a method for keyword queries disambiguation; - a method for semantically representing search query logs in the form of a taxonomy; - a clustering algorithm that allows fast and efficient identification of topics of interest as semantic clusters of keywords; - a method to identify user and data source profiles according to the generic model. This framework enables in particular to perform various tasks related to usage-based structuration of a distributed environment. As an example of application, the framework is used to the discovery of user communities, and the categorization of data sources. To validate the proposed framework, we conduct a series of experiments on real logs from the search engine AOL search, which demonstrate the efficiency of the disambiguation method in short queries, and show the relation between the quality based clustering and the structure based clustering. Informatique Recherche d'information Analyse de logs Query log analysis Clustering Profil utilisateur Desambiguisation de requêtes Information Technology Information retrieval Clustering User profile Query disambiguation 025.040 72
273	QTor : Une approche communautaire pour l'évaluation de requêtes / QTor : Using communities to evaluate queries Dufromentel-Fougerit, Sébastien 09 December 2016 (has links) Cette thèse porte sur la mise en place d'un système de requêtage sur des flux sous contraintes de capacités. Ce système est porté par ses utilisateurs-trices et basé sur les similitudes entre requêtes. Les relations d'équivalences entre les différentes requêtes permettent de réunir les participants au sein de communautés d'intérêt. Celles-ci forment alors une abstraction permettant de séparer le problème d'organisation du système en plusieurs sous-problèmes plus simples et de taille réduite. Afin de garantir une généricité vis-à-vis du langage, l'organisation repose sur une API simple et modulable. Nous avons ainsi recours au mécanisme de réécritures de requêtes utilisant des vues matérialisées, connu en bases de données, pour déterminer les relations possibles entre les communautés. Le choix entre ces différentes possibilités est ensuite effectué à l'aide d'un modèle de coût paramétrable. Les relations entre communautés sont concrétisées par un échange de ressources entre elles, un participant de l'une venant contribuer à l'autre. Cela permet de s'affranchir des limitations de capacités au niveau abstrait, tout en en tenant hautement compte pour la mise en relation effective des participants. Au sein des communautés, un arbre de diffusion permet à l'ensemble des participants de récupérer les résultats requis. L'approche, mise en œuvre de manière incrémentale, permet une réduction efficace des coûts de calcul et de diffusion (l'optimalité est atteinte, notamment, dans le cas de l'inclusion de requête) pour un coût d'organisation limité et une latence raisonnable. Les expérimentations réalisées ont montré une grande adaptabilité aux variations concernant les requêtes exprimées et les capacités des participants. Le démonstrateur mis en place peut être utilisé à la fois pour des simulations (automatiques ou interactives) et pour un déploiement réel, par une implémentation commune générique vis-à-vis du langage. / This thesis addresses the problem of the organization of querying system on data streams under capacity constraints, such system being user-powered and based on the queries' similarity. Equivalence relations between queries allow to group the participants into communities. Those communities are then used as an abstraction to split the general organization problem into several easier and smaller subproblems. In order to stay language-independent, the organization is based on a simple and modular API, that rely on a query answering using views mechanism, well known in databases. Choice between the different rewritten queries is done using an adjustable cost model. Relations between communities are thus materialized by a spreading mechanism, a participant from one community joining the other(s) to contribute. This allows to avoid the capacities problem on the organization's abstract level, while efficiently taking care of it on the concrete one. Inside the communities, all the participants receive the common results they need using a spanning tree. The QTor approach, incrementally built, allows an efficient reduce of the processing and diffusion costs (processing cost being optimal in some cases, e.g. containment) with a reasonable latency, for a limited organization cost. Experiments have shown that the organization is flexible, regarding both the expressed queries and the participants' capacities. A demonstrator was built, allowing to both perform (automatic or interactive) simulations, and deploy the system over a real network, with a single. Informatique Requête Requêts continues sur flux Communauté d'intérêt Réécriture de requêtes Information Technology Query Continuous queries over streams Query rewriting 025.040 72
274	Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environments Teixeira, Jefferson William 29 May 2015 (has links) Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies. Bitmap index Consulta OLAP Consulta por similaridade Data warehouse de imagens Image data warehouse Índice bitmap Método de acesso métrico Metric access method OLAP query Similarity query
275	Querying a Web of Linked Data Hartig, Olaf 28 July 2014 (has links) In den letzten Jahren haben sich spezielle Prinzipien zur Veröffentlichung strukturierter Daten im World Wide Web (WWW) etabliert. Diese Prinzipien erlauben es, von den jeweils angebotenen Daten auf weitere, nach den selben Prinzipien veröffentlichten Daten zu verweisen. Die daraus resultierende Form von Web-Daten wird entsprechend als Linked Data bezeichnet. Mit der Veröffentlichung von Linked Data im WWW entsteht ein sehr großer Datenraum, welcher Daten verschiedenster Anbieter miteinander verbindet und neuartige Möglichkeiten für Web-basierte Anwendungen bietet. Als Basis für die Entwicklung solcher Anwendungen haben mehrere Forschungsgruppen begonnen, Ansätze zu untersuchen, welche diesen Datenraum als eine Art verteilte Datenbank auffassen und die Ausführung deklarativer Anfragen über dieser Datenbank ermöglichen. Forschungsarbeit zu theoretischen Grundlagen der untersuchten Ansätze fehlt jedoch nahezu vollständig. Die vorliegende Dissertation schließt diese Lücke. / During recent years a set of best practices for publishing and connecting structured data on the World Wide Web (WWW) has emerged. These best practices are referred to as the Linked Data principles and the resulting form of Web data is called Linked Data. The increasing adoption of these principles has lead to the creation of a globally distributed space of Linked Data that covers various domains such as government, libraries, life sciences, and media. Approaches that conceive this data space as a huge distributed database and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. However, the main focus of existing work is to address practical challenges that arise in this context. Research on the foundations of such approaches is largely missing. This dissertation closes this gap. Anfragebearbeitung SPARQL Linked Data RDF Anfragesemantik WWW Linked Data RDF SPARQL query processing query semantics WWW 004 Informatik 28 Informatik, Datenverarbeitung ST 265 ddc:004
276	Querying semistructured data based on schema matching Bergholz, André 24 January 2000 (has links) Daten werden noch immer groesstenteils in Dateien und nicht in Datenbanken gespeichert. Dieser Trend wird durch den Internetboom der 90er Jahre nur noch verstaerkt. Daraus ist das Forschungsgebiet der semistrukturierten Daten entstanden. Semistrukturierte Daten sind Daten, die meist in Dokumenten gespeichert sind und eine implizite und irregulaere Struktur aufweisen. HTML- oder BibTeX-Dateien oder in ASCII-Dateien gespeicherte Genomdaten sind Beispiele. Traditionelles Datenbankmanagement erfordert Design und sichert Deklarativitaet zu. Dies ist im Umfeld der semistrukturierten Daten nicht gegeben, ein flexiblerer Ansatz wird gebraucht. In dieser Arbeit wird ein neuer Ansatz des Abfragens semistrukturierter Daten praesentiert. Wir schlagen vor, semistrukturierte Daten durch eine Menge von partiellen Schemata zu beschreiben, anstatt zu versuchen, ein globales Schema zu definieren. Letzteres ist zwar geeignet, einen effizienten Zugriff auf Daten zu ermoeglichen; ein globales Schema fuer semistrukturierte Daten leidet aber zwangslaeufig an der Irregularitaet der Struktur der Daten. Wegen der vielen Ausnahmen vom intendierten Schema wird ein globales Schema schnell sehr gross und wenig repraesentativ. Damit wird dem Nutzer ein verzerrtes Bild ueber die Daten gegeben. Hingegen koennen partielle Schemata eher ein repraesentatives Bild eines Teils der Daten darstellen. Mit Hilfe statistischer Methoden kann die Guete eines partiellen Schemas bewertet werden, ebenso koennen irrelevante Teile der Datenbank identifiziert werden. Ein Datenbanksystem, das auf partiellen Schemata basiert, ist flexibler und reflektiert den Grad der Strukturierung auf vielen Ebenen. Seine Benutzbarkeit und seine Performanz steigen mit einem hoeheren Grad an Struktur und mit seiner Nutzungsdauer. Partielle Schemata koennen auf zwei Arten gewonnen werden. Erstens koennen sie durch einen Datenbankdesigner bereitgestellt werden. Es ist so gut wie unmoeglich, eine semistrukturierte Datenbank komplett zu modellieren, das Modellieren gewisser Teile ist jedoch denkbar. Zweitens koennen partielle Schemata aus Benutzeranfragen gewonnen werden, wenn nur die Anfragesprache entsprechend entworfen und definiert wird. Wir schlagen vor, eine Anfrage in einen ``Was''- und einen ``Wie''-Teil aufzuspalten. Der ``Was''-Teil wird durch partielle Schemata repraesentiert. Partielle Schemata beinhalten reiche semantische Konzepte, wie Variablendefinitionen und Pfadbeschreibungen, die an Konzepte aus Anfragesprachen angelehnt sind. Mit Variablendefinitionen koennen verschiedene Teile der Datenbank miteinander verbunden werden. Pfadbeschreibungen helfen, durch das Zulassen einer gewissen Unschaerfe, die Irregularitaet der Struktur der Daten zu verdecken. Das Finden von Stellen der Datenbank, die zu einem partiellen Schema passen, bildet die Grundlage fuer alle Arten von Anfragen. Im ``Wie''-Teil der Anfrage werden die gefundenen Stellen der Datenbank fuer die Antwort modifiziert. Dabei koennen Teile der gefundenen Entsprechungen des partiellen Schemas ausgeblendet werden oder auch die Struktur der Antwort voellig veraendert werden. Wir untersuchen die Ausdrucksstaerke unserer Anfragesprache, in dem wir einerseits die Operatoren der relationalen Algebra abbilden und andererseits das Abfragen von XML-Dokumenten demonstrieren. Wir stellen fest, dass das Finden der Entsprechungen eines Schemas (wir nennen ein partielles Schema in der Arbeit nur Schema) den aufwendigsten Teil der Anfragebearbeitung ausmacht. Wir verwenden eine weitere Abstraktionsebene, die der Constraint Satisfaction Probleme, um die Entsprechungen eines Schemas in einer Datenbank zu finden. Constraint Satisfaction Probleme bilden eine allgemeine Klasse von Suchproblemen. Fuer sie existieren bereits zahlreiche Optimierungsalgorithmen und -heuristiken. Die Grundidee besteht darin, Variablen mit zugehoerigen Domaenen einzufuehren und dann die Werte, die verschiedene Variablen gleichzeitig annehmen koennen, ueber Nebenbedingungen zu steuern. In unserem Ansatz wird das Schema in Variablen ueberfuehrt, die Domaenen werden aus der Datenbank gebildet. Nebenbedingungen ergeben sich aus den im Schema vorhandenen Praedikaten, Variablendefinitionen und Pfadbeschreibungen sowie aus der Graphstruktur des Schemas. Es werden zahlreiche Optimierungstechniken fuer Constraint Satisfaction Probleme in der Arbeit vorgestellt. Wir beweisen, dass die Entsprechungen eines Schemas in einer Datenbank ohne Suche und in polynomialer Zeit gefunden werden koennen, wenn das Schema ein Baum ist, keine Variablendefinitionen enthaelt und von der Anforderung der Injektivitaet einer Einbettung abgesehen wird. Zur Optimierung wird das Enthaltensein von Schemata herangezogen. Das Enthaltensein von Schemata kann auf zwei Weisen, je nach Richtung der Enthaltenseinsbeziehung, genutzt werden: Entweder kann der Suchraum fuer ein neues Schema reduziert werden oder es koennen die ersten passenden Stellen zu einem neuen Schema sofort praesentiert werden. Der gesamte Anfrageansatz wurde prototypisch zunaechst in einem Public-Domain Prolog System, spaeter im Constraintsystem ECLiPSe implementiert und mit Anfragen an XML-Dokumente getestet. Dabei wurden die Auswirkungen verschiedener Optimierungen getestet. Ausserdem wird eine grafische Benutzerschnittstelle zur Verfuegung gestellt. / Most of today's data is still stored in files rather than in databases. This fact has become even more evident with the growth of the World Wide Web in the 1990s. Because of that observation, the research area of semistructured data has evolved. Semistructured data is typically stored in documents and has an irregular, partial, and implicit structure. The thesis presents a new framework for querying semistructured data. Traditional database management requires design and ensures declarativity. The possibilities to design are limited in the field of semistructured data, thus, a more flexible approach is needed. We argue that semistructured data should be represented by a set of partial schemata rather than by one complete schema. Because of irregularities of the data, a complete schema would be very large and not representative. Instead, partial schemata can serve as good representations of parts of the data. While finding a complete schema turns out to be difficult, a database designer may be able to provide partial schemata for the database. Also, partial schemata can be extracted from user queries if the query language is designed appropriately. We suggest to split the notion of query into a ``What''- and a ``How''-part. Partial schemata represent the ``What''-part. They cover semantically richer concepts than database schemata traditionally do. Among these concepts are predicates, variable definitions, and path descriptions. Schemata can be used for query optimization, but they also give users hints on the content of the database. Finding the occurrences (matches) of such a schema forms the most important part of query execution. All queries of our approach, such as the focus query or the transformation query, are based on this matching. Query execution can be optimized using knowledge about containment relationships between different schemata. Our approach and the optimization techniques are conceptually modeled and implemented as a prototype on the basis of Constraint Satisfaction Problems (CSPs). CSPs form a general class of search problems for which many techniques and heuristics exist. A CSP consists of variables that have a domain associated to them. Constraints restrict the values that variables can simultaneously take. We transform the problem of finding the matches of a schema in a database to a CSP. We prove that under certain conditions the matches of a schema can be found without any search and in polynomial time. For optimization purposes the containment relationship between schemata is explored. We formulate a sufficient condition for schema containment and test it again using CSP techniques. The containment relationship can be used in two ways depending on the direction of the containment: It is either possible to reduce the search space when looking for matches of a schema, or it is possible to present the first few matches immediately without any search. Our approach has been implemented into the constraint system ECLiPSe and tested using XML documents. Semistrukturierte Daten Anfragesprachen Anfragebearbeitung Constraint Satisfaction Probleme Semistructured data Query languages Query processing Constraint Satisfaction Problems 510 Mathematik 27 Mathematik ST 270 ddc:510
277	AQuES Stillger, Michael 21 January 2000 (has links) Die parallele Anfragebearbeitung für relationale Datenbankmanagementsysteme (RDBMS) ist wegen ihrer unterschiedlichen Arten der Ausführungsparallelität und den Eigenschaften der zugrunde liegenden parallelen Architektur ein äusserst komplexes Problem. Systemänderungen zur Laufzeit der Anfrage können zusätzlich ein dynamisches Verhalten der ausführenden Komponenten erfordern, um eine nahezu optimale Antwortzeit zu gewährleisten. Diese Arbeit stellt einen neuen, flexiblen Ansatz für die Optimierung und Abarbeitung von komplexen Anfragen vor, der besonders die dynamische Optimierung berücksichtigt. Insbesondere werden in der Arbeit folgende Teile präsentiert: 1. die Architektur eines neuen, verteilt-kooperierenden Komponentensystems beeinflusst von agenten-orientierten Konzepten; 2. der Entwurf und die Realisierung einer neuen Kommunikationsinfrastruktur für die identifizierten Systemkomponenten; 3. der Entwurf und die Implementierung eines flexiblen Anfrageoptimierers mit einem neuen, zufallsbasierten Algorithmus; und 4. der Entwurf und die Realisierung einer parallel arbeitenden Ausführungskomponente unter besonderer Berücksichtigung der dynamischen Anfrageoptimierung. Bei der Entwicklung der Konzepte standen neben den spezifischen Anforderungen für RDBMS besonders die Konfigurierbarkeit und die Erweiterbarkeit des verteilten Systems im Vordergrund. / Parallel query evaluation for relational database management systems (RDBSM) still remains a challenging problem. Modern systems must show near optimal performance in spite of running in a heterogeneous hardware environment, exploiting different ways of parallelism and dealing with unpredictable system load. This thesis paper presents a dynamic and flexible system addressing the issues of optimization and evaluation of relational queries for a distributed and dynamic environment. In particular, this work consists of: 1) the architecture of a distributed system which was inspired by the concepts of software agents, 2) the architecture and the implementation of a communication infrastructure for the system components, 3) the architecture and the implementation of a new query optimization algorithm, and 4) the concept and the implementation of a new query evaluation engine for parallel execution, which enables runtime optimization of queries. Furthermore, the design supports the extension and the configuration of the system and its components. Anfragebearbeitung Datenbankmanagementsystem Anfrageoptimierung Software-Agenten database management systems query processing query optimization software agents 510 Mathematik 27 Mathematik ST 270 ddc:510
278	針對複合式競賽挑選最佳球員組合的方法 / Selecting the best group of players for a composite competition 鄧雅文, Teng, Ya Wen Unknown Date (has links) 在資料庫的處理中，top-k查詢幫助使用者從龐大的資料中萃取出具有價值的物件，它將資料庫中的物件依照給分公式給分後，選擇出分數最高的前k個回傳給使用者。然而在多數的情況下，一個物件也許不只有一個分數，要如何在多個分數中仍然選擇出整體最高分的前k個物件，便成為一個新的問題。在本研究中，我們將這樣的物件用不確定資料來表示，而每個物件的不確定性則是其帶有機率的分數以表示此分數出現的可能性，並提出一個新的問題：Best-kGROUP查詢。在此我們將情況模擬為一個複合式競賽，其中有多個子項目，每個項目的參賽人數各異，且最多需要k個人參賽；我們希望能針對此複合式競賽挑選出最佳的k個球員組合。當我們定義一個較佳的組合為其在較多項目居首位的機率比另一組合高，而最佳的組合則是沒有比它更佳的組合。為了加快挑選的速度，我們利用動態規劃的方式與篩選的演算法，將不可能的組合先剔除；所剩的組合則是具有天際線特質的組合，在這些天際線組合中，我們可以輕易的找出最佳的組合。此外，在實驗中，對於在所有球員中挑選最佳的組合，Best-kGROUP查詢也有非常優異的表現。 / In a large database, top-k query is an important mechanism to retrieve the most valuable information for the users. It ranks data objects with a ranking function and reports the k objects with the highest scores. However, when an object has multiple scores, how to rank objects without information loss becomes challenging. In this paper, we model the object with multiple scores as an uncertain data object and the uncertainty of the object as a distribution of the scores, and consider a novel problem named Best-kGROUP query. Imagine the following scenario. Assume there is a composite competition consisting of several games each of which requires a distinct number of players. Suppose the largest number is k, and we want to select the best group of k players from all the players for the competition. A group x is considered better than another group y if x has higher aggregated probability to be the top ones in more games than y. In order to speed up the selection process, the groups worse than another group definitely should first be discarded. We identify these groups using a dynamic programming based approach and a filtering algorithm. The remaining groups with the property that none of them have higher aggregated probability to be the top ones for all games against the other groups are called skyline groups. From these skyline groups, we can easily compare them to select the best group for the composite competition. The experiments show that our approach outperforms the other approaches in selecting the best group to defeat the other groups in the composite competitions. Top-k查詢不確定資料 Best-kGROUP查詢 Top-k query uncertain data Best-kGROUP query skyline kGROUP
279	QPPT: Query Processing on Prefix Trees Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 28 May 2013 (has links) (PDF) Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that offer a balanced read/write performance. The indexed tableat-a-time processing model allows the efficient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases. Datenbanksystem Query Processing Sonderforschungsbereich 912 Hochadaptive Energieeffiziente Systeme Database system query processing Collaborative Research Centre 912 ddc:004 rvk:ST 270
280	A Methodology for Domain-Specific Conceptual Data Modeling and Querying Tian, Hao 02 May 2007 (has links) Traditional data management technologies originating from business domain are currently facing many challenges from other domains such as scientific research. Data structures in databases are becoming more and more complex and data query functions are moving from the back-end database level towards the front-end user-interface level. Traditional query languages such as SQL, OQL, and form-based query interfaces cannot fully meet the needs today. This research is motivated by the data management issues in life science applications. I propose a methodology for domain-specific conceptual data modeling and querying. The methodology can be applied to any domain to capture more domain semantics and empower end-users to formulate a query at the conceptual level with terminologies and functions familiar to them. The query system resulting from the methodology is designed to work on all major types of database management systems (DBMS) and support end-users to dynamically define and add new domain-specific functions. That is, all user-defined functions can be either pre-defined by domain experts and/or data model creators at the time of system creation, or dynamically defined by end-users from the client side at any time. The methodology has a domain-specific conceptual data model (DSC-DM) and a domain-specific conceptual query language (DSC-QL). DSC-QL uses only the abstract concepts, relationships, and functions defined in DSC-DM. It is a user-oriented high level query language and intentionally designed to be flexible, extensible, and readily usable. DSC-QL queries are much simpler than corresponding SQL or OQL queries because of advanced features such as user-defined functions, composite and set attributes, dot-path expressions, and super-classes. DSC-QL can be translated into SQL and OQL through a dynamic mapping function, and automatically updated when the underlying database schema evolves. The operational and declarative semantics of DSC-QL are formally defined in terms of graphs. A normal form for DSC-QL as a standard format for the mappings from flexible conceptual expressions to restricted SQL or OQL statements is also defined. Two translation algorithms from normalized DSC-QL to SQL and OQL are introduced. Through comparison, DSC-QL is shown to have very good balance between simplicity and expressive power and is suitable for end-users. Implementation details of the query system are reported as well. Two prototypes have been built. One prototype is for neuroscience domain, which is built on an object-oriented DBMS. The other one is for traditional business domain, which is built on a relational DBMS. Form-Based Query Interface Graph-Based Semantics Domain-Specific Functions User-Defined Functions Life Science Data Normal Form Methodology Data Model Domain-Specific Conceptual Query Language Computer Sciences

Search results