71 |
Avaliação do Star Schema Benchmark aplicado a bancos de dados NoSQL distribuídos e orientados a colunas / Evaluation of the Star Schema Benchmark applied to NoSQL column-oriented distributed databases systemsLucas de Carvalho Scabora 06 May 2016 (has links)
Com o crescimento do volume de dados manipulado por aplicações de data warehousing, soluções centralizadas tornam-se muito custosas e enfrentam dificuldades para tratar a escalabilidade do volume de dados. Nesse sentido, existe a necessidade tanto de se armazenar grandes volumes de dados quanto de se realizar consultas analíticas (ou seja, consultas OLAP) sobre esses dados volumosos de forma eficiente. Isso pode ser facilitado por cenários caracterizados pelo uso de bancos de dados NoSQL gerenciados em ambientes paralelos e distribuídos. Dentre os desafios relacionados a esses cenários, destaca-se a necessidade de se promover uma análise de desempenho de aplicações de data warehousing que armazenam os dados do data warehouse (DW) em bancos de dados NoSQL orientados a colunas. A análise experimental e padronizada de diferentes sistemas é realizada por meio de ferramentas denominadas benchmarks. Entretanto, benchmarks para DW foram desenvolvidos majoritariamente para bancos de dados relacionais e ambientes centralizados. Nesta pesquisa de mestrado são investigadas formas de se estender o Star Schema Benchmark (SSB), um benchmark de DW centralizado, para o banco de dados NoSQL distribuído e orientado a colunas HBase. São realizadas propostas e análises principalmente baseadas em testes de desempenho experimentais considerando cada uma das quatro etapas de um benchmark, ou seja, esquema e carga de trabalho, geração de dados, parâmetros e métricas, e validação. Os principais resultados obtidos pelo desenvolvimento do trabalho são: (i) proposta do esquema FactDate, o qual otimiza consultas que acessam poucas dimensões do DW; (ii) investigação da aplicabilidade de diferentes esquemas a cenários empresariais distintos; (iii) proposta de duas consultas adicionais à carga de trabalho do SSB; (iv) análise da distribuição dos dados gerados pelo SSB, verificando se os dados agregados pelas consultas OLAP estão balanceados entre os nós de um cluster; (v) investigação da influência de três importantes parâmetros do framework Hadoop MapReduce no processamento de consultas OLAP; (vi) avaliação da relação entre o desempenho de consultas OLAP e a quantidade de nós que compõem um cluster; e (vii) proposta do uso de visões materializadas hierárquicas, por meio do framework Spark, para otimizar o desempenho no processamento de consultas OLAP consecutivas que requerem a análise de dados em níveis progressivamente mais ou menos detalhados. Os resultados obtidos representam descobertas importantes que visam possibilitar a proposta futura de um benchmark para DWs armazenados em bancos de dados NoSQL dentro de ambientes paralelos e distribuídos. / Due to the explosive increase in data volume, centralized data warehousing applications become very costly and are facing several problems to deal with data scalability. This is related to the fact that these applications need to store huge volumes of data and to perform analytical queries (i.e., OLAP queries) against these voluminous data efficiently. One solution is to employ scenarios characterized by the use of NoSQL databases managed in parallel and distributed environments. Among the challenges related to these scenarios, there is a need to investigate the performance of data warehousing applications that store the data warehouse (DW) in column-oriented NoSQL databases. In this context, benchmarks are widely used to perform standard and experimental analysis of distinct systems. However, most of the benchmarks for DW focus on relational database systems and centralized environments. In this masters research, we investigate how to extend the Star Schema Benchmark (SSB), which was proposed for centralized DWs, to the distributed and column-oriented NoSQL database HBase. We introduce proposals and analysis mainly based on experimental performance tests considering each one of the four steps of a benchmark, i.e. schema and workload, data generation, parameters and metrics, and validation. The main results described in this masters research are described as follows: (i) proposal of the FactDate schema, which optimizes queries that access few dimensions of the DW; (ii) investigation of the applicability of different schemas for different business scenarios; (iii) proposal of two additional queries to the SSB workload; (iv) analysis of the data distribution generated by the SSB, verifying if the data aggregated by OLAP queries are balanced between the nodes of a cluster; (v) investigation of the influence caused by three important parameters of the Hadoop MapReduce framework in the OLAP query processing; (vi) evaluation of the relationship between the OLAP query performance and the number of nodes of a cluster; and (vii) employment of hierarchical materialized views using the Spark framework to optimize the processing performance of consecutive OLAP queries that require progressively more or less aggregated data. These results represent important findings that enable the future proposal of a benchmark for DWs stored in NoSQL databases and managed in parallel and distributed environments.
|
72 |
Acesso a dados baseado em ontologias com NoSQL / Ontology-based data access with NoSQLBarbara Tieko Agena 27 November 2017 (has links)
O acesso a dados baseado em ontologia (OBDA, de Ontology-Based Data Access) propõe facilitar ao usuário acesso a dados sem o conhecimento específico de como eles estão armazenados em suas fontes. Para isso, faz-se uso de uma ontologia como camada conceitual de alto nível, explorando sua capacidade de descrever o domínio e lidar com a incompletude dos dados. Atualmente, os sistemas NoSQL (Not Only SQL) estão se tornando populares, oferecendo recursos que os sistemas de bancos de dados relacionais não suportam. Desta forma, surgiu a necessidade dos sistemas OBDA se moldarem a estes novos tipos de bancos de dados. O objetivo desta pesquisa é propor uma arquitetura nova para sistemas OBDA possibilitando o acesso a dados em bancos de dados relacionais e bancos de dados NoSQL. Para tal, foi proposta a utilização de um mapeamento mais simples responsável pela comunicação entre ontologia e bancos de dados. Foram construídos dois protótipos de sistemas OBDA para sistemas NoSQL e sistemas de bancos de dados relacional para uma validação empírica da arquitetura proposta neste trabalho. / Ontology-based data access (OBDA) proposes to facilitate user access to data without specific knowledge of how they are stored in their sources. For this, an ontology is used as a high level conceptual layer, exploring its capacity to describe the domain and deal with the incompleteness of the data. Currently, NoSQL (Not Only SQL) systems are becoming popular, offering features that relational database systems do not support. In this way, the need arose for shaping OBDA systems to deal with these new types of databases. The objective of this research is to propose a new architecture for OBDA systems allowing access to data in relational databases and NoSQL databases. For this, we propose the use of a simpler mapping responsible for the communication between ontology and databases. Two OBDA system prototypes were constructed: one for NoSQL systems and one for relational database systems for an empirical validation.
|
73 |
Acesso a dados baseado em ontologias com NoSQL / Ontology-based data access with NoSQLAgena, Barbara Tieko 27 November 2017 (has links)
O acesso a dados baseado em ontologia (OBDA, de Ontology-Based Data Access) propõe facilitar ao usuário acesso a dados sem o conhecimento específico de como eles estão armazenados em suas fontes. Para isso, faz-se uso de uma ontologia como camada conceitual de alto nível, explorando sua capacidade de descrever o domínio e lidar com a incompletude dos dados. Atualmente, os sistemas NoSQL (Not Only SQL) estão se tornando populares, oferecendo recursos que os sistemas de bancos de dados relacionais não suportam. Desta forma, surgiu a necessidade dos sistemas OBDA se moldarem a estes novos tipos de bancos de dados. O objetivo desta pesquisa é propor uma arquitetura nova para sistemas OBDA possibilitando o acesso a dados em bancos de dados relacionais e bancos de dados NoSQL. Para tal, foi proposta a utilização de um mapeamento mais simples responsável pela comunicação entre ontologia e bancos de dados. Foram construídos dois protótipos de sistemas OBDA para sistemas NoSQL e sistemas de bancos de dados relacional para uma validação empírica da arquitetura proposta neste trabalho. / Ontology-based data access (OBDA) proposes to facilitate user access to data without specific knowledge of how they are stored in their sources. For this, an ontology is used as a high level conceptual layer, exploring its capacity to describe the domain and deal with the incompleteness of the data. Currently, NoSQL (Not Only SQL) systems are becoming popular, offering features that relational database systems do not support. In this way, the need arose for shaping OBDA systems to deal with these new types of databases. The objective of this research is to propose a new architecture for OBDA systems allowing access to data in relational databases and NoSQL databases. For this, we propose the use of a simpler mapping responsible for the communication between ontology and databases. Two OBDA system prototypes were constructed: one for NoSQL systems and one for relational database systems for an empirical validation.
|
74 |
MongoDB jako datové úložiště pro Google App Engine SDK / MongoDB as a Datastore for Google App Engine SDKHeller, Stanislav January 2013 (has links)
In this thesis, there are discussed use-cases of NoSQL database MongoDB implemented as a datastore for user data, which is stored by Datastore stubs in Google App Engine SDK. Existing stubs are not very well optimized for higher load; they significantly slow down application development and testing if there is a need to store larger data sets in these storages. The analysis is focused on features of MongoDB, Google App Engine NoSQL Datastore and interfaces for data manipulation in SDK - Datastore Service Stub API. As a result, there was designed and implemented new datastore stub, which is supposed to solve problems of existing stubs. New stub uses MongoDB as a database layer for storing testing data and it is fully integrated into Google App Engine SDK.
|
75 |
SQL and NoSQL databases : A CASE STUDY IN THE AZURE CLOUDMiiro, Fabian, Nääs, Mikael January 2015 (has links)
I denna rapport jämförs Azure SQL Database med NoSQL lösningen DocumentDB, under förutsättningarna givna av ett turbaserat flerspelarspel. NoSQL är en relativt ny typ av databas, som till skillnad från dess relationsbaserade kusin har nya lovord angående fantastisk prestanda och oändlig skalbarhet. Det sägs att det är skapt för dagens problem utan gårdagens begränsningar. Warmkitten vill ta reda på om löftena kring NoSQL håller i deras scenario. Eller om den nuvarande SQL lösningen är en bättre match för deras spel med tanke på skalning, snabbhet och anvädningen utav .NET. Jämförelsen gjordes genom att först skapa ett testramverk för spelets flerspelar funktioner. Sedan optimerades både den relations- och den icke-relationsbaserade lösningen, baserat på rekommendationer och experters utlåtande. Testerna kördes sedan under förhållanden som skulle återspegla verkligheten. Rapporten visar att denna nya typ utav databaser helt klart är mogna nog att i vissa scenarion användas istället för de relationsbaserade databaserna, givet att tillräckligt mycket tid ges åt att förstå skillnaderna mellan databastyperna och vad det innebär. Extra betänketid bör läggas på hur NoSQL skalar samt vilka skillnader som det innebär när relations-begränsningarna försvinner. Prestandamässigt så är DocumentDB snabbare under normal användning när det finns lugnare stunder, SQL är bättre för applikationer som behöver köra under långa stunder med hög användning utav databasen. Rapportens slutsats blir en rekommendation till Warmkitten att fortsätta med den redan utvecklade SQL lösningen. Detta då prestandaförbättringarna från NoSQL inte väger upp för den ökade transaktionskomplexiteten, då spelet använder transaktioner i hög utsträckning. / This paper was created to compare Azure SQL Database to the Azure NoSQL solution DocumentDB. This case study is done in the scope of a turn-based multiplayer game created by Warmkitten. NoSQL is a new type of databases, instead of the relational kind we are used to, this new type gives promises of increased performance and unlimited scalability. It is marketed that it is created for the problems of today and not bound by yesterday’s limitations. Warmkitten would like to gain insight if the promises from NoSQL holds for the game they are developing or if the current SQL solution fit their needs. They are most interested in the areas of scaling, performance and .NET interoperability. The comparison was carried out by creating a test suite testing the game functionalities. Then both the SQL and NoSQL solution was optimized based on best practices and expert guidelines. The tests were then run under circumstances mimicking real-world scenarios. The paper shows that NoSQL is a valid replacement to SQL, if enough time and thought is put into the implementation. Given that NoSQL is a good fit for the problem at hand. Our paper shows that performance wise NoSQL is faster under normal load when there are times with less load, SQL is better for applications that will have continuously heavy load. The final verdict of this paper is a recommendation for Warmkitten to continue using SQL for their game. This is because the recorded performance improvements does not outweighs the transactional problems seen in this case study created by the transactional nature of the game.
|
76 |
Odvozování schématu v NoSQL databázích / Schema Inference for NoSQL DatabasesVeinhardt Latták, Ivan January 2021 (has links)
NoSQL databases are becoming increasingly more popular due to their undeniable advantages in the context of storing and processing big data, mainly horizontal scala- bility and the lack of a requirement to define a data schema upfront. In the absence of explicit schema, however, an implicit schema inherent to the stored data still exists and can be inferred. Once inferred, a schema is of great value to the stakeholders and database maintainers. Nevertheless, the problem of schema inference is non-trivial and is still the subject of ongoing research. We explore the many aspects of NoSQL schema inference and data modeling, analyze a number of existing schema inference solutions in terms of their inner workings and capabilities, point out their shortcomings, and devise (1) a novel horizontally scalable approach based on the Apache Spark platform and (2) a new NoSQL Schema metamodel capable of modeling i.a. inter-entity referential relation- ships and deeply nested JSON constructs. We then experimentally evaluate the newly designed approach along with the preexisting solutions with respect to their functional and performance capabilities. 1
|
77 |
A graph database management system for a logistics-related serviceWalldén, Marcus, Özkan, Aylin January 2016 (has links)
Higher demands on database systems have lead to an increased popularity of certain database system types in some niche areas. One such niche area is graph networks, such as social networks or logistics networks. An analysis made on such networks often focus on complex relational patterns that sometimes can not be solved efficiently by traditional relational databases, which has lead to the infusion of some specialized non-relational database systems. Some of the database systems that have seen a surge in popularity in this area are graph database systems. This thesis presents a prototype of a logistics network-related service using a graph database management system called Neo4j, which currently is the most popular graph database management system in use. The logistics network covered by the service is based on existing data from PostNord, Sweden’s biggest provider of logistics solutions, and primarily focuses on customer support and business to business. By creating a prototype of the service this thesis strives to indicate some of the positive and negative aspects of a graph database system, as well as give an indication of how a service using a graph database system could be created. The results indicate that Neo4j is very intuitive and easy to use, which would make it optimal for prototyping and smaller systems, but due to the used evaluation method more research in this area would need to be carried out in order to confirm these conclusions. / Högre krav på databassystem har lett till en ökad popularitet för vissa databassystemstyper i några nischområden. Ett sådant nischområde är grafnätverk, såsomsociala nätverk eller logistiknätverk. Analyser på grafnätverk fokuserar ofta påkomplexa relationsmönster som ibland inte kan lösas effektivt av traditionella relationsdatabassystem, vilket har lett till att vissa specialiserade icke-relationella databassystem har blivit populära alternativ. Många av de populära databassystemen inom detta område är grafdatabassystem. Detta arbete presenterar en prototyp av en logistiknätverksrelaterad tjänst som använder sig av ett grafdatabashanteringssystem som heter Neo4j, vilket är det mest använda grafdatabashanteringssystemet. Logistiknätverket som täcks av tjänsten är baserad på existerande data från PostNord, Sveriges ledande leverantör av logistiklösningar, och fokuserar primärt på kundsupport och företagsrelaterad analys. Genom att skapa en prototyp av tjänsten strävar detta arbete efter att uppvisa vissa av de positiva och negativa aspekterna av ett grafdatabashanteringssystem samt att visa hur en tjänst kan skapas genom att använda ett grafdatabashanteringssystem. Resultaten indikerar att Neo4j är väldigt intuitivt och lättanvänt, vilket skulle göra den optimal för prototyping och mindre system, men på grund av den använda evalueringsmetoden så behöver mer forskning inom detta område utföras innan dessa slutsatser kan bekräftas.
|
78 |
Módulo de consultas distribuídas do Infinispan / Module that supports distributed queries in InfinispanLacerra, Israel Danilo 26 November 2012 (has links)
Com a grande quantidade de informações existentes nas aplicações computacionais hoje em dia, cada vez mais tornam-se necessários mecanismos que facilitem e aumentem o desempenho da recuperação dessas informações. Nesse contexto vem surgindo os bancos de dados chamados de NOSQL, que são bancos de dados tipicamente não relacionais que, em prol da disponibilidade e do desempenho em ambientes com enormes quantidades de dados, abrem mão de requisitos antes vistos como fundamentais. Neste trabalho iremos lidar com esse cenário ao implementar o módulo de consultas distribuídas do JBoss Infinispan, um sistema de cache distribuído que funciona também como um banco de dados NOSQL em memória. Além de apresentar a implementação desse módulo, iremos falar do surgimento do movimento NOSQL, de como se caracterizam esses bancos e de onde o Infinispan se insere nesse movimento. / With the big amount of data available to computer applications nowadays, there is an increasing need for mechanisms that facilitate the retrieval of such data and improve data access performance. In this context we see the emergence of so-called NOSQL databases, which are databases that are typically non-relational and that give up fulfilling some requirements previously seen as fundamental in order to achieve better availability and performance in big data environments. In this work we deal with the scenario above and implement a module that supports distributed queries in JBoss Infinispan, a distributed cache system that works also as an in-memory NOSQL database. Besides presenting the implementation of that module, we discuss the emergence of the NOSQL movement, the characterization of NOSQL databases, and where Infinispan fits in this context.
|
79 |
Time Series databaser för sensorsystem : En experimentell studie av prestanda för Time Series databaser för sensorsystem som grundas på: NoSQL eller RDBMS. / Time Series databases for sensor systemsWarrén, Linus, Tallkvist, Daniel January 2019 (has links)
Purpose – The purpose of this study is to recommend a database and its belonging database model which is optimized for a sensor system. There is a lack of comparisons for databases and data models for bigger sensor systems. The study also brings scientific support for whom wishes to build a sensor system like the one which is included in this paper. Method – This paper starts with a literature study, which purpose is to choose the databases and the database models to be included in the comparison. To achieve the purpose of the study, a quantitative approach has been chosen. The study follows the steps that defines an experimental study within software development according to Shari Lawrence Pfleeger. Four predefined cases are used to compare the databases and the different database models which has been obtained in the literature study. Findings – The literature study shows that Time Series DBMS is the recommended database model to use for implementing sensor systems. The findings of the study also show that TimescaleDB is the preferable database over InfluxDB in four of four predefined cases. The null hypothesis which has been admitted is rejected and the alternative hypothesis is accepted at 1% significance level. Implications – The implications of the paper is to enhance the knowledge about Time Series DBMS, specifically of TimescaleDB and InfluxDB for sensor systems. The result can be implemented and used when resembling sensor systems are created. According to the result of the experiment it is shown that TimescaleDB is better than InfluxDB for sensor systems with similar datastructure. Limitations – Two Time Series DBMS (TimescaleDB and InfluxDB) were used in the experiments in this paper. The experiments was is carried out in Azure and is limited to 10 vCPU:s that a standard account have access to. There were not many beacons available to use for creating testdata. Files with corresponding data that the beacon sends out was created to simulate beacons. Keywords – Time Series DBMS, NoSQL, RDBMS, TimescaleDB, InfluxDB, Sensor systems / Syfte – I problembeskrivningen framgår att det finns brist på vetenskapligt underlag för vilken sorts databas som är optimal att använda för ett sensorsystem. Det saknas jämförelser av prestanda mellan olika databaser och datamodeller i större sensorsystem. Studiens syfte är: ”Att rekommendera en databas och tillhörande databasmodell som är optimerad för ett sensorsystem” Metod – Studien inleds med en litteraturstudie för att genom teorin välja databas och databasmodeller som ska ingå i studien. För att uppnå syftet har en kvantitativ ansats valts. Studien följer de steg som Shari Lawrence Pfleeger definierar som en experimentell studie inom mjukvaruutveckling. Fyra fördefinierade fall används för att jämföra databaserna med olika databasmodeller som erhållits i litteraturstudien. Resultat - Litteraturstudien visar att Time Series DBMS är den databasmodell som rekommenderas att användas i ett sensorsystem. Studiens resultat visar att TimescaleDB presterar bättre än InfluxDB i fyra av fyra fördefinierade fall. Nollhypotesen som har ställts upp förkastas och en mothypotes antas vid 1% signifikansnivå. Implikationer - Studiens implikationer är att öka och fylla vissa kunskapshål kring Time Series DBMS, specifikt TimescaleDB och InfluxDB för sensorsystem. Resultatet kan tillämpas och användas när liknande sensorsystem skall implementeras. Enligt experimentets resultat visar det att TimescaleDB är bättre än InfluxDB för sensorsystem med liknande struktur. Begränsningar – Två Time Series DBMS (TimescaleDB och InfluxDB) ingår i denna studie som experimenten utfördes på. Experimenten utföres i Azure och var begränsade av de 10 vCPU:erna ett standardkonto har tillgång till att använda. Det fanns inte tillgång till ett stort antal beacons för att generera data till experimenten, så filer med motsvarande data skapades för att simulera beacons. Nyckelord - Time Series DBMS, NoSQL, RDBMS, TimescaleDB, InfluxDB, Sensorsystem
|
80 |
Modélisation NoSQL des entrepôts de données multidimensionnelles massives / Modeling Multidimensional Data Warehouses into NoSQLEl Malki, Mohammed 08 December 2016 (has links)
Les systèmes d’aide à la décision occupent une place prépondérante au sein des entreprises et des grandes organisations, pour permettre des analyses dédiées à la prise de décisions. Avec l’avènement du big data, le volume des données d’analyses atteint des tailles critiques, défiant les approches classiques d’entreposage de données, dont les solutions actuelles reposent principalement sur des bases de données R-OLAP. Avec l’apparition des grandes plateformes Web telles que Google, Facebook, Twitter, Amazon… des solutions pour gérer les mégadonnées (Big Data) ont été développées et appelées « Not Only SQL ». Ces nouvelles approches constituent une voie intéressante pour la construction des entrepôts de données multidimensionnelles capables de supporter des grandes masses de données. La remise en cause de l’approche R-OLAP nécessite de revisiter les principes de la modélisation des entrepôts de données multidimensionnelles. Dans ce manuscrit, nous avons proposé des processus d’implantation des entrepôts de données multidimensionnelles avec les modèles NoSQL. Nous avons défini quatre processus pour chacun des deux modèles NoSQL orienté colonnes et orienté documents. De plus, le contexte NoSQL rend également plus complexe le calcul efficace de pré-agrégats qui sont habituellement mis en place dans le contexte ROLAP (treillis). Nous avons élargis nos processus d’implantations pour prendre en compte la construction du treillis dans les deux modèles retenus.Comme il est difficile de choisir une seule implantation NoSQL supportant efficacement tous les traitements applicables, nous avons proposé deux processus de traductions, le premier concerne des processus intra-modèles, c’est-à-dire des règles de passage d’une implantation à une autre implantation du même modèle logique NoSQL, tandis que le second processus définit les règles de transformation d’une implantation d’un modèle logique vers une autre implantation d’un autre modèle logique. / Decision support systems occupy a large space in companies and large organizations in order to enable analyzes dedicated to decision making. With the advent of big data, the volume of analyzed data reaches critical sizes, challenging conventional approaches to data warehousing, for which current solutions are mainly based on R-OLAP databases. With the emergence of major Web platforms such as Google, Facebook, Twitter, Amazon...etc, many solutions to process big data are developed and called "Not Only SQL". These new approaches are an interesting attempt to build multidimensional data warehouse capable of handling large volumes of data. The questioning of the R-OLAP approach requires revisiting the principles of modeling multidimensional data warehouses.In this manuscript, we proposed implementation processes of multidimensional data warehouses with NoSQL models. We defined four processes for each model; an oriented NoSQL column model and an oriented documents model. Each of these processes fosters a specific treatment. Moreover, the NoSQL context adds complexity to the computation of effective pre-aggregates that are typically set up within the ROLAP context (lattice). We have enlarged our implementations processes to take into account the construction of the lattice in both detained models.As it is difficult to choose a single NoSQL implementation that supports effectively all the applicable treatments, we proposed two translation processes. While the first one concerns intra-models processes, i.e., pass rules from an implementation to another of the same NoSQL logic model, the second process defines the transformation rules of a logic model implementation to another implementation on another logic model.
|
Page generated in 0.0317 seconds