91 |
Analys och jämförelse av relationsdatabaser vid behandling av spatiala data : En studie kring prestanda hos relationsdatabaser / Analysis and comparison of relational databases when processing spatial data : A study on the performance of relational databasesKarlsson, David January 2023 (has links)
Det finns en stor mängd databaser som används inom många olika sorters användningsområden. Bland dessa finns det sådana som har funktion för att behandla spatiala data. Problemet som detta medför är att välja en databas som kan hantera en viss tänkt typ av spatiala data med bäst prestanda. Denna rapport presenterar en utredning för detta utifrån ett dataset som erhållits från Norconsult Digital. Bland de databaser som valts finns tre SQL databaser (PostgreSQL, MySQL och SQLite) och en NoSQL databas (MongoDB). Dessa databaser genomgick fem likvärdiga operationer/tester som resulterade i att PostgreSQL med dess GiST/SP-GiST index och MongoDB presterade på en nivå långt över resterande databaser som testades. Utifrån detta arbete kan det konstateras att fler utförliga prestandatester bör utföras, där större och mer komplexa dataset, samt fler alternativ till databaser och spatiala index bör finnas med. Detta för att ge en bättre bild över vilka databaser, med stöd för spatiala data, som presterar bättre. / There are a large number of databases that are used in many different areas. Among these, some have a function for processing spatial data. The problem that this entails is the choice of a database that can handle a certain type of spatial data with the best possible performance. This report presents an analysis of this based on a dataset obtained from Norconsult Digital. Among the chosen databases are three SQL databases (PostgreSQL, MySQL and SQLite) and one NoSQL database (MongoDB). These databases underwent five identical operations/tests resulting in PostgreSQL with its GiST/SP-GiST index and MongoDB performing at a level well above the rest of the databases tested. Based on this work, it can be concluded that more detailed performance tests should be carried out, where larger and more complex datasets, as well as more alternatives to databases and spatial indexes, should be included. This is to give a better picture of which databases, with support for spatial data, perform better.
|
92 |
Cauldron: A Scalable Domain Specific Database for Product DataOttosson, Love January 2017 (has links)
This project investigated how NoSQL databases can be used together with a logical layer, instead of a relational database with separated backend logic, to search for products with customer specific constraints in an e-commerce scenario. The motivation behind moving from a relational database was the scalability issues and increased read latencies experienced as the data increased. The work resulted in a framework called Cauldron that uses pipelines a sequence of execution steps to expose its data stored in an in-memory key-value store and a document database. Cauldron uses write replication between distributed instances to increase read throughput at the cost of write latency. A product database with customer specific constraints was implemented using Cauldron to compare it against an existing solution based on a relational database. The new product database can serve search queries 10 times faster in the general case and up to 25 times faster in extreme cases compared to the existing solution. / Projektet undersökte hur NoSQL databaser tillsammans med ett logiskt lager, istället för en relationsdatabas med separat backend logik, kan användas för att söka på produkter med kundunika restriktioner. Motivationen till att byta ut relationsdatabasen berodde på skalbarhetsproblem och långsammare svarstider när datamängden ökade. Arbetet resulterade i ett ramverk vid namn Cauldron som använder pipelines sammankopplade logiska steg för att exponera sin data från en minnesbunden nyckel-värde-databas och en dokumentdatabas. Cauldron använder replikering mellan distribuerade instanser för att öka läsgenomstömmningen på bekostnad av högre skrivlatenser. En produktdatabas med kundunika restriktioner implementerades med hjälp av Cauldron för att jämföra den mot en befintlig lösning baserad på en relationsdatabas. Den nya databasen kan besvara sökförfrågningar 10 gånger snabbare i normalfallen och upp till 25 gånger snabbare i extremfallen jämfört med den befintliga lösningen.
|
93 |
Implementation of healthcare web service and its integration into OutlookHoe Oh, Chee, Larsson, Ludvig January 2022 (has links)
The healthcare sector is still using paper transcripts for daily tasks that could very well be managed with digital solutions. One of the areas that can utilize digitization are health evaluation forms that patients fill out. This thesis addresses the implementation och the integration of a healthcare service. The product requires that the service’s back-end and database is further developed for later integration into Microsoft Outlook with an Office add-in. In addition, it provides a review on Office add-ins and the ideal API authorization method for the system. An agile methodology is adopted for project management and the chosen software development method is use case driven. To summarize the results, it can be determined that the use cases are successfully implemented and that relevant experience is collected for answering the various aspects regarding Office add-ins. The existing API authorization method was incorrectly implemented, but despite that, the API keys method is deemed suitable for the system. / Vårdsektorn använder sig fortfarande av pappersutskrifter i det dagligt arbetet som mycket väl kan hanteras med digitala lösningar. Ett område som kan ha användning av digitalisering är de självskattningsformulär som patienter fyller i. Detta examensarbete har för avsikt att behandla implementation och integration av en vårdtjänst. Implementationen kräver vidareutveckling av tjänstens backend och databas för att sedan kunna integrera den till Microsoft Outlook med ett Office add-in. Dessutom presenteras en recension av Office add-ins och den ideala methoden för att implementera API auktorisering i systemet. En agil metodik används för projekthantering och den valda mjukvaruutvecklingenmetoden är användarfallsdriven. För att summera resultaten kan det fastställas att samtliga användarfall är implementerade och att relevant erfarenhet är insamlad för att besvara rescensionsaspekterna kring Office add-ins. Den existerande metoden för API auktorisering var felaktigt implementerad, men trots det är det bedömt att API nycklar är lämpligt att använda i systemet.
|
94 |
Compactions in Apache Cassandra : Performance Analysis of Compaction Strategies in Apache CassandraKona, Srinand January 2016 (has links)
Context: The global communication system is in a tremendous growth, leading to wide range of data generation. The Telecom operators in various Telecom Industries, that generate large amount of data has a need to manage these data efficiently. As the technology involved in the database management systems is increasing, there is a remarkable growth of NoSQL databases in the 20th century. Apache Cassandra is an advanced NoSQL database system, which is popular for handling semi-structured and unstructured format of Big Data. Cassandra has an effective way of compressing data by using different compaction strategies. This research is focused on analyzing the performances of different compaction strategies in different use cases for default Cassandra stress model. The analysis can suggest better usage of compaction strategies in Cassandra, for a write heavy workload. Objectives: In this study, we investigate the appropriate performance metrics to evaluate the performance of compaction strategies. We provide the detailed analysis of Size Tiered Compaction Strategy, Date Tiered Compaction Strategy, and Leveled Compaction Strategy for a write heavy (90/10) work load, using default cassandra stress tool. Methods: A detailed literature research has been conducted to study the NoSQL databases, and the working of different compaction strategies in Apache Cassandra. The performances metrics are considered by the understanding of the literature research conducted, and considering the opinions of supervisors and Ericsson’s Apache Cassandra team. Two different tools were developed for collecting the performances of the considered metrics. The first tool was developed using Jython scripting language to collect the cassandra metrics, and the second tool was developed using python scripting language to collect the Operating System metrics. The graphs have been generated in Microsoft Excel, using the values obtained from the scripts. Results: Date Tiered Compaction Strategy and Size Tiered Compaction strategy showed more or less similar behaviour during the stress tests conducted. Level Tiered Compaction strategy has showed some remarkable results that effected the system performance, as compared to date tiered compaction and size tiered compaction strategies. Date tiered compaction strategy does not perform well for default cassandra stress model. Size tiered compaction can be preferred for default cassandra stress model, but not considerable for big data. Conclusions: With a detailed analysis and logical comparison of metrics, we finally conclude that Level Tiered Compaction Strategy performs better for a write heavy (90/10) workload while using default cassandra stress model, as compared to size tiered compaction and date tiered compaction strategies.
|
95 |
An artefact to analyse unstructured document data stores / by André Romeo BotesBotes, André Romeo January 2014 (has links)
Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014
|
96 |
Automatické generování umělých XML dokumentů / Automatic Generation of Synthetic XML DocumentsBetík, Roman January 2015 (has links)
The aim of this thesis is to research the current possibilities and limitations of automatic generation of synthetic XML and JSON documents used in the area of Big Data. The first part of the work discusses the properties of the most used XML data generators, Big Data and JSON generators and compares them. The next part of the thesis proposes an algorithm for data generation of semistructured data. The main focus of the algorithm is on the parallel execution of the generation process while preserving the ability to control the contents of the generated documents. The data generator can also use samples of real data in the generation of the synthetic data and is also capable of automatic creation of simple references between JSON documents. The last part of the thesis provides the results of experiments with the data generator exploited for the purpose of testing database MongoDB, describes its added value and compares it to other solutions. Powered by TCPDF (www.tcpdf.org)
|
97 |
NoSQL Database Selection Focused on Performance Criteria for Web-driven ApplicationsKharboutli, Zacky January 2019 (has links)
This paper delivers a comparative analysis of the performance of three of the NoSQL technologies in Web applications. These technologies are graph stores, key-value stores, and document stores. The study aims to assist developers and organizationsin picking the suitable NoSQL solution for their application. For this purpose, three identical e-book applications were developed. Each of these is connected to adatabase from the selected technologies to examine how they perform compared toeach other against various performance measures.
|
98 |
NoSQL: a análise da modelagem e consistência dos dados na era do Big DataRodrigues, Wagner Braz 19 October 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-11-14T11:11:11Z
No. of bitstreams: 1
Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5) / Made available in DSpace on 2017-11-14T11:11:12Z (GMT). No. of bitstreams: 1
Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5)
Previous issue date: 2017-10-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The new storage models, known as NoSQL, arise to solve current data issues, defined by the properties volume, velocity and variety (3 V’s) established in the Big Data concept. These new storage models develop with the support of distributed computing and horizontal scalability, which allows the processing of the big amount of data necessary to the Big Data 3 V’s. In this thesis was used as theoretical framework the relational model, introducing its solutions and troubles. The relational model allowed the use of structures in secondary memory in a persistent way. Its modeling establishes rules to the creation of a solid data model, using mathematics concepts and tangible representation to the human interpretation. The properties defined by the transactional model ACID, implemented in the relational SGBDs brings assurance consistency of the storaged data. The use of the relational model distanced the transient structures in primary memory, used in execution time by software applications and those persisted in secondary memory, an effect known as impedance mismatch. The new models presented by the categories of the NoSQL, bring transient structures previously used in primary memory. The use of distributed computing presents the possibility of the transaction and storage of the data for several computers, known as nodes, present in clusters. Distributed computing increases availability and decreases the likelihood of system failures. However, its use brings inconsistency to the data, according to the properties defined by the CAP Theorem (FOX; BREWER, 1999). This study was carried out on behalf of a bibliographic review, analyzing primarily the needs, which led to the relational model creation. Later, we establish the state of the theoretical and techniques art that involves the NoSQL and the distributed data processing system, just as the different categories introduced by it. An adequate tool were chosen and analyzed from each NoSQL category, for the proper understanding about your structure, metadata and operations. Aside from establish the state of art regarding NoSQL, we demonstrate how the transient and persistent data structures rapprochement becomes possible due to the current machine advances, such as the possibilities to the consistency effect processing, outlined by CAP Theorem / Os novos modelos de armazenamento de dados, conhecidos como NoSQL (Not Only SQL), surgem para solucionar as problemáticas de dados atuais, definidas pelas propriedades volume, velocidade e variedade (3 V’s) presentes no conceito do Big Data. Esses novos modelos de armazenamento se desenvolvem com o suporte da computação distribuída e “escalabilidade horizontal”, o que possibilita o tratamento do grande volume de dados necessários para os V’s do Big Data. Nesta dissertação é utilizado como referencial teórico o modelo relacional, apresentando suas soluções e problemas. O modelo relacional possibilitou a persistência de estruturas de dados, em memória secundária não volátil. Sua modelagem estabelece regras para a criação de um modelo de dados fundamentado, utilizando conceitos de lógica formal e representação compreensível à interpretação humana. As propriedades definidas pelo modelo transacional ACID (Atomicity, Consistency, Isolation, Durability), utilizado em SGBDs (Sistema Gerenciador de Bando de Dados) relacionais, garantem que os dados transacionados serão “persistidos” de maneira consistente na base de dados. O emprego do modelo relacional distanciou as estruturas transientes em memória primária, utilizadas em tempo de execução por aplicações de software e as persistidas em memória secundária, efeito conhecido como “incompatibilidade de impedância”. Os novos modelos apresentados pelas categorias apresentadas no NoSQL trazem estruturas transientes anteriormente utilizadas em memória primária. Contudo, abrem mão da forte estruturação, apresentada pelo modelo relacional. A utilização da computação distribuída apresenta a possibilidade da realização de transações e armazenamento dos dados para vários computadores, conhecidos como nós, presentes em cluster. Esse conceito conhecido como tolerância a partição, aumenta a disponibilidade e diminui a possibilidade de falhas em um sistema. No entanto, sua utilização, traz inconsistência aos dados, conforme as propriedades definidas pelo Teorema CAP (FOX; BREWER, 1999). Este trabalho foi realizado através de revisão bibliográfica, analisando primeiramente as necessidades que levaram à criação do modelo relacional. Posteriormente, estabelecemos o estado da arte das teorias e técnicas que envolvem o NoSQL e o tratamento de dados em sistemas distribuídos, bem como as diferentes categorias apresentadas por ele. Foram escolhidas e
analisadas uma ferramenta pertencente a cada categoria de NoSQL para o entendimento de duas estruturas, metamodelos e operações. Além de estabelecer o estado da arte referente ao NoSQL, demonstramos como a reaproximação das estruturas transientes e persistentes se torna possível dado os avanços de máquina atuais, que possibilitaram avanços computacionais, assim como as possibilidades para o tratamento dos efeitos na consistência, demonstrados pelo Teorema CAP
|
99 |
Organização e armazenamento de imagens multitemporais georreferenciadas para suporte ao processo de detecção de mudanças /Souza, Luiz Eduardo Christovam de January 2018 (has links)
Orientador: Maria de Lourdes Bueno Trindade Galo / Resumo: Atualmente o volume de dados produzidos tem atingido patamares nunca imaginados, sobretudo em decorrência da multiplicação do número de sensores e da popularização da internet, com a web 2.0 e as redes sociais. Dentre os diversos tipos de sensores existentes, os de imageamento, transportados principalmente por satélites, produzem vastos conjuntos de observações da superfície da Terra. A observação contínua da Terra por satélites possibilita o monitoramento de mudanças no uso e cobertura da terra. Contudo, em diversas pesquisas relacionadas a mudanças no planeta, são utilizados apenas pequenos fragmentos do imenso conjunto de dados existente, essencialmente devido a ainda haver uma lacuna científicatecnológica relacionada aos procedimentos de organização, armazenamento, análise e representação de grandes conjuntos de dados. Portanto, nessa pesquisa foi definida uma estrutura para organização, armazenamento e recuperação de dados espaço-temporais, com o propósito de fornecer suporte a detecção de mudanças na cobertura da terra. Para tanto, foi definida como aplicação a análise de séries temporais de Normalized Difference Vegetation Index (NDVI) derivadas de imagens adquiridas desde 1984 até 2017, pelos sensores Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+) e Operational Land Imager (OLI) para a região de Porto Velho, Rondônia. Foi construída uma série temporal de NDVI para a posição de cada pixel presente na área de estudo. Regiões de referência foram definidas par... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Nowadays the size of datasets has been reaching levels never seen before, mainly due to new sensors and the widespread of the internet, with web 2.0 and social media. Among the various types of sensors, the imaging sensors, mainly carried by satellites, have produced big Earth observations datasets. The regular Earth observation by satellites enable to monitor Land Use/Cover Change (LUCC). However, in many researches related to LUCC, only small parts of the big Earth Observation datasets are normally used, because there is still a scientifictechnological gap related to the organization, storage, analysis and representation of big Earth Observations data. Therefore, in this research was defined a database for the organization, storage and retrieval of spatio-temporal data, to support a LUCC task. Therefore, the time series analysis of Normalized Difference Vegetation Index (NDVI) of images acquired from 1984 to 2017 by Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+) and Operational Land Imager (OLI) for the region of Porto Velho, Rondônia was defined as the application. To the position each of the pixel in the study area was built a NDVI time series. Reference areas were defined to retrieve reference time series that describe the land cover types and the change classes (anthropic and natural). The Fast Dynamic Time Warping (FastDTW) algorithm was used to measure the similarity between the time series, to be classified and reference ones. To find the time series clas... (Complete abstract click electronic access below) / Mestre
|
100 |
Improving the Performance of the Eiffel Event Persistence Solution / 提高EIFFEL事件持久性解决方案的性能Hellenberg, Rickard January 2019 (has links)
Deciding which database management system (DBMS) to use has perhaps never been harder. In recent years there has been an explosive growth of new types of database management systems that address different issues and performs well for different scenarios. This thesis is an improving case study of an Event Persistence Solution for the Eiffel Framework, which is a framework used for achieving traceability in very-large-scale systems development. The purpose of this thesis is to investigate whether it is possible to improve the performance of the Eiffel Event Persistence Solution by changing from MongoDB, to Elasticsearch or ArangoDB. Experiments were conducted to measure the request throughput for 4 types of requests. As a prerequisite to measuring the performance, support for the different DBMSs and the possibility to change between them was implemented. The results showed that Elasticsearch performed better than MongoDB in terms of nested-document-search as well as for graph-traversal operations. ArangoDB had even better performance for graph-traversal operations but had an inadequate performance for nested-document-search. / 决定使用哪个数据库管理系统(DBMS)可能从未如此困难过。近年来,新型数据库管理系统呈现爆炸式增长,它们解决了不同的问题,并在不同的情境中表现出优异性能。本论文是针对Eiffel框架的事件持久性解决方案的改进案例研究,该框架被用于实现超大规模系统开发中的可追溯性。本文的目的是研究是否可以通过摒弃MongoDB并改用Elasticsearch或ArangoDB来提高Eiffel事件持久性解决方案的性能。为测量4种类型的请求的请求吞吐量进行了实验。作为衡量性能的前提条件,实施了对不同数据库管理系统(可在这些系统之间进行更换)的支持。结果表明,Elasticsearch在嵌套文档搜索和图形遍历操作方面的性能均优于MongoDB。 ArangoDB在图形遍历操作方面具有比前者更好的性能,但在嵌套文档搜索方面的性能不佳。
|
Page generated in 0.0286 seconds