Global ETD Search

21	A Comparative Study of Databases for Storing Sensor Data Fjällid, Jimmy January 2019 (has links) More than 800 Zettabytes of data is predicted to be generated per year by the Internet of Things by 2021. Storing this data necessitates highly scalable databases. Many different data storage solutions exist that specialize in specific use cases, and designing a system to accept arbitrary sensor data while remaining scalable presents a challenge.The problem was approached through a comparative study of six common databases, inspecting documented features and evaluations, followed by the construction of a prototype system. Elasticsearch was found to be the best suited data storage system for the specific use case presented in this report, and a flexible prototype system was designed. No single database was determined to be best suited for sensor data in general, but with more specific requirements and knowledge of future use, a decision could be made. / Över 800 Zettabytes av data är förutspått att genereras av Sakernas Internet vid år 2021. Lagring av denna data gör det nödvändigt med synnerligen skalbara databaser. Det finns många olika datalagringslösningar som specialiserar sig på specifika användningsområden, och att designa ett system som ska kunna ta emot godtycklig sensordata och samtidigt vara skalbar är en utmaning. Problemet angreps genom en jämförande studie av sex populära databaser som jämfördes utifrån dokumenterad funktionalitet och fristående utvärderingar. Detta följdes av utvecklingen ut av ett prototypsystem. Elasticsearch bedömdes vara bäst lämpad för det specifika användningsområde som presenteras i denna rapport, och ett flexibelt prototypsystem utvecklades. Inte en enda databas bedömdes vara bäst lämpad för att hantera sensordata i allmänhet, men med mer specifika krav och vetskap om framtida användning kan en databas väljas ut. IoT NoSQL Computer and Information Sciences Data- och informationsvetenskap
22	Integrando banco de dados relacional e orientado a grafos para otimizar consultas com alto grau de indireção / Integrating relational and graph-oriented database to optimize queries with high degree of indirection Catarino, Marino Hilario 10 November 2017 (has links) Um indicador importante na área acadêmica está relacionado ao grau de impacto de uma publicação, o que pode auxiliar na avaliação da qualidade e do grau de internacionalização de uma instituição. Para melhor delimitar esse indicador torna-se necessária a realização de uma análise das redes de colaboração dos autores envolvidos. Considerando que o modelo de dados relacional é o modelo predominante dos bancos de dados atuais, observa-se que a análise das redes de colaboração é prejudicada pelo fato desse modelo não atender, com o mesmo desempenho, a todos os tipos de consultas realizadas. Uma alternativa para executar as consultas que perdem desempenho no modelo de banco de dados relacional é a utilização do modelo de banco de dados orientado a grafos. Porém, não é claro quais parâmetros podem ser utilizados para definir quando utilizar cada um dos modelos de bancos de dados. Assim, este trabalho tem como objetivo fazer uma análise de consultas que, a partir da sintaxe da consulta e do ambiente de execução, possa apontar o modelo de dados mais adequado para execução da referida consulta. Com essa análise, é possível delimitar em que cenários uma integração entre o modelo relacional e o orientado a grafos é mais adequada. / An important indicator in the academic area is related to the degree of impact of a publication that can help in evaluating the quality and degree of internationalization in academic institutions. One approach to better understand the aforementioned indicator is analyzing the collaboration network formed by each researcher. In order to analyze this network, several alternatives use the well known relational data model which is predominant in most databases used today. Even though this model is widely used, it has a performance drawback when some types of queries are performed. For overcoming this drawback, certain alternatives are using a graph-oriented database model which is similar to a collaboration network model. However, it is unclear what parameters can be used to define when to use a relational or graph-oriented model. In this work, we propose an analysis of queries that, from the syntax of a query and the execution environment, can point to the most suitable data model for the execution given a specific query. With this query analysis, it is possible to delimit in which scenarios an integration between the relational and the graph-oriented models is more appropriate. Banco de dados orientado a grafos Collaboration networks Graph database Integração Integration Internacionalização Internationalization NoSQL NoSQL Redes de colaboração
23	Classificação de Imagem Orbital Rapideye utilizando banco de dados NOSQL e método GEOBIA RIBEIRO, Evelaine Berger 20 June 2017 (has links) Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2017-08-17T17:52:19Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) EvelaineBergerRibeiro.pdf: 3485617 bytes, checksum: 18e9091f3505473cbeee5bffb65a1467 (MD5) / Made available in DSpace on 2017-08-17T17:52:19Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) EvelaineBergerRibeiro.pdf: 3485617 bytes, checksum: 18e9091f3505473cbeee5bffb65a1467 (MD5) Previous issue date: 2017-06-20 / Com as informações adquiridas das imagens capturadas pelo Sensoriamento Remoto e das técnicas disponíveis nos Sistemas de Informação Geográfica pode-se gerar mapeamentos temáticos para uso e cobertura do solo. Para isso, é realizada a classificação de imagens para definir as classes de interesse. Essa classificação pode ser feita pixel a pixel ou por regiões. Em imagens de alta resolução, como a Rapideye, é indicada a classificação por regiões. Esse método considera as informações do pixel e sua vizinhança, agrupando pixels com características semelhantes, formando as regiões. Portanto, recomenda-se aplicar o método da segmentação pela GEOBIA, que segmenta a imagem em regiões, visando extrair características espaciais, espectrais e de textura. Como resultado desse método, têm-se o vetor de regiões e o banco de dados relacional com os atributos (espaciais, espectrais e de textura). O presente trabalho teve como objetivo obter a classificação do uso e cobertura do solo da imagem Rapideye com banco de dados NoSQL orientado a grafos para análise dos atributos extraídos mediante a GEOBIA. A metodologia desenvolvida utilizou a Análise Multivariada para analisar os atributos resultantes da segmentação. Por meio do dendrograma foi possível a separação dos grupos de atributos (espaciais, espectrais e de textura), que foram utilizados para as consultas de busca por agrupamentos de regiões com características semelhantes no grafo formado pelo banco de dados NoSQL. As regiões foram classificadas de acordo com as classes de interesse definidas no processo de fotointerpretação, gerando a imagem classificada. Para validar o resultado, realizou-se a classificação da imagem da área de estudo pelos algoritmos Distância Mínima, Máxima Verossimilhança e KNN e a matriz de confusão. O algoritmo KNN apresentou melhor classificação, com índice kappa de 0,77 e então foi utilizada para comparação com a imagem classificada pelo banco de dados NoSQL, por meio da tabulação cruzada. O cruzamento dos dados mostrou que a imagem classificada pelo banco de dados NoSQL obteve resultados positivos. Conclui-se que a pesquisa alcançou os objetivos propostos apresentando resultados satisfatórios para o método desenvolvido para classificação do uso e cobertura do solo. / The information from images captured by Remote Sensing and the techniques available in the Geographic Information Systems, it is possible to generate thematic mappings for use and land cover. For this, the classification of images is realized to define interest classes. This classification can be done pixel by pixel or by regions. In high resolution images, such as Rapideye, classification by region is indicated. This method considers the information of the pixel and its neighborhood, grouping pixels with similar characteristics create the regions. Therefore, it is recommended to apply the GEOBIA segmentation method, which segments the image in regions to extract spatial, spectral and texture characteristics. As a result of this method, have the region vector and the relational database with the attributes (spatial, spectral and texture). The objective of this work was to obtain the classification of the use and coverage of the soil of the Rapideye image using the NoSQL database oriented to graphs to analyze the attributes extracted through GEOBIA. The developed methodology used the Multivariate Analysis to analyze the attributes resulting from the segmentation. The dendrogram it was possible to separate the groups of attributes (spatial, spectral and texture), which were used for the search queries by groupings of regions with similar characteristics in the graph formed by the NoSQL database. The regions were classified according to the interest classes defined in the photointerpretation process, generating the classified image. To validate the result, the image area of the study area was classified by the Minimum Distance, Maximum Likelihood and KNN algorithms and the confusion matrix. The KNN algorithm presented better classification, with a kappa index of 0.77 and was then used for comparison with the image classified by the NoSQL database, through cross tabulation. The cross-validation of the data showed that the image classified by the NoSQL database obtained positive results. It was concluded that the research reached the proposed objectives presenting satisfactory results for the method developed for classification of land use and land cover. Classificação Digital GEOBIA NoSQL Digital Classification GEOBIA NoSQL
24	Um método para paralelização automática de workflows intensivos em dados / A method for automatic paralelization of data-intensive workflows Watanabe, Elaine Naomi 22 May 2017 (has links) A análise de dados em grande escala é um dos grandes desafios computacionais atuais e está presente não somente em áreas da ciência moderna mas também nos setores público e industrial. Nesses cenários, o processamento dos dados geralmente é modelado como um conjunto de atividades interligadas por meio de fluxos de dados os workflows. Devido ao alto custo computacional, diversas estratégias já foram propostas para melhorar a eficiência da execução de workflows intensivos em dados, tais como o agrupamento de atividades para minimizar as transferências de dados e a paralelização do processamento, de modo que duas ou mais atividades sejam executadas ao mesmo tempo em diferentes recursos computacionais. O paralelismo nesse caso é definido pela estrutura descrita em seu modelo de composição de atividades. Em geral, os Sistemas de Gerenciamento de Workflows, responsáveis pela coordenação e execução dessas atividades em um ambiente distribuído, desconhecem o tipo de processamento a ser realizado e por isso não são capazes de explorar automaticamente estratégias para execução paralela. As atividades paralelizáveis são definidas pelo usuário em tempo de projeto e criar uma estrutura que faça uso eficiente de um ambiente distribuído não é uma tarefa trivial. Este trabalho tem como objetivo prover execuções mais eficientes de workflows intensivos em dados e propõe para isso um método para a paralelização automática dessas aplicações, voltado para usuários não-especialistas em computação de alto desempenho. Este método define nove anotações semânticas para caracterizar a forma como os dados são acessados e consumidos pelas atividades e, assim, levando em conta os recursos computacionais disponíveis para a execução, criar automaticamente estratégias que explorem o paralelismo de dados. O método proposto gera réplicas das atividades anotadas e define também um esquema de indexação e distribuição dos dados do workflow que possibilita maior acesso paralelo. Avaliou-se sua eficiência em dois modelos de workflows com dados reais, executados na plataforma de nuvem da Amazon. Usou-se um SGBD relacional (PostgreSQL) e um NoSQL (MongoDB) para o gerenciamento de até 20,5 milhões de objetos de dados em 21 cenários com diferentes configurações de particionamento e replicação de dados. Os resultados obtidos mostraram que a paralelização da execução das atividades promovida pelo método reduziu o tempo de execução do workflow em até 66,6% sem aumentar o seu custo monetário. / The analysis of large-scale datasets is one of the major current computational challenges and it is present not only in fields of modern science domain but also in the industry and public sector. In these scenarios, the data processing is usually modeled as a set of activities interconnected through data flows as known as workflows. Due to their high computational cost, several strategies were proposed to improve the efficiency of data-intensive workflows, such as activities clustering to minimize data transfers and parallelization of data processing for reducing makespan, in which two or more activities are performed at same time on different computational resources. The parallelism, in this case, is defined in the structure of the workflows model of activities composition. In general, Workflow Management Systems are responsible for the coordination and execution of these activities in a distributed environment. However, they are not aware of the type of processing that will be performed by each one of them. Thus, they are not able to automatically explore strategies for parallel execution. Parallelizable activities are defined by user at workflow design time and creating a structure that makes an efficient use of a distributed environment is not a trivial task. This work aims to provide more efficient executions for data intensive workflows and, for that, proposes a method for automatic parallelization of these applications, focusing on users who are not specialists in high performance computing. This method defines nine semantic annotations to characterize how data is accessed and consumed by activities and thus, taking into account the available computational resources, automatically creates strategies that explore data parallelism. The proposed method generates replicas of annotated activities. It also defines a workflow data indexing and distribution scheme that allows greater parallel access. Its efficiency was evaluated in two workflow models with real data, executed in Amazon cloud platform. A relational (PostgreSQL) and a NoSQL (MongoDB) DBMS were used to manage up to 20.5 million of data objects in 21 scenarios with different partitioning and data replication settings. The experiments have shown that the parallelization of the execution of the activities promoted by the method resulted in a reduction of up to 66.6 % in the workflows makespan without increasing its monetary cost. Data Parallelism Data-intensive Workflows NoSQL NoSQL Paralelismo de Dados Workflows Intensivos em Dados
25	Um método para paralelização automática de workflows intensivos em dados / A method for automatic paralelization of data-intensive workflows Elaine Naomi Watanabe 22 May 2017 (has links) A análise de dados em grande escala é um dos grandes desafios computacionais atuais e está presente não somente em áreas da ciência moderna mas também nos setores público e industrial. Nesses cenários, o processamento dos dados geralmente é modelado como um conjunto de atividades interligadas por meio de fluxos de dados os workflows. Devido ao alto custo computacional, diversas estratégias já foram propostas para melhorar a eficiência da execução de workflows intensivos em dados, tais como o agrupamento de atividades para minimizar as transferências de dados e a paralelização do processamento, de modo que duas ou mais atividades sejam executadas ao mesmo tempo em diferentes recursos computacionais. O paralelismo nesse caso é definido pela estrutura descrita em seu modelo de composição de atividades. Em geral, os Sistemas de Gerenciamento de Workflows, responsáveis pela coordenação e execução dessas atividades em um ambiente distribuído, desconhecem o tipo de processamento a ser realizado e por isso não são capazes de explorar automaticamente estratégias para execução paralela. As atividades paralelizáveis são definidas pelo usuário em tempo de projeto e criar uma estrutura que faça uso eficiente de um ambiente distribuído não é uma tarefa trivial. Este trabalho tem como objetivo prover execuções mais eficientes de workflows intensivos em dados e propõe para isso um método para a paralelização automática dessas aplicações, voltado para usuários não-especialistas em computação de alto desempenho. Este método define nove anotações semânticas para caracterizar a forma como os dados são acessados e consumidos pelas atividades e, assim, levando em conta os recursos computacionais disponíveis para a execução, criar automaticamente estratégias que explorem o paralelismo de dados. O método proposto gera réplicas das atividades anotadas e define também um esquema de indexação e distribuição dos dados do workflow que possibilita maior acesso paralelo. Avaliou-se sua eficiência em dois modelos de workflows com dados reais, executados na plataforma de nuvem da Amazon. Usou-se um SGBD relacional (PostgreSQL) e um NoSQL (MongoDB) para o gerenciamento de até 20,5 milhões de objetos de dados em 21 cenários com diferentes configurações de particionamento e replicação de dados. Os resultados obtidos mostraram que a paralelização da execução das atividades promovida pelo método reduziu o tempo de execução do workflow em até 66,6% sem aumentar o seu custo monetário. / The analysis of large-scale datasets is one of the major current computational challenges and it is present not only in fields of modern science domain but also in the industry and public sector. In these scenarios, the data processing is usually modeled as a set of activities interconnected through data flows as known as workflows. Due to their high computational cost, several strategies were proposed to improve the efficiency of data-intensive workflows, such as activities clustering to minimize data transfers and parallelization of data processing for reducing makespan, in which two or more activities are performed at same time on different computational resources. The parallelism, in this case, is defined in the structure of the workflows model of activities composition. In general, Workflow Management Systems are responsible for the coordination and execution of these activities in a distributed environment. However, they are not aware of the type of processing that will be performed by each one of them. Thus, they are not able to automatically explore strategies for parallel execution. Parallelizable activities are defined by user at workflow design time and creating a structure that makes an efficient use of a distributed environment is not a trivial task. This work aims to provide more efficient executions for data intensive workflows and, for that, proposes a method for automatic parallelization of these applications, focusing on users who are not specialists in high performance computing. This method defines nine semantic annotations to characterize how data is accessed and consumed by activities and thus, taking into account the available computational resources, automatically creates strategies that explore data parallelism. The proposed method generates replicas of annotated activities. It also defines a workflow data indexing and distribution scheme that allows greater parallel access. Its efficiency was evaluated in two workflow models with real data, executed in Amazon cloud platform. A relational (PostgreSQL) and a NoSQL (MongoDB) DBMS were used to manage up to 20.5 million of data objects in 21 scenarios with different partitioning and data replication settings. The experiments have shown that the parallelization of the execution of the activities promoted by the method resulted in a reduction of up to 66.6 % in the workflows makespan without increasing its monetary cost. NoSQL Paralelismo de Dados Workflows Intensivos em Dados Data Parallelism Data-intensive Workflows NoSQL
26	MC.d.o.t : Motion capture data och dess tillgänglighet / Motion capture : Data and its availability Larsson, Albin January 2014 (has links) Hårdvara kan bli gammal, program kan sluta utvecklas, filer som skapats från sådan hårdvara respektive mjukvara kan bli oanvändbara med tiden. Samt att hålla ordning på många individuella filer kan i längden bli jobbigt för användare. Med en databasorienterad lagrinsgslösning kan olika API:er användas för att göra data kompatibel med flera olika verktyg och program, samt att det kan användas för att skapa en centraliserad lösning för att enkelt hålla ordning på information. Bland databaser finns det två primära grupperingar: SQL och NoSQL. Detta arbete ämnar undersöka vilken typ som passar för att hantera motion capture data. Tester har utförts på SQLs MySQL och NoSQLs Neo4j. Neo4j som är specialiserad för att hantera data som motion capture data. Resultatet från testningarna är förvånande nog att MySQL hanterar motion capture data bättre än Neo4j. Ytterligare arbeten för att undersöka fler varianter av databaser för en mer komplett bild föreslås. NoSQL SQL Motion capture Availability NoSQL SQL Motion capture Tillgänglighet Computer Sciences Datavetenskap (datalogi)
27	Srovnání distribuovaných "No-SQL" databází s důrazem na výkon a škálovatelnost / Comparison of Distributed "No-SQL" Databases with an Emphasis on Performance and Scalability Petera, Martin January 2014 (has links) This thesis deals with NoSQL database performance issue. The aim of the paper is to compare most common prototypes of distributed database systems with emphasis on performance and scalability. Yahoo! Cloud Serving Benchmark (YCSB) is used to accomplish the aforementioned aim. The YCSB tool allows performance testing through performance indicators like throughput or response time. It is followed by a thorough explanation of how to work with this tool, which gives readers an opportunity to test performance or do a performance comparison of other distributed database systems than of those described in this thesis. It also helps readers to be able to create testing environment and apply the testing method which has been listed in this thesis should they need it. This paper can be used as a help when making an arduous choice for a specific system from a wide variety of NoSQL database systems for intended solution.
28	Integrando banco de dados relacional e orientado a grafos para otimizar consultas com alto grau de indireção / Integrating relational and graph-oriented database to optimize queries with high degree of indirection Marino Hilario Catarino 10 November 2017 (has links) Um indicador importante na área acadêmica está relacionado ao grau de impacto de uma publicação, o que pode auxiliar na avaliação da qualidade e do grau de internacionalização de uma instituição. Para melhor delimitar esse indicador torna-se necessária a realização de uma análise das redes de colaboração dos autores envolvidos. Considerando que o modelo de dados relacional é o modelo predominante dos bancos de dados atuais, observa-se que a análise das redes de colaboração é prejudicada pelo fato desse modelo não atender, com o mesmo desempenho, a todos os tipos de consultas realizadas. Uma alternativa para executar as consultas que perdem desempenho no modelo de banco de dados relacional é a utilização do modelo de banco de dados orientado a grafos. Porém, não é claro quais parâmetros podem ser utilizados para definir quando utilizar cada um dos modelos de bancos de dados. Assim, este trabalho tem como objetivo fazer uma análise de consultas que, a partir da sintaxe da consulta e do ambiente de execução, possa apontar o modelo de dados mais adequado para execução da referida consulta. Com essa análise, é possível delimitar em que cenários uma integração entre o modelo relacional e o orientado a grafos é mais adequada. / An important indicator in the academic area is related to the degree of impact of a publication that can help in evaluating the quality and degree of internationalization in academic institutions. One approach to better understand the aforementioned indicator is analyzing the collaboration network formed by each researcher. In order to analyze this network, several alternatives use the well known relational data model which is predominant in most databases used today. Even though this model is widely used, it has a performance drawback when some types of queries are performed. For overcoming this drawback, certain alternatives are using a graph-oriented database model which is similar to a collaboration network model. However, it is unclear what parameters can be used to define when to use a relational or graph-oriented model. In this work, we propose an analysis of queries that, from the syntax of a query and the execution environment, can point to the most suitable data model for the execution given a specific query. With this query analysis, it is possible to delimit in which scenarios an integration between the relational and the graph-oriented models is more appropriate. Banco de dados orientado a grafos Integração Internacionalização NoSQL Redes de colaboração Collaboration networks Graph database Integration Internationalization NoSQL
29	Podpora MongoDB pro UnifiedPush Server / MongoDB Support for UnifiedPush Server Pecsérke, Róbert January 2016 (has links) Tato diplomová práce se zabývá návrhem a implementací rozšíření pro UnifiedPush Server, které serveru umožní přistupovat k nerelační databázi MongoDB a využívá potenciál horiznotální škálovatelnosti neralačních databází. Součástí práce je i návrh výkonnostních testů a porovnání výkonu při behu na jednom a vícero uzlích, návrh migračního scénáře z MySQL na MongoDB, identifikace úzkých míst. Aplikace je implementována v jazyce Java a využívá Java Persistence API pro přístup k databázím. Pro přístup k nerelačním databázím používá implementaci standardu JPA Hibernate OGM.
30	Visualisering av storadatamängder i en webbläsare / Visualization of big data in a webbrowser Amri, Abdurrahman, Hashosh, Wameedh January 2016 (has links) Scania AB äger komplicerade testsystem som genererar stora och invecklade mängder av data. Ett av dessa testsystem är Hardware-In-the-Loop riggar (HIL) som testar elektriska system och genererar elektriska signaler samt resultat av de utförande testerna. Datat från dessa elektriska signaler skall sedan sparas i JSON-filer. För att dra nytta av Datat måste de sparas i en lämplig databas som kan be-handla denna data på ett bra sätt.Data från elektriska signaler beskriver olika fordonsegenskaper över tiden, vilket kan betraktas som en multivariat tidsserie.I det nuvarande systemet lagras och visualiseras data på ett ineffektivt sätt. Syftet med den första delen av studien är att undersöka 4 relevanta databaser som kan hantera JSON-filer. Den andra delen av detta arbete handlar om metoder för visua-lisering av signaler, det vill säga hur signaler kan representeras i form av grafer.Undersökningsresultat visade att två databaser är mer kvalificerade att ersätta den nuvarande databasen. CouchDB och PostgreSQL testades för att mäta deras pre-standa när det gäller lagring och hämtning av data.Testresultatet visade att CouchDB har högre prestanda för datahämtning än Post-greSQL. Därefter utvecklades en prototyp vars uppdrag var att visualisera signaler i webbläsare. / Scania AB owns the complex test systems that generate complex and large amounts of data. To take advantage of this data, it must be stored in a suitable database, de-pending on the data type. One of these test systems is Hardware-In-the-Loop which tests electrical system and produces large amounts of electrical signals. The data of these electrical signals will be saved in the form of JSON files.In the current system, these data are stored and visualized in an inefficient manner. The purpose of this study was to investigate 4 relevant databases that can handle JSON files. The second part of this work was to visualize signals in the web brow-ser.Results of investigations show that two databases were more qualified to replace the current database. CouchDB and PostgreSQL was a test-item to measure their performance in terms of response time in relation to file size and the number of signals.Results of the tests show that CouchDB has higher performance for retrieving data than PostgreSQL. Then, a prototype was developed in order to visualize the signals in the web browser. SQL NoSQL JSON Python Visualization Signals SQL NoSQL JSON Python Visualisering Signaler Computer Sciences Datavetenskap (datalogi)

Search results