Global ETD Search

151	A Cloud Based Platform for Big Data Science Islam, Md. Zahidul January 2014 (has links) With the advent of cloud computing, resizable scalable infrastructures for data processing is now available to everyone. Software platforms and frameworks that support data intensive distributed applications such as Amazon Web Services and Apache Hadoop enable users to the necessary tools and infrastructure to work with thousands of scalable computers and process terabytes of data. However writing scalable applications that are run on top of these distributed frameworks is still a demanding and challenging task. The thesis aimed to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large data sets, collectively known as “big data”. The term “big-data” in this thesis refers to large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, social networks, twitter streams, and/or all digital sources available today and in the future. We introduced architectures and concepts for implementing a cloud-based infrastructure for analyzing large volume of semi-structured and unstructured data. We built and evaluated an application prototype for collecting, organizing, processing, visualizing and analyzing data from the retail industry gathered from indoor navigation systems and social networks (Twitter, Facebook etc). Our finding was that developing large scale data analysis platform is often quite complex when there is an expectation that the processed data will grow continuously in future. The architecture varies depend on requirements. If we want to make a data warehouse and analyze the data afterwards (batch processing) the best choices will be Hadoop clusters and Pig or Hive. This architecture has been proven in Facebook and Yahoo for years. On the other hand, if the application involves real-time data analytics then the recommendation will be Hadoop clusters with Storm which has been successfully used in Twitter. After evaluating the developed prototype we introduced a new architecture which will be able to handle large scale batch and real-time data. We also proposed an upgrade of the existing prototype to handle real-time indoor navigation data. Big Data Data Analysis Hadoop Hive Sentiment Analysis Predictive Analysis Fraud Detection Big data concepts NoSQL Databases Amazon AWS Windows Azure Data Visualization Lambda architecture Software Engineering Programvaruteknik
152	Effektiv och underhållssäker lagring av medicinsk data Ekberg, Albin, Holm, Jacob January 2014 (has links) Creating a database to manage medical data is not the easiest. We create a database to be used for a presentation tool that presents medical data about patients that is stored in the database. We examine which of the three databases, MySQL with relational design, MySQL with EAV design and MongoDB that are best suited for storing medical data. The analysis i performed in two steps. The first step handles the database that is most effective to retriev data. The second step examines how easy it is to change the structure of the various databases. The results show that depending on whether efficiency or maintenance is most important, different databases are the best choise. MySQL with relational design proves to be most effective while MongoDB is the easiest to maintain. mysql mongodb eav entity attribute value effective database databasetest testing databases SQL NoSQL effektiv databas databastest testa databaser mysql mongodb eav Computer Sciences Datavetenskap (datalogi)
153	Výhody a nevýhody relačních a nerelačních (noSQL) databází pro analytické úlohy / Advantages and disadvantages of relational and non-relational (NoSQL) databases for analytical tasks Klapač, Milan January 2015 (has links) This work focuses on NoSQL databases, their use for analytical tasks and on comparison of NoSQL databases with relational and OLAP databases. The aim is to analyse the benefits of NoSQL databases and their use for analytical purposes. The first part presents the basic principles of Business Intelligence, Data Warehousing, and Big Data. The second part deals with the key features of relational and NoSQL databases. The last part of the thesis describes the properties of four basic types of NoSQL databases, analyses their advantages, disadvantages and areas of application. The end of this part in-cludes specific examples of the use of NoSQL databases, together with the reasons for the selection of those solutions.
154	Time Series Similarity Search in Distributed Key-Value Data Stores Using R-Trees Charapko, Aleksey 01 January 2015 (has links) Time series data are sequences of data points collected at certain time intervals. The advance in mobile and sensor technologies has led to rapid growth in the available amount of time series data. The ability to search large time series data sets can be extremely useful in many applications. In healthcare, a system monitoring vital signals can perform a search against the past data and identify possible health threatening conditions. In engineering, a system can analyze performances of complicated equipment and identify possible failure situations or needs of maintenance based on historical data. Existing search methods for time series data are limited in many ways. Systems utilizing memory-bound or disk-bound indexes are restricted by the resources of a single machine or hard drive. Systems that do not use indexes must search through the entire database whenever a search is requested. The proposed system uses multidimensional index in the distributed storage environment to break the bound of one physical machine and allow for high data scalability. Utilizing an index allows the system to locate the patterns similar to the query without having to examine the entire dataset, which can significantly reduce the amount of computing resources required. The system uses an Apache HBase distributed key-value database to store the index and time series data across a cluster of machines. Evaluations were conducted to examine the system’s performance using synthesized data up to 30 million data points. The evaluation results showed that, despite some drawbacks inherited from an R-tree data structure, the system can efficiently search and retrieve patterns in large time series datasets. Thesis University of North Florida UNF Dissertations Dissertations Academic -- UNF -- Computing Time Series HBase Databases NoSQL Indexing R-tree Databases and Information Systems
155	HUR DATALAGRING KAN MÖJLIGGÖRA OCH BEGRÄNSA VÄRDESKAPANDE MED BUSINESS INTELLIGENCE Namér, Samuel, Shadman, Altai Jörgen, Svensson, Thomas January 2021 (has links) Organizations depend on IT for the successful completion of many organizationalactivities. In this paper, we aim to contribute to the research field and the awareness of the opportunities and limitations data storage puts on value creation with Business Intelligence. Thus, the research question asked in this thesis is: Which opportunities and limitations does data storage put on the value creation with Business Intelligence? A case study was conducted on an IT-organization along with two expert interviews in order to answer the research question. Semi-structured interviews were held with developers and an IT-architect of the IT-organization. We conclude that there might be situations where data storage affects BI-systems, but there are factors such as BI-maturity, time and budget that play a big part in how the value that an IT-organization aim to create can be realized. We identified that a migration to a graph database could be applied to the IT-organization for a more effective and optimized value creation through the BI-system. This due to the advantages with graph databases in relation to the type of data that the IT-organization is working with. Data storage SQL NoSQL Cloud Storage Business Intelligence BI-maturity Case Study Interview Study Information Systems, Social aspects
156	Techniques for Storing and Processing Next-Generation DNA Sequencing Data Camerlengo, Terry Luke 02 June 2014 (has links) No description available. Bioinformatics DNA sequence storage 4 bit encoding reference-based compression Needleman-Wusnch DNA base pair compression sequence compression MongoDB NGS Data management bioinformatics NoSQL 3 bases per byte
157	Is it time to move beyond Stored Procedures? Najafi Zadeh, Sam, Hellgren, Viktor January 2024 (has links) This project investigates the feasibility of migrating from an SQL database utilizing complex stored procedures to a NoSQL database, specifically focusing on the Saab application BAAS. The motivation behind the investigation is that the stored procedures are complex to maintain. The aim is to determine whether such a migration can simplify the database structure while maintaining acceptable performance levels. The study involves developing a proof of concept by translating a frequently used SQL stored procedure into application-sided logic implemented with MongoDB, a document-oriented NoSQL database. Performance tests comparing execution times of SQL stored procedures and the NoSQL alternative showed that while SQL stored procedures are faster due to primarily pre-compilation and optimized indexing, the NoSQL solution achieves acceptable execution times and offers enhanced maintainability and flexibility. This report provides a detailed evaluation of the potential benefits and drawbacks of migrating to NoSQL, emphasizing the importance of optimizing indexing strategies to close the performance gap. The findings suggest that, despite the time investment required for migration, the reduction in complexity and improved maintainability may justify the transition for organizations facing similar challenges. SQL NoSQL Stored Procedures Data Migration MongoDB Document-Oriented Database Relational Database Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Other Computer and Information Science Annan data- och informationsvetenskap
158	Distribuovaný repositář digitálních forenzních dat / Distributed Forensic Digital Data Repository Josefík, Martin January 2018 (has links) This work deals with the design of distributed repository aimed at storing digital forensic data. The theoretical part of the thesis describes digital forensics and what is its purpose. There are also explained Big data, suitable storages, their properties, advantages and disadvantages, in this part. The main part of the thesis deals with the design and implementation of distributed storage for digital forensic data. The design is also focused in suitable indexing of stored data, and supporting new types of digital forensic data. The performance of implemented system was evaluated for chosen type of digital forensic data PCAP files.
159	En jämförelse i kostnad och prestanda för molnbaserad datalagring / A comparison in cost and performance for cloud-based data storage Burgess, Olivia, Oucif, Sara January 2024 (has links) I takt med att datakvantiteter växer och kraven på skalbarhet och tillgänglighet inom molntjänster växer, framhävs behovet av undersökningar kring dess prestanda och kostnadseffektivitet. Dessa analyser är avgörande för att optimera tjänster och bistå företag med värdefulla rekommendationer för att fatta välgrundade beslut om datalagring i molnet. Detta examensarbete undersöker kostnad samt prestanda hos relationella och icke-relationella datalagringslösningar implementerade på Microsoft Azure och Google Cloud Platform. Verktyget Hyperfine används för att mäta latens och tjänsternas kostnadseffektivitet beräknas baserat på detta resultat samt dess beräknade månadskostnader. Studiens resultat indikerar att för de utvärderade relationella databastjänsterna uppvisar Azure SQL Database initialt en låg latens som sedan ökar proportionellt med datamängden, medan Google Cloud SQL visar en något högre latens vid lägre datamängder men mer konstant latens vid högre datamängder. Azure SQL visar sig vara mer kostnadseffektiv i förhållande till Google Cloud SQL, vilket gör den till ett mer fördelaktigt alternativ för företag som eftersträvar hög prestanda till lägre kostnader. Vid jämförelse mellan de två icke-relationella databastjänsterna Azure Cosmos DB och Google Cloud Datastore uppvisar Azure Cosmos DB genomgående jämförelsevis lägre latens och överlägsen kostnadseffektivitet. Detta gör Azure Cosmos DB till en fördelaktig lösning för företag som prioriterar ekonomisk effektivitet i sin databashantering. / As data volumes grow and the demands for scalability and availability within cloud services increase, the need for studies on their performance and cost-effectiveness is emphasized. These analyses are crucial for optimizing services and providing businesses with valuable recommendations to make well-grounded decisions about cloud data storage. This thesis examines cost and performance for relational and non-relational data storage solutions implemented on Microsoft Azure and Google Cloud Platform. The tool Hyperfine is used to evaluate latency and the cloud services cost efficiency is calculated using this result as well as their monthly cost. The study's results regarding relational data storage indicate that Azure SQL Database initially exhibits low latency, which then increases proportionally with the data volume, while Google Cloud SQL shows slightly higher latency at smaller data volumes but more consistent latency with more data. Azure SQL Database is more cost-effective, making it a more favorable option than Google Cloud SQL for companies seeking high performance at lower costs. Regarding evaluated services for non-relational data storage Azure Cosmos DB consistently demonstrates lower latency and superior cost efficiency compared to Google Cloud Datastore, making it the preferred solution for companies prioritizing economic efficiency in their database management. Azure Cosmos DB Azure SQL Database Google Cloud Datastore Google Cloud SQL Cloud databases Cloud services Cost efficiency NoSQL Performance SQL Azure Cosmos DB Azure SQL Database Google Cloud Datastore Google Cloud SQL kostnadseffektivitet molndatabaser molntjänster NoSQL prestanda SQL Computer Sciences Datavetenskap (datalogi)
160	Modélisation et construction des bases de données géographiques floues et maintien de la cohérence de modèles pour les SGBD SQL et NoSQL / Modeling and construction of fuzzy geographic databases with supporting models consistency for SQL and NoSQL database systems Soumri Khalfi, Besma 12 June 2017 (has links) Aujourd’hui, les recherches autour du stockage et de l’intégration des données spatiales constituent un maillon important qui redynamise les recherches sur la qualité des données. La prise en compte de l’imperfection des données géographiques, particulièrement l’imprécision, ajoute une réelle complexification. Parallèlement à l’augmentation des exigences de qualité centrées sur les données (précision, exhaustivité, actualité), les besoins en information intelligible ne cessent d’augmenter. Sous cet angle, nous sommes intéressés aux bases de données géographiques imprécises (BDGI) et leur cohérence. Ce travail de thèse présente des solutions pour la modélisation et la construction des BDGI et cohérentes pour les SGBD SQL et NoSQL.Les méthodes de modélisation conceptuelle de données géographiques imprécises proposées ne permettent pas de répondre de façon satisfaisante aux besoins de modélisation du monde réel. Nous présentons une version étendue de l’approche F-Perceptory pour la conception de BDGI. Afin de construire la BDGI dans un système relationnel, nous présentons un ensemble de règles de transformation automatique de modèles pour générer à partir du modèle conceptuel flou le modèle physique. Nous implémentons ces solutions sous forme d’un prototype baptisé FPMDSG.Pour les systèmes NoSQL type document. Nous présentons un modèle logique baptisé Fuzzy GeoJSON afin de mieux cerner la structure des données géographiques imprécises. En plus, ces systèmes manquent de pertinence pour la cohérence des données ; nous présentons une méthodologie de validation pour un stockage cohérent. Les solutions proposées sont implémentées sous forme d'un processus de validation. / Today, research on the storage and the integration of spatial data is an important element that revitalizes the research on data quality. Taking into account the imperfection of geographic data particularly the imprecision adds a real complexity. Along with the increase in the quality requirements centered on data (accuracy, completeness, topicality), the need for intelligible information (logically consistent) is constantly increasing. From this point of view, we are interested in Imprecise Geographic Databases (IGDBs) and their logical coherence. This work proposes solutions to build consistent IGDBs for SQL and NoSQL database systems.The design methods proposed to imprecise geographic data modeling do not satisfactorily meet the modeling needs of the real world. We present an extension to the F-Perceptory approach for IGDBs design. To generate a coherent definition of the imprecise geographic objects and built the IGDB into relational system, we present a set of rules for automatic models transformation. Based on these rules, we develop a process to generate the physical model from the fuzzy conceptual model. We implement these solutions as a prototype called FPMDSG.For NoSQL document oriented databases, we present a logical model called Fuzzy GeoJSON to better express the structure of imprecise geographic data. In addition, these systems lack relevance for data consistency; therefore, we present a validation methodology for consistent storage. The proposed solutions are implemented as a schema driven pipeline based on Fuzzy GeoJSON schema and semantic constraints. SIG Imperfection Imprécision Données géographiques floues Modélisation de données floues Conception de bases de données floues Cohérence de données Bases de données spatiales GIS Fuzzy geographic data Fuzzy data modeling Fuzzy database design Data consistency Spatial databases NoSQL databases F-Perceptory approach

Search results