Global ETD Search

21	System Support for Large-scale Geospatial Data Analytics January 2020 (has links) abstract: The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suﬀer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely diﬃcult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly. In this dissertation, I focus on designing eﬃcient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I ﬁrst propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell). / Dissertation/Thesis / Doctoral Dissertation Computer Science 2020 Computer science big data cluster computing data visualization database index distributed data systems geospatial data
22	Learning Decision List from Distributed Data Sources Charllo, Bala Vignesh January 2018 (has links) No description available. Computer Science Learning Decision List Distributed Data Mining Learning by Negotiations
23	An Architecture For High-performance Privacy-preserving And Distributed Data Mining Secretan, James 01 January 2009 (has links) This dissertation discusses the development of an architecture and associated techniques to support Privacy Preserving and Distributed Data Mining. The field of Distributed Data Mining (DDM) attempts to solve the challenges inherent in coordinating data mining tasks with databases that are geographically distributed, through the application of parallel algorithms and grid computing concepts. The closely related field of Privacy Preserving Data Mining (PPDM) adds the dimension of privacy to the problem, trying to find ways that organizations can collaborate to mine their databases collectively, while at the same time preserving the privacy of their records. Developing data mining algorithms for DDM and PPDM environments can be difficult and there is little software to support it. In addition, because these tasks can be computationally demanding, taking hours of even days to complete data mining tasks, organizations should be able to take advantage of high-performance and parallel computing to accelerate these tasks. Unfortunately there is no such framework that is able to provide all of these services easily for a developer. In this dissertation such a framework is developed to support the creation and execution of DDM and PPDM applications, called APHID (Architecture for Private, High-performance Integrated Data mining). The architecture allows users to flexibly and seamlessly integrate cluster and grid resources into their DDM and PPDM applications. The architecture is scalable, and is split into highly de-coupled services to ensure flexibility and extensibility. This dissertation first develops a comprehensive example algorithm, a privacy-preserving Probabilistic Neural Network (PNN), which serves a basis for analysis of the difficulties of DDM/PPDM development. The privacy-preserving PNN is the first such PNN in the literature, and provides not only a practical algorithm ready for use in privacy-preserving applications, but also a template for other data intensive algorithms, and a starting point for analyzing APHID's architectural needs. After analyzing the difficulties in the PNN algorithm's development, as well as the shortcomings of researched systems, this dissertation presents the first concrete programming model joining high performance computing resources with a privacy preserving data mining process. Unlike many of the existing PPDM development models, the platform of services is language independent, allowing layers and algorithms to be implemented in popular languages (Java, C++, Python, etc.). An implementation of a PPDM algorithm is developed in Java utilizing the new framework. Performance results are presented, showing that APHID can enable highly simplified PPDM development while speeding up resource intensive parts of the algorithm. Data Mining Privacy Preserving Data Mining Distributed Data Mining High-Performance Computing Computer Engineering Engineering
24	Resource Allocation for Federated Learning over Wireless Networks Jansson, Fredrik January 2022 (has links) This thesis examines resource allocation for Federated Learning in wireless networks. In Federated learning a server and a number of users exchange neural network parameters during training. This thesis aims to create a realistic simulation of a Federated Learning process by creating a channel model and using compression when channel capacity is insufficient. In the thesis we learn that Federated learning can handle high ratios of sparsification compression. We will also investigate how the choice of users and scheduling schemes affect the convergence speed and accuracy of the training process. This thesis will conclude that the choice of scheduling schemes will depend on the distributed data distribution. Federated Learning Neural Networks Channel Model Scheduling Compression Distributed Data Communication Systems Kommunikationssystem
25	DISCOVERY OF LINEAR TRAJECTORIES IN GEOGRAPHICALLY DISTRIBUTED DATASETS JHAVER, RISHI January 2003 (has links) No description available. Computer Science distributed data sets data aggregation in-network aggregation temporal databases sensor data sets
26	A 3D Deep Learning Architecture for Denoising Low-Dose CT Scans Kasparian, Armen Caspar 11 April 2024 (has links) This paper introduces 3D-DDnet, a cutting-edge 3D deep learning (DL) framework designed to improve the image quality of low-dose computed tomography (LDCT) scans. Although LDCT scans are advantageous for reducing radiation exposure, they inherently suffer from reduced image quality. Our novel 3D DL architecture addresses this issue by effectively enhancing LDCT images to achieve parity with the quality of standard-dose CT scans. By exploiting the inter-slice correlation present in volumetric CT data, 3D-DDnet surpasses existing denoising benchmarks. It incorporates distributed data parallel (DDP) and transfer learning techniques to significantly accelerate the training process. The DDP approach is particularly tailored for operation across multiple Nvidia A100 GPUs, facilitating the processing of large-scale volumetric data sets that were previously unmanageable due to size constraints. Comparative analyses demonstrate that 3D-DDnet reduces the mean square error (MSE) by 10% over its 2D counterpart, 2D-DDnet. Moreover, by applying transfer learning from pre-trained 2D models, 3D-DDnet effectively 'jump starts' the learning process, cutting training times by half without compromising on model accuracy. / Master of Science / This research focuses on improving the quality of low-dose CT scans using advanced technology. CT scans are medical imaging techniques used to see inside the body. Low-dose CT (LDCT) scans use less radiation than standard CT scans, making them safer, but the downside is that the images are not as clear. To solve this problem, we developed a new deep learning method to make these low-dose images clearer and as good as regular CT scans. Our approach, called 3D-DDnet, is unique because it looks at the scans in 3D, considering how slices of the scan are related, which helps remove the noise and improve the image quality. Additionally, we used a technique called distributed data parallel (DDP) with advanced GPUs (graphics processing units, which are powerful computer components) to speed up the training of our system. This means our method can learn to improve images faster and work with larger data sets than before. Our results are promising: 3D-DDnet improved the image quality of low-dose CT scans significantly better than previous methods. Also, by using what we call "transfer learning" (starting with a pre-made model and adapting it), we cut the training time in half without losing accuracy. This development is essential for making low-dose CT scans more effective and safer for patients. deep learning distributed data parallel transfer learning computed tomography image enhancement
27	Integração entre sistema multi-agentes e sistemas de banco de dados distribuídos. / Integration between multi-agent systems and distributed data base systems. Carvalho, Fábio Silva 26 June 2008 (has links) Sistemas multi-agentes devem oferecer recursos suficientes para que seus agentes possam interagir de maneira satisfatória e atingir seus objetivos. Um exemplo de recurso é um conjunto de dados armazenados em algum tipo de mecanismo de persistência, como um sistema gerenciador de banco de dados. O acesso a dados deve ser possível mesmo que eles estejam distribuídos, fato inclusive que também caracteriza os sistemas multi-agentes. Assim, este trabalho apresenta um sistema chamado DASE cujo objetivo é prover a agentes o acesso a dados distribuídos de forma simples e transparente, ou seja, independentemente da complexidade que o ambiente dos agentes possui e das peculiaridades do Sistema de Banco de Dados Distribuído. O DASE suporta qualquer Sistema Gerenciador de Banco de Dados, seja ele centralizado ou distribuído, desde que o mesmo esteja em conformidade com o JDBC. Além disso, oferece recursos importantes como controle de concorrência, suporte a ambientes de dados simultâneos e uso de sentenças de acesso a dados pré-definidas e parametrizadas. Todos os aspectos mais importantes analisados durante o projeto deste sistema estão descritos neste trabalho, evidenciando e justificando o porquê de cada decisão que certamente refletiram no funcionamento e comportamento do DASE. O sistema foi implementado de acordo com o seu projeto, resultando em uma versão funcional e estável, o que foi comprovado através de seu uso em um projeto que envolvia sistemas multiagentes e controle de tráfego aéreo. Além disso, alguns testes de análise de desempenho considerando cenários variados foram realizados. / Multi-agent systems must offer the needed resources to allow their agents to interact and to reach their goals. An example of resource is a set of data stored in any kind of resource manager, such as a database management system. Data access must be possible even if the data is distributed, characteristic that is also present in multi-agent systems. Thus, this work describes a system whose objective is to provide to agents distributed data access in a simple and transparent way, in other words, hiding the agent environment and complexities related to distributed database systems. DASE supports any database management system, centralized or distributed, in compliance with JDBC (Java Database Connectivity). In addition it offers important features, such as concurrency control, simultaneous data environments and stored SQL sentences. All challenges and important aspects overcome in order to design and implement DASE are described, explaining and justifying every decision that in some way had a participation to form DASE set of functions and behavior. The system was implemented following its design, resulting in a functional and stable version, what could be verified through its adoption in a project based on multiagent systems and air traffic control systems. In addition, a plenty of performance tests were done regarding different scenarios. Banco de dados distribuídos Concurrency control Controle de concorrência Distributed data base Distributed systems Multi-agent systems Sistemas distribuídos Sistemas multiagentes
28	Integração entre sistema multi-agentes e sistemas de banco de dados distribuídos. / Integration between multi-agent systems and distributed data base systems. Fábio Silva Carvalho 26 June 2008 (has links) Sistemas multi-agentes devem oferecer recursos suficientes para que seus agentes possam interagir de maneira satisfatória e atingir seus objetivos. Um exemplo de recurso é um conjunto de dados armazenados em algum tipo de mecanismo de persistência, como um sistema gerenciador de banco de dados. O acesso a dados deve ser possível mesmo que eles estejam distribuídos, fato inclusive que também caracteriza os sistemas multi-agentes. Assim, este trabalho apresenta um sistema chamado DASE cujo objetivo é prover a agentes o acesso a dados distribuídos de forma simples e transparente, ou seja, independentemente da complexidade que o ambiente dos agentes possui e das peculiaridades do Sistema de Banco de Dados Distribuído. O DASE suporta qualquer Sistema Gerenciador de Banco de Dados, seja ele centralizado ou distribuído, desde que o mesmo esteja em conformidade com o JDBC. Além disso, oferece recursos importantes como controle de concorrência, suporte a ambientes de dados simultâneos e uso de sentenças de acesso a dados pré-definidas e parametrizadas. Todos os aspectos mais importantes analisados durante o projeto deste sistema estão descritos neste trabalho, evidenciando e justificando o porquê de cada decisão que certamente refletiram no funcionamento e comportamento do DASE. O sistema foi implementado de acordo com o seu projeto, resultando em uma versão funcional e estável, o que foi comprovado através de seu uso em um projeto que envolvia sistemas multiagentes e controle de tráfego aéreo. Além disso, alguns testes de análise de desempenho considerando cenários variados foram realizados. / Multi-agent systems must offer the needed resources to allow their agents to interact and to reach their goals. An example of resource is a set of data stored in any kind of resource manager, such as a database management system. Data access must be possible even if the data is distributed, characteristic that is also present in multi-agent systems. Thus, this work describes a system whose objective is to provide to agents distributed data access in a simple and transparent way, in other words, hiding the agent environment and complexities related to distributed database systems. DASE supports any database management system, centralized or distributed, in compliance with JDBC (Java Database Connectivity). In addition it offers important features, such as concurrency control, simultaneous data environments and stored SQL sentences. All challenges and important aspects overcome in order to design and implement DASE are described, explaining and justifying every decision that in some way had a participation to form DASE set of functions and behavior. The system was implemented following its design, resulting in a functional and stable version, what could be verified through its adoption in a project based on multiagent systems and air traffic control systems. In addition, a plenty of performance tests were done regarding different scenarios. Banco de dados distribuídos Controle de concorrência Sistemas distribuídos Sistemas multiagentes Concurrency control Distributed data base Distributed systems Multi-agent systems
29	Distributed knowledge sharing and production through collaborative e-Science platforms Gaignard, Alban 15 March 2013 (has links) (PDF) This thesis addresses the issues of coherent distributed knowledge production and sharing in the Life-science area. In spite of the continuously increasing computing and storage capabilities of computing infrastructures, the management of massive scientific data through centralized approaches became inappropriate, for several reasons: (i) they do not guarantee the autonomy property of data providers, constrained, for either ethical or legal concerns, to keep the control over the data they host, (ii) they do not scale and adapt to the massive scientific data produced through e-Science platforms. In the context of the NeuroLOG and VIP Life-science collaborative platforms, we address on one hand, distribution and heterogeneity issues underlying, possibly sensitive, resource sharing ; and on the other hand, automated knowledge production through the usage of these e-Science platforms, to ease the exploitation of the massively produced scientific data. We rely on an ontological approach for knowledge modeling and propose, based on Semantic Web technologies, to (i) extend these platforms with efficient, static and dynamic, transparent federated semantic querying strategies, and (ii) to extend their data processing environment, from both provenance information captured at run-time and domain-specific inference rules, to automate the semantic annotation of ''in silico'' experiment results. The results of this thesis have been evaluated on the Grid'5000 distributed and controlled infrastructure. They contribute to addressing three of the main challenging issues faced in the area of computational science platforms through (i) a model for secured collaborations and a distributed access control strategy allowing for the setup of multi-centric studies while still considering competitive activities, (ii) semantic experiment summaries, meaningful from the end-user perspective, aimed at easing the navigation into massive scientific data resulting from large-scale experimental campaigns, and (iii) efficient distributed querying and reasoning strategies, relying on Semantic Web standards, aimed at sharing capitalized knowledge and providing connectivity towards the Web of Linked Data. [INFO:INFO_OH] Computer Science/Other Scientific workflows Semantic web services Web of linked data Federated knowledge bases Distributed data integration E-Science E-Health
30	Web application development with .NET : 3-tier architecture Dhali, Salle January 2012 (has links) The reason for performing this project work is to develop a Web application for the Student Union of Mid Sweden University applying the modern and comprehensive Microsoft .NET framework platform architecture. At present, the existing web application is divided into several modules which are built of server‐side scripting language technique and an open source database. The customer would like to develop the entire web applications using the Microsoft development tools and technologies in order to determine the possible benefit which could be obtained in terms of cost, maintenance, flexibility and the security perspective issues and also in terms of user friendly interactions options for all the involving partners in an effective way. The primary aim for the project is to start building a bookstore module for the Students Union that is responsible for selling literature to the students at the University. The module will also be integrated into a database system into which an administrator, a member of staff working in the Student Union, will be able to add a new book when it arrives and also update or delete if necessary later on. In addition to this module application all the book’s details belong to a certain category viewable to the students. The other part of this project work is aiming at finding a pattern similar to the bookstore module in which ordinary users can authenticate them towards a database and be able to add their curriculum vitae data entry and update it at a later stage as required. Human‐computer interaction ASP.NET .Net C# SQL ADO.NET N‐ tier distributed data architecture Computer and systems science Data- och systemvetenskap

Search results