Spelling suggestions: "subject:"[een] COMPUTER SYSTEMS"" "subject:"[enn] COMPUTER SYSTEMS""
751 |
Self-Organizing Logical-Clustering Topology for Managing Distributed Context InformationRahman, Hasibur January 2015 (has links)
Internet of Things (IoT) is on the verge of experiencing a paradigm shift, the focus of which is the integration of people, services, context information, and things in the Connected Society, thus enabling Internet of Everything (IoE). Hundreds of billions of things will be connected to IoT/IoE by 2020. This massive immersion of things paves the way for sensing and analysing anything, anytime and anywhere. This everywhere computing coupled with Internet or web-enabled services have allowed access to a vast amount of distributed context information from heterogeneous sources. This enormous amount of context information will remain under-utilized if not properly managed. Therefore, this thesis proposes a new approach of logical-clustering as opposed to physical clustering aimed at enabling efficient context information management. However, applying this new approach requires many research challenges to be met. By adhering to a design science research method, this thesis addresses these challenges and proposes solutions to them. The thesis first outlines the architecture for realizing logical-clustering topology for which a two-tier hierarchical-distributed hash table (DHT) based system architecture and a Software Defined Networking (SDN)-like approach are utilized whereby the clustering identifications are managed on the top-level overlay (as context storage) and heterogeneous context information sources are controlled via the bottom level. The feasibility of the architecture has been proven with an ns-3 simulation tool. The next challenge is to enable scalable clustering identification dissemination, for which a distributed Publish/Subscribe (PubSub) model is developed. The massive number of immersed nodes further necessitates a dynamic self-organized system. The thesis concludes by proposing new algorithms with regard to autonomic management of IoT to bring about the self-organization. These algorithms enable to structure the logical-clustering topology in an organized way with minimal intervention from outside sources and further ensure that it evolves correctly. A distributed IoT context information-sharing platform, MediaSense, is employed and extended to prove the feasibility of the dynamic PubSub model and the correctness of self-organized algorithms and to utilize them as context storage. Promising results have provided a high number of PubSub messages per second and fast subscription matching. Self-organization further enabled logical-clustering to evolve correctly and provided results on a par with the existing MediaSense for entity joining and high discovery rates for non-concurrent entity joining. The increase in context information requires its proper management. Being able to cluster (i.e. filter) heterogeneous context information based on context similarity can help to avoid under-utilization of resources. This thesis presents an accumulation of work which can be comprehended as a step towards realizing the vision of logical-clustering topology.
|
752 |
Application-aware resource management for datacenters / Applikationsmedveten resurshantering för datacenterSouza, Abel Pinto Coelho de January 2018 (has links)
High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and making predictions. Already in a single datacenter server, there are thousands of hardware and software metrics – Key Performance Indicators (KPIs) – that individually and aggregated can give insight in the performance, robustness, and efficiency of the datacenter and the provisioned applications. At the datacenter level, the number of KPIs is even higher. The fast growing interest on datacenter management from both public and industry together with the rapid expansion in scale and complexity of datacenter resources and the services being provided on them have made monitoring, profiling, controlling, and provisioning compute resources dynamically at runtime into a challenging and complex task. Commonly, correlations of application KPIs, like response time and throughput, with resource capacities show that runtime systems (e.g., containers or virtual machines) that are used to provision these applications do not utilize available resources efficiently. This reduces datacenter efficiency, which in term results in higher operational costs and longer waiting times for results. The goal of this thesis is to develop tools and autonomic techniques for improving datacenter operations, management and utilization, while improving and/or minimizing impacts on applications performance. To this end, we make use of application resource descriptors to create a library that dynamically adjusts the amount of resources used, enabling elasticity for scientific workflows in HPC datacenters. For mission critical applications, high availability is of great concern since these services must be kept running even in the event of system failures. By modeling and correlating specific resource counters, like CPU, memory and network utilization, with the number of runtime synchronizations, we present adaptive mechanisms to dynamically select which fault tolerant mechanism to use. Likewise, for scientific applications we propose a hybrid extensible architecture for dual-level scheduling of data intensive jobs in HPC infrastructures, allowing operational simplification, on-boarding of new types of applications and achieving greater job throughput with higher overall datacenter efficiency.
|
753 |
Additional Classes Effect on Model Accuracy using Transfer LearningKazan, Baran January 2020 (has links)
This empirical research study discusses how much the model’s accuracy changes when adding a new image class by using a pre-trained model with the same labels and measuring the precision of the previous classes to observe the changes. The purpose is to determine if using transfer learning is beneficial for users that do not have enough data to train a model. The pre-trained model that was used to create a new model was the Inception V3. It has the same labels as the eight different classes that were used to train the model. To test this model, classes of wild and non-wild animals were taken as samples. The algorithm used to train the model was implemented in a single class programmed in Python programming language with PyTorch and TensorBoard library. The Tensorboard library was used to collect and represent the result. Research results showed that the accuracy of the first two classes was 94.96% in training and 97.07% in validation. When training the model with a total of eight classes, the accuracy was 91.89% in training and 95.40 in validation. The precision of both classes was detected at 100% when the model solely had cat and dog classes. After adding six additional classes in the model, the precision changed to 95.82% of the cats and 97.16% of the dogs.
|
754 |
Onboard computer fault-tolerance detection and mitigationOlofsson, Gustav January 2020 (has links)
The aim for this thesis is to design a software library responsible for preventing, detecting, handling and logging faults caused by radiation in a representable flight computer system based on the Cobham Gaisler GR740 quad-core LEON4FT processor chip. The LEON processor family is commonly used in space applications and it is based on the open SPARC instruction set and has been extended with fault tolerant features to cope with both on-chip radiation effects as well as upsets in external memory. The new GR740 device introduces a new computer architecture with multiple buses as compared to previous chips, Level-2 cache and a memory scrubber accelerating fault mitigation in external SDRAM memories. As the processor system design keeps getting more complex it also requires software to handle more hardware and new events, including central handling and logging routines of faults. The report describes the analysis performed to identify sources of faults and proposed suitable mitigation techniques, the established software requirements and how they are translated into a software architecture, then implemented and finally demonstrated on hardware. Along with this, it is shown how the developed demonstrator application software library can be integrated into the RTEMS real-time operating system commonly used in European space missions. The results are based on the demonstrator execution, and the results show that the software is functionally working and validates that the performance of the scrubber matches the derived scrubbing timings. After the project is completed, the software library design will be evaluated for use in Cobham Gaisler’s payload computer platform for the GOMX-5 mission. Radiation upsets will be emulated by injecting faults while running the developed API on demonstrator applications. Furthermore, implementation of software into NASA cFS/cFE will be analysed.
|
755 |
Performance and power modeling of GPU systems with dynamic voltage and frequency scalingWang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
|
756 |
Performance and power modeling of GPU systems with dynamic voltage and frequency scalingWang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
|
757 |
Contributions to Parallel Simulation of Equation-Based Models on Graphics Processing UnitsStavåker, Kristian January 2011 (has links)
In this thesis we investigate techniques and methods for parallel simulation of equation-based, object-oriented (EOO) Modelica models on graphics processing units (GPUs). Modelica is being developed through an international effort via the Modelica Association. With Modelica it is possible to build computationally heavy models; simulating such models however might take a considerable amount of time. Therefor techniques of utilizing parallel multi-core architectures for simulation are desirable. The goal in this work is mainly automatic parallelization of equation-based models, that is, it is up to the compiler and not the end-user modeler to make sure that code is generated that can efficiently utilize parallel multi-core architectures. Not only the code generation process has to be altered but the accompanying run-time system has to be modified as well. Adding explicit parallel language constructs to Modelica is also discussed to some extent. GPUs can be used to do general purpose scientific and engineering computing. The theoretical processing power of GPUs has surpassed that of CPUs due to the highly parallel structure of GPUs. GPUs are, however, only good at solving certain problems of data-parallel nature. In this thesis we relate several contributions, by the author and co-workers, to each other. We conclude that the massively parallel GPU architectures are currently only suitable for a limited set of Modelica models. This might change with future GPU generations. CUDA for instance, the main software platform used in the thesis for general purpose computing on graphics processing units (GPGPU), is changing rapidly and more features are being added such as recursion, function pointers, C++ templates, etc.; however the underlying hardware architecture is still optimized for data-parallelism.
|
758 |
Integration of Ontology Alignment and Ontology Debugging for Taxonomy NetworksIvanova, Valentina January 2014 (has links)
Semantically-enabled applications, such as ontology-based search and data integration, take into account the semantics of the input data in their algorithms. Such applications often use ontologies, which model the application domains in question, as well as alignments, which provide information about the relationships between the terms in the different ontologies. The quality and reliability of the results of such applications depend directly on the correctness and completeness of the ontologies and alignments they utilize. Traditionally, ontology debugging discovers defects in ontologies and alignments and provides means for improving their correctness and completeness, while ontology alignment establishes the relationships between the terms in the different ontologies, thus addressing completeness of alignments. This thesis focuses on the integration of ontology alignment and debugging for taxonomy networks which are formed by taxonomies, the most widely used kind of ontologies, connected through alignments. The contributions of this thesis include the following. To the best of our knowledge, we have developed the first approach and framework that integrate ontology alignment and debugging, and allow debugging of modelling defects both in the structure of the taxonomies as well as in their alignments. As debugging modelling defects requires domain knowledge, we have developed algorithms that employ the domain knowledge intrinsic to the network to detect and repair modelling defects. Further, a system has been implemented and several experiments with real-world ontologies have been performed in order to demonstrate the advantages of our integrated ontology alignment and debugging approach. For instance, in one of the experiments with the well-known ontologies and alignment from the Anatomy track in Ontology Alignment Evaluation Initiative 2010, 203 modelling defects (concerning incomplete and incorrect information) were discovered and repaired.
|
759 |
Scalable and Efficient Tasking for Dynamic Sensor NetworksDang, Thanh Xuan 01 January 2011 (has links)
Sensor networks including opportunistic networks of sensor-equipped smartphones as well as networks of embedded sensors can enable a wide range of applications including environmental monitoring, smart grids, intelligent transportation, and healthcare. In most real-world applications, to meet end-user requirements, the network operator needs to define and update the sensors' tasks dynamically, such as updating the parameters for sensor data collection or updating the sensors' code. Tasking sensor networks is necessary to reduce the effort in programming sensor networks. However, it is challenging due to dynamics and scale in terms of number of nodes, number of tasks, and sensing regions of the networks. In addition, tasking sensor networks must also be efficient in terms of bandwidth, latency, energy consumption, and memory usage. This dissertation identifies and addresses the problems of scalability and efficiency in tasking sensor networks. The first challenge in tasking sensor networks is to define a mechanism that represents multiple tasks and sensor groups efficiently taking into account the heterogeneity and mobility of sensors deployed over a large geographical region. Another challenge in tasking sensor networks in general, and embedded sensor networks in particular, is to design protocols that can not only efficiently disseminate tasks but also maintain a consistent view of the task to be performed among inherently unreliable and resource-limited sensors. We believe that a scalable and efficient tasking framework can greatly benefit the development and deployment of sensor network applications. Our thesis is that decoupling the task specification from task implementation using a spatial two-dimensional (2D) representation of a tasking region such as maps enables scalable, efficient, and resource-adaptive tasking over heterogeneous mobile sensor networks. In addition, reducing overhead in detecting inconsistencies across nodes enables scalable and efficient task dissemination and maintenance. We present the design, implementation, and evaluation of Zoom, a multiresolution tasking framework that efficiently encapsulates multiple tasks and sensor groups for sensor networks deployed in a large geographical region. The key ideas in Zoom are (i) decoupling task specification and task implementation to support heterogeneity, (ii) using maps for representing spatial sensor groups and tasks to scale with the number of sensor groups and sensing regions, and (iii) using image encoding techniques to reduce the map size and provide adaptation to sensor platforms with different resource capabilities. We present the design, implementation, and evaluation of our protocol, DHV, which efficiently disseminates task content and ensures that all nodes have up-to-date task content in sensor networks. It achieves this by minimizing both the redundant information in each message and the number of transmitted messages in the networks. DHV has been included in the official distribution of TinyOS, a popular operating system for embedded sensor networks. As sensor networks continue to develop, they will evolve from dedicated and single-purpose systems to open and multi-purpose large scale systems. Nodes in the network will be retasked frequently to support multiple applications and multiple users. We believe that this work is an important step in enabling seamless interaction between users and sensor networks and to make sensor networks more widely adopted.
|
760 |
Robot Exercise Trainer : Intended for Treating DementiaLarsson, Hanna, Pihl, Jacob January 2020 (has links)
Worldwide, about 35.7 million people were estimated to be affected by dementia in 2010. One way to treat dementia is by exercising, but human trainers are few and expensive. Robots can be mass produced and work at anytime of the day. This report describes research done for developing a robot exercise coach intended for treating dementia. Three main problems for people with dementia were identified: memory, attention and motivation. By using computer vision the robot can help count repetitions, grade exercise correctness and make sure that the user is still paying attention. The Kinect was used for skeleton tracking to count repetitions and provide video. For motivation, motivational models and flow theory were used to design the users interaction with the robot and make it more enjoyable and engaging. Feedback was believed to be an important part of this interaction. To provide extra feedback skeleton tracking was turned into the robot mimicking the user. To test which combination of feedback and interaction was most enjoyable, a user study was done. The user study consisted of 11 subjects, each interacting with three different systems, each system with varying levels of feedback. After interacting, the subject filled out a survey and had an interview. The results from the user study showed evidence that repetition counting and exercise correctness feedback but no mimicking is the most enjoyable. With a statistically significant difference in regards to repetition counting at the 0.05 level. Younger people found the mimicking enjoyable but still preferred the system without it, and older people found it confusing. In future systems like this, repetition counting and exercise correctness feedback should be seen as important parts of the interaction.
|
Page generated in 0.0695 seconds