1 |
Netswap: Network-based Swapping for Server-Embedded Board ClustersErrabelly, Sandeep 05 July 2023 (has links)
Capital equipment costs and energy costs are the major cost drivers in datacenters. Prior works have explored various techniques, like efficient scheduling algorithms and advanced power management techniques, to maximize resource utilization to reduce the capital and energy costs. The project HEXO has explored heterogeneous-Instruction Set Architecture (ISA) server-embedded clusters to minimize the cost. HEXO's key idea is to migrate stateful virtual machines from high-performance x86-based servers to low-power, low-cost ARM-based embedded boards, reducing server's resource congestion and thereby improving throughput and energy efficiency. However, embedded boards generally have significantly lower onboard memory, typically in the range of 100MB to 4GB. Due to this limitation, high memory-demand applications cannot be migrated to embedded devices. This limits the scope of applications that can be used with heterogeneous-ISA server-embedded clusters such as HEXO. This thesis proposes Netswap, a mechanism that utilizes the server's free memory as remote memory for the embedded board. Netswap comprises three main components: the swap-out and swap-in mechanism, a bitmap-based Free Memory Manager, and the Netswap Remote Daemon. Experimental studies using micro- and macro benchmarks reveal that Netswap improves the throughput and energy efficiency of server-embedded clusters by as much as 40% and 20%, respectively, over server-only baselines. / Master of Science / Datacenters have major expenditures like capital costs and energy expenditures. The project HEXO addresses in reducing these expenditures by including small embedded devices in datacenters. These embedded devices are cheaper and consume less energy than a typical server, but they have limited onboard RAM. The memory limitation restricts HEXO's ability to run applications with higher memory demand. This thesis introduces Netswap, which solves this issue by utilizing the free memory available on the servers as a secondary memory for the connected embedded devices. We discussed various design choices for efficiently implementing such a remote memory mechanism.
|
2 |
CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in UserspaceNoor Mohamed, Mohamed Husain 21 July 2023 (has links)
Scaling up application performance on single high-end machines is increasingly becoming difficult due to scalability challenges of processor interconnects, cache coherence protocols, and memory bandwidth. Significant prior work has addressed this problem by scaling-out application threads across multiple nodes to exploit resources outside the single machine boundary. Prior works have also leveraged heterogeneous instruction set architecture (ISA) systems to improve application performance as well as energy-efficiency, a major cost driver in datacenters, by augmenting high-end servers with power-efficient embedded boards. Existing works, however, suffer from deployability challenges due to dependencies on the operating system or programming models that require non-trivial application modifications. We introduce CRIU-RTX, a userspace framework to scale-out multi-threaded applications across multiple nodes. Integrated with HetMigrate, a prior work on migrating processes across heterogeneous-ISA systems, CRIU-RTX can suspend a subset of threads in a process and resume their execution on different nodes, including, but not limited to heterogeneous-ISA nodes. CRIU-RTX implements distributed shared memory in userspace, thereby allowing application threads to access distributed memory transparently without any operating system dependency. Our experimental evaluations show 21% to 43% performance gains while scaling-out applications across x86-64 servers, and energy efficiency gains of up to 18% while scaling-out across a cluster of x86-64 server and ARM64 embedded boards. Since CRIU-RTX does not depend on operating system modifications, it can be easily deployed on a diverse set of machines, including, but not limited to ISA-different machines running the stock Linux operating system. / Master of Science / Commonly referred to as "Moore's Law", Gordan Moore predicted that the number of transistors on a chip would double every two years. However, this law no longer holds true, leading to a shift in computer research and development. To meet the increasing demands for faster and cheaper servers, researchers began exploring alternative computer designs. Data centers have started adopting servers with diverse architectures to enhance the cost-to-performance ratio, resulting in heterogeneous environments. Distributed execution refers to the process of running computational tasks or executing software across multiple interconnected systems or nodes. Instead of relying on a single machine or processor, the workload is distributed among a network of computers, allowing for parallel processing and improved performance. Prior works in this direction had difficulty in adoption due to customized hardware or operating system requirements. This thesis introduces CRIU-RTX, a userspace framework to scale-out application threads without operating system dependency. We implemented a distributed shared memory system in userspace to allow application threads running in scaled-out execution to access distributed memory as if they are running on the same machine. Our evaluations of CRIU-RTX show significant improvement in performance and energy-efficiency.
|
3 |
Cooperating heterogeneous systems: A blackboard-based meta approachSchwartz, David Gary January 1993 (has links)
No description available.
|
4 |
Cross-ISA Execution Migration of Unikernels: Build Toolchain, Memory Alignment, and VM State Transfer TechniquesMehrab, A K M Fazla 12 December 2018 (has links)
The data centers are composed of resource-rich expensive server machines. A server, overloadeded with workloads, offloads some jobs to other servers; otherwise, its throughput becomes low. On the other hand, low-end embedded computers are low-power, and cheap OS-capable devices. We propose a system to use these embedded devices besides the servers and migrate some jobs from the server to the boards to increase the throughput when overloaded. The datacenters usually run workloads inside virtual machines (VM), but these embedded boards are not capable of running full-fledged VMs. In this thesis, we propose to use lightweight VMs, called unikernel, which can run on these low-end embedded devices. Another problem is that the most efficient versions of these boards have different instruction set architectures than the servers have. The ISA-difference between the servers and the embedded boards and the migration of the entire unikernel between them makes the migration a non-trivial problem. This thesis proposes a way to provide the unikernels with migration capabilities so that it becomes possible to offload workloads from the server to the embedded boards. This thesis describes a toolchain development process for building migratable unikernel for the native applications. This thesis also describes the alignment of the memory components between unikernels for different ISAs, so that the memory referencing remains valid and consistent after migration. Moreover, this thesis represents an efficient VM state transfer method so that the workloads experience higher execution time and minimum downtime due to the migration. / Master of Science / Cloud computing providers run data centers which are composed of thousands of server machines. Servers are robust, scalable, and thus capable of executing many jobs efficiently. At the same time, they are expensive to purchase and maintain. However, these servers may become overloaded by the jobs and take more time to finish their execution. In this situation, we propose a system which runs low-cost, low-power single-board computers in the data centers to help the servers, in considered scenarios, reduce execution time by transferring jobs from the server to the boards. Cloud providers run services inside virtual machines (VM) which provides isolation from other services. As these boards are not capable of running traditional VMs due to the low resources, we run lightweight VMs, called unikernel, in them. So if the servers are overloaded, some jobs running inside unikernels are offloaded to the boards. Later when the server gets some of its resources freed, these jobs are migrated back to the server. This back and forth migration system development for a unikernel is composed of several modules. This thesis discuss detail design and implementation of a few of these modules such as unikernel build environment implementation, and unikernel's execution state transfer during the migration.
|
5 |
Testing of Heterogeneous SystemsGhazi, Nauman January 2014 (has links)
Context: A system of systems often exhibits heterogeneity, for instance in implementation, hardware, process and verification. We define a heterogeneous system, as a system comprised of multiple systems (system of systems) where at least one subsystem exhibits heterogeneity with respect to the other systems. The system of systems approach taken in development of heterogeneous systems give rise to various challenges due to continuous change in configurations and multiple interactions between the functionally independent subsystems. The challenges posed to testing of heterogeneous systems are mainly related to interoperability, conformance and large regression test suites. Furthermore, the inherent complexities of heterogeneous systems also pose challenge to the specification, selection and execution of tests. Objective: The main objective of this licentiate thesis is to provide an insight on the state of the art in testing heterogeneous systems. Moreover, we also aimed to investigate different test techniques used to test heterogeneous systems in industrial settings and their usefulness as well as to identify and prioritize different information sources that can help practitioners to define a generic search space for test case selection process. Method: The findings presented in this thesis are obtained through a controlled experiment, a systematic literature review (SLR), a case study and an exploratory survey. The purpose of systematic literature review was to investigate the existing state of art in testing heterogeneous systems and identification of research gaps. The results from the SLR further laid down the foundation of action research conducted through an exploratory survey to compare different test techniques. We also conducted an industrial case study to investigate the relevant data sources for search space initiation to prioritize and specify test cases in context of heterogeneous systems. Results: Based on our literature review, we found that testing of heterogeneous systems is considered a problem of integration and system testing. It has been observed that multiple interactions between the system and subsystems results into a testing challenge, especially when the configurations change continuously. It is also observed that current literature targets the problem of testing heterogeneous systems with multiple test objectives resulting in employing different test methods to reach a domain specific testing challenge. Using the exploratory survey, we found three test techniques to be most relevant in context of testing heterogeneous systems. However, the most frequently used technique mentioned by the practitioners is manual exploratory testing which is not a much researched topic in the context of heterogeneous systems. Moreover, multiple information sources for test selection process are identified through the case study and the survey. Conclusion: Companies engaged in development of heterogeneous systems encounter huge challenges due to multiple interactions between the system and subsystems. However, the conclusions we draw from the research studies included herein show a gap between literature and industry. Search-based testing is widely discussed in the literature but is the least used test technique in industrial practice. Moreover, for test selection process there are no frameworks that take in account all the information sources that we investigated. Therefore, to fill this gap there is a need for an optimized test selection process based on the information sources. There is also a need to study different test techniques identified through our SLR and survey and compare these techniques on real heterogeneous systems.
|
6 |
Using Workload Characterization to Guide High Performance Graph ProcessingHassan, Mohamed Wasfy Abdelfattah 24 May 2021 (has links)
Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits.
This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers.
The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources. / Doctor of Philosophy / Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance.
With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
|
7 |
Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile PlatformsJanuary 2018 (has links)
abstract: Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios.
Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
|
8 |
Scheduling hard real-time tasks in heterogeneous multiprocessor platforms subject to energy and temperature constraints / Agendando tarefas duras em tempo real em plataformas de multiprocessadores heterogêneas sujeitas a restrições de energia e temperaturaValentin, Eduardo Bezerra, 92-36710870 29 September 2017 (has links)
Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-02-08T12:48:35Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_Eduardo Bezerra Valetim.pdf: 1753904 bytes, checksum: b47b056ce4f5f67a30051e12c578323a (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-02-08T12:49:13Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_Eduardo Bezerra Valetim.pdf: 1753904 bytes, checksum: b47b056ce4f5f67a30051e12c578323a (MD5) / Made available in DSpace on 2018-02-08T12:49:13Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_Eduardo Bezerra Valetim.pdf: 1753904 bytes, checksum: b47b056ce4f5f67a30051e12c578323a (MD5)
Previous issue date: 2017-09-29 / The power wall is a barrier to improvement in the processor design process due to the
power consumption of components. The production of energy optimum systems demands
knowledge of different disciplines. The usage of heterogeneous multicore platforms is appealing
for recent applications, e.g., hard real-time systems. The motivation is the potential
reduced energy consumption offered by such platforms. Hard real-time systems are
present in life critical environments. Reducing the energy consumption on such systems
is an onerous process. Scheduling becomes particularly challenging to improve system
utilization and minimize system energy consumption and peak temperature on such platforms,
specially subject to hard real-time constraints.
Therefore, we propose a study to effectively answer the pertinent research question: “How
to offer users timing correctness and guarantees of hard real-time systems executed on heterogeneous
multicore systems with energy and temperature constraints?”. Finding optimal
solutions for such question has still several open research questions.
The main aim of this thesis is to propose an energy optimization method for hard realtime
system on heterogeneous multicore platform demonstrating that it is possible to timely
compute timing correctness and guarantees using a sufficient and necessary condition; accounting
for energy, temperature, preemption, precedence, shared resources constraints,
and architectural interference. The proposal is a two fold approach. First, we investigate
the process of finding the optimal task to core and frequency to task processes by means
of applying exact schedulability tests for heterogeneous multicore platforms. Second, the
outcome of the optimization analysis shall be used as reference to the on-line scheduler.
We believe that we have achieved the main objective of this research by combining: (a)
schedulability analysis from hard real-time systems, (b) representative mathematical formulations,
based on integer linear programming, covering modern processors technological
characteristics and using a classical combinatorial mathematical formulation (Multilevel
Generalized Assignment Problem), and (c) robust exact implicit enumeration algorithmic
strategies from combinatorial optimization, such as branch-and-cut and branch-and-price.
The systematic literature review in the research subject reveals that the field has open
questions to be answered. For instance, to the knowledge of the author only five works
in the state-of-the-art literature deal with the problem by providing optimal solutions.
Typically, the existing approaches focus on either heuristics or approximation algorithms.
Also, only one work has a proposal to evaluate the schedulability in this scenario with
an exact test. The typical formulation in the specialized literature is a 0/1 integer linear
programming model which considers a continuous processor frequency domain and determines
a single operating frequency per processor. One of the hypotheses tested in this
research is: stronger feasibility analysis offers tighter bounds for the problem. We believe that this can be observed, for example, in the results produced by solvers for fixed priority
schedulers, by means of an analysis based on a comparative study. By applying less accurate
schedulability tests, such as utilization based, the solvers take longer to converge to
optimal solutions, when compared to solvers that apply exact schedulability tests based
on response time analysis. Another hypothesis tested in this research is: practical instances
of the problem are timely solvable to optimal. We have experimented, by means of a comparative
study, on finding feasible solutions for workload for fixed priority schedulers with
up to 50 tasks distributed on four processors with seven different available frequencies. On
independent hard real-time tasks scheduled using EDF policy, we found optimal distribution
of up to 90 tasks on four processors with seven different available frequencies. In both
cases, the solutions were found within 30 min of execution time. Similarly, on dependent
tasks workload, we have optimally distributed 22 tasks, from an automotive control hard
real-time application, on four processors with seven different available frequencies, with
two shared resources and 23 precedence constraints within 1.5 h. We consider a few hours
in the design phase a price worth paying in this context. / .
|
9 |
Usage of third party components in Heterogeneous systems : An empirical studyRaavi, Jaya Krishna January 2016 (has links)
Context: The development of complex systems of systems leads to high development cost, uncontrollable software quality and low productivity. Thus Component-based software development was used to improve development effort and cost of the software. Heterogeneous systems are the system of systems that consist of functionally independent sub-systems with at least one sub-system exhibiting heterogeneity with respect to other systems. The context of this study is to investigate the usage of third party components in heterogeneous systems. Objectives. In this study an attempt was made to investigate the usage of third party components in heterogeneous systems in order to accomplish the following objectives: Identify different types of third party components. Identify challenges faced while integrating third-party components in heterogeneous systems. Investigate the difference in test design of various third party components Identify what the practitioners learn from various third party components Methods: We have conducted a systematic literature review by following Systematic literature review guidelines by Kitchenham to identify third party components used, challenges faced while integrating third-party components and test design techniques. Qualitative interviews were conducted in order to complement, supplement the finding from the SLR and further provide guidelines to the practitioners using third party components. The studies obtained from the SLR were analyzed in relation to the quality criteria using narrative analysis. The data obtained from interview results were analyzed using thematic analysis. Results: 31 primary studies were obtained from the systematic literature review (SLR). 3 types of third components, 12 challenges, 6 test design techniques were identified from SLR. From the analysis of interviews, it was observed that a total of 21 challenges were identified which complemented the SLR results. In addition, from interview test design techniques used for testing of heterogeneous systems having third party components were investigated. Interviews have also provided 10 Recommendations for the practitioners using different types of third party components in the product development. Conclusions: To conclude, commercial of the shelf systems (COTS and Open software systems (OSS) were the third party components mainly used in heterogeneous systems rather than in-house software from the interview and SLR results. 21 challenges were identified from SLR and interview results. The test design for testing of heterogeneous systems having different third party components vary, Due to the non-availability of source code, dependencies of the subsystems and competence of the component. From the analysis of obtained results, the author has also proposed guidelines to the practitioners based on the type of third party components used for product development. / <p>All the information provided are correct as per my knowledge.</p>
|
10 |
Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous SystemsErnstsson, August January 2020 (has links)
Today's society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore's law, hardware designers are finding new and increasingly complex ways to increase the accessible processor performance. It is getting more and more difficult to effectively target these processing resources without expert knowledge in parallelization, heterogeneous computation, communication, synchronization, and so on. To ensure that the software side can keep up, advanced programming environments and frameworks are needed to bridge the widening gap between hardware and software. One such example is the pattern-centric skeleton programming model and in particular the SkePU project. The work presented in this thesis first redesigns the SkePU framework based on modern C++ variadic template metaprogramming and state-of-the-art compiler technology. It then explores new ways to improve performance: by providing new patterns, improving the data access locality of existing ones, and using both static and dynamic knowledge about program flow. The work combines novel ideas with practical evaluation of the approach on several applications. The advancements also include the first skeleton API that allows variadic skeletons, new data containers, and finally an approach to make skeleton programming more customizable without compromising universal portability. / <p>Ytterligare forskningsfinansiärer: EU H2020 project EXA2PRO (801015); SeRC.</p>
|
Page generated in 0.126 seconds