• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 36
  • 6
  • 5
  • 4
  • 1
  • 1
  • Tagged with
  • 66
  • 66
  • 20
  • 17
  • 15
  • 14
  • 13
  • 13
  • 11
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

A Modular Platform for Adaptive Heterogeneous Many-Core Architectures

Atef, Ahmed Kamaleldin 18 December 2023 (has links)
Multi-/many-core heterogeneous architectures are shaping current and upcoming generations of compute-centric platforms which are widely used starting from mobile and wearable devices to high-performance cloud computing servers. Heterogeneous many-core architectures sought to achieve an order of magnitude higher energy efficiency as well as computing performance scaling by replacing homogeneous and power-hungry general-purpose processors with multiple heterogeneous compute units supporting multiple core types and domain-specific accelerators. Drifting from homogeneous architectures to complex heterogeneous systems is heavily adopted by chip designers and the silicon industry for more than a decade. Recent silicon chips are based on a heterogeneous SoC which combines a scalable number of heterogeneous processing units from different types (e.g. CPU, GPU, custom accelerator). This shifting in computing paradigm is associated with several system-level design challenges related to the integration and communication between a highly scalable number of heterogeneous compute units as well as SoC peripherals and storage units. Moreover, the increasing design complexities make the production of heterogeneous SoC chips a monopoly for only big market players due to the increasing development and design costs. Accordingly, recent initiatives towards agile hardware development open-source tools and microarchitecture aim to democratize silicon chip production for academic and commercial usage. Agile hardware development aims to reduce development costs by providing an ecosystem for open-source hardware microarchitectures and hardware design processes. Therefore, heterogeneous many-core development and customization will be relatively less complex and less time-consuming than conventional design process methods. In order to provide a modular and agile many-core development approach, this dissertation proposes a development platform for heterogeneous and self-adaptive many-core architectures consisting of a scalable number of heterogeneous tiles that maintain design regularity features while supporting heterogeneity. The proposed platform hides the integration complexities by supporting modular tile architectures for general-purpose processing cores supporting multi-instruction set architectures (multi-ISAs) and custom hardware accelerators. By leveraging field-programmable-gate-arrays (FPGAs), the self-adaptive feature of the many-core platform can be achieved by using dynamic and partial reconfiguration (DPR) techniques. This dissertation realizes the proposed modular and adaptive heterogeneous many-core platform through three main contributions. The first contribution proposes and realizes a many-core architecture for heterogeneous ISAs. It provides a modular and reusable tilebased architecture for several heterogeneous ISAs based on open-source RISC-V ISA. The modular tile-based architecture features a configurable number of processing cores with different RISC-V ISAs and different memory hierarchies. To increase the level of heterogeneity to support the integration of custom hardware accelerators, a novel hybrid memory/accelerator tile architecture is developed and realized as the second contribution. The hybrid tile is a modular and reusable tile that can be configured at run-time to operate as a scratchpad shared memory between compute tiles or as an accelerator tile hosting a local hardware accelerator logic. The hybrid tile is designed and implemented to be seamlessly integrated into the proposed tile-based platform. The third contribution deals with the self-adaptation features by providing a reconfiguration management approach to internally control the DPR process through processing cores (RISC-V based). The internal reconfiguration process relies on a novel DPR controller targeting FPGA design flow for RISC-V-based SoC to change the types and functionalities of compute tiles at run-time.
52

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Potluri, Sreeram 18 September 2014 (has links)
No description available.
53

Multi-objective resource management for many-core systems

Martins, Andr? Lu?s Del Mestre 19 March 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-22T12:22:46Z No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-04T11:21:09Z (GMT) No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) / Made available in DSpace on 2018-06-04T11:37:12Z (GMT). No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) Previous issue date: 2018-03-19 / Sistemas many-core integram m?ltiplos cores em um chip, fornecendo alto desempenho para v?rios segmentos de mercado. Novas tecnologias introduzem restri??es de pot?ncia conhecidos como utilization-wall ou dark-silicon, onde a dissipa??o de pot?ncia no chip impede que todos os PEs sejam utilizados simultaneamente em m?ximo desempenho. A carga de trabalho (workload) em sistemas many-core inclui aplica??es tempo real (RT), com restri??es de vaz?o e temporiza??o. Al?m disso, workloads t?picos geram vales e picos de utiliza??o de recursos ao longo do tempo. Este cen?rio, sistemas complexos de alto desempenho sujeitos a restri??es de pot?ncia e utiliza??o, exigem um gerenciamento de recursos (RM) multi-objetivos capaz de adaptar dinamicamente os objetivos do sistema, respeitando as restri??es impostas. Os trabalhos relacionados que tratam aplica??es RT aplicam uma an?lise em tempo de projeto com o workload esperado, para atender ?s restri??es de vaz?o e temporiza??o. Para abordar esta limita??o do estado-da-arte, ecis?es em tempo de projeto, esta Tese prop?e um gerenciamento hier?rquico de energia (REM), sendo o primeiro trabalho que considera a execu??o de aplica??es RT e ger?ncia de recursos sujeitos a restri??es de pot?ncia, sem uma an?lise pr?via do conjunto de aplica??es. REM emprega diferentes heur?sticas de mapeamento e de DVFS para reduzir o consumo de energia. Al?m de n?o incluir as aplica??es RT, os trabalhos relacionados n?o consideram um workload din?mico, propondo RMs com um ?nico objetivo a otimizar. Para tratar esta segunda limita??o do estado-da-arte, RMs com objetivo ?nico a otimizar, esta Tese apresenta um gerenciamento de recursos multi-objetivos adaptativo e hier?rquico (MORM) para sistemas many-core com restri??es de pot?ncia, considerando workloads din?micos com picos e vales de utiliza??o. MORM pode mudar dinamicamente os objetivos, priorizando energia ou desempenho, de acordo com o comportamento do workload. Ambos RMs (REM e MORM) s?o abordagens multi-objetivos. Esta Tese emprega o paradigma Observar-Decidir-Atuar (ODA) como m?todo de projeto para implementar REM e MORM. A Observa??o consiste em caracterizar os cores e integrar monitores de hardware para fornecer informa??es precisas e r?pidas relacionadas ? energia. A Atua??o configura os atuadores do sistema em tempo de execu??o para permitir que os RMs atendam ?s decis?es multi-objetivos. A Decis?o corresponde ? implementa??o do REM e do MORM, os quais compartilham os m?todos de Observa??o e Atua??o. REM e MORM destacam-se dos trabalhos relacionados devido ?s suas caracter?sticas de escalabilidade, abrang?ncia e estimativa de pot?ncia e energia precisas. As avalia??es utilizando REM em manycores com at? 144 cores reduzem o consumo de energia entre 15% e 28%, mantendo as viola??es de temporiza??o abaixo de 2,5%. Resultados mostram que MORM pode atender dinamicamente a objetivos distintos. Comparado MORM com um RM estado-da-arte, MORM otimiza o desempenho em vales de workload em 11,56% e em picos workload em at? 49%. / Many-core systems integrate several cores in a single die to provide high-performance computing in multiple market segments. The newest technology nodes introduce restricted power caps so that results in the utilization-wall (also known as dark silicon), i.e., the on-chip power dissipation prevents the use of all resources at full performance simultaneously. The workload of many-core systems includes real-time (RT) applications, which bring the application throughput as another constraint to meet. Also, dynamic workloads generate valleys and peaks of resources utilization over the time. This scenario, complex high-performance systems subject to power and performance constraints, creates the need for multi-objective resource management (RM) able to dynamically adapt the system goals while respecting the constraints. Concerning RT applications, related works apply a design-time analysis of the expected workload to ensure throughput constraints. To cover this limitation, design-time decisions, this Thesis proposes a hierarchical Runtime Energy Management (REM) for RT applications as the first work to link the execution of RT applications and RM under a power cap without design-time analysis of the application set. REM employs different mapping and DVFS (Dynamic Voltage Frequency Scaling) heuristics for RT and non-RT tasks to save energy. Besides not considering RT applications, related works do not consider the workload variation and propose single-objective RMs. To tackle this second limitation, single-objective RMs, this Thesis presents a hierarchical adaptive multi-objective resource management (MORM) for many-core systems under a power cap. MORM addresses dynamic workloads with peaks and valleys of resources utilization. MORM can dynamically shift the goals to prioritize energy or performance according to the workload behavior. Both RMs (REM and MORM), are multi-objective approaches. This Thesis employs the Observe-Decide-Act (ODA) paradigm as the design methodology to implement REM and MORM. The Observing consists on characterizing the cores and on integrating hardware monitors to provide accurate and fast power-related information for an efficient RM. The Actuation configures the system actuators at runtime to enable the RMs to follow the multi-objective decisions. The Decision corresponds to REM and MORM, which share the Observing and Actuation infrastructure. REM and MORM stand out from related works regarding scalability, comprehensiveness, and accurate power and energy estimation. Concerning REM, evaluations on many-core systems up to 144 cores show energy savings from 15% to 28% while keeping timing violations below 2.5%. Regarding MORM, results show it can drive applications to dynamically follow distinct objectives. Compared to a stateof- the-art RM targeting performance, MORM speeds up the workload valley by 11.56% and the workload peak by up to 49%.
54

Composite thermal capacitors for transient thermal management of multicore microprocessors

Green, Craig Elkton 06 June 2012 (has links)
While 3D stacked multi-processor technology offers the potential for significant computing advantages, these architectures also face the significant challenge of small, localized hotspots with very large heat fluxes due to the placement of asymmetric cores, heterogeneous devices and performance driven layouts. In this thesis, a new thermal management solution is introduced that seeks to maximize the performance of microprocessors with dynamically managed power profiles. To mitigate the non-uniformities in chip temperature profiles resulting from the dynamic power maps, solid-liquid phase change materials (PCMs) with an embedded heat spreader network are strategically positioned near localized hotspots, resulting in a large increase in the local thermal capacitance in these problematic areas. Theoretical analysis shows that the increase in local thermal capacitance results in an almost twenty-fold increase in the time that a thermally constrained core can operate before a power gating or core migration event is required. Coupled to the PCMs are solid state coolers (SSCs) that serve as a means for fast regeneration of the PCMs during the cool down periods associated with throttling events. Using this combined PCM/SSC approach allows for devices that operate with the desirable combination of low throttling frequency and large overall core duty cycles, thus maximizing computational throughput. The impact of the thermophysical properties of the PCM on the device operating characteristics has been investigated from first principles in order to better inform the PCM selection or design process. Complementary to the theoretical characterization of the proposed thermal solution, a prototype device called a "Composite Thermal Capacitor (CTC)" that monolithically integrates micro heaters, PCMs and a spreader matrix into a Si test chip was fabricated and tested to validate the efficacy of the concept. A prototype CTC was shown to increase allowable device operating times by over 7X and address heat fluxes of up to ~395 W/cm2. Various methods for regenerating the CTC have been investigated, including air, liquid, and solid state cooling, and operational duty cycles of over 60% have been demonstrated.
55

Coordinated system level resource management for heterogeneous many-core platforms

Gupta, Vishakha 24 August 2011 (has links)
A challenge posed by future computer architectures is the efficient exploitation of their many and sometimes heterogeneous computational cores. This challenge is exacerbated by the multiple facilities for data movement and sharing across cores resident on such platforms. To answer the question of how systems software should treat heterogeneous resources, this dissertation describes an approach that (1) creates a common manageable pool for all the resources present in the platform, and then (2) provides virtual machines (VMs) with multiple `personalities', flexibly mapped to and efficiently run on the heterogeneous underlying hardware. A VM's personality is its execution context on the different types of available processing resources usable by the VM. We provide mechanisms for making such platforms manageable and evaluate coordinated scheduling policies for mapping different VM personalities on heterogeneous hardware. Towards that end, this dissertation contributes technologies that include (1) restructuring hypervisor and system functions to create high performance environments that enable flexibility of execution and data sharing, (2) scheduling and other resource management infrastructure for supporting diverse application needs and heterogeneous platform characteristics, and (3) hypervisor level policies to permit efficient and coordinated resource usage and sharing. Experimental evaluations on multiple heterogeneous platforms, like one comprised of x86-based cores with attached NVIDIA accelerators and others with asymmetric elements on chip, demonstrate the utility of the approach and its ability to efficiently host diverse applications and resource management methods.
56

Efficient hardware and software assist for many-core performance

Oh, Jungju 13 January 2014 (has links)
In recent years, the number of available cores in a processor are increasing rapidly while the pace of performance improvement of an individual core has been lagged. It led application developers to extract more parallelism from a number of cores to make their applications run faster. However, writing a parallel program that scales well with the increasing core counts is challenging. Consequently, many parallel applications suffer from performance bugs caused by scalability limiters. We expect core counts to continue to increase for the foreseeable future and hence, addressing scalability limiters is important for better performance on future hardware. With this thesis, I propose both software frameworks and hardware improvements that I developed to address three important scalability limiters: load imbalance, barrier latency and increasing on-chip packet latency. First, I introduce a debugging framework for load imbalance called LIME. The LIME framework uses profiling, statistical analysis and control flow graph analysis to automatically determine the nature of load imbalance problems and pinpoint the code where the problems are introduced. Second, I address scalability problem of the barrier, which has become costly and difficult to achieve scalable performance. To address this problem, I propose a transmission line (TL) based hardware barrier support, called TLSync, that is orders of magnitude faster than software barrier implementation while supports many (tens) of barriers simultaneously using a single chip-spanning network. Third and lastly, I focus on the increasing packet latency in on-chip network, and propose a hybrid interconnection where a low-latency TL based interconnect is synergistically used with a high-throughput switched interconnect. Also, a new adaptive packet steering policy is created to judiciously use the limited throughput available on the low-latency TL interconnect.
57

Approche logicielle pour améliorer la fiabilité d’applications parallèles implémentées dans des processeurs multi-cœur et many-cœur / Software approach to improve the reliability of parallel applications implemented on multi-core and many-core processors

Vargas Vallejo, Vanessa Carolina 28 April 2017 (has links)
La grande capacité de calcul, flexibilité, faible consommation d'énergie, redondance intrinsèque et la haute performance fournie par les processeurs multi/many-cœur les rendent idéaux pour surmonter les nouveaux défis dans les systèmes informatiques. Cependant, le degré d'intégration de ces dispositifs augmente leur sensibilité aux effets des radiations naturelles. Par conséquent, des fabricants, partenaires industriels et universitaires travaillent ensemble pour améliorer les caractéristiques de ces dispositifs ce qui permettrait leur utilisation dans des systèmes embarqués et critiques. Dans ce contexte, le travail effectué dans le cadre de cette thèse vise à évaluer l'impact des SEEs (Single Event Effects) dans des applications parallèles s'exécutant sur des processeurs multi-cœur et many-cœur, et développer et valider une approche logicielle pour améliorer la fiabilité du système appelée N- MoRePar. La méthodologie utilisée pour l'évaluation était fondée sur des études de cas multiples. Les différents scénarios mis en œuvre envisagent une large gamme de configurations de système en termes de mode de multi-processing, modèle de programmation, modèle de mémoire et des ressources utilisées. Pour l'expérimentation, deux dispositifs COTS ont été sélectionnés: le quad-core Freescale PowerPC P2041 en technologie SOI 45nm, et le processeur multi-cœur KALRAY MPPA-256 en CMOS 28nm. Les études de cas ont été évaluées par l'injection de fautes et par des campagnes des tests sur neutron. Les résultats obtenus servent de guide aux développeurs pour choisir la configuration du système la plus fiable en fonction de leurs besoins. En outre, les résultats de l'évaluation de l'approche N-MoRePar basée sur des critères de redondance et de partitionnement augmente l'utilisation des processeurs COTS multi/many-cœur dans des systèmes qui requièrent haute fiabilité. / The large computing capacity, great flexibility, low power consumption, intrinsic redundancy and high performance provided by multi/many-core processors make them ideal to overcome with the new challenges in computing systems. However, the degree of scale integration of these devices increases their sensitivity to the effects of natural radiation. Consequently manufacturers, industrial and university partners are working together to improve their characteristics which allow their usage in critical embedded systems. In this context, the work done throughout this thesis aims at evaluating the impact of SEEs on parallel applications running on multi-core and many-core processors, and proposing a software approach to improve the system reliability. The methodology used for evaluation was based on multiple-case studies. The different scenarios implemented consider a wide range of system configurations in terms of multi-processing mode, programming model, memory model, and resources used. For the experimentation, two COTS devices were selected: the Freescale PowerPC P2041 quad-core built in 45nm SOI technology, and the KALRAY MPPA-256 many-core processor built in 28nm CMOS technology. The case-studies were evaluated through fault-injection and neutron radiation. The obtained results serve as useful guidelines to developers for choosing the most reliable system configuration according to their requirements. Furthermore, the evaluation results of the proposed N-MoRePar fault-tolerant approach based on redundancy and partitioning criteria boost the usage of COTS multi/many-core processors in high level dependability systems.
58

Co-scheduling for large-scale applications : memory and resilience / Ordonnancement concurrent d’applications à grande échelle : mémoire et résilience

Pottier, Loïc 18 September 2018 (has links)
Cette thèse explore les problèmes liés à l'ordonnancement concurrent dans le contexte des applications massivement parallèle, de deux points de vue: le coté mémoire (en particulier la mémoire cache) et le coté tolérance aux fautes.Avec l'avènement récent des architectures dites many-core, tels que les récents processeurs multi-coeurs, le nombre d'unités de traitement augmente de manière importante.Dans ce contexte, les avantages fournis par les techniques d'ordonnancements concurrents ont été démontrés à travers de nombreuses études.L'ordonnancement concurrent, aussi appelé co-ordonnancement, consiste à exécuter les applications de manière concurrente plutôt que les unes après les autres, dans le but d'améliorer le débit global de la plateforme.Mais le partage des ressources peut souvent générer des interférences.Une des solutions pour réduire de manière importante ces interférences est le partitionnement de cache.À travers un modèle théorique, des simulations et des expériences sur une plateforme existante, nous montrons l'utilité et l'importance du co-ordonnancement quand nos stratégies de partitionnement de cache sont utilisées.De plus, avec ce nombre croissant de processeurs, la probabilité d'une panne augmente également.L'efficacité des techniques de co-ordonnancement a été démontrée dans un contexte sans pannes, mais les plateformes massivement parallèles sont confrontées à des pannes fréquentes, et des techniques de tolérance aux fautes doivent être mise en place pour améliorer l'efficacité de ces plateformes.Nous étudions la complexité du problème avec un modèle théorique, nous concevons des heuristiques et nous effectuons un ensemble complet de simulations avec un simulateur de pannes, qui démontre l'efficacité des heuristiques proposées. / This thesis explores co-scheduling problems in the context of large-scale applications with two main focus: the memory side, in particular the cache memory and the resilience side.With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units is increasing.In this context, the benefits of co-scheduling techniques have been demonstrated. Recall that, the main idea behind co-scheduling is to execute applications concurrently rather than in sequence in order to improve the global throughput of the platform.But sharing resources often generates interferences.With the arising number of processing units accessing to the same last-level cache, those interferences among co-scheduled applications becomes critical.In addition, with that increasing number of processors the probability of a failure increases too.Resiliency aspects must be taking into account, specially for co-scheduling because failure-prone resources might be shared between applications.On the memory side, we focus on the interferences in the last-level cache, one solution used to reduce these interferences is the cache partitioning.Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.We also investigate the same problem on a real cache partitioned chip multiprocessors, using the Cache Allocation Technology recently provided by Intel.In a second time, still on the memory side, we study how to model and schedule task graphs on the new many-core architectures, such as Knights Landing architecture.These architectures offer a new level in the memory hierarchy through a new on-packagehigh-bandwidth memory. Current approaches usually do not take intoaccount this new memory level, however new scheduling algorithms anddata partitioning schemes are needed to take advantage of this deepmemory hierarchy.On the resilience, we explore the impact on failures on co-scheduling performance.The co-scheduling approach has been demonstrated in a fault-free context, but large-scale computer systems are confronted by frequent failures, and resilience techniques must be employed for large applications to execute efficiently. Indeed, failures may create severe imbalance between applications, and significantly degrade performance.We aim at minimizing the expected completion time of a set of co-scheduled applications in a failure-prone context by redistributing processors.
59

RA-LPEL : a Resource-Aware Light-weight Parallel Execution Layer for reactive stream processing networks on the SCC many-core tiled architecture

Karavadara, Nilesh January 2016 (has links)
In computing the available computing power has continuously fallen short of the demanded computing performance. As a consequence, performance improvement has been the main focus of processor design. However, due to the phenomenon called 'Power Wall' it has become infeasible to build faster processors by just increasing the processor's clock speed. One of the resulting trends in hardware design is to integrate several simple and power-efficient cores on the same chip. This design shift poses challenges of its own. In the past, with increasing clock frequency the programs became automatically faster as well without modifications. This is no longer true with many-core architectures. To achieve maximum performance the programs have to run concurrently on more than one core, which forces the general computing paradigm to become increasingly parallel to leverage maximum processing power. In this thesis, we will focus on the Reactive Stream Program (RSP). In stream processing, the system consists of computing nodes, which are connected via communication streams. These streams simplify the concurrency management on modern many-core architectures due to their implicit synchronisation. RSP is a stream processing system that implements the reactive system. The RSPs work in tandem with their environment and the load imposed by the environment may vary over time. This provides a unique opportunity to increase performance per watt. In this thesis the research contribution focuses on the design of the execution layer to run RSPs on tiled many-core architectures, using the Intel's Single-chip Cloud Computer (SCC) processor as a concrete experimentation platform. Further, we have developed a Dynamic Voltage and Frequency Scaling (DVFS) technique for RSP deployed on many-core architectures. In contrast to many other approaches, our DVFS technique does not require the capability of controlling the power settings of individual computing elements, thus making it applicable for modern many-core architectures, with which power can be changed only for power islands. The experimental results confirm that the proposed DVFS technique can effectively improve the energy efficiency, i.e. increase the performance per watt, for RSPs.
60

Enhancing Task Assignment in Many-Core Systems by a Situation Aware Scheduler

Meier, Tobias, Ernst, Michael, Frey, Andreas, Hardt, Wolfram 17 July 2017 (has links)
The resource demand on embedded devices is constantly growing. This is caused by the sheer explosion of software based functions in embedded systems, that are growing far faster than the resources of the single-core and multi-core embedded processors. As one of the limitation is the computing power of the processors we need to explore ways to use this resource more efficiently. We identified that during the run-time of the embedded devices the resource demand of the software functions is permanently changing dependent on the device situation. To enable an embedded device to take advantage of this dynamic resource demand, the allocation of the software functions to the processor must be handled by a scheduler that is able to evaluate the resource demand of the software functions in relation to the device situation. This marks a change in embedded devices from static defined software systems to dynamic software systems. Above that we can increase the efficiency even further by extending the approach from a single device to a distributed or networked system (many-core system). However, existing approaches to deal with dynamic resource allocation are focused on individual devices and leave the optimization potential of manycore systems untouched. Our concept will extend the existing Hierarchical Asynchronous Multi-Core Scheduler (HAMS) concept for individual devices to many-core systems. This extension introduces a dynamic situation aware scheduler for many-core systems which take the current workload of all devices and the system-situation into account. With our approach, the resource efficiency of an embedded many-core system can be increased. The following paper will explain the architecture and the expected results of our concept.

Page generated in 0.0628 seconds