Spelling suggestions: "subject:"heterogeneous processors"" "subject:"eterogeneous processors""
1 |
Mapping parallelism to heterogeneous processorsChandramohan, Kiran January 2016 (has links)
Most embedded devices are based on heterogeneous Multiprocessor System on Chips (MPSoCs). These contain a variety of processors like CPUs, micro-controllers, DSPs, GPUs and specialised accelerators. The heterogeneity of these systems helps in achieving good performance and energy efficiency but makes programming inherently difficult. There is no single programming language or runtime to program such platforms. This thesis makes three contributions to these problems. First, it presents a framework that allows code in Single Program Multiple Data (SPMD) form to be mapped to a heterogeneous platform. The mapping space is explored, and it is shown that the best mapping depends on the metric used. Next, a compiler framework is presented which bridges the gap between the high -level programming model of OpenMP and the heterogeneous resources of MPSoCs. It takes OpenMP programs and generates code which runs on all processors. It delivers programming ease while exploiting heterogeneous resources. Finally, a compiler-based approach to runtime power management for heterogeneous cores is presented. Given an externally provided budget, the approach generates heterogeneous, partitioned code that attempts to give the best performance within that budget.
|
2 |
Coordinated power management in heterogeneous processorsPaul, Indrani 08 June 2015 (has links)
Coordinated Power Management in Heterogeneous Processors
Indrani Paul
164 pages
Directed by Dr. Sudhakar Yalamanchili
With the end of Dennard scaling, the scaling of device feature size by itself no longer guarantees sustaining the performance improvement predicted by Moore’s Law. As industry moves to increasingly small feature sizes, performance scaling will become dominated by the physics of the computing environment and in particular by the transient behavior of interactions between power delivery, power management and thermal fields. Consequently, performance scaling must be improved by managing interactions between physical properties, which we refer to as processor physics, and system level performance metrics, thereby improving the overall efficiency of the system.
The industry shift towards heterogeneous computing is in large part motivated by energy efficiency. While such tightly coupled systems benefit from reduced latency and improved performance, they also give rise to new management challenges due to phenomena such as physical asymmetry in thermal and power signatures between the diverse elements and functional asymmetry in performance. Power-performance tradeoffs in heterogeneous processors are determined by coupled behaviors between major components due to the i) on-die integration, ii) programming model and the iii) processor physics. Towards this end, this thesis demonstrates the needs for coordinated management of functional and physical resources of a heterogeneous system across all major compute and memory elements. It shows that the interactions among performance, power delivery and different types of coupling phenomena are not an artifact of an architecture instance, but is fundamental to the operation of many core and heterogeneous architectures. Managing such coupling effects is a central focus of this dissertation. This awareness has the potential to exert significant influence over the design of future power and performance management algorithms.
The high-level contributions of this thesis are i) in-depth examination of characteristics and performance demands of emerging applications using hardware measurements and analysis from state-of-the-art heterogeneous processors and high-performance GPUs, ii) analysis of the effects of processor physics such as power and thermals on system level performance, iii) identification of a key set of run-time metrics that can be used to manage these effects, and iv) development and detailed evaluation of online coordinated power management techniques to optimize system level global metrics in heterogeneous CPU-GPU-memory processors.
|
3 |
Designing a Software Defined Radio to Run on a Heterogeneous ProcessorFayez, Almohanad Samir 13 May 2011 (has links)
Software Defined Radios (SDRs) are radio implementations in software versus the classic method of using discrete electronics. Considering the various classes of radio applications ranging from mobile-handsets to cellular base-stations, SDRs cover a wide range of power and computational needs. As a result, computing heterogeneity, in terms of Field-Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), and General Purpose Processors (GPPs), is needed to balance the computing and power needs of such radios. Whereas SDR represents radio implementation, Cognitive Radio (CR) represents a layer of intelligence and reasoning that derives reconfiguration of an SDR to suit an application's need. Realizing CR requires a new dimension for radios, dynamically creating new radio implementations during runtime so they can respond to changing channel and/or application needs.
This thesis explores the use of integrated GPP and DSP based processors for realizing SDR and CR applications. With such processors a GPP realizes the mechanism driving radio reconfiguration, and a DSP is used to implement the SDR by performing the signal processing necessary. This thesis discusses issues related to implementing radios in this computing environment and presents a sample solution for integrating both processors to create SDR-based applications.
The thesis presents a sample application running on a Texas Instrument (TI) OMAP3530 processor, utilizing its GPP and DSP cores, on a platform called the Beagleboard. For the application, the Center for Wireless Telecommunications' (CWT) Public Safety Cognitive Radio (PSCR) is ported, and an Android based touch screen interface is used for user interaction. In porting the PSCR to the Beagleboard USB bandwidth and memory access latency issues were the main system bottlenecks. Latency measurements of these interfaces are presented in the thesis to highlight those bottlenecks and can be used to drive GPP/DSP based system design using the Beagleboard. / Master of Science
|
4 |
Coordinating the Design and Management of Heterogeneous Datacenter ResourcesGuevara, Marisabel Alejandra January 2014 (has links)
<p>Heterogeneous design presents an opportunity to improve energy efficiency but raises a challenge in management. Whereas prior work separates the two, we coordinate heterogeneous design and management. We present a market-based resource allocation mechanism that navigates the performance and power trade-offs of heterogeneous architectures. Given this management framework, we explore a design space of heterogeneous processors and show a 12x reduction in response time violations when equipping a datacenter with three processor types over a homogeneous system that consumes the same power. To better understand trade-offs in large heterogeneous design spaces, we explore dozens of design strategies and present a risk taxonomy that classifies the reasons why a deployed system may underperform relative to design targets. We propose design strategies that explicitly mitigate risk, such as a strategy that minimizes the coefficient of variation in performance. In our experiments, we find that risk-aware design accounts for more than 70% of the strategies that produce systems with the best service quality. We also present a new datacenter management mechanism that fairly allocates processors to latency-sensitive applications. Tasks express value for performance using sophisticated piecewise-linear utility functions. With fairness in market allocations, we show how datacenters can mitigate envy amongst latency-sensitive users. We quantify the price of fairness and detail efficiency-fairness trade-offs. Finally, we extend the market to fairly allocate heterogeneous processors.</p> / Dissertation
|
5 |
A Dynamically Configurable Discrete Event Simulation Framework for Many-Core System-on-ChipsBarnes, Christopher J. January 2010 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Industry trends indicate that many-core heterogeneous processors will be the next-generation answer to Moore's law and reduced power consumption. Thus, both academia and industry are focused on the challenges presented by many-core heterogeneous processor designs. In many cases, researchers use discrete event simulators to research and validate new computer architecture innovations. However, there is a lack of dynamically configurable discrete event simulation environments for the testing and development of many-core heterogeneous processors. To fulfill this need we present Mhetero, a retargetable framework for cycle-accurate simulation of heterogeneous many-core processors along with the cycle-accurate simulation of their associated network-on-chip communication infrastructure. Mhetero is the result of research into dynamically configurable and highly flexible simulation tools with which users are free to produce custom instruction sets and communication methods in a highly modular design environment. In this thesis we will discuss our approach to dynamically configurable discrete event simulation and present several experiments performed using the framework to exemplify how Mhetero, and similarly constructed simulators, may be used for future innovations.
|
6 |
Sistema de seleção automática de conteúdo televisivo escalável baseado em rede de sensores. / Automatic scalable TV recommendation system based on sensors network.Foina, Aislan Gomide 02 December 2011 (has links)
Com o uso da tecnologia de Identificação por Radiofrequência (RFID), arquiteturas heterogêneas de processadores e as novas tendências da TV Digital e televisão via rede IP (IPTV) foi desenvolvido um sistema para montar, em tempo real, em forma automática, uma programação televisiva personalizada, baseada no perfil do grupo de usuários de um determinado televisor. Aplicações de vídeo sob demanda (VoD), IPTV e TV Digital permitem que cada telespectador possa assistir aos programas a qualquer momento, e assim construir sua grade de programação personalizada. Com um sistema de RFID é possível identificar as pessoas que se encontram próximas ao televisor. Com essas tecnologias unidas a um subsistema de análise de perfil, junto com os dados fornecidos pelos telespectadores no momento da contratação do serviço, e uma interface (middleware) para gerenciar os dados, é possível configurar um sistema que escolhe automaticamente quais programas e quais comerciais serão apresentados no aparelho de TV. Essa escolha é baseada no perfil dos telespectadores presentes naquele momento à frente da televisão e dos programas disponíveis naquele instante. As etiquetas (tags) de RFID usadas para o levantamento da audiência foram aparelhos celulares equipados com tecnologia Bluetooth, que possibilitam a identificação simultânea dos telespectadores via rádio. O algoritmo de recomendação é híbrido, possuindo componentes baseados em conteúdo e componentes colaborativos. O uso dos novos processadores heterogêneos exigiu o desenvolvimento de algoritmos paralelos que utilizam instruções do tipo SIMD, aceleradores e GPUs. Os sistemas que existem no momento (2011) nesta área, se limitam à identificação dos usuários mediante a digitação usando o controle remoto da TV e só identificam uma pessoa de cada vez. O uso de tecnologia por rádio, proposto nesta pesquisa, permite a identificação de várias pessoas simultaneamente, exigindo o desenvolvimento de padrões de um sistema completo baseado em grupos de perfis diferentes. A arquitetura do sistema elaborado está baseada no processador Cell BE e nas arquiteturas CPU+GPU, de forma que o tempo de execução do algoritmo fosse minimizado. / Merging together Radiofrequency identification (RFID), heterogeneous architectures of processors and new tendencies of the Digital TV (DTV) and television through IP network (IPTV), a system to create, automatically and in real-time, a personalized TV program schedule, based on the group of people profile next to a TV. Video-on-Demand (VoD) applications, IPTV and DTV allow each person to watch a chosen program at any moment and to its personalized programming guide. The RFID system allows the identification of the people next to the TV. This technology used with a profile analysis subsystem accessing a database of people preferences, and a middleware to manage the data, it is possible to set a system that automatically chooses with TV shows and with TV ads will be presented in the TV. This selection is based on the profile of the people next to the TV in that instant and on the available programs. The RFID tags used to detect the audience were the mobile phones equipped with Bluetooth, which allows the identification of its owner wirelessly. The recommendation algorithm is hybrid, containing collaborative and content-based components. The new heterogeneous processors demanded the development of parallel algorithms that use SIMD instruction, accelerators and GPUs. The systems that were available in the moment of this research (2011) were limited to the identification through login using remote control, one person by time. The use of RFID technology, proposed in this research, enables the simultaneous identification of many people at a time, demanding the development standards for group profiles recommendation. The systems architectures will be based on Cell BE processor and the conjunct CPU+GPU, focusing in the reduction of the algorithm execution time.
|
7 |
Sistema de seleção automática de conteúdo televisivo escalável baseado em rede de sensores. / Automatic scalable TV recommendation system based on sensors network.Aislan Gomide Foina 02 December 2011 (has links)
Com o uso da tecnologia de Identificação por Radiofrequência (RFID), arquiteturas heterogêneas de processadores e as novas tendências da TV Digital e televisão via rede IP (IPTV) foi desenvolvido um sistema para montar, em tempo real, em forma automática, uma programação televisiva personalizada, baseada no perfil do grupo de usuários de um determinado televisor. Aplicações de vídeo sob demanda (VoD), IPTV e TV Digital permitem que cada telespectador possa assistir aos programas a qualquer momento, e assim construir sua grade de programação personalizada. Com um sistema de RFID é possível identificar as pessoas que se encontram próximas ao televisor. Com essas tecnologias unidas a um subsistema de análise de perfil, junto com os dados fornecidos pelos telespectadores no momento da contratação do serviço, e uma interface (middleware) para gerenciar os dados, é possível configurar um sistema que escolhe automaticamente quais programas e quais comerciais serão apresentados no aparelho de TV. Essa escolha é baseada no perfil dos telespectadores presentes naquele momento à frente da televisão e dos programas disponíveis naquele instante. As etiquetas (tags) de RFID usadas para o levantamento da audiência foram aparelhos celulares equipados com tecnologia Bluetooth, que possibilitam a identificação simultânea dos telespectadores via rádio. O algoritmo de recomendação é híbrido, possuindo componentes baseados em conteúdo e componentes colaborativos. O uso dos novos processadores heterogêneos exigiu o desenvolvimento de algoritmos paralelos que utilizam instruções do tipo SIMD, aceleradores e GPUs. Os sistemas que existem no momento (2011) nesta área, se limitam à identificação dos usuários mediante a digitação usando o controle remoto da TV e só identificam uma pessoa de cada vez. O uso de tecnologia por rádio, proposto nesta pesquisa, permite a identificação de várias pessoas simultaneamente, exigindo o desenvolvimento de padrões de um sistema completo baseado em grupos de perfis diferentes. A arquitetura do sistema elaborado está baseada no processador Cell BE e nas arquiteturas CPU+GPU, de forma que o tempo de execução do algoritmo fosse minimizado. / Merging together Radiofrequency identification (RFID), heterogeneous architectures of processors and new tendencies of the Digital TV (DTV) and television through IP network (IPTV), a system to create, automatically and in real-time, a personalized TV program schedule, based on the group of people profile next to a TV. Video-on-Demand (VoD) applications, IPTV and DTV allow each person to watch a chosen program at any moment and to its personalized programming guide. The RFID system allows the identification of the people next to the TV. This technology used with a profile analysis subsystem accessing a database of people preferences, and a middleware to manage the data, it is possible to set a system that automatically chooses with TV shows and with TV ads will be presented in the TV. This selection is based on the profile of the people next to the TV in that instant and on the available programs. The RFID tags used to detect the audience were the mobile phones equipped with Bluetooth, which allows the identification of its owner wirelessly. The recommendation algorithm is hybrid, containing collaborative and content-based components. The new heterogeneous processors demanded the development of parallel algorithms that use SIMD instruction, accelerators and GPUs. The systems that were available in the moment of this research (2011) were limited to the identification through login using remote control, one person by time. The use of RFID technology, proposed in this research, enables the simultaneous identification of many people at a time, demanding the development standards for group profiles recommendation. The systems architectures will be based on Cell BE processor and the conjunct CPU+GPU, focusing in the reduction of the algorithm execution time.
|
8 |
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterogeneous ProcessorsPrasad, Ashwin 01 1900 (has links) (PDF)
MATLAB is an array language, initially popular for rapid prototyping, but is now being in-creasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s com-puter systems have tremendous computing power in the form of traditional CPU cores and also throughput-oriented accelerators such as graphics processing units (GPUs). Thus, an approach that maps the control flow dominated regions of a MATLAB program to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this work, we present the design and implementation of MEGHA, a compiler that auto-matically compiles MATLAB programs to enable synergistic execution on heterogeneous pro-cessors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. Our compiler identifies data parallel regions of the program and com-poses them into kernels. The kernel composition step eliminates a number of intermediate arrays which are otherwise required and also reduces the size of the scheduling and mapping problem the compiler needs to solve subsequently. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically, and the amount of data transfer needed is minimized. A heuristic technique to ensure that memory accesses on the CPU exploit locality and those on the GPU are coalesced is also presented. In order to ensure that data transfers required for dependences across basic blocks are performed, we propose a data flow analysis step and an edge-splitting strategy. Thus our compiler automatically handles kernel composition, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfers.
Additionally, we address the problem of identifying what variables can coexist in GPU memory simultaneously under the GPU memory constraints. We formulate this problem as that of identifying maximal cliques in an interference graph. We approximate the interference graph using an interval graph and develop an efficient algorithm to solve the problem. Furthermore, we present two program transformations that optimize memory accesses on the GPU using the software managed scratchpad memory available in GPUs.
We have prototyped the proposed compiler using the Octave system. Our experiments using this implementation show a geometric mean speedup of 12X on the GeForce 8800 GTS and 29.2X on the Tesla S1070 over baseline MATLAB execution for data parallel benchmarks. Experiments also reveal that our method provides up to 10X speedup over hand written GPUmat versions of the benchmarks. Our method also provides a speedup of 5.3X on the GeForce 8800 GTS and 13.8X on the Tesla S1070 compared to compiled MATLAB code running on the CPU.
|
Page generated in 0.0899 seconds