Spelling suggestions: "subject:"heterogeneouscomponents"" "subject:"heterogeneousclustering""
41 |
Optimal Network Topologies and Resource Mappings for Heterogeneous Networks-on-ChipChung, Haera 01 January 2013 (has links)
Communication has become a bottleneck for modern microprocessors and multi-core chips because metal wires don't scale. The problem becomes worse as the number of components increases and chips become bigger. Traditional Systems-on-Chips (SoCs) interconnect architectures are based on shared-bus communication, which can carry only one communication transaction at a time. This limits the communication bandwidth and scalability. Networks-on-Chip (NoC) were proposed as a promising solution for designing large and complex SoCs. The NoC paradigm provides better scalability and reusability for future SoCs, however, long-distance multi-hop communication through traditional metal wires suffers from both high latency and power consumption. A radical solution to address this challenge is to add long-range, low power, and high-bandwidth single-hop links between distant cores. The use of optical or on-chip RF wireless links has been explored in this context. However, all previous work has focused on regular mesh-based metal wire fabrics that were expanded with one or two additional link types only for long-distance communication.
In this thesis we address the following main research questions to address the above-mentioned challenges: (1) What library of different link types would represent an optimum in the design space? (2) How would these links be used to design an application-specific NoC architecture? (3) How would applications use the resulting NoC architecture efficiently? We hypothesize that networks with a higher degree of heterogeneity, i.e., three or more link types, will improve the network throughput and consume less energy compared to traditional NoC architectures. In order to verify our hypothesis and to address the research challenges, we design and analyze optimal heterogeneous networks under different realistic traffic models by considering different cost and performance trade-offs in a comprehensive technology-agnostic simulation framework that uses metaheuristic optimization techniques. As opposed to related work, our heterogeneous links can be placed anywhere in the network, which allows to explore the entire search space. The resulting application-specific networks are then analyzed by using complex network techniques, such as community detection and small-worldness, to understand how heterogeneous link types are used to improve the NoCs performance and cost. Next, we use the application-specific networks as a target architecture for other applications. The goal is to evaluate the performance of our new NoCs for applications they have not been designed for by finding optimal resource allocations.
Our results show that there is an optimal number of heterogeneous link types for each set of constraints and that networks with three or more heterogeneous link types provide significantly higher throughput along with lower energy consumption compared to both homogeneous link type and regular 2D mesh networks under three different traffic scenarios. Our evolved networks with three different technology-driven link types, namely metal wires, wireless, and optical links, provide 15% more throughput and fourteen times less energy consumption compared to homogeneous link type network. When ten different abstract link types are used in the design, 12% more throughput and 52% less energy consumption are obtained compared to networks with three different technology-driven link types. This shows that heterogeneous NoC designs based on traditional metal wires, wireless, and optical links, occupy a non-optimal spot in the entire design space. Our results further show that heterogeneous NoCs scale up significantly better in terms of performance and cost compared to mesh networks. We uncovered that network communities evolve robustly and that heterogeneous link types are efficiently establishing inter- and intra-subnet connections depending on their link type properties. We also show that mapping an application on our application-specific NoC architecture provides on average 45% more throughput at 70% less energy consumption compared to regular 2D mesh networks. The NoCs are therefore not only good for the application they were designed for, but for a broad range of other applications as well.
|
42 |
Designing heterogeneous many-core processors to provide high performance under limited chip power budgetWoo, Dong Hyuk 04 October 2010 (has links)
This thesis describes the efficient design of a future many-core processor that can provide higher performance under the limited chip power budget. To achieve such a goal, this thesis first develops an analytical framework within which computer architects can estimate achievable performance improvement of different many-core architectures given the same power budget. From this study, this thesis found that a future many-core processor needs (1) energy-efficient parallel cores and (2) a high-performance sequential core. Based on these observations, this thesis proposes an energy-efficient broad-purpose acceleration layer that can be snapped on top of a conventional general-purpose processor. In addition to such an energy-efficient parallel cores, this thesis also proposes different architectural techniques to further boost the performance of sequential computation while those parallel cores are idle. In particular, this thesis develops low-cost architectural techniques to enhance the memory performance of a host core by utilizing those idle parallel cores. This idea is evaluated in two different system architectures: one with the aforementioned acceleration layer and the other with an emerging integrated CPU and GPU chip.
|
43 |
Managing XML data in a relational warehouse on query translation, warehouse maintenance, and data staleness /Kanna, Rajesh. January 2001 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains x, 75 p.; also contains graphics. Vita. Includes bibliographical references (p. 71-74).
|
44 |
Shared resource management for efficient heterogeneous computingLee, Jaekyu 13 January 2014 (has links)
The demand for heterogeneous computing, because of its performance and energy efficiency, has made on-chip heterogeneous chip multi-processors (HCMP) become the mainstream computing platform, as the recent trend shows in a wide spectrum of platforms from smartphone application processors to desktop and low-end server processors. The performance of on-chip GPUs is not yet comparable to that of discrete GPU cards, but vendors have integrated more powerful GPUs and this trend will continue in upcoming processors.
In this architecture, several system resources are shared between CPUs and GPUs. The sharing of system resources enables easier and cheaper data transfer between CPUs and GPUs, but it also causes resource contention problems between cores. The resource sharing problem has existed since the homogeneous (CPU-only) chip-multi processor (CMP) was introduced. However, resource sharing in HCMPs shows different aspects because of the different nature of CPU and GPU cores. In order to solve the resource sharing problem in HCMPs, we consider efficient shared resource management schemes, in particular tackling the problem in shared last-level cache and interconnection network.
In the thesis, we propose four resource sharing mechanisms:
First, we propose an efficient cache sharing mechanism that exploits the different characteristics of CPU and GPU cores to effectively share cache space between them. Second, adaptive virtual channel partitioning for on-chip interconnection network is proposed to isolate inter-application interference. By partitioning virtual channels to CPUs and GPUs, we can prevent the interference problem while guaranteeing quality-of-service (QoS) for both cores. Third, we propose a dynamic frequency controlling mechanism to efficiently share system resources. When both cores are active, the degree of resource contention as well as the system throughput will be affected by the operating frequency of CPUs and GPUs. The proposed mechanism tries to find optimal operating frequencies for both cores, which reduces the resource contention while improving system throughput. Finally, we propose a second cache sharing mechanism that exploits GPU-semantic information. The programming and execution models of GPUs are more strict and easier than those of CPUs. Also, programmers are asked to provide more information to the hardware. By exploiting these characteristics, GPUs can energy-efficiently exercise the cache and simpler, but more efficient cache partitioning can be enabled for HCMPs.
|
45 |
Accelerating Java on Embedded GPUP. Joseph, Iype 10 March 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
|
46 |
A performance evaluation of dynamic transport switching for multi-transport devices /Wang, Lei, January 2006 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2006. / Includes bibliographical references (p. 199-200).
|
47 |
Statically configured heterogeneous SMT processorVellore Suriyakumar, Avinankumar. January 2009 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
|
48 |
Pervasive hypermediaAnderson, Kenneth M. January 1997 (has links)
Thesis (Ph. D., Information and Computer Science)--University of California, Irvine, 1997. / Includes bibliographical references.
|
49 |
A hardware/software codesign for the chemical reactivity of BRAMS / Um coprojeto de hardware/software para a reatividade química do BRAMSCarlos Alberto Oliveira de Souza Junior 05 June 2017 (has links)
Several critical human activities depend on the weather forecasting. Some of them are transportation, health, work, safety, and agriculture. Such activities require computational solutions for weather forecasting through numerical models. These numerical models must be accurate and allow the computers to process them quickly. In this project, we aim at migrating a small part of the software of the weather forecasting model of Brazil, BRAMS Brazilian developments on the Regional Atmospheric Modelling System to a heterogeneous system composed of Xeon (Intel) processors coupled to a reprogrammable circuit (FPGA) via PCIe bus. According to the studies in the literature, the chemical equation from the mass continuity equation is the most computationally demanding part. This term calculates several linear systems Ax = b. Thus, we implemented such equations in hardware and provided a portable and highly parallel design in OpenCL language. The OpenCL framework also allowed us to couple our circuit to BRAMS legacy code in Fortran90. Although the development tools present several problems, the designed solution has shown to be viable with the exploration of parallel techniques. However, the performance was below of what we expected. / Várias atividades humanas dependem da previsão do tempo. Algumas delas são transporte, saúde, trabalho, segurança e agricultura. Tais atividades exigem solucões computacionais para previsão do tempo através de modelos numéricos. Estes modelos numéricos devem ser precisos e ágeis para serem processados no computador.Este projeto visa portar uma pequena parte do software do modelo de previsão de tempo do Brasil, o BRAMSBrazilian developments on the Regional Atmospheric Modelling Systempara uma arquitetura heterogênea composta por processadores Xeon (Intel) acoplados a um circuito reprogramável em FPGA via barramento PCIe. De acordo com os estudos, o termo da química da equação de continuidade da massa é o termo mais caro computacionalmente. Este termo calcula várias equações lineares do tipo Ax = b. Deste modo, este trabalho implementou estas equações em hardware, provendo um ´codigo portável e paralelo na linguagem OpenCL. O framework OpenCL também nos permitiu acoplar o código legado do BRAMS em Fortran90 junto com o hardware desenvolvido. Embora as ferramentas de desenvolvimento tenham apresentado vários problemas, a solução implementada mostrou-se viável com a exploração de técnicas de paralelismo. Entretando sua perfomance ficou muito aquém do desejado.
|
50 |
Accelerating Java on Embedded GPUP. Joseph, Iype January 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
|
Page generated in 0.1144 seconds