Global ETD Search

31	A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems Charnyote Pluempitiwiriyawej. January 2001 (has links) (PDF) Thesis (Ph. D.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains xii, 122 p.; also contains graphics. Vita. Includes bibliographical references (p. 114-121).
32	ADACORE: Achieving Energy Efficiency via Adaptive Core Morphing at Runtime Kurella, Nithesh 23 November 2015 (has links) Heterogeneous multicore processors offer an energy-efficient alternative to homogeneous multicores. Typically, heterogeneous multi-core refers to a system with more than one core where all the cores use a single ISA but differ in one or more micro-architectural configurations. A carefully designed multicore system consists of cores of diverse power and performance profiles. During execution, an application is run on a core that offers the best trade-off between performance and energy-efficiency. Since the resource needs of an application may vary with time, so does the optimal core choice. Moving a thread from one core to another involves transferring the entire processor state and cache warm-up. Frequent migration leads to large performance overhead, negating any benefits of migration. Infrequent migration on the other hand leads to missed opportunities. Thus, reducing overhead of migration is integral to harnessing benefits of heterogeneous multicores. \par This work proposes \textit{AdaCore}, a novel core architecture which pushes the heterogeneity exploited in the heterogeneous multicore into a single core. \textit{AdaCore} primarily addresses the resource bottlenecks in workloads. The design attempts to adaptively match the resource demands by reconfiguring on-chip resources at a fine-grain granularity. The adaptive core morphing allows core configurations with diverse power and performance profiles within a single core by adaptive voltage, frequency and resource reconfiguration. Towards this end, the proposed novel architecture while providing energy savings, improves performance with a low overhead in-core reconfiguration. This thesis further compares \textit{AdaCore} with a standard Out-of-Order core with capability to perform Dynamic Voltage and Frequency Scaling (DVFS) designed to achieve energy efficiency. The results presented in this thesis indicate that the proposed scheme can improve the performance/Watt of application, on average, by 32\% over a static out-of-order core and by 14\% over DVFS. The proposed scheme improves $IPS^{2}/Watt$ by 38\% over static out-of-order core. Heterogeneous Computing Architectures Morphable Architectures Energy Efficient Computing
33	Maximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale Systems Alturkestani, Tariq 30 September 2020 (has links) Out-of-Core simulation systems often produce a massive amount of data that cannot t on the aggregate fast memory of the compute nodes, and they also require to read back these data for computation. As a result, I/O data movement can be a bottleneck in large-scale simulations. Advances in memory architecture have made it feasible and a ordable to integrate hierarchical storage media on large-scale systems, starting from the traditional Parallel File Systems (PFSs) to intermediate fast disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst Bu ers) and up to CPU main memory and GPU High Bandwidth Memory (HBM). However, while adding additional and faster storage media increases I/O bandwidth, it pressures the CPU, as it becomes responsible for managing and moving data between these layers of storage. Simulation systems are thus vulnerable to being blocked by I/O operations. The Multilayer Bu er System (MLBS) proposed in this research demonstrates a general and versatile method for overlapping I/O with computation that helps to ameliorate the strain on the processors through asynchronous access. The main idea consists in decoupling I/O operations from computational phases using dedicated hardware resources to perform expensive context switches. MLBS monitors I/O tra c in each storage layer allowing fair utilization of shared resources. By continually prefetching up and down across all hardware layers of the memory and storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated applications and shifts it closer to a memory-bound or compute-bound regime. The evaluation on the Cray XC40 Shaheen-2 supercomputer for a representative I/Obound application, seismic inversion, shows that MLBS outperforms state-of-the-art PFSs, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively. On the IBM-built Summit supercomputer, using 2048 compute nodes equipped with a total of 12288 GPUs, MLBS achieves up to 1.4X performance speedup compared to the reference PFS-based implementation. MLBS is also demonstrated on applications from cosmology, combustion, and a classic out-of-core computational physics and linear algebra routines. hpc data I/O supercomputer Burst Buffer Heterogeneous Computing
34	HETEROGENEOUS COMPUTING AND LOAD BALANCING TECHNIQUES FOR MONTE CARLO SIMULATION IN A DISTRIBUTED ENVIRONMENT Deshpande, Isha Sanjay 08 September 2011 (has links) No description available. Computer Science Monte Carlo Simulation Load balancing FREERIDE Heterogeneous Computing
35	A MapReduce Framework for Heterogeneous Computing Architectures Elteir, Marwa Khamis 01 June 2013 (has links) Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional space of optimization methodologies, which is challenging even for the well-trained expert. The complexity of this multi-dimensional space arises not only from the traditionally well known but arduous task of architecture-aware GPU optimization at design and compile time, but it also arises in the partitioning and scheduling of the computation across these heterogeneous resources. Even with programming models like the Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), the developer still needs to manage the data transfer be- tween host and device and vice versa, orchestrate the execution of several kernels, and more arduously, optimize the kernel code. In this dissertation, we aim to deliver a transparent parallel programming environment for heterogeneous resources by leveraging the power of the MapReduce programming model and OpenCL programming language. We propose a portable architecture-aware framework that efficiently runs an application across heterogeneous resources, specifically AMD GPUs and NVIDIA GPUs, while hiding complex architectural details from the developer. To further enhance performance portability, we explore approaches for asynchronously and efficiently distributing the computations across heterogeneous resources. When applied to benchmarks and representative applications, our proposed framework significantly enhances performance, including up to 58% improvement over traditional approaches to task assignment and up to a 45-fold improvement over state-of-the-art MapReduce implementations. / Ph. D. Atomics Graphics Processing Units Programming Models Heterogeneous Computing MapReduce
36	Load balancing of irregular parallel applications on heterogeneous computing environments Janjic, Vladimir January 2012 (has links) Large-scale heterogeneous distributed computing environments (such as Computational Grids and Clouds) offer the promise of access to a vast amount of computing resources at a relatively low cost. In order to ease the application development and deployment on such complex environments, high-level parallel programming languages exist that need to be supported by sophisticated runtime systems. One of the main problems that these runtime systems need to address is dynamic load balancing that ensures that no resources in the environment are underutilised or overloaded with work. This thesis deals with the problem of obtaining good speedups for irregular applications on heterogeneous distributed computing environments. It focuses on workstealing techniques that can be used for load balancing during the execution of irregular applications. It specifically addresses two problems that arise during work-stealing: where thieves should look for work during the application execution and how victims should respond to steal attempts. In particular, we describe and implement a new Feudal Stealing algorithm and also we describe and implement new granularity-driven task selection policies in the SCALES simulator, which is a work-stealing simulator developed for this thesis. In addition, we present the comprehensive evaluation of the Feudal Stealing algorithm and the granularity-driven task selection policies using the simulations of a large class of regular and irregular parallel applications on a wide range of computing environments. We show how the Feudal Stealing algorithm and the granularity-driven task selection policies bring significant improvements in speedups of irregular applications, compared to the state-of-the-art work-stealing algorithms. Furthermore, we also present the implementation of the task selection policies in the Grid-GUM runtime system [AZ06] for Glasgow Parallel Haskell (GpH) [THLPJ98], in addition to the implementation in SCALES, and we also present the evaluation of this implementation on a large set of synthetic applications. 004.01
37	Dynamic superscalar grid for technical debt reduction Killian, Rudi January 2018 (has links) Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2018. / Organizations and the private individual, look to technology advancements to increase their ability to make informed decisions. The motivation for technology adoption by entities sprouting from an innate need for value generation. The technology currently heralded as the future platform to facilitate value addition, is popularly termed cloud computing. The move to cloud computing however, may conceivably increase the obsolescence cycle for currently retained Information Technology (IT) assets. The term obsolescence, applied as the inability to repurpose or scale an information system resource for needed functionality. The incapacity to reconfigure, grow or shrink an IT asset, be it hardware or software is a well-known narrative of technical debt. The notion of emergent technical debt realities is professed to be all but inevitable when informed by Moore’s Law, as technology must inexorably advance. Of more imminent concern however are that major accelerating factors of technical debt are deemed as non-holistic conceptualization and design conventions. Should management of IT assets fail to address technical debt continually, the technology platform would predictably require replacement. The unrealized value, functional and fiscal loss, together with the resultant e-waste generated by technical debt is meaningfully unattractive. Historically, the cloud milieu had evolved from the grid and clustering paradigms which allowed for information sourcing across multiple and often dispersed computing platforms. The parallel operations in distributed computing environments are inherently value adding, as enhanced effective use of resources and efficiency in data handling may be achieved. The predominant information processing solutions that implement parallel operations in distributed environments are abstracted constructs, styled as High Performance Computing (HPC) or High Throughput Computing (HTC). Regardless of the underlying distributed environment, the archetypes of HPC and HTC differ radically in standard implementation. The foremost contrasting factors of parallelism granularity, failover and locality in data handling have recently been the subject of greater academic discourse towards possible fusion of the two technologies. In this research paper, we uncover probable platforms of future technical debt and subsequently recommend redeployment alternatives. The suggested alternatives take the form of scalable grids, which should provide alignment with the contemporary nature of individual information processing needs. The potential of grids, as efficient and effective information sourcing solutions across geographically dispersed heterogeneous systems are envisioned to reduce or delay aspects of technical debt. As part of an experimental investigation to test plausibility of concepts, artefacts are designed to generically implement HPC and HTC. The design features exposed by the experimental artefacts, could provide insights towards amalgamation of HPC and HTC. Computational grids (Computer systems) Cloud computing High performance computing Heterogeneous computing
38	A hardware/software codesign for the chemical reactivity of BRAMS / Um coprojeto de hardware/software para a reatividade química do BRAMS Souza Junior, Carlos Alberto Oliveira de 05 June 2017 (has links) Several critical human activities depend on the weather forecasting. Some of them are transportation, health, work, safety, and agriculture. Such activities require computational solutions for weather forecasting through numerical models. These numerical models must be accurate and allow the computers to process them quickly. In this project, we aim at migrating a small part of the software of the weather forecasting model of Brazil, BRAMS Brazilian developments on the Regional Atmospheric Modelling System to a heterogeneous system composed of Xeon (Intel) processors coupled to a reprogrammable circuit (FPGA) via PCIe bus. According to the studies in the literature, the chemical equation from the mass continuity equation is the most computationally demanding part. This term calculates several linear systems Ax = b. Thus, we implemented such equations in hardware and provided a portable and highly parallel design in OpenCL language. The OpenCL framework also allowed us to couple our circuit to BRAMS legacy code in Fortran90. Although the development tools present several problems, the designed solution has shown to be viable with the exploration of parallel techniques. However, the performance was below of what we expected. / Várias atividades humanas dependem da previsão do tempo. Algumas delas são transporte, saúde, trabalho, segurança e agricultura. Tais atividades exigem solucões computacionais para previsão do tempo através de modelos numéricos. Estes modelos numéricos devem ser precisos e ágeis para serem processados no computador.Este projeto visa portar uma pequena parte do software do modelo de previsão de tempo do Brasil, o BRAMSBrazilian developments on the Regional Atmospheric Modelling Systempara uma arquitetura heterogênea composta por processadores Xeon (Intel) acoplados a um circuito reprogramável em FPGA via barramento PCIe. De acordo com os estudos, o termo da química da equação de continuidade da massa é o termo mais caro computacionalmente. Este termo calcula várias equações lineares do tipo Ax = b. Deste modo, este trabalho implementou estas equações em hardware, provendo um ´codigo portável e paralelo na linguagem OpenCL. O framework OpenCL também nos permitiu acoplar o código legado do BRAMS em Fortran90 junto com o hardware desenvolvido. Embora as ferramentas de desenvolvimento tenham apresentado vários problemas, a solução implementada mostrou-se viável com a exploração de técnicas de paralelismo. Entretando sua perfomance ficou muito aquém do desejado. Codesign Computação heterogênea Coprojeto FPGA FPGA Hardware Hardware Heterogeneous-computing OpenCL OpenCL
39	Optimal Network Topologies and Resource Mappings for Heterogeneous Networks-on-Chip Chung, Haera 01 January 2013 (has links) Communication has become a bottleneck for modern microprocessors and multi-core chips because metal wires don't scale. The problem becomes worse as the number of components increases and chips become bigger. Traditional Systems-on-Chips (SoCs) interconnect architectures are based on shared-bus communication, which can carry only one communication transaction at a time. This limits the communication bandwidth and scalability. Networks-on-Chip (NoC) were proposed as a promising solution for designing large and complex SoCs. The NoC paradigm provides better scalability and reusability for future SoCs, however, long-distance multi-hop communication through traditional metal wires suffers from both high latency and power consumption. A radical solution to address this challenge is to add long-range, low power, and high-bandwidth single-hop links between distant cores. The use of optical or on-chip RF wireless links has been explored in this context. However, all previous work has focused on regular mesh-based metal wire fabrics that were expanded with one or two additional link types only for long-distance communication. In this thesis we address the following main research questions to address the above-mentioned challenges: (1) What library of different link types would represent an optimum in the design space? (2) How would these links be used to design an application-specific NoC architecture? (3) How would applications use the resulting NoC architecture efficiently? We hypothesize that networks with a higher degree of heterogeneity, i.e., three or more link types, will improve the network throughput and consume less energy compared to traditional NoC architectures. In order to verify our hypothesis and to address the research challenges, we design and analyze optimal heterogeneous networks under different realistic traffic models by considering different cost and performance trade-offs in a comprehensive technology-agnostic simulation framework that uses metaheuristic optimization techniques. As opposed to related work, our heterogeneous links can be placed anywhere in the network, which allows to explore the entire search space. The resulting application-specific networks are then analyzed by using complex network techniques, such as community detection and small-worldness, to understand how heterogeneous link types are used to improve the NoCs performance and cost. Next, we use the application-specific networks as a target architecture for other applications. The goal is to evaluate the performance of our new NoCs for applications they have not been designed for by finding optimal resource allocations. Our results show that there is an optimal number of heterogeneous link types for each set of constraints and that networks with three or more heterogeneous link types provide significantly higher throughput along with lower energy consumption compared to both homogeneous link type and regular 2D mesh networks under three different traffic scenarios. Our evolved networks with three different technology-driven link types, namely metal wires, wireless, and optical links, provide 15% more throughput and fourteen times less energy consumption compared to homogeneous link type network. When ten different abstract link types are used in the design, 12% more throughput and 52% less energy consumption are obtained compared to networks with three different technology-driven link types. This shows that heterogeneous NoC designs based on traditional metal wires, wireless, and optical links, occupy a non-optimal spot in the entire design space. Our results further show that heterogeneous NoCs scale up significantly better in terms of performance and cost compared to mesh networks. We uncovered that network communities evolve robustly and that heterogeneous link types are efficiently establishing inter- and intra-subnet connections depending on their link type properties. We also show that mapping an application on our application-specific NoC architecture provides on average 45% more throughput at 70% less energy consumption compared to regular 2D mesh networks. The NoCs are therefore not only good for the application they were designed for, but for a broad range of other applications as well. Networks on a chip Heterogeneous computing Computer and Systems Architecture Systems and Communications
40	Designing heterogeneous many-core processors to provide high performance under limited chip power budget Woo, Dong Hyuk 04 October 2010 (has links) This thesis describes the efficient design of a future many-core processor that can provide higher performance under the limited chip power budget. To achieve such a goal, this thesis first develops an analytical framework within which computer architects can estimate achievable performance improvement of different many-core architectures given the same power budget. From this study, this thesis found that a future many-core processor needs (1) energy-efficient parallel cores and (2) a high-performance sequential core. Based on these observations, this thesis proposes an energy-efficient broad-purpose acceleration layer that can be snapped on top of a conventional general-purpose processor. In addition to such an energy-efficient parallel cores, this thesis also proposes different architectural techniques to further boost the performance of sequential computation while those parallel cores are idle. In particular, this thesis develops low-cost architectural techniques to enhance the memory performance of a host core by utilizing those idle parallel cores. This idea is evaluated in two different system architectures: one with the aforementioned acceleration layer and the other with an emerging integrated CPU and GPU chip. Heterogeneous many-core architecture Heterogeneous computing Multiprocessors Microprocessors High performance processors

Search results