• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 95
  • 13
  • 9
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 177
  • 177
  • 54
  • 36
  • 35
  • 33
  • 31
  • 25
  • 25
  • 22
  • 22
  • 20
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Parallelization of a thermal elastohydrodynamic lubricated contacts simulation using OpenMP

Alrheis, Ghassan January 2020 (has links)
Datorer med flera kärnor som delar på ett gemensamt minne (SMP) har blivit normen sedan Moore's lag har slutat gälla. För att utnyttja den prestanda flera kärnor erbjuder så behöver mjukvaruingenjören skriva programmen så att de explicit utnyttjar flera kärnor. För mindre projekt är det lätt att detta bortses från vilket skapar program som endast utnyttjar en kärna. Detta gör att det i sådana fall finns stora vinningar genom att parallellisera koden. Det här examensarbetet har förbättrat prestandan på ett beräkningstungt simuleringsprogram, skrivit att utnyttja endast en kärna, genom att hitta områden i koden som är lämpliga att parallellisera. Dessa områden har identifierats med Intel's Vtune Amplifier och utförts med OpenMP. Arbetet har också bytt ut en speciell beräkningsrutin som var särskilt krävande, speciellt för större problem. Slutresultatet är ett beräkningsprogram som ger samma resultat som det ursprungliga programmet men betydligt snabbare och med mindre datorresurser. Programmet kommer att användas i framtida forskningsprojekt. / Multi-core Shared Memory Parallel (SMP) systems became the norm ever since the performance trend prophesied by Moore’s law ended. Correctly utilizing the performance benefits these systems offer usually requires a conscious effort from the software developer’s side to enforce concurrency in the program. This is easy to disregard in small software projects and can lead to great amounts of unused potential parallelism in the produced code. This thesis attempted to improve the perfor- mance of a computationally demanding Thermal Elastohydrodynamic Lubrication (TEHL) simula- tion written in Fortran by finding such parallelism. The parallelization effort focused on the most demanding parts of the program identified using Intel’s VTune Amplifier and was implemented using OpenMP. The thesis also documents an algorithm change that led to further improvements in terms of execution time and scalability with respect to problem size. The end result is a faster, lighter and more efficient TEHL simulator that can further support the research in its domain.
82

Multi-core Architectures for Feed-forward Neural Networks

Hasan, Md. Raqibul 05 June 2014 (has links)
No description available.
83

High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture

Singh, Kunal 15 August 2018 (has links)
No description available.
84

Multi-Core Implementation of F-16 Flight Surface Control System Using GA Based Multiple Model Reference Adaptive Control Algorithm

Wang, Xiaoru 24 May 2011 (has links)
No description available.
85

Event List Organization and Management on the Nodes of a Many-Core Beowulf Cluster

Dickman, Thomas J. 21 October 2013 (has links)
No description available.
86

ADVANCEMENT OF OPERATING SYSTEM TO MANAGE CRITICAL RESOURCES IN INCREASINGLY COMPLEX COMPUTER ARCHITECTURE

Ding, Xiaoning 28 September 2010 (has links)
No description available.
87

Transforming and Optimizing Irregular Applications for Parallel Architectures

Zhang, Jing 12 February 2018 (has links)
Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically increase. On the other hand, many applications in well-established and emerging fields, such as bioinformatics, social network analysis, and graph processing, exhibit increasing irregularities in memory access, control flow, and communication patterns. While multiple techniques have been introduced into modern parallel architectures to tolerate these irregularities, many irregular applications still execute poorly on current parallel architectures, as their irregularities exceed the capabilities of these techniques. Therefore, it is critical to resolve irregularities in applications for parallel architectures. However, this is a very challenging task, as the irregularities are dynamic, and hence, unknown until runtime. To optimize irregular applications, many approaches have been proposed to improve data locality and reduce irregularities through computational and data transformations. However, there are two major drawbacks in these existing approaches that prevent them from achieving optimal performance. First, these approaches use local optimizations that exploit data locality and regularity locally within a loop or kernel. However, in many applications, there is hidden locality across loops or kernels. Second, these approaches use "one-size-fits-all'' methods that treat all irregular patterns equally and resolve them with a single method. However, many irregular applications have complex irregularities, which are mixtures of different types of irregularities and need differentiated optimizations. To overcome these two drawbacks, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation based on the characteristics of the application and architecture. By extending our adaptive data-reordering transformation on a single node, we propose a data-partitioning framework to resolve the load imbalance problem of irregular applications on multi-node systems. Unlike existing frameworks, which use "one-size-fits-all" methods to partition the input data by a single property, our framework provides a set of operations to transform the input data by multiple properties and generates the desired data-partitioning codes by composing these operations into a workflow. / Ph. D.
88

Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory Systems

Su, Chun-Yi 03 February 2015 (has links)
By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy. / Ph. D.
89

A Scalable Approach to Multi-core Prototyping

Newcomb, Jamie David 22 April 2008 (has links)
In recent years, multi-core processors and multi-processor networks have grown in popularity as a solution to the limits on increasing clock speed, rising power consumption, and the nanometer manufacturing processes. Multi-core processors and multi-processor networks are seen as the next step in the advancement of computational capabilities by way of concurrent processing. However, parallel software design is difficult due to the immaturity of scalable architectures and software development environments for multi-core hardware. How should processors effectively and quickly pass information, with as little overhead as possible? What kind of communication architecture is best suited for parallelism? How can large-scale architectures be quickly produced, verified and properly utilized by software? Using commercially available FPGA development boards, Xilinx tools and components, this thesis offers a light-weight solution to these questions for effective, low-overhead, low-latency multi-core communication and fast prototyping of multi-processor networks for scalable processor arrays. / Master of Science
90

Accelerating Hardware Simulation on Multi-cores

Nanjundappa, Mahesh 04 June 2010 (has links)
Electronic design automation (EDA) tools play a central role in bridging the productivity gap for designing complex hardware systems. However, with an increase in the size and complexity of today's design requirements, current methodologies and EDA tools are unable to effectively mitigate the further widening of productivity gap. It is estimated that testing and verification takes 2/3rd of the total development time of complex hardware systems. Functional simulation forms the main stay of testing and verification process and is the most widely used technique for testing and verification. Most of the simulation algorithms and their implementations are designed for uniprocessor systems that cannot easily leverage the parallelism in multi-core and GPU platforms. For example, logic simulation often uses levelized sequential algorithms, whereas the discrete-event simulation frameworks for Verilog, VHDL and SystemC employ concurrency in the form of multi-threading to given an illusion of the inherent parallelism present in circuits. However, the discrete-event model of computation requires a global notion of an event-queue, which makes improving its simulation performance via parallelization even more challenging. This work investigates automatic parallelization of simulation algorithms used to simulate hardware models. In particular, we focus on parallelizing the simulation of hardware designs described at the RTL using SystemC/HDL with examples to clearly describe the parallelization. Even though multi-cores and GPUs other parallelism, efficiently exploiting this parallelism with their programming models is not straightforward. To overcome this, we also focus our research on building intelligent translators to map simulation applications onto multi-cores and GPUs such that the complexity of the low-level programming models is hidden from the designers. / Master of Science

Page generated in 0.0349 seconds