Global ETD Search

81	Parallelization of a thermal elastohydrodynamic lubricated contacts simulation using OpenMP Alrheis, Ghassan January 2020 (has links) Datorer med flera kärnor som delar på ett gemensamt minne (SMP) har blivit normen sedan Moore's lag har slutat gälla. För att utnyttja den prestanda flera kärnor erbjuder så behöver mjukvaruingenjören skriva programmen så att de explicit utnyttjar flera kärnor. För mindre projekt är det lätt att detta bortses från vilket skapar program som endast utnyttjar en kärna. Detta gör att det i sådana fall finns stora vinningar genom att parallellisera koden. Det här examensarbetet har förbättrat prestandan på ett beräkningstungt simuleringsprogram, skrivit att utnyttja endast en kärna, genom att hitta områden i koden som är lämpliga att parallellisera. Dessa områden har identifierats med Intel's Vtune Amplifier och utförts med OpenMP. Arbetet har också bytt ut en speciell beräkningsrutin som var särskilt krävande, speciellt för större problem. Slutresultatet är ett beräkningsprogram som ger samma resultat som det ursprungliga programmet men betydligt snabbare och med mindre datorresurser. Programmet kommer att användas i framtida forskningsprojekt. / Multi-core Shared Memory Parallel (SMP) systems became the norm ever since the performance trend prophesied by Moore’s law ended. Correctly utilizing the performance benefits these systems offer usually requires a conscious effort from the software developer’s side to enforce concurrency in the program. This is easy to disregard in small software projects and can lead to great amounts of unused potential parallelism in the produced code. This thesis attempted to improve the perfor- mance of a computationally demanding Thermal Elastohydrodynamic Lubrication (TEHL) simula- tion written in Fortran by finding such parallelism. The parallelization effort focused on the most demanding parts of the program identified using Intel’s VTune Amplifier and was implemented using OpenMP. The thesis also documents an algorithm change that led to further improvements in terms of execution time and scalability with respect to problem size. The end result is a faster, lighter and more efficient TEHL simulator that can further support the research in its domain. Computer and Information Sciences Data- och informationsvetenskap
82	Multi-core Architectures for Feed-forward Neural Networks Hasan, Md. Raqibul 05 June 2014 (has links) No description available. Electrical Engineering
83	High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture Singh, Kunal 15 August 2018 (has links) No description available. Computer Science SpMM SpMDM Sparse Dense Matrix Multiplication Multi-core Many-Core
84	Multi-Core Implementation of F-16 Flight Surface Control System Using GA Based Multiple Model Reference Adaptive Control Algorithm Wang, Xiaoru 24 May 2011 (has links) No description available. Electrical Engineering Multi-core F-16 Flight Surface Control Model Reference Adaptive Control Genetic Algorithm
85	Event List Organization and Management on the Nodes of a Many-Core Beowulf Cluster Dickman, Thomas J. 21 October 2013 (has links) No description available. Computer Engineering Time Warp pending event lists multi-core threads Beowulf clusters parallel computing
86	ADVANCEMENT OF OPERATING SYSTEM TO MANAGE CRITICAL RESOURCES IN INCREASINGLY COMPLEX COMPUTER ARCHITECTURE Ding, Xiaoning 28 September 2010 (has links) No description available. Computer Science Operating System Caching Prefetching Multi-core Buffer Cache Scalability
87	Exploring the Boundaries of Operating System in the Era of Ultra-fast Storage Technologies Ramanathan, Madhava Krishnan 24 May 2023 (has links) The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. However, the OS storage stack has not been evolving fast enough to keep up with these new ultra-fast storage hardware. Hence, the latency due user-kernel context switch caused by system calls and hardware interrupts is no longer negligible as presumed in the era of slower high latency hard disks. Further, the OS storage stack is not designed with multi-core scalability in mind; so with CPU core count continuously increasing, the OS storage stack particularly the Virtual Filesystem (VFS) and filesystem layer are increasingly becoming a scalability bottleneck. Applications bypass the kernel (kernel-bypass storage stack) completely to eliminate the storage stack from becoming a performance and scalability bottleneck. But this comes at the cost of programmability, isolation, safety, and reliability. Moreover, scalability bottlenecks in the filesystem can not be addressed by simply moving the filesystem to the userspace. Overall, while designing a kernel-bypass storage stack looks obvious and promising there are several critical challenges in the aspects of programmability, performance, scalability, safety, and reliability that needs to be addressed to bypass the traditional OS storage stack. This thesis proposes a series of kernel-bypass storage techniques designed particularly for fast memory-centric storage. First, this thesis proposes a scalable persistent transactional memory (PTM) programming model to address the programmability and multi-core scalability challenges. Next, this thesis proposes techniques to make the PTM memory safe and fault tolerant. Further, this thesis also proposes a kernel-bypass programming framework to port legacy DRAM-based in-memory database applications to run on persistent memory-centric storage. Finally, this thesis explores an application-driven approach to address the CPU side and storage side bottlenecks in the deep learning model training by proposing a kernel-bypass programming framework to move to compute closer to the storage. Overall, the techniques proposed in this thesis will be a strong foundation for the applications to adopt and exploit the emerging ultra-fast storage technologies without being bottlenecked by the traditional OS storage stack. / Doctor of Philosophy / The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. The Operating System (OS) has been the gateway for the applications to access and manage the storage hardware. Unfortunately, the OS storage stack that is designed with slower storage technologies (e.g., hard disk drives) becomes a performance, scalability, and programmability bottleneck for the emerging ultra-fast storage technologies. This has created a large gap between the storage hardware advancements and the system software support for such emerging storage technologies. Consequently, applications are constrained by the limitations of the OS storage stack when they intend to explore these emerging storage technologies. In this thesis, we propose a series of novel kernel-bypass storage stack designs to address the performance, scalability, and programmability limitations of the conventional OS storage stack. The kernel-bypass storage stack proposed in this thesis is carefully designed with ultra-fast modern storage hardware in mind. Application developers can leverage the kernel-bypass techniques proposed in this thesis to develop new applications or port the legacy applications to use the emerging ultra-fast storage technologies without being constrained by the limitations of the conventional OS storage stack. Kernel-bypass storage stack Non-volatile Memory Concurrency Multi-core Scalability Operating System Computational Storage
88	Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory Systems Su, Chun-Yi 03 February 2015 (has links) By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy. / Ph. D. Thread Management Multi-core Processors Performance Modeling and Analysis Power-Aware Computing Heterogeneous Memory Data Management
89	A Scalable Approach to Multi-core Prototyping Newcomb, Jamie David 22 April 2008 (has links) In recent years, multi-core processors and multi-processor networks have grown in popularity as a solution to the limits on increasing clock speed, rising power consumption, and the nanometer manufacturing processes. Multi-core processors and multi-processor networks are seen as the next step in the advancement of computational capabilities by way of concurrent processing. However, parallel software design is difficult due to the immaturity of scalable architectures and software development environments for multi-core hardware. How should processors effectively and quickly pass information, with as little overhead as possible? What kind of communication architecture is best suited for parallelism? How can large-scale architectures be quickly produced, verified and properly utilized by software? Using commercially available FPGA development boards, Xilinx tools and components, this thesis offers a light-weight solution to these questions for effective, low-overhead, low-latency multi-core communication and fast prototyping of multi-processor networks for scalable processor arrays. / Master of Science multi-gigabit Aurora Xilinx Field programmable gate arrays multi-processor array multi-core
90	Accelerating Hardware Simulation on Multi-cores Nanjundappa, Mahesh 04 June 2010 (has links) Electronic design automation (EDA) tools play a central role in bridging the productivity gap for designing complex hardware systems. However, with an increase in the size and complexity of today's design requirements, current methodologies and EDA tools are unable to effectively mitigate the further widening of productivity gap. It is estimated that testing and verification takes 2/3rd of the total development time of complex hardware systems. Functional simulation forms the main stay of testing and verification process and is the most widely used technique for testing and verification. Most of the simulation algorithms and their implementations are designed for uniprocessor systems that cannot easily leverage the parallelism in multi-core and GPU platforms. For example, logic simulation often uses levelized sequential algorithms, whereas the discrete-event simulation frameworks for Verilog, VHDL and SystemC employ concurrency in the form of multi-threading to given an illusion of the inherent parallelism present in circuits. However, the discrete-event model of computation requires a global notion of an event-queue, which makes improving its simulation performance via parallelization even more challenging. This work investigates automatic parallelization of simulation algorithms used to simulate hardware models. In particular, we focus on parallelizing the simulation of hardware designs described at the RTL using SystemC/HDL with examples to clearly describe the parallelization. Even though multi-cores and GPUs other parallelism, efficiently exploiting this parallelism with their programming models is not straightforward. To overcome this, we also focus our research on building intelligent translators to map simulation applications onto multi-cores and GPUs such that the complexity of the low-level programming models is hidden from the designers. / Master of Science POSIX threads Discrete Event Simulation (DES) CUDA SystemC Simulation GPGPU Multi-core simulation Threading Building Blocks

Search results