• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 96
  • 13
  • 9
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 178
  • 178
  • 54
  • 36
  • 35
  • 33
  • 31
  • 25
  • 25
  • 22
  • 22
  • 20
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Approches de parallélisation automatique et d'ordonnancement pour la co-simulation de modèles numériques sur processeurs multi-coeurs / Automatic parallelization and scheduling approaches for co-simulation of numerical models on multi-core processors

Saidi, Salah Eddine 18 April 2018 (has links)
Lors de la conception de systèmes cyber-physiques, des modèles issus de différents environnements de modélisation doivent être intégrés afin de simuler l'ensemble du système et estimer ses performances. Si certaines parties du système sont disponibles, il est possible de connecter ces parties à la simulation dans une approche Hardware-in-the-Loop (HiL). La simulation doit alors être effectuée en temps réel où les modèles réagissent périodiquement aux composants réels. En utilisant des modèles complexes, il devient difficile d'assurer une exécution rapide ou en temps réel sans utiliser des architectures multiprocesseurs. FMI (Functional Mocked-up Interface), un standard pour l'échange de modèles et la co-simulation, offre de nouvelles possibilités d'exécution multi-cœurs des modèles. L'un des objectifs de cette thèse est de permettre l'extraction du parallélisme potentiel dans une co-simulation multi-rate. Nous nous appuyons sur l'approche RCOSIM qui permet la parallélisation de modèles FMI. Des améliorations sont proposées dans le but de surmonter les limitations de RCOSIM. Nous proposons de nouveaux algorithmes pour permettre la prise en charge de modèles multi-rate. Les améliorations permettent de gérer des contraintes spécifiques telles que l'exclusion mutuelle et les contraintes temps réel. Nous proposons des algorithmes pour l'ordonnancement des co-simulations, en tenant compte de différentes contraintes. Ces algorithmes visent à accélérer la co-simulation ou assurer son exécution temps réel dans une approche HiL. Les solutions proposées sont testées sur des co-simulations synthétiques et validées sur un cas industriel. / When designing cyber-physical systems, engineers have to integrate models from different modeling environments in order to simulate the whole system and estimate its global performances. If some parts of the system are available, it is possible to connect these parts to the simulation in a Hardware-in-the-Loop (HiL) approach. In this case, the simulation has to be performed in real-time where models periodically react to the real components. The increase of requirements on the simulation accuracy and its validity domain requires more complex models. Using such models, it becomes hard to ensure fast or real-time execution without using multiprocessor architectures. FMI (Functional Mocked-up Interface), a standard for model exchange and co-simulation, offers new opportunities for multi-core execution of models. One goal of this thesis is the extraction of potential parallelism in a set of interconnected multi-rate models. We build on the RCOSIM approach which allows the parallelization of FMI models. In the first part of the thesis, improvements have been proposed to overcome the limitations of RCOSIM. We propose new algorithms in order to allow handling multi-rate models and schedule them on multi-core processors. The improvements allow handling specific constraints such as mutual exclusion and real-time constraints. Second, we propose algorithms for the allocation and scheduling of co-simulations, taking into account different constraints. These algorithms aim at accelerating the execution of the co-simulation or ensuring its real-time execution in a HiL approach. The proposed solutions have been tested on synthetic co-simulations and validated against an industrial use case.
82

Parallelization of a thermal elastohydrodynamic lubricated contacts simulation using OpenMP

Alrheis, Ghassan January 2020 (has links)
Datorer med flera kärnor som delar på ett gemensamt minne (SMP) har blivit normen sedan Moore's lag har slutat gälla. För att utnyttja den prestanda flera kärnor erbjuder så behöver mjukvaruingenjören skriva programmen så att de explicit utnyttjar flera kärnor. För mindre projekt är det lätt att detta bortses från vilket skapar program som endast utnyttjar en kärna. Detta gör att det i sådana fall finns stora vinningar genom att parallellisera koden. Det här examensarbetet har förbättrat prestandan på ett beräkningstungt simuleringsprogram, skrivit att utnyttja endast en kärna, genom att hitta områden i koden som är lämpliga att parallellisera. Dessa områden har identifierats med Intel's Vtune Amplifier och utförts med OpenMP. Arbetet har också bytt ut en speciell beräkningsrutin som var särskilt krävande, speciellt för större problem. Slutresultatet är ett beräkningsprogram som ger samma resultat som det ursprungliga programmet men betydligt snabbare och med mindre datorresurser. Programmet kommer att användas i framtida forskningsprojekt. / Multi-core Shared Memory Parallel (SMP) systems became the norm ever since the performance trend prophesied by Moore’s law ended. Correctly utilizing the performance benefits these systems offer usually requires a conscious effort from the software developer’s side to enforce concurrency in the program. This is easy to disregard in small software projects and can lead to great amounts of unused potential parallelism in the produced code. This thesis attempted to improve the perfor- mance of a computationally demanding Thermal Elastohydrodynamic Lubrication (TEHL) simula- tion written in Fortran by finding such parallelism. The parallelization effort focused on the most demanding parts of the program identified using Intel’s VTune Amplifier and was implemented using OpenMP. The thesis also documents an algorithm change that led to further improvements in terms of execution time and scalability with respect to problem size. The end result is a faster, lighter and more efficient TEHL simulator that can further support the research in its domain.
83

Multi-core Architectures for Feed-forward Neural Networks

Hasan, Md. Raqibul 05 June 2014 (has links)
No description available.
84

High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture

Singh, Kunal 15 August 2018 (has links)
No description available.
85

Multi-Core Implementation of F-16 Flight Surface Control System Using GA Based Multiple Model Reference Adaptive Control Algorithm

Wang, Xiaoru 24 May 2011 (has links)
No description available.
86

Event List Organization and Management on the Nodes of a Many-Core Beowulf Cluster

Dickman, Thomas J. 21 October 2013 (has links)
No description available.
87

ADVANCEMENT OF OPERATING SYSTEM TO MANAGE CRITICAL RESOURCES IN INCREASINGLY COMPLEX COMPUTER ARCHITECTURE

Ding, Xiaoning 28 September 2010 (has links)
No description available.
88

Exploring the Boundaries of Operating System in the Era of Ultra-fast Storage Technologies

Ramanathan, Madhava Krishnan 24 May 2023 (has links)
The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. However, the OS storage stack has not been evolving fast enough to keep up with these new ultra-fast storage hardware. Hence, the latency due user-kernel context switch caused by system calls and hardware interrupts is no longer negligible as presumed in the era of slower high latency hard disks. Further, the OS storage stack is not designed with multi-core scalability in mind; so with CPU core count continuously increasing, the OS storage stack particularly the Virtual Filesystem (VFS) and filesystem layer are increasingly becoming a scalability bottleneck. Applications bypass the kernel (kernel-bypass storage stack) completely to eliminate the storage stack from becoming a performance and scalability bottleneck. But this comes at the cost of programmability, isolation, safety, and reliability. Moreover, scalability bottlenecks in the filesystem can not be addressed by simply moving the filesystem to the userspace. Overall, while designing a kernel-bypass storage stack looks obvious and promising there are several critical challenges in the aspects of programmability, performance, scalability, safety, and reliability that needs to be addressed to bypass the traditional OS storage stack. This thesis proposes a series of kernel-bypass storage techniques designed particularly for fast memory-centric storage. First, this thesis proposes a scalable persistent transactional memory (PTM) programming model to address the programmability and multi-core scalability challenges. Next, this thesis proposes techniques to make the PTM memory safe and fault tolerant. Further, this thesis also proposes a kernel-bypass programming framework to port legacy DRAM-based in-memory database applications to run on persistent memory-centric storage. Finally, this thesis explores an application-driven approach to address the CPU side and storage side bottlenecks in the deep learning model training by proposing a kernel-bypass programming framework to move to compute closer to the storage. Overall, the techniques proposed in this thesis will be a strong foundation for the applications to adopt and exploit the emerging ultra-fast storage technologies without being bottlenecked by the traditional OS storage stack. / Doctor of Philosophy / The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. The Operating System (OS) has been the gateway for the applications to access and manage the storage hardware. Unfortunately, the OS storage stack that is designed with slower storage technologies (e.g., hard disk drives) becomes a performance, scalability, and programmability bottleneck for the emerging ultra-fast storage technologies. This has created a large gap between the storage hardware advancements and the system software support for such emerging storage technologies. Consequently, applications are constrained by the limitations of the OS storage stack when they intend to explore these emerging storage technologies. In this thesis, we propose a series of novel kernel-bypass storage stack designs to address the performance, scalability, and programmability limitations of the conventional OS storage stack. The kernel-bypass storage stack proposed in this thesis is carefully designed with ultra-fast modern storage hardware in mind. Application developers can leverage the kernel-bypass techniques proposed in this thesis to develop new applications or port the legacy applications to use the emerging ultra-fast storage technologies without being constrained by the limitations of the conventional OS storage stack.
89

Transforming and Optimizing Irregular Applications for Parallel Architectures

Zhang, Jing 12 February 2018 (has links)
Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically increase. On the other hand, many applications in well-established and emerging fields, such as bioinformatics, social network analysis, and graph processing, exhibit increasing irregularities in memory access, control flow, and communication patterns. While multiple techniques have been introduced into modern parallel architectures to tolerate these irregularities, many irregular applications still execute poorly on current parallel architectures, as their irregularities exceed the capabilities of these techniques. Therefore, it is critical to resolve irregularities in applications for parallel architectures. However, this is a very challenging task, as the irregularities are dynamic, and hence, unknown until runtime. To optimize irregular applications, many approaches have been proposed to improve data locality and reduce irregularities through computational and data transformations. However, there are two major drawbacks in these existing approaches that prevent them from achieving optimal performance. First, these approaches use local optimizations that exploit data locality and regularity locally within a loop or kernel. However, in many applications, there is hidden locality across loops or kernels. Second, these approaches use "one-size-fits-all'' methods that treat all irregular patterns equally and resolve them with a single method. However, many irregular applications have complex irregularities, which are mixtures of different types of irregularities and need differentiated optimizations. To overcome these two drawbacks, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation based on the characteristics of the application and architecture. By extending our adaptive data-reordering transformation on a single node, we propose a data-partitioning framework to resolve the load imbalance problem of irregular applications on multi-node systems. Unlike existing frameworks, which use "one-size-fits-all" methods to partition the input data by a single property, our framework provides a set of operations to transform the input data by multiple properties and generates the desired data-partitioning codes by composing these operations into a workflow. / Ph. D.
90

Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory Systems

Su, Chun-Yi 03 February 2015 (has links)
By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy. / Ph. D.

Page generated in 0.0786 seconds