• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 95
  • 13
  • 9
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 177
  • 177
  • 54
  • 36
  • 35
  • 33
  • 31
  • 25
  • 25
  • 22
  • 22
  • 20
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Measuring and Improving the Potential Parallelism of Sequential Java Programs

Van Valkenburgh, Kevin 25 September 2009 (has links)
No description available.
42

Evaluation of High Performance Financial Messaging on Modern Multi-core Systems

Marsh, Gregory J. 25 August 2010 (has links)
No description available.
43

GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading

Singh, Ajeet 14 August 2009 (has links)
Hardware-acceleration techniques continue to be used to boost the performance of scientific codes. To do so, software developers identify portions of these codes that are amenable for offloading and map them to hardware accelerators. However, offloading such tasks to specialized hardware accelerators is non-trivial. Furthermore, these accelerators can add significant cost to a computing system. Consequently, this thesis proposes a framework called GePSeA (General Purpose Software Acceleration Framework), which uses a small fraction of the computational power on multi-core architectures to offload complex application-specific tasks. Specifically, GePSeA provides a lightweight process that acts as a helper agent to the application by executing application-specific tasks asynchronously and efficiently. GePSeA is not meant to replace hardware accelerators but to extend them. GePSeA provide several utilities called core components that offload tasks on to the core or to the special-purpose hardware when available in a way that is transparent to the application. Examples of such core components include reliable communication service, distributed lock management, global memory management, dynamic load distribution and network protocol processing. We then apply the GePSeA framework to two applications, namely mpiBLAST, an open-source computational biology application and Reliable Blast UDP (RBUDP) based file transfer application. We observe significant speed-up for both applications. / Master of Science
44

An FPGA-based Target Acquisition System

Marschner, Alexander R. 09 January 2008 (has links)
This work describes the development of an image processing algorithm, the implementation of that algorithm as both a strictly hardware design and as a multi-core software design, and the side-by-side comparison of the two implementations. In the course of creating the multi-core software design, several improvements are made to the OpenFire soft core micro-processor that is used to create the multi-core network. The hardware and multi-core software implementations of the image processing algorithm are compared side-by-side in an FPGA-based test platform. Results show that although the strictly hardware implementation leads in terms of lower power consumption and very low area consumption, modification of and programming for the multi-core software implementation is simpler to perform. / Master of Science
45

Knowledgebase basiertes Scheduling für hierarchisch asynchrone Multi-Core Scheduler im Systembereich Automotive und Avionik

Hanti, Thomas 24 October 2019 (has links)
In Automobilen Elektrik /Elektronik (E/E) Systemen sowie in der Flugzeugavionik gibt es eine Vielzahl an Funktionen, die den Fahrzeugführer/Piloten bei seinen Aufgaben unterstützen können. Die Anzahl und Verbreitung der Funktionen nahm in den letzten Jahren sehr stark zu und ein Ende dieses Trends ist nicht in Sicht. Neue Technologien, wie komplett autonomes Fahren bei Fahrzeugen sowie eine stetige Erhöhung des Autonomie Levels von Unmanned Aerial Vehicles (UAVs) in der Avionik bringen das heutige E/E Konzept an die Grenzen des Vertretbaren. Das klassische, statisch konfigurierte E/E Konzept steht somit vor einer neuen Herausforderung, nämlich eine Vielzahl von neuen, zusätzlichen Funktion zu integrieren und dabei die gleiche Funktionalität, Determinismus und Zuverlässigkeit an den Tag zu legen, wie in der Vergangenheit. Mit dem klassischen statischen Konzept ist diese Anforderung nur zu gewährleisten, wenn für jede neue Funktion ein eigenes Steuergerät in das System Automobil/Flugzeug integriert wird. Da dieses Konzept aber hohe Kosten und erweiterten Bauraum nach sich zieht, ist es nicht mehr vertretbar und es wird nach neuen Optimierungsansätzen gesucht. Ein Optimierungsansatz ist der Übergang vom statischen zum semi-statischen Scheduling unter Berücksichtigung der Multi-Core Prozessortechnologie. Dieser Ansatz wird im Hierarchical Asynchronous Multi-Core System (HAMS) mit der statisch generierten Knowledgebase (KB) für Multi-Core Steuergeräte beschrieben. Diese Dissertation zeigt die Konzepte und Ideen hinter HAMS auf und erörtert wie diese schon in den heutigen Domänen umgesetzt werden können. Dabei werden Funktionen in unterschiedliche Phasen eingeteilt, die sich unter anderem an den aktuellen Fahr-/Flugzustand anlehnen. Des Weiteren werden Funktionen logisch miteinander verknüpft, um konträre Aktivierungszustände zu finden. Aus diesen Eingaben und der Konfiguration des Multi-Core Systems wird das Konzept für die HAMS Knowledgebase entwickelt. Mit der Knowledgebase wird es nun ermöglicht, dass sich das System zur Laufzeit semi-statisch rekonfigurieren kann, ohne dabei den Determinismus von statischen Steuergeräten aus den Domänen Automotive und Avionik zu verletzen. In einer abschließenden Evaluation wird das Konzept an einem realen Beispiel umgesetzt und so die Vorteile und Grenzen aufgezeigt. / In the current automotive electric /electronic (E/E) systems as well as in aircraft avionics a magnitude of functions does exist to assist the driver/pilot fulfilling its tasks. The quantity and distribution of assistant functions has continuously increased over the last few years and an end of this trend is not in sight. New emerging technologies like complete autonomous driving and an increase in the autonomy levels of Unmanned Areal Vehicles (UAVs) push the current E/E architecture towards its limits. The classic, statically configured E/E concept is facing new challenges like integrating a variety of new and additional functions and still displaying the same functionality, determinism and reliability as it has done in the past. Meeting this challenge with the classic, statically configured E/E concept is only possible, by integrating a new electronic control module (ECM) for each of this functions. This approach comes along with high costs for electronic control modules and a need for more installation space. Thus this configuration is not reasonable any more and a search for optimized solutions is ongoing. One strategy is to dissolve the static configurations and evolve into a semi-static configured system with Multi-Core Technology. This approach is described in the Hierarchical Asynchronous Multi-Core System (HAMS) and its statically generated Knowledgebase (KB) for Multi-Core electronic control modules. This dissertation illustrates the concept and ideas behind HAMS and demonstrates how it can already be implemented in today's systems. To do so functions are allocated in distinct phases which reflect the current driving or flying situation. Additionally functions are logically linked together to highlight contrary activation states. Depending on this input and the configuration of the Multi-Core ECM the Knowledgebase concept is developed. Having this Knowledgebase at hand it is now possible to reconfigure a system during run-time without violating the deterministic behavior of ECMs of the automotive and avionic domain. In the final evaluation the concept is realized based on a real example and the advantages as well as the limits are demonstrated.
46

Application des architectures many core dans les systèmes embarqués temps réel / Implementing a Real-time Avionic application on a Many-core Processor

Lo, Moustapha 22 February 2019 (has links)
Les processeurs mono-coeurs traditionnels ne sont plus suffisants pour répondre aux besoins croissants en performance des fonctions avioniques. Les processeurs multi/many-coeurs ont emergé ces dernières années afin de pouvoir intégrer plusieurs fonctions et de bénéficier de la puissance par Watt disponible grâce aux partages de ressources. En revanche, tous les processeurs multi/many-coeurs ne répondent pas forcément aux besoins des fonctions avioniques. Nous préférons avoir plus de déterminisme que de puissance de calcul car la certification de ces processeurs passe par la maîtrise du déterminisme. L’objectif de cette thèse est d’évaluer le processeur many-coeur (MPPA-256) de Kalray dans un contexte industriel aéronautique. Nous avons choisi la fonction de maintenance HMS (Health Monitoring System) qui a un besoin important en bande passante et un besoin de temps de réponse borné.Par ailleurs, cette fonction est également dotée de propriétés de parallélisme car elle traite des données de vibration venant de capteurs qui sont fonctionnellement indépendants, et par conséquent leur traitement peut être parallélisé sur plusieurs coeurs. La particularité de cette étude est qu’elle s’intéresse au déploiement d’une fonction existante séquentielle sur une architecture many-coeurs en partant de l’acquisition des données jusqu’aux calculs des indicateurs de santé avec un fort accent sur le fluxd’entrées/sorties des données. Nos travaux de recherche ont conduit à 5 contributions:• Transformation des algorithmes existants en algorithmes incrémentaux capables de traiter les données au fur et mesure qu’elles arrivent des capteurs.• Gestion du flux d’entrées des échantillons de vibrations jusqu’aux calculs des indicateurs de santé,la disponibilité des données dans le cluster interne, le moment où elles sont consommées et enfinl’estimation de la charge de calcul.• Mesures de temps pas très intrusives directement sur le MPPA-256 en ajoutant des timestamps dans le flow de données.• Architecture logicielle qui respecte les contraintes temps-réel même dans les pires cas. Elle estbasée sur une pipeline à 3 étages.• Illustration des limites de la fonction existante: nos expériences ont montré que les paramètres contextuels de l’hélicoptère tels que la vitesse du rotor doivent être corrélés aux indicateurs de santé pour réduire les fausses alertes. / Traditional single-cores are no longer sufficient to meet the growing needs of performance in avionics domain. Multi-core and many-core processors have emerged in the recent years in order to integrate several functions thanks to the resource sharing. In contrast, all multi-core and many-core processorsdo not necessarily satisfy the avionic constraints. We prefer to have more determinism than computing power because the certification of such processors depends on mastering the determinism.The aim of this thesis is to evaluate the many-core processor (MPPA-256) from Kalray in avionic context. We choose the maintenance function HMS (Health Monitoring System) which requires an important bandwidth and a response time guarantee. In addition, this function has also parallelism properties. It computes data from sensors that are functionally independent and, therefore their processing can be parallelized in several cores. This study focuses on deploying the existing sequential HMS on a many-core processor from the data acquisition to the computation of the health indicators with a strongemphasis on the input flow.Our research led to five main contributions:• Transformation of the global existing algorithms into a real-time ones which can process data as soon as they are available.• Management of the input flow of vibration samples from the sensors to the computation of the health indicators, the availability of raw vibration data in the internal cluster, when they are consumed and finally the workload estimation.• Implementing a lightweight Timing measurements directly on the MPPA-256 by adding timestamps in the data flow.• Software architecture that respects real-time constraints even in the worst cases. The software architecture is based on three pipeline stages.• Illustration of the limits of the existing function: our experiments have shown that the contextual parameters of the helicopter such as the rotor speed must be correlated with the health indicators to reduce false alarms.
47

Network Coding Strategies for Multi-Core Architectures

Wunderlich, Simon 09 November 2021 (has links)
Random Linear Network Coding (RLNC) is a new coding technique which can provide higher reliability and efficiency in wireless networks. Applying it on the fifth generation of cellular networks (5G) is now possible due to the softwarization approach of the 5G architecture. However, the complex computations necessary to encode and decode symbols in RLNC are limiting the achievable throughput and energy efficiency on todays mobile computers. Most computers, phones, TVs, or network equipment nowadays come with multiple, possibly heteregoneous (i.e. slow low-power and fast high-power) processing cores. Previous multi core research focused on RLNC optimization for big data chunks which are useful for storage, however network operations tend to use smaller packets (e.g. Ethernet MTUs of 1500 byte) and code over smaller generations of packets. Also latency is an increasingly important performance aspect in the upcoming Tactile Internet, however latency has received only small attention in RLNC optimization so far. The primary research question of my thesis is therefore how to optimize throughput and delay of RLNC on todays most common computing architectures. By fully leveraging the resources of todays consumer electronics hardware, RLNC can be practically adopted in todays wireless systems with just a software update and improve the network efficiency and user experience. I am generally following a constructive approach by introducing algorithms and methods, and then demonstrating their performance by benchmarking actual implementations on common consumer electronics hardware against the state of the art. Inspired by linear algebra parallelization methods used in high performance computers (HPC), I’ve developed a RLNC encoder/decoder which schedules matrix block tasks for multiple cores using a directed acyclic graph (DAG) based on data dependencies between the tasks. A non-progressive variant works with pre-computed DAG schedules which can be re-used to push throughput even higher. I’ve also developed a progressive variant which can be used to minimize latency. Both variants are achieving higher throughput performance than the fastest currently known RLNC decoder, with up to three times the throughput for small generation size and short packets. Unlike previous approaches, they can utilize all cores also on heterogeneous architectures. The progressive decoder greatly reduces latency while allowing to keep a high throughput, reducing the latency up to a factor ten compared to the non-progressive variant. Progressive decoders need special low-delay codes to release packets early instead of waiting for more dependent packets from the network. I'm introducing Caterpillar RLNC (CRLNC), a sliding window code using a fixed sliding window over a stream of packets. CRLNC can be implemented on top of a conventional generation based RLNC decoder. CRLNC combines the resilience against packet loss and fixed resource boundaries (number of computations and memory) of conventional generation based RLNC decoders with the low delay of an infinite sliding window decoder. The DAG RLNC coders and the Caterpillar RLNC method together provide a powerful toolset to practically enable RLNC in 5G or other wireless systems while achieving high throughput and low delay as required by upcoming immersive and machine control applications.:1 Introduction 2 Background and Related Work 2.1 Network Delay 2.2 Network Coding Basics 2.3 RLNC Optimization for Throughput 2.3.1 SIMD Optimization 2.3.2 Block Operation Increasing Cache Efficieny with Subblocking 2.3.3 Optimizing Matrix Computations 2.4 Progressive RLNC Decoders 2.5 Sliding Window RLNC 3 Optimized RLNC Parallelization with Scheduling Graphs 3.1 Offline Directed Acyclic Graph (DAG) Scheduling 3.1.1 Blocked LU Matrix Inversion 3.1.2 Scheduling on a DAG 3.1.3 Phase 1: DAG Recording 3.1.4 Phase 2: DAG Schedule Execution 3.1.5 DAG Scheduling vs. Conventional Multithreading 3.1.6 Task Size Considerations 3.1.7 Scheduling Strategies First Task Strategy Task Dependency Strategy Data Locality Strategy Combined Task Dependency and Data Locality Strategy 3.2 Online DAG Scheduling 3.2.1 Online DAG Operation Forward Elimination Backward Substitution Row Swapping 3.2.2 Scheduling on an Online DAG Data Dependency Traversal Online DAG Creating and Task Delegation 3.2.3 Optimizations Stripe Optimization Full Rows Optimization 3.3 Evaluation Setup 3.3.1 Multicore Boards ODROID-XU3 ODROID-XU4 ODROID-XU+E Cubieboard 4 Raspberry Pi 2 Model B 3.3.2 Evaluation Parameters Parameter Settings Matrix Types 3.3.3 Performance Metrics Throughput Delay Energy 3.3.4 Evaluation Methodology 3.4 Evaluation Results 3.4.1 Block Size b 3.4.2 Comparison of Scheduling Strategies 3.4.3 Single Thread Throughput 3.4.4 Multi Thread Throughput 3.4.5 Comparison of Multicore Boards 3.4.6 Energy Consumption 3.4.7 Online DAG vs. Offline DAG Throughput 3.4.8 DAG vs Progressive CD 3.4.9 Delay 3.4.10 Trading Throughput with Delay 3.4.11 Sparse Coefficient Matrices in Online DAG 4 Sliding Window - Caterpillar RLNC (CRLNC) 4.1 CRLNC Overview 4.2 CRLNC Packet Format And Encoding 4.3 CRLNC Decoding 4.3.1 Shifting the Row Echelon Form Same sequence number: s_p = s_d New Packet: s_p > s_d Old Packet: s_p < s_d 4.3.2 Larger Decoding Windows: w_d > w_e 4.3.3 CRLNC Decoding Storage and Computing Requirements 4.4 CRLNC Evaluation 4.4.1 Performance Metrics Packet Loss Probability In-Order Packet Delay 4.4.2 Evaluation Methodology 4.5 Evaluation Results 4.5.1 Packet Loss Probability 4.5.2 In-Order Packet Delay 4.5.3 Tradeoffs for Larger Decoding Windows 4.5.4 Computation Complexity 5 Summary and Conclusion List of Publications Bibliography
48

Interaction of Hardware Transactional Memory and Microprocessor Microarchitecture

Diestelhorst, Stephan 10 July 2019 (has links)
Microprocessors have experienced a significant stall in single-thread performance since about 2004. Instead of significant annual performance improvements for a single core, it is easier to increase performance by providing multiple, independent cores that the application programmer has to coordinate. Exposing concurrency to the applications requires mechanisms to control it. Hardware Transactional Memory (HTM) is an abstraction that provides optimistic, fine-grained concurrency control with a simple application interface, and has received significant research attentions fro 2004 - 2010, with initial publications in the mid-90s. The central thesis of my work is that detailed analysis and ISA modelling of HTM is necessary to understand actual implementation and usage challenges, and get more realistic results. Instead of overly complicating the design of HTM with features that would be extremely hard to implement right in a more detailed microarchitecture and ISA proposal, I suggest that getting a base-line HTM specification and micro-architecture right is a challenge in itself. Yet, despite the complexity, there are interesting implementation options and extensions that can provide benefits to applications using HTM–but they are not on the trajectory taken by most papers published between 2004 and 2010.
49

Performance scalability of n-tier application in virtualized cloud environments: Two case studies in vertical and horizontal scaling

Park, Junhee 27 May 2016 (has links)
The prevalence of multi-core processors with recent advancement in virtualization technologies has enabled horizontal and vertical scaling within a physical node achieving economical sharing of computing infrastructures as computing clouds. Through hardware virtualization, consolidated servers each with specific number of core allotment run on the same physical node in dedicated Virtual Machines (VMs) to increase overall node utilization which increases profit by reducing operational costs. Unfortunately, despite the conceptual simplicity of vertical and horizontal scaling in virtualized cloud environments, leveraging the full potential of this technology has presented significant scalability challenges in practice. One of the fundamental problems is the performance unpredictability in virtualized cloud environments (ranked fifth in the top 10 obstacles for growth of cloud computing). In this dissertation, we present two case studies in vertical and horizontal scaling to this challenging problem. For the first case study, we describe concrete experimental evidence that shows important source of performance variations: mapping of virtual CPU to physical cores. We then conduct an experimental comparative study of three major hypervisors (i.e., VMware, KVM, Xen) with regard to their support of n-tier applications running on multi-core processor. For the second case study, we present empirical study that shows memory thrashing caused by interference among consolidated VMs is a significant source of performance interference that hampers horizontal scalability of an n-tier application performance. We then execute transient event analyses of fine-grained experiment data that link very short bottlenecks with memory thrashing to the very long response time (VLRT) requests. Furthermore we provide three practical techniques such as VM migration, memory reallocation, soft resource allocation and show that they can mitigate the effects of performance interference among consolidate VMs.
50

Towards Predictable Real-Time Performance on Multi-Core Platforms

Kim, Hyoseung 01 June 2016 (has links)
Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable real-time performance: the timing correctness of the system should be analyzable at design time with a quantitative metric and guaranteed at runtime with high assurance. This requirement of predictability is particularly important for safety-critical domains such as automobiles, aerospace, defense, manufacturing and medical devices. The work in this dissertation focuses on the challenges arising from the use of modern multi-core platforms in CPS. Even as of today, multi-core platforms are rarely used in safety-critical applications primarily due to the temporal interference caused by contention on various resources shared among processor cores, such as caches, memory buses, and I/O devices. Such interference is hard to predict and can significantly increase task execution time, e.g., up to 12 commodity quad-core platforms. To address the problem of ensuring timing predictability on multi-core platforms, we develop novel analytical and systems techniques in this dissertation. Our proposed techniques theoretically bound temporal interference that tasks may suffer from when accessing shared resources. Our techniques also involve software primitives and algorithms for real-time operating systems and hypervisors, which significantly reduce the degree of the temporal interference. Specifically, we tackle the issues of cache and memory contention, locking and synchronization, interrupt handling, and access control for computational accelerators such as general-purpose graphics processing units (GPGPUs), all of which are crucial to achieving predictable real-time performance on a modern multi-core platform. Our solutions are readily applicable to commodity multi-core platforms, and can be used not only for developing new systems but also migrating existing applications from single-core to multi-core platforms.

Page generated in 0.0618 seconds