• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 95
  • 13
  • 9
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 177
  • 177
  • 54
  • 36
  • 35
  • 33
  • 31
  • 25
  • 25
  • 22
  • 22
  • 20
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Exploiting parallelism of irregular problems and performance evaluation on heterogeneous multi-core architectures

Xu, Meilian 04 October 2012 (has links)
In this thesis, we design, develop and implement parallel algorithms for irregular problems on heterogeneous multi-core architectures. Irregular problems exhibit random and unpredictable memory access patterns, poor spatial locality and input dependent control flow. Heterogeneous multi-core processors vary in: clock frequency, power dissipation, programming model (MIMD vs. SIMD), memory design and computing units, scalar versus vector units. The heterogeneity of the processors makes designing efficient parallel algorithms for irregular problems on heterogeneous multicore processors challenging. Techniques of mapping tasks or data on traditional parallel computers can not be used as is on heterogeneous multi-core processors due to the varying hardware. In an attempt to understand the efficiency of futuristic heterogeneous multi-core architectures on applications we study several computation and bandwidth oriented irregular problems on one heterogeneous multi-core architecture, the IBM Cell Broadband Engine (Cell BE). The Cell BE consists of a general processor and eight specialized processors and addresses vector/data-level parallelism and instruction-level parallelism simultaneously. Through these studies on the Cell BE, we provide some discussions and insight on the performance of the applications on heterogeneous multi-core architectures. Verifying these experimental results require some performance modeling. Due to the diversity of heterogeneous multi-core architectures, theoretical performance models used for homogeneous multi-core architectures do not provide accurate results. Therefore, in this thesis we propose an analytical performance prediction model that considers the multitude architectural features of heterogeneous multi-cores (such as DMA transfers, number of instructions and operations, the processor frequency and DMA bandwidth). We show that the execution time from our prediction model is comparable to the execution time of the experimental results for a complex medical imaging application.
132

New Techniques for Building Timing-Predictable Embedded Systems

Guan, Nan January 2013 (has links)
Embedded systems are becoming ubiquitous in our daily life. Due to close interaction with physical world, embedded systems are typically subject to timing constraints. At design time, it must be ensured that the run-time behaviors of such systems satisfy the pre-specified timing constraints under any circumstance. In this thesis, we develop techniques to address the timing analysis problems brought by the increasing complexity of underlying hardware and software on different levels of abstraction in embedded systems design. On the program level, we develop quantitative analysis techniques to predict the cache hit/miss behaviors for tight WCET estimation, and study two commonly used replacement policies, MRU and FIFO, which cannot be analyzed adequately using the state-of-the-art qualitative cache analysis method. Our quantitative approach greatly improves the precision of WCET estimation and discloses interesting predictability properties of these replacement policies, which are concealed in the qualitative analysis framework. On the component level, we address the challenges raised by multi-core computing. Several fundamental problems in multiprocessor scheduling are investigated. In global scheduling, we propose an analysis method to rule out a great part of impossible system behaviors for better analysis precision, and establish conditions to guarantee the bounded responsiveness of computing tasks. In partitioned scheduling, we close a long standing open problem to generalize the famous Liu and Layland's utilization bound in uniprocessor real-time scheduling to multiprocessor systems. We also propose to use cache partitioning for multi-core systems to avoid contentions on shared caches, and solve the underlying schedulability analysis problem. On the system level, we present techniques to improve the Real-Time Calculus (RTC) analysis framework in both efficiency and precision. First, we have developed Finitary Real-Time Calculus to solve the scalability problem of the original RTC due to period explosion. The key idea is to only maintain and operate on a limited prefix of each curve that is relevant to the final results during the whole analysis procedure. We further improve the analysis precision of EDF components in RTC, by precisely bounding the response time of each computation request.
133

Exploiting parallelism of irregular problems and performance evaluation on heterogeneous multi-core architectures

Xu, Meilian 04 October 2012 (has links)
In this thesis, we design, develop and implement parallel algorithms for irregular problems on heterogeneous multi-core architectures. Irregular problems exhibit random and unpredictable memory access patterns, poor spatial locality and input dependent control flow. Heterogeneous multi-core processors vary in: clock frequency, power dissipation, programming model (MIMD vs. SIMD), memory design and computing units, scalar versus vector units. The heterogeneity of the processors makes designing efficient parallel algorithms for irregular problems on heterogeneous multicore processors challenging. Techniques of mapping tasks or data on traditional parallel computers can not be used as is on heterogeneous multi-core processors due to the varying hardware. In an attempt to understand the efficiency of futuristic heterogeneous multi-core architectures on applications we study several computation and bandwidth oriented irregular problems on one heterogeneous multi-core architecture, the IBM Cell Broadband Engine (Cell BE). The Cell BE consists of a general processor and eight specialized processors and addresses vector/data-level parallelism and instruction-level parallelism simultaneously. Through these studies on the Cell BE, we provide some discussions and insight on the performance of the applications on heterogeneous multi-core architectures. Verifying these experimental results require some performance modeling. Due to the diversity of heterogeneous multi-core architectures, theoretical performance models used for homogeneous multi-core architectures do not provide accurate results. Therefore, in this thesis we propose an analytical performance prediction model that considers the multitude architectural features of heterogeneous multi-cores (such as DMA transfers, number of instructions and operations, the processor frequency and DMA bandwidth). We show that the execution time from our prediction model is comparable to the execution time of the experimental results for a complex medical imaging application.
134

Approche logicielle pour améliorer la fiabilité d’applications parallèles implémentées dans des processeurs multi-cœur et many-cœur / Software approach to improve the reliability of parallel applications implemented on multi-core and many-core processors

Vargas Vallejo, Vanessa Carolina 28 April 2017 (has links)
La grande capacité de calcul, flexibilité, faible consommation d'énergie, redondance intrinsèque et la haute performance fournie par les processeurs multi/many-cœur les rendent idéaux pour surmonter les nouveaux défis dans les systèmes informatiques. Cependant, le degré d'intégration de ces dispositifs augmente leur sensibilité aux effets des radiations naturelles. Par conséquent, des fabricants, partenaires industriels et universitaires travaillent ensemble pour améliorer les caractéristiques de ces dispositifs ce qui permettrait leur utilisation dans des systèmes embarqués et critiques. Dans ce contexte, le travail effectué dans le cadre de cette thèse vise à évaluer l'impact des SEEs (Single Event Effects) dans des applications parallèles s'exécutant sur des processeurs multi-cœur et many-cœur, et développer et valider une approche logicielle pour améliorer la fiabilité du système appelée N- MoRePar. La méthodologie utilisée pour l'évaluation était fondée sur des études de cas multiples. Les différents scénarios mis en œuvre envisagent une large gamme de configurations de système en termes de mode de multi-processing, modèle de programmation, modèle de mémoire et des ressources utilisées. Pour l'expérimentation, deux dispositifs COTS ont été sélectionnés: le quad-core Freescale PowerPC P2041 en technologie SOI 45nm, et le processeur multi-cœur KALRAY MPPA-256 en CMOS 28nm. Les études de cas ont été évaluées par l'injection de fautes et par des campagnes des tests sur neutron. Les résultats obtenus servent de guide aux développeurs pour choisir la configuration du système la plus fiable en fonction de leurs besoins. En outre, les résultats de l'évaluation de l'approche N-MoRePar basée sur des critères de redondance et de partitionnement augmente l'utilisation des processeurs COTS multi/many-cœur dans des systèmes qui requièrent haute fiabilité. / The large computing capacity, great flexibility, low power consumption, intrinsic redundancy and high performance provided by multi/many-core processors make them ideal to overcome with the new challenges in computing systems. However, the degree of scale integration of these devices increases their sensitivity to the effects of natural radiation. Consequently manufacturers, industrial and university partners are working together to improve their characteristics which allow their usage in critical embedded systems. In this context, the work done throughout this thesis aims at evaluating the impact of SEEs on parallel applications running on multi-core and many-core processors, and proposing a software approach to improve the system reliability. The methodology used for evaluation was based on multiple-case studies. The different scenarios implemented consider a wide range of system configurations in terms of multi-processing mode, programming model, memory model, and resources used. For the experimentation, two COTS devices were selected: the Freescale PowerPC P2041 quad-core built in 45nm SOI technology, and the KALRAY MPPA-256 many-core processor built in 28nm CMOS technology. The case-studies were evaluated through fault-injection and neutron radiation. The obtained results serve as useful guidelines to developers for choosing the most reliable system configuration according to their requirements. Furthermore, the evaluation results of the proposed N-MoRePar fault-tolerant approach based on redundancy and partitioning criteria boost the usage of COTS multi/many-core processors in high level dependability systems.
135

Support des communications dans des architectures multicœurs par l’intermédiaire de mécanismes matériels et d’interfaces de programmation standardisées / Communication support in multi-core architectures through hardware mechanisms and standardized programming interfaces

Rosa, Thiago Raupp da 08 April 2016 (has links)
L’évolution des contraintes applicatives imposent des améliorations continues sur les performances et l’efficacité énergétique des systèmes embarqués. Pour répondre à ces contraintes, les plateformes « SoC » actuelles s’appuient sur la multiplication des cœurs de calcul, tout en ajoutant des accélérateurs matériels dédiés pour gérer des tâches spécifiques. Dans ce contexte, développer des applications embarquées devient un défi complexe, en effet la charge de travail des applications continue à croître alors que les technologies logicielles n’évoluent pas aussi vite que les architectures matérielles, laissant un écart dans la conception complète du système. De fait, la complexité accrue de programmation peut être associée à l’absence de standards logiciels qui prennent en charge l’hétérogénéité des architectures, menant souvent à des solutions ad hoc. A l’opposé, l’utilisation d’une solution logicielle standardisée pour les systèmes embarqués peut induire des surcoûts importants concernant les performances et l’occupation de la mémoire si elle n’est pas adaptée à l’architecture. Par conséquent, le travail de cette thèse se concentre sur la réduction de cet écart en mettant en œuvre des mécanismes matériels dont la conception prend en compte une interface de programmation standard pour systèmes embarqués. Les principaux objectifs sont ainsi d’accroître la programmabilité par la mise en œuvre d’une interface de programmation : MCAPI, et de diminuer la charge logiciel des cœurs grâce à l’utilisation des mécanismes matériels développés.Les contributions de la thèse comprennent la mise en œuvre de MCAPI pour une plate-forme multicœur générique et des mécanismes matériels pour améliorer la performance globale de la configuration de la communication et des transferts de données. Il est démontré que les mécanismes peuvent être pris en charge par les interfaces logicielles sans augmenter leur complexité. En outre, les résultats de performance obtenus en utilisant un modèle SystemC/TLM de l’architecture multicœurs de référence montrent que les mécanismes proposés apportent des gains significatifs en termes de latence, débit, trafic réseau, temps de charge processeur et temps de communication sur des cas d’étude et des applications complètes. / The application constraints driving the design of embedded systems are constantly demanding higher performance and power efficiency. To meet these constraints, current SoC platforms rely on replicating several processing cores while adding dedicated hardware accelerators to handle specific tasks. However, developing embedded applications is becoming a key challenge, since applications workload will continue to grow and the software technologies are not evolving as fast as hardware architectures, leaving a gap in the full system design. Indeed, the increased programming complexity can be associated to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant performance and memory usage overheads. Therefore, this Thesis focus on decreasing this gap by implementing hardware mechanisms in co-design with a standard programming interface for embedded systems. The main objectives are to increase programmability through the implementation of a standardized communication application programming interface (MCAPI), and decrease the overheads imposed by the software implementation through the use of the developed hardware mechanisms.The contributions of the Thesis comprise the implementation of MCAPI for a generic multi-core platform and dedicated hardware mechanisms to improve communication connection phase and overall performance of data transfer phase. It is demonstrated that the proposed mechanisms can be exploited by the software implementation without increasing software complexity. Furthermore, performance estimations obtained using a SystemC/TLM simulation model for the reference multi-core architecture show that the proposed mechanisms provide significant gains in terms of latency (up to 97%), throughput (40x increase) and network traffic (up to 68%) while reducing processor workload for both characterization test-cases and real application benchmarks.
136

LX-MCAPI : biblioteca de comunicação para suporte a programação paralela em sistemas multi-core

Ideguchi, Antonio Diogo Hidee 12 May 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2016-12-19T10:21:33Z No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:17Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:38Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Made available in DSpace on 2017-01-16T18:00:48Z (GMT). No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) Previous issue date: 2016-05-12 / Não recebi financiamento / The multi-core processors represent the industry response for the physical barriers encountered during the development of computing processors during the last decades, and brought new advances on computing system performance. The complex superscalar unicore processors with high frequency clocks gave way to processing units with two or more cores in just one encapsulation, generally with low clock frequencies, allowing one or more execution threads per core. On this context, the existing programming models using serial and concurrent paradigms do not allow exploring the real potential provided by the new hardware elements incorporated, generating a necessity of new programming methodologies that does allow exploring parallelism aggregated by the use of multi-core processors. This work presents LX-MCAPI, a library based on modern IPC (Inter-Process Communication) and memory sharing mechanisms, developed over the hypothesis that message passing is a viable, flexible and scalable abstraction, compared to conventional programming methods using shared-memory on multi-core systems. LX-MCAPI offers a message-passing, zerocopy memory sharing mechanism between processes and ready to use scalability patterns to facilitate the process of abstraction and construction of applications. It has performed well in therms of transmission latency and transfer rate on x86-64 and ARM environments. / Os processadores multi-core representaram a resposta da indústria às barreiras físicas encontradas no desenvolvimento de processadores computacionais nas últimas décadas, e trouxeram novo fôlego ao avanço do desempenho de sistemas computacionais. Os complexos processadores superescalares de núcleo único com frequências de clock relativamente altas deram espaço a unidades de processamento com dois ou mais núcleos em um mesmo encapsulamento, geralmente mais “lentos”, possibilitando uma ou mais threads por núcleo. Nesse contexto, os modelos de programação existentes utilizando os paradigmas sequencial e concorrente não permitiam a exploração do potencial real proporcionado pelos novos elementos de hardware introduzidos, gerando uma necessidade de criação de novas metodologias de programação que permitissem tirar proveito do paralelismo agregado à utilização dos processadores multi-core. Este trabalho apresenta a LX-MCAPI, biblioteca baseada em mecanismos modernos de IPC (Inter-Process Communication) e compartilhamento de memória, desenvolvida sobre a hipótese em que a passagem de mensagens é uma abstração viável, flexível e escalável, quando comparada a métodos de programação convencionais utilizando memória-compartilhada em sistemas multi-core. LX-MCAPI oferece um mecanismo de passagem de mensagem e compartilhamento zero-copy de memória entre processos, além de padrões de programação paralela prontos para uso, que facilitam o processo de abstração e construção de aplicações. Além disso, apresentando bom desempenho em termos de latências de transmissão e taxas de transferência em ambientes x86-64 e ARM.
137

Large Scale Graph Processing in a Distributed Environment

Upadhyay, Nitesh January 2017 (has links) (PDF)
Graph algorithms are ubiquitously used across domains. They exhibit parallelism, which can be exploited on parallel architectures, such as multi-core processors and accelerators. However, real world graphs are massive in size and cannot fit into the memory of a single machine. Such large graphs are partitioned and processed in a distributed cluster environment which consists of multiple GPUs and CPUs. Existing frameworks that facilitate large scale graph processing in the distributed cluster have their own style of programming and require extensive involvement by the user in communication and synchronization aspects. Adaptation of these frameworks appears to be an overhead for a programmer. Furthermore, these frameworks have been developed to target only CPU clusters and lack the ability to harness the GPU architecture. We provide a back-end framework to the graph Domain Specific Language, Falcon, for large scale graph processing on CPU and GPU clusters. The Motivation behind choosing this DSL as a front-end is its shared-memory based imperative programmability feature. Our framework generates Giraph code for CPU clusters. Giraph code runs on the Hadoop cluster and is known for scalable and fault-tolerant graph processing. For GPU cluster, Our framework applies a set of optimizations to reduce computation and communication latency, and generates efficient CUDA code coupled with MPI. Experimental evaluations show the scalability and performance of our framework for both CPU and GPU clusters. The performance of the framework generated code is comparable to the manual implementations of various algorithms in distributed environments.
138

Mitteilungen des URZ 3/2009

Clauß, Matthias, Müller, Thomas, Riedel, Wolfgang, Schier, Thomas, Vodel, Matthias 31 August 2009 (has links)
Informationen des Universitätsrechenzentrums
139

Design and Implementation of an Audio Codec (AMR-WB) using Dataflow Programming Language CAL in the OpenDF Environment

Ali, Hazem, Patoary, Mohammad Nazrul Ishlam January 2010 (has links)
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, we have to develop parallel applications that assign tasks to different cores. On each core, pipeline, data and task parallelization can be used to achieve higher performance. Dataflow programming languages are attractive for achieving parallelism because of their high-level, machine-independent, implicitly parallel notation and because of their fine-grain parallelism. These features are essential for obtaining effective, scalable utilization of multi-core processors. In this thesis work we have parallelized an existing audio codec - Adaptive Multi-Rate Wide Band (AMR-WB) - written in the C language for single core processor. The target platform is a multi-core AMR11 MP developer board. The final result of the efforts is a working AMR-WB encoder implemented in CAL and running in the OpenDF simulator. The C specification of the AMR-WB encoder was analysed with respect to dataflow and parallelism. The final implementation was developed in the CAL Actor Language, with the goal of exposing available parallelism - different dataflows - as well as removing unwanted data dependencies. Our thesis work discusses mapping techniques and guidelines that we followed and which can be used in any future work regarding mapping C based applications to CAL. We also propose solutions for some specific dependencies that were revealed in the AMR-WB encoder analysis and suggest further investigation of possible modifications to the encoder to enable more efficient implementation on a multi-core target system.
140

Power Optimal Network-On-Chip Interconnect Design

Vikas, G 02 1900 (has links) (PDF)
A large part of today's multi-core chips is interconnect. Increasing communication complexity has made new strategies for interconnects essential such as Network on Chip. Power dissipation in interconnects has become a substantial part of the total power dissipation. Hence, techniques to reduce interconnect power have become a necessity. In this thesis, we present a design methodology that gives values of bus width for interconnect links, frequency of operation for routers, in Network on Chip scenario that satisfy required throughput and dissipate minimal switching power. We develop closed form analytical expressions for the power dissipation, with bus width and frequency as variables and then use Lagrange multiplier method to arrive at the optimal values. To validate our methodology, we implement the router design in 90 nm technology and measure power for various bus widths and frequency combinations. We find that the experimental results are in good agreement with the predicted theoretical results. Further, we present the scenario of an Application Specific System on Chip (ASSoC), where the throughput requirements are different on different links. We show that our analytical model holds in this case also. Then, we present modified version of the solution considered for Chip Multi Processor (CMP) case that can solve the ASSoC scenario also.

Page generated in 0.052 seconds