Global ETD Search

101	An Interconnection Network for a Cache Coherent System on FPGAs Mirian, Vincent 12 January 2011 (has links) Field-Programmable Gate Arrays (FPGAs) systems now comprise many processing elements that are processors running software and hardware engines used to accelerate specific functions. To make the programming of such a system simpler, it is easiest to think of a shared-memory environment, much like in current multi-core processor systems. This thesis introduces a novel, shared-memory, cache-coherent infrastructure for heterogeneous systems implemented on FPGAs that can then form the basis of a shared-memory programming model for heterogeneous systems. With simulation results, it is shown that the cache-coherent infrastructure outperforms the infrastructure of Woods [1] with a speedup of 1.10. The thesis explores the various configurations of the cache interconnection network and the benefit of the cache-to-cache cache line data transfer with its impact on main memory access. Finally, the thesis shows the cache-coherent infrastructure has very little overhead when using its cache coherence implementation. FPGA Cache Coherence Interconnection Network Protocols Multiprocessor muti-thread Pthread (Posix Thread) programming model parrallel programming shared-memory model 0544
102	Scheduling Algorithms for Instruction Set Extended Symmetrical Homogeneous Multiprocessor Systems-on-Chip Montcalm, Michael R. 10 June 2011 (has links) Embedded system designers face multiple challenges in fulfilling the runtime requirements of programs. Effective scheduling of programs is required to extract as much parallelism as possible. These scheduling algorithms must also improve speedup after instruction-set extensions have occurred. Scheduling of dynamic code at run time is made more difficult when the static components of the program are scheduled inefficiently. This research aims to optimize a program’s static code at compile time. This is achieved with four algorithms designed to schedule code at the task and instruction level. Additionally, the algorithms improve scheduling using instruction set extended code on symmetrical homogeneous multiprocessor systems. Using these algorithms, we achieve speedups up to 3.86X over sequential execution for a 4-issue 2-processor system, and show better performance than recent heuristic techniques for small programs. Finally, the algorithms generate speedup values for a 64-point FFT that are similar to the test runs. Scheduling ILP System on Chip SoC Instruction level parallelism Integer Linear Program Custom Instruction Instruction Set Extension Multiprocessor
103	Architecture synthesis for adaptive multiprocessor systems on chip Ishebabi, Harold January 2010 (has links) This thesis presents methods for automated synthesis of flexible chip multiprocessor systems from parallel programs targeted at FPGAs to exploit both task-level parallelism and architecture customization. Automated synthesis is necessitated by the complexity of the design space. A detailed description of the design space is provided in order to determine which parameters should be modeled to facilitate automated synthesis by optimizing a cost function, the emphasis being placed on inclusive modeling of parameters from application, architectural and physical subspaces, as well as their joint coverage in order to avoid pre-constraining the design space. Given a parallel program and a set of an IP library, the automated synthesis problem is to simultaneously (i) select processors (ii) map and schedule tasks to them, and (iii) select one or several networks for inter-task communications such that design constraints and optimization objectives are met. The research objective in this thesis is to find a suitable model for automated synthesis, and to evaluate methods of using the model for architectural optimizations. Our contributions are a holistic approach for the design of such systems, corresponding models to facilitate automated synthesis, evaluation of optimization methods using state of the art integer linear and answer set programming, as well as the development of synthesis heuristics to solve runtime challenges. / Aktuelle Technologien erlauben es komplexe Multiprozessorsysteme auf einem Chip mit Milliarden von Transistoren zu realisieren. Der Entwurf solcher Systeme ist jedoch zeitaufwendig und schwierig. Diese Arbeit befasst sich mit der Frage, wie On-Chip Multiprozessorsysteme ausgehend von parallelen Programmen automatisch synthetisiert werden können. Die Implementierung der Multiprozessorsysteme auf rekonfigurierbaren Chips erlaubt es die gesamte Architektur an die Struktur eines vorliegenden parallelen Programms anzupassen. Auf diese Weise ist es möglich die aktuellen technologischen Unzulänglichkeiten zu umgehen, insbesondere die nicht weitersteigende Taktfrequenzen sowie den langsamen Zugriff auf Datenspeicher. Eine Automatisierung des Entwurfs von Multiprozessorsystemen ist notwendig, da der Entwurfsraum von Multiprozessorsystemen zu groß ist, um vom Menschen überschaut zu werden. In einem ersten Ansatz wurde das Syntheseproblem mittels linearer Gleichungen modelliert, die dann durch lineare Programmierungswerkzeuge gelöst werden können. Ausgehend von diesem Ansatz wurde untersucht, wie die typischerweise langen Rechenzeiten solcher Optimierungsmethoden durch neuere Methode aus dem Gebiet der Erfüllbarkeitsprobleme der Aussagenlogik minimiert werden können. Dabei wurde die Werkzeugskette Potassco verwendet, in der lineare Programme direkt in Logikprogramme übersetzt werden können. Es wurde gezeigt, dass dieser zweite Ansatz die Optimierungszeit um bis zu drei Größenordnungen beschleunigt. Allerdings lassen sich große Syntheseprobleme auf diese weise wegen Speicherbegrenzungen nicht lösen. Ein weiterer Ansatz zur schnellen automatischen Synthese bietet die Verwendung von Heuristiken. Es wurden im Rahmen diese Arbeit drei Heuristiken entwickelt, die die Struktur des vorliegenden Syntheseproblems ausnutzen, um die Optimierungszeit zu minimieren. Diese Heuristiken wurden unter Berücksichtigung theoretischer Ergebnisse entwickelt, deren Ursprung in der mathematische Struktur des Syntheseproblems liegt. Dadurch lassen sich optimale Architekturen in kurzer Zeit ermitteln. Die durch diese Dissertation offen gewordene Forschungsarbeiten sind u. a. die Berücksichtigung der zeitlichen Reihenfolge des Datenaustauschs zwischen parallelen Tasks, die Optimierung des logik-basierten Ansatzes, die Integration von Prozessor- und Netzwerksimulatoren zur funktionalen Verifikation synthetisierter Architekturen, sowie die Entwicklung geeigneter Architekturkomponenten. Multiprozessor rekonfigurierbar Synthese Parallelrechner Exploration Multiprocessor Reconfigurable High-Level Synthesis Parallel Programming Exploration Data processing Computer science
104	A Generalized Framework for Energy Savings in Real-Time Multiprocessor Systems Zeng, Gang, Yokoyama, Tetsuo, Tomiyama, Hiroyuki, Takada, Hiroaki 11 1900 (has links) No description available. embedded real-time systems energy-aware multiprocessor scheduling dynamic hardware resource configuration dynamic voltage frequency scaling dynamic power management
105	Global scheduling on temperature-constrained multiprocessor real-time systems Koo, Ja-Ryeong 10 October 2008 (has links) In this thesis, we study temperature-constrained multiprocessor real-time systems, where real-time guarantees must be met without exceeding safe temperature levels within the processors. We focus on Pfair scheduling algorithms, especially ERfair scheduling scheme (a work-conserving extension to Pfair scheduling) as our main multiprocessor real-time scheduling methodology. Then, we study the benefits of simple reactive speed scaling as described in the real-time multiprocessor systems. In this thesis, in support of the temperature-awareness, we extend the applicability of the reactive speed scaling to global scheduling schemes for multiprocessors. We propose temperature-aware scheduling and processor selection schemes motivated by existing (thermally non-optimal) ERfair scheduling in order to reduce thermal stress and therefore increase the processor utilization. Then, we show that the proposed algorithm and reactive scheme can enhance the processor utilization compared with any constant speed scheme on real-time multiprocessor systems. Additionally, we show how the maximum schedulable utilization (MSU) for partitioning heuristics can be determined on the temperature-constrained multiprocessor real-time systems. Temperature-constrained real-time system multiprocessor real-time scheduling reactive speed scaling temperature-aware scheduling schedulability analysis
106	Scheduling Algorithms for Instruction Set Extended Symmetrical Homogeneous Multiprocessor Systems-on-Chip Montcalm, Michael R. 10 June 2011 (has links) Embedded system designers face multiple challenges in fulfilling the runtime requirements of programs. Effective scheduling of programs is required to extract as much parallelism as possible. These scheduling algorithms must also improve speedup after instruction-set extensions have occurred. Scheduling of dynamic code at run time is made more difficult when the static components of the program are scheduled inefficiently. This research aims to optimize a program’s static code at compile time. This is achieved with four algorithms designed to schedule code at the task and instruction level. Additionally, the algorithms improve scheduling using instruction set extended code on symmetrical homogeneous multiprocessor systems. Using these algorithms, we achieve speedups up to 3.86X over sequential execution for a 4-issue 2-processor system, and show better performance than recent heuristic techniques for small programs. Finally, the algorithms generate speedup values for a 64-point FFT that are similar to the test runs. Scheduling ILP System on Chip SoC Instruction level parallelism Integer Linear Program Custom Instruction Instruction Set Extension Multiprocessor
107	Optimisation des transferts de données sur systèmes multiprocesseurs sur puce / Optimizing Data Transfers for Multiprocessor Systems on Chips Saidi, Selma 24 October 2012 (has links) Les systèmes multiprocesseurs sur puce, tel que le processeur CELL ou plus récemment Platform 2012, sont des architectures multicœurs hétérogènes constitués d'un processeur host et d'une fabric de calcul qui consiste en plusieurs petits cœurs dont le rôle est d'agir comme un accélérateur programmable. Les parties parallélisable d'une application, qui initialement est supposé etre executé par le host, et dont le calcul est intensif sont envoyés a la fabric multicœurs pour être exécutés. Ces applications sont en général des applications qui manipulent des tableaux trés larges de données, ces données sont stockées dans une memoire distante hors puce (off-chip memory) dont l 'accès est 100 fois plus lent que l 'accès par un cœur a une mémoire locale. Accéder ces données dans la mémoire off-chip devient donc un problème majeur pour les performances. une characteristiques principale de ces plateformes est une mémoire local géré par le software, au lieu d un mechanisme de cache, tel que les mouvements de données dans la hiérarchie mémoire sont explicitement gérés par le software. Dans cette thèse, l 'objectif est d'optimiser ces transfert de données dans le but de reduire/cacher la latence de la mémoire off-chip . / Multiprocessor system on chip (MPSoC) such as the CELL processor or the more recent Platform2012 are heterogeneous multi-core architectures, with a powerful host processor and a computation fabric, consisting of several smaller cores, whose intended role is to act as a general purpose programmable accelerator. Therefore computation-intensive (and parallelizable) parts of the application initially intended to be executed by the host processor are offloaded to the multi-cores for execution. These parts of the application are often data intensive, operating on large arrays of data initially stored in a remote off-chip memory whose access time is about 100 times slower than that of the cores local memory. Accessing data in the off-chip memory becomes then a main bottleneck for performance. A major characteristic of these platforms is a software controlled local memory storage rather than a hidden cache mechanism where data movement in the memory hierarchy, typically performed using a DMA (Direct Memory Access) engine, are explicitely managed by the software. In this thesis, we attempt to optimize such data transfers in order to reduce/hide the off-chip memory latency. Application data parallèles DMA Systemes multiprocesseurs sur puce Data parallel applications Direct Memory Access(DMA) Multiprocessor architecture
108	Desenvolvimento de uma arquitetura reconfigurável para o processamento de modelos no ambiente ABACUS Lima, Verônica Aparecida Lopes [UNESP] 31 August 2007 (has links) (PDF) Made available in DSpace on 2014-06-11T19:22:35Z (GMT). No. of bitstreams: 0 Previous issue date: 2007-08-31Bitstream added on 2014-06-13T20:29:09Z : No. of bitstreams: 1 lima_val_me_ilha.pdf: 399126 bytes, checksum: 5597e5f619ca9aa5e433432ef064a3bf (MD5) / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / O objetivo deste trabalho é o desenvolvimento de uma arquitetura reconfigurável estaticamente, de um elemento de processamento (MPH) para o ambiente de simulação de circuitos ABACUS. Este elemento de processamento consiste de um conjunto de unidades funcionais que podem ser relacionadas por meio de algumas palavras de controle armazenadas na ROM, e cuja interconexão pode ser alterada para que o hardware de processamento se adapte ao modelo do elemento de circuito a ser simulado. O projeto foi descrito em linguagem VHDL e simulado com o auxílio do software QUARTUS II. / The aim of this work is the development of a statically reconfigurable architecture, of a processing element (MPH) for the ABACUS circuit simulation environment. This processing element consists of a set of functional units that can be related by means of some control words stored in the ROM, and whose interconnection can be modified so that the processing hardware be adapted to the model of the circuit element to be simulated. The project was described in VHDL, and simulated with the aid of Quartus II software. Sistemas digitais reconfiguráveis Reconfigurable digital systems Circuit simulation Multiprocessor computer architectures
109	Multiprocessador em eletronica reconfiguravel para aplicações roboticas / Multiprocessor in reconfigurable electronics to robotical applications Castro, Eberval Oliveira 12 November 2007 (has links) Orientador: Marconi Kolm Madrid / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-10T03:57:30Z (GMT). No. of bitstreams: 1 Castro_EbervalOliveira_M.pdf: 4698124 bytes, checksum: 0a0a438cbad7212bdba90f2c96875871 (MD5) Previous issue date: 2007 / Resumo: A solução de modelos dinâmicos de robôs em tempo real é um dos principais desafios da robótica. Este trabalho propõe um multiprocessador de quatro núcleos fortemente acoplados, o SMM-4 (Sistema Multiprocessado Monolítico), consistindo de uma arquitetura de processamento paralelo monolítica sintetizada em FPGA para aplicações em controle de sistemas robóticos. Uma análise quantitativa e qualitativa é realizada em contraste a sistemas uniprocessadores, evidenciando os ganhos obtidos através desta abordagem em FPGA. O SMM-4 foi desenvolvido no Laboratório de Sistemas Modulares Robóticos (LSMR/Unicamp) como uma das alternativas para o cálculo das equações dos modelos de robôs em tempo real / Abstract: The solution of robots¿ dynamic models in real-time is one of major challenges of the robotics. This work presents a strongly coupled quad-core multiprocessor ¿ the MMS-4 (Monolithic Multiprocessor System) ¿ consisting of a monolithical parallel processing architecture synthesized on FPGA for applications on robotic control systems. A quantitative and qualitative analysis is performed in contrast with uniprocessor systems for the purpose of evince the benefits obtained choosing this approach in FPGA. The MMS-4 was developed at Robotic Modular Systems Laboratory (LSMR/Unicamp) as an alternative to calculate the equations systems of robots¿ models on real-time / Mestrado / Automação / Mestre em Engenharia Elétrica Robótica Processamento paralelo (Computadores) Multiprocessadores Controle em tempo real Robotics Parallel processing FPGA Embedded multiprocessor Real-time systems
110	Scheduling Algorithms for Instruction Set Extended Symmetrical Homogeneous Multiprocessor Systems-on-Chip Montcalm, Michael R. January 2011 (has links) Embedded system designers face multiple challenges in fulfilling the runtime requirements of programs. Effective scheduling of programs is required to extract as much parallelism as possible. These scheduling algorithms must also improve speedup after instruction-set extensions have occurred. Scheduling of dynamic code at run time is made more difficult when the static components of the program are scheduled inefficiently. This research aims to optimize a program’s static code at compile time. This is achieved with four algorithms designed to schedule code at the task and instruction level. Additionally, the algorithms improve scheduling using instruction set extended code on symmetrical homogeneous multiprocessor systems. Using these algorithms, we achieve speedups up to 3.86X over sequential execution for a 4-issue 2-processor system, and show better performance than recent heuristic techniques for small programs. Finally, the algorithms generate speedup values for a 64-point FFT that are similar to the test runs. Scheduling ILP System on Chip SoC Instruction level parallelism Integer Linear Program Custom Instruction Instruction Set Extension Multiprocessor

Search results