Global ETD Search

71	Hardware-Software Co-Design for Sensor Nodes in Wireless Networks Zhang, Jingyao 11 June 2013 (has links) Simulators are important tools for analyzing and evaluating different design options for wireless sensor networks (sensornets) and hence, have been intensively studied in the past decades. However, existing simulators only support evaluations of protocols and software aspects of sensornet design. They cannot accurately capture the significant impacts of various hardware designs on sensornet performance. As a result, the performance/energy benefits of customized hardware designs are difficult to be evaluated in sensornet research. To fill in this technical void, in first section, we describe the design and implementation of SUNSHINE, a scalable hardware-software emulator for sensornet applications. SUNSHINE is the first sensornet simulator that effectively supports joint evaluation and design of sensor hardware and software performance in a networked context. SUNSHINE captures the performance of network protocols, software and hardware up to cycle-level accuracy through its seamless integration of three existing sensornet simulators: a network simulator TOSSIM, an instruction-set simulator SimulAVR and a hardware simulator GEZEL. SUNSHINE solves several sensornet simulation challenges, including data exchanges and time synchronization across different simulation domains and simulation accuracy levels. SUNSHINE also provides hardware specification scheme for simulating flexible and customized hardware designs. Several experiments are given to illustrate SUNSHINE's simulation capability. Evaluation results are provided to demonstrate that SUNSHINE is an efficient tool for software-hardware co-design in sensornet research. Even though SUNSHINE can simulate flexible sensor nodes (nodes contain FPGA chips as coprocessors) in wireless networks, it does not estimate power/energy consumption of sensor nodes. So far, no simulators have been developed to evaluate the performance of such flexible nodes in wireless networks. In second section, we present PowerSUNSHINE, a power- and energy-estimation tool that fills the void. PowerSUNSHINE is the first scalable power/energy estimation tool for WSNs that provides an accurate prediction for both fixed and flexible sensor nodes. In the section, we first describe requirements and challenges of building PowerSUNSHINE. Then, we present power/energy models for both fixed and flexible sensor nodes. Two testbeds, a MicaZ platform and a flexible node consisting of a microcontroller, a radio and a FPGA based co-processor, are provided to demonstrate the simulation fidelity of PowerSUNSHINE. We also discuss several evaluation results based on simulation and testbeds to show that PowerSUNSHINE is a scalable simulation tool that provides accurate estimation of power/energy consumption for both fixed and flexible sensor nodes. Since the main components of sensor nodes include a microcontroller and a wireless transceiver (radio), their real-time performance may be a bottleneck when executing computation-intensive tasks in sensor networks. A coprocessor can alleviate the burden of microcontroller from multiple tasks and hence decrease the probability of dropping packets from wireless channel. Even though adding a coprocessor would gain benefits for sensor networks, designing applications for sensor nodes with coprocessors from scratch is challenging due to the consideration of design details in multiple domains, including software, hardware, and network. To solve this problem, we propose a hardware-software co-design framework for network applications that contain multiprocessor sensor nodes. The framework includes a three-layered architecture for multiprocessor sensor nodes and application interfaces under the framework. The layered architecture is to make the design of multiprocessor nodes' applications flexible and efficient. The application interfaces under the framework are implemented for deploying reliable applications of multiprocessor sensor nodes. Resource sharing technique is provided to make processor, coprocessor and radio work coordinately via communication bus. Several testbeds containing multiprocessor sensor nodes are deployed to evaluate the effectiveness of our framework. Network experiments are executed in SUNSHINE emulator to demonstrate the benefits of using multiprocessor sensor nodes in many network scenarios. / Ph. D. Sensor networks multiprocessor sensor node Field programmable gate arrays simulator hardware-software co-design power/energy estimation testbeds
72	Dynamische Anwendungspartitionierung für heterogene adaptive Computersysteme: Dynamische Anwendungspartitionierung für heterogene adaptiveComputersysteme Rößler, Marko 21 May 2014 (has links) Die Dissertationsschrift stellt eine Methodik und die Infrastruktur zur Entwicklung von dynamisch verteilbaren Anwendungen für heterogene Computersysteme vor. Diese Computersysteme besitzen vielfältige Rechenwerke, die Berechnungen in den Domänen Software und Hardware realisieren. Als erster Schritt wird ein übergreifendes und integriertes Vorgehen für den Anwendungsentwurf auf Basis eines abstrakten “Single-Source” Ansatzes entwickelt. Durch die Virtualisierung der Rechenwerke wird die preemptive Verteilung der Anwendungen auch über die Domänengrenzen möglich. Die Anwendungsentwicklung für diese Computersysteme bedarf einer durchgehend automatisierten Entwurfsunterstützung. In der Arbeit wird der dazu vorgeschlagene Ansatz formalisiert und eine neuartige Unterbrechungspunktsynthese entwickelt, die ein hinsichtlich Zeit und Fläche optimiertes, präemptives Verhalten für beliebige Anwendungsbeschreibungen generiert. Das Verfahren wird beispielhaft implementiert und mittels einer FPGA- Prototypenplattform mit Linux-basierter Laufzeitumgebung anhand dreier Fallbeispiele unterschiedlicher Komplexität validiert und evaluiert. / This thesis introduces a methodology and infrastructure for the development of dynamically distributable applications on heterogeneous computing systems. Such systems execute computations using resources from both the hardware and the software domain. An integrated approach based on an abstract single-source design entry is developed that allows preemptive partitioning through virtualization of computing resources across the boundaries of differing computational domains. Application design for heterogeneous computing systems is a complex task that demands aid by electronic design automation tools. This work provides a novel synthesis approach for breakpoints that generates preemptive behaviour for arbitrary applications. The breakpoint scheme is computed for a minimal additional resource utilization and given timing constraints. The approach is implemented on an FPGA prototyping platform driven by a Linux based runtime environment. Evaluation and validation of the approach have been carried out using three different application examples. info:eu-repo/classification/ddc/620 ddc:620 info:eu-repo/classification/ddc/537 ddc:537
73	Towards the development of a reliable reconfigurable real-time operating system on FPGAs Hong, Chuan January 2013 (has links) In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner. 621.39
74	Hardware and software co-design toward flexible terabits per second traffic processing / Co-conception matérielle et logicielle pour du traitement de trafic flexible au-delà du terabit par seconde Cornevaux-Juignet, Franck 04 July 2018 (has links) La fiabilité et la sécurité des réseaux de communication nécessitent des composants efficaces pour analyser finement le trafic de données. La diversification des services ainsi que l'augmentation des débits obligent les systèmes d'analyse à être plus performants pour gérer des débits de plusieurs centaines, voire milliers de Gigabits par seconde. Les solutions logicielles communément utilisées offrent une flexibilité et une accessibilité bienvenues pour les opérateurs du réseau mais ne suffisent plus pour répondre à ces fortes contraintes dans de nombreux cas critiques.Cette thèse étudie des solutions architecturales reposant sur des puces programmables de type Field-Programmable Gate Array (FPGA) qui allient puissance de calcul et flexibilité de traitement. Des cartes équipées de telles puces sont intégrées dans un flot de traitement commun logiciel/matériel afin de compenser les lacunes de chaque élément. Les composants du réseau développés avec cette approche innovante garantissent un traitement exhaustif des paquets circulant sur les liens physiques tout en conservant la flexibilité des solutions logicielles conventionnelles, ce qui est unique dans l'état de l'art.Cette approche est validée par la conception et l'implémentation d'une architecture de traitement de paquets flexible sur FPGA. Celle-ci peut traiter n'importe quel type de paquet au coût d'un faible surplus de consommation de ressources. Elle est de plus complètement paramétrable à partir du logiciel. La solution proposée permet ainsi un usage transparent de la puissance d'un accélérateur matériel par un ingénieur réseau sans nécessiter de compétence préalable en conception de circuits numériques. / The reliability and the security of communication networks require efficient components to finely analyze the traffic of data. Service diversification and through put increase force network operators to constantly improve analysis systems in order to handle through puts of hundreds,even thousands of Gigabits per second. Commonly used solutions are software oriented solutions that offer a flexibility and an accessibility welcome for network operators, but they can no more answer these strong constraints in many critical cases.This thesis studies architectural solutions based on programmable chips like Field-Programmable Gate Arrays (FPGAs) combining computation power and processing flexibility. Boards equipped with such chips are integrated into a common software/hardware processing flow in order to balance short comings of each element. Network components developed with this innovative approach ensure an exhaustive processing of packets transmitted on physical links while keeping the flexibility of usual software solutions, which was never encountered in the previous state of theart.This approach is validated by the design and the implementation of a flexible packet processing architecture on FPGA. It is able to process any packet type at the cost of slight resources over consumption. It is moreover fully customizable from the software part. With the proposed solution, network engineers can transparently use the processing power of an hardware accelerator without the need of prior knowledge in digital circuit design. Surveillance de trafic FPGA Architecture hétérogène Traitements haute performance Co-conception logicielle/matérielle Traffic monitoring FPGA Heterogeneous architecture High performance computing Hardware/software co-design 620
75	Implémentation d'algorithmes de reconnaissance biométrique par l'iris sur des architectures dédiées / Implementing biometric iris recognition algorithms on dedicated architectures Hentati, Raïda 02 November 2013 (has links) Dans cette thèse, nous avons adapté trois versions d'une chaine d'algorithmes de reconnaissance biométrique par l’iris appelés OSIRIS V2, V3, V4 qui correspondent à différentes implémentations de l’approche de J. Daugman pour les besoins d’une implémentation logicielle / matérielle. Les résultats expérimentaux sur la base de données ICE2005 montrent que OSIRIS_V4 est le système le plus fiable alors qu’OSIRIS_V2 est le plus rapide. Nous avons proposé une mesure de qualité de l’image segmentée pour optimiser en terme de compromis coût / performance un système de référence basé sur OSIRIS V2 et V4. Nous nous sommes ensuite intéressés à l’implémentation de ces algorithmes sur des plateformes reconfigurables. Les résultats expérimentaux montrent que l’implémentation matériel / logiciel est plus rapide que l’implémentation purement logicielle. Nous proposons aussi une nouvelle méthode pour le partitionnement matériel / logiciel de l’application. Nous avons utilisé la programmation linéaire pour trouver la partition optimale pour les différentes tâches prenant en compte les trois contraintes : la surface occupée, le temps d’exécution et la consommation d’énergie / In this thesis, we adapted three versions of a chain of algorithms for biometric iris recognition called OSIRIS V2, V3, V4, which correspond to different implementations of J. Daugman approach. The experimental results on the database ICE2005 show that OSIRIS_V4 is the most reliable when OSIRIS_V2 is the fastest. We proposed a measure of quality of the segmented image in order to optimize in terms of cost / performance compromise a reference system based on OSIRIS V2 and V4. We focused on the implementation of these algorithms on reconfigurable platforms. The experimental results show that the hardware / software implementation is faster than the software implementation. We propose a new method for partitioning hardware / software application. We used linear programming to find the optimal partition for different tasks taking into account the three constraints : the occupied area, execution time and energy consumption Vérification d'identité biométrique Iris Mesure de qualité FPGA Flot de conception Partitionnement matériel / logiciel Biometric identity verification Iris Quality measurement FPGA Design flow Hardware / software partitioning
76	High Performance FPGA-Based Computation and Simulation for MIMO Measurement and Control Systems Palm, Johan January 2009 (has links) <p>The Stressometer system is a measurement and control system used in cold rolling to improve the flatness of a metal strip. In order to achieve this goal the system employs a multiple input multiple output (MIMO) control system that has a considerable number of sensors and actuators. As a consequence the computational load on the Stressometer control system becomes very high if too advance functions are used. Simultaneously advances in rolling mill mechanical design makes it necessary to implement more complex functions in order for the Stressometer system to stay competitive. Most industrial players in this market considers improved computational power, for measurement, control and modeling applications, to be a key competitive factor. Accordingly there is a need to improve the computational power of the Stressometer system. Several different approaches towards this objective have been identified, e.g. exploiting hardware parallelism in modern general purpose and graphics processors.</p><p>Another approach is to implement different applications in FPGA-based hardware, either tailored to a specific problem or as a part of hardware/software co-design. Through the use of a hardware/software co-design approach the efficiency of the Stressometer system can be increased, lowering overall demand for processing power since the available resources can be exploited more fully. Hardware accelerated platforms can be used to increase the computational power of the Stressometer control system without the need for major changes in the existing hardware. Thus hardware upgrades can be as simple as connecting a cable to an accelerator platform while hardware/software co-design is used to find a suitable hardware/software partition, moving applications between software and hardware.</p><p>In order to determine whether this hardware/software co-design approach is realistic or not, the feasibility of implementing simulator, computational and control applications in FPGAbased hardware needs to be determined. This is accomplished by selecting two specific applications for a closer study, determining the feasibility of implementing a Stressometer measuring roll simulator and a parallel Cholesky algorithm in FPGA-based hardware.</p><p>Based on these studies this work has determined that the FPGA device technology is perfectly suitable for implementing both simulator and computational applications. The Stressometer measuring roll simulator was able to approximate the force and pulse signals of the Stressometer measuring roll at a relative modest resource consumption, only consuming 1747 slices and eight DSP slices. This while the parallel FPGA-based Cholesky component is able to provide performance in the range of GFLOP/s, exceeding the performance of the personal computer used for comparison in several simulations, although at a very high resource consumption. The result of this thesis, based on the two feasibility studies, indicates that it is possible to increase the processing power of the Stressometer control system using the FPGA device technology.</p> FPGA Field Programmable Gate Arrays Parallel Cholesky Decomposition Hardware/Software Co-Design Digital Systems MIMO Measurement and Control FPGA-based Simulation FPGA-based Computation
77	Software Techniques for Distributed Shared Memory Radovic, Zoran January 2005 (has links) <p>In large multiprocessors, the access to shared memory is often nonuniform, and may vary as much as ten times for some distributed shared-memory architectures (DSMs). This dissertation identifies another important nonuniform property of DSM systems: <i>nonuniform communication architecture</i>, NUCA. High-end hardware-coherent machines built from large nodes, or from chip multiprocessors, are typical NUCA systems, since they have a lower penalty for reading recently written data from a neighbor's cache than from a remote cache. This dissertation identifies <i>node affinity</i> as an important property for scalable general-purpose locks. Several software-based hierarchical lock implementations exploiting NUCAs are presented and evaluated. NUCA-aware locks are shown to be almost twice as efficient for contended critical sections compared to traditional lock implementations.</p><p>The shared-memory “illusion”' provided by some large DSM systems may be implemented using either hardware, software or a combination thereof. A software-based implementation can enable cheap cluster hardware to be used, but typically suffers from poor and unpredictable performance characteristics.</p><p>This dissertation advocates a new software-hardware trade-off design point based on a new combination of techniques. The two low-level techniques, fine-grain deterministic coherence and synchronous protocol execution, as well as profile-guided protocol flexibility, are evaluated in isolation as well as in a combined setting using all-software implementations. Finally, a minimum of hardware trap support is suggested to further improve the performance of coherence protocols across cluster nodes. It is shown that all these techniques combined could result in a fairly stable performance on par with hardware-based coherence.</p> Datorteknik synchronization distributed shared memory write permission cache nonuniform communication architecture node affinity locality hardware-software trade-off profiling flexibility trap-based memory architecture Datorteknik Computer engineering Datorteknik
78	Dresdner Arbeitstagung Schaltungs- und Systementwurf mehrer Autoren, Sammelband 11 June 2007 (has links) (PDF) Die jedes Frühjahr stattfindende »Dresdner Arbeitstagung Schaltungs- und Systementwurf« wird traditionell vom Fraunhofer-Institut für Integrierte Schaltungen, Institutsteil Entwurfsautomatisierung (EAS) und vom Sächsischen Arbeitskreis Informationstechnik des VDE Bezirksvereins Dresden ausgerichtet. Die Arbeitstagung hat bereits eine über 30-jährige Tradition und wird von Wissenschaftlern aus Forschungsinstituten und Ingenieuren aus der Industrie für einen regen fachlichen Austausch genutzt. Gegenstand der Tagung sind aktuelle Ergebnisse und neue Erkenntnisse aus Forschung und Entwicklung sowie Erfahrungsberichte und Problemdiskussionen auf dem Gebiet des Entwurfs analoger, digitaler und hybrider Systeme. Das Tagungsprogramm bietet den Teilnehmern wieder interessante Beiträge über neue Lösungen zum Entwurf komplexer Schaltungen und Systeme, die auch Themen wie Rekonfigurierbarkeit, Architekturen, Performance, Hardware-Software, Test und Optimierung behandeln. Begleitend zur Tagung wird von der Firma Mentor Graphics ein Workshop zum Thema »Advanced Verification Methodology« angeboten. Hier werden an einem Beispiel die Vorteile der zukünftigen Design Verifikation mit System Verilog und Assertions erläutert. Der vorliegende Tagungsband enthält die Langfassungen der Beiträge, für deren Form und Inhalt die Autoren verantwortlich sind. Als Veranstalter bedanken wir uns bei den Autoren für die Bereitstellung dieser Beiträge, die als Grundlage für die fachlichen Diskussionen dienen, und bei den Teilnehmern für ihr Interesse an unserer Arbeitstagung. Architektur Aufsatzsammlung Entwurf analoger Systeme Entwurf digitaler Systeme Entwurf hybrider Systeme Hardware-Software-Design Optimierung Performance Rekonfigurierbarkeit Test ddc:004 ddc:500 Informatik Schaltungsentwurf Systementwurf Technische Informatik
79	Exploring coordinated software and hardware support for hardware resource allocation Figueiredo Boneti, Carlos Santieri de 04 September 2009 (has links) Multithreaded processors are now common in the industry as they offer high performance at a low cost. Traditionally, in such processors, the assignation of hardware resources between the multiple threads is done implicitly, by the hardware policies. However, a new class of multithreaded hardware allows the explicit allocation of resources to be controlled or biased by the software. Currently, there is little or no coordination between the allocation of resources done by the hardware and the prioritization of tasks done by the software.This thesis targets to narrow the gap between the software and the hardware, with respect to the hardware resource allocation, by proposing a new explicit resource allocation hardware mechanism and novel schedulers that use the currently available hardware resource allocation mechanisms.It approaches the problem in two different types of computing systems: on the high performance computing domain, we characterize the first processor to present a mechanism that allows the software to bias the allocation hardware resources, the IBM POWER5. In addition, we propose the use of hardware resource allocation as a way to balance high performance computing applications. Finally, we propose two new scheduling mechanisms that are able to transparently and successfully balance applications in real systems using the hardware resource allocation. On the soft real-time domain, we propose a hardware extension to the existing explicit resource allocation hardware and, in addition, two software schedulers that use the explicit allocation hardware to improve the schedulability of tasks in a soft real-time system.In this thesis, we demonstrate that system performance improves by making the software aware of the mechanisms to control the amount of resources given to each running thread. In particular, for the high performance computing domain, we show that it is possible to decrease the execution time of MPI applications biasing the hardware resource assignation between threads. In addition, we show that it is possible to decrease the number of missed deadlines when scheduling tasks in a soft real-time SMT system. MT CMP harware priorities thread prioritization resource balancing load balancing powers simultaneous multithreading SMT hardware-software codesign performance characterization software-controlled prioritization 004
80	High Performance FPGA-Based Computation and Simulation for MIMO Measurement and Control Systems Palm, Johan January 2009 (has links) The Stressometer system is a measurement and control system used in cold rolling to improve the flatness of a metal strip. In order to achieve this goal the system employs a multiple input multiple output (MIMO) control system that has a considerable number of sensors and actuators. As a consequence the computational load on the Stressometer control system becomes very high if too advance functions are used. Simultaneously advances in rolling mill mechanical design makes it necessary to implement more complex functions in order for the Stressometer system to stay competitive. Most industrial players in this market considers improved computational power, for measurement, control and modeling applications, to be a key competitive factor. Accordingly there is a need to improve the computational power of the Stressometer system. Several different approaches towards this objective have been identified, e.g. exploiting hardware parallelism in modern general purpose and graphics processors. Another approach is to implement different applications in FPGA-based hardware, either tailored to a specific problem or as a part of hardware/software co-design. Through the use of a hardware/software co-design approach the efficiency of the Stressometer system can be increased, lowering overall demand for processing power since the available resources can be exploited more fully. Hardware accelerated platforms can be used to increase the computational power of the Stressometer control system without the need for major changes in the existing hardware. Thus hardware upgrades can be as simple as connecting a cable to an accelerator platform while hardware/software co-design is used to find a suitable hardware/software partition, moving applications between software and hardware. In order to determine whether this hardware/software co-design approach is realistic or not, the feasibility of implementing simulator, computational and control applications in FPGAbased hardware needs to be determined. This is accomplished by selecting two specific applications for a closer study, determining the feasibility of implementing a Stressometer measuring roll simulator and a parallel Cholesky algorithm in FPGA-based hardware. Based on these studies this work has determined that the FPGA device technology is perfectly suitable for implementing both simulator and computational applications. The Stressometer measuring roll simulator was able to approximate the force and pulse signals of the Stressometer measuring roll at a relative modest resource consumption, only consuming 1747 slices and eight DSP slices. This while the parallel FPGA-based Cholesky component is able to provide performance in the range of GFLOP/s, exceeding the performance of the personal computer used for comparison in several simulations, although at a very high resource consumption. The result of this thesis, based on the two feasibility studies, indicates that it is possible to increase the processing power of the Stressometer control system using the FPGA device technology. FPGA Field Programmable Gate Arrays Parallel Cholesky Decomposition Hardware/Software Co-Design Digital Systems MIMO Measurement and Control FPGA-based Simulation FPGA-based Computation

Search results