• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 906
  • 337
  • 177
  • 171
  • 72
  • 65
  • 55
  • 27
  • 25
  • 19
  • 15
  • 12
  • 10
  • 8
  • 5
  • Tagged with
  • 2147
  • 518
  • 461
  • 311
  • 302
  • 228
  • 226
  • 212
  • 184
  • 183
  • 176
  • 173
  • 167
  • 167
  • 164
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1501

Characterizing and Manipulating Intra-Die Performance Variation of FPGAs and its Application in Security

Cook, Hayden C 09 July 2024 (has links) (PDF)
Field Programmable Gate Arrays (FPGAs) are reconfigurable, high-performing devices that are often used in critical applications. However, like all semiconductors, FPGAs experience transistor aging that can lower performance and lead to device failures. Additionally, device aging also has several security implications. Therefore, understanding the aging mechanisms behind transistor aging is necessary to ensure the reliability of FPGAs. However, current aging studies either rely on simulation alone or are unable to isolate aging effects on specific elements within the FPGA. This dissertation uses the reconfigurability of FPGAs to develop novel aging techniques that allow for the targeted aging of specific areas of the FPGA fabric. This allows us to manipulate the performance variation of a device, which allows for several interesting security applications. In addition, we use precise characterization methods that, when combined with our fine-grained aging techniques, allow us to isolate the effects of aging on individual paths and elements within the FPGA. This provides valuable insights into FPGA aging which can be used to develop new aging mitigation strategies. This dissertation is comprised of five major contributions. The first contribution uses thousands of short circuits to induce a non-uniform slowdown of an FPGA's programmable fabric. The second contribution demonstrates how modifier circuits can be inserted into a region of short circuits to perform more precise aging to a targeted region and allow us to manipulate performance variation at the tile level of an FPGA. The third contribution uses our targeted aging technique to demonstrate two security applications: frequency watermark and cloning a ring oscillator physical unclonable function (RO PUF) on an FPGA. The fourth contribution uses carefully crafted stress circuits and precise characterization methods to isolate the effects of transistor aging on individual paths within the FPGA. The final contribution uses elements of our precise characterization techniques to create a more reliable configurable RO PUF (CRO PUF) for cryptographic key generation on FPGAs.
1502

Efficient Processing of Convolutional Neural Networks on the Edge: A Hybrid Approach Using Hardware Acceleration and Dual-Teacher Compression

Alhussain, Azzam 01 January 2024 (has links) (PDF)
This dissertation addresses the challenge of accelerating Convolutional Neural Networks (CNNs) for edge computing in computer vision applications by developing specialized hardware solutions that maintain high accuracy and perform real-time inference. Driven by open-source hardware design frameworks such as FINN and HLS4ML, this research focuses on hardware acceleration, model compression, and efficient implementation of CNN algorithms on AMD SoC-FPGAs using High-Level Synthesis (HLS) to optimize resource utilization and improve the throughput/watt of FPGA-based AI accelerators compared to traditional fixed-logic chips, such as CPUs, GPUs, and other edge accelerators. The dissertation introduces a novel CNN compression technique, "Two-Teachers Net," which utilizes PyTorch FX-graph mode to train an 8-bit quantized student model using knowledge distillation from two teacher models, improving the accuracy of the compressed model by 1%-2% compared to existing solutions for edge platforms. This method can be applied to any CNN model and dataset for image classification and seamlessly integrated into existing AI hardware and software optimization toolchains, including Vitis-AI, OpenVINO, TensorRT, and ONNX, without architectural adjustments. This provides a scalable solution for deploying high-accuracy CNNs on low-power edge devices across various applications, such as autonomous vehicles, surveillance systems, robotics, healthcare, and smart cities.
1503

Analysis of Improved µ-Law Companding Technique for OFDM Systems

Ali, N., Almahainy, R., Al-Shabili, A., Almoosa, N., Abd-Alhameed, Raed 07 1900 (has links)
Yes / High Peak-to-Average-Power Ratio (PAPR) of transmitted signals is a common problem in broadband telecommunication systems using an orthogonal frequency division multiplexing (OFDM) modulation scheme, as it increases transmitter power consumption. In consumer applications where it impacts mobile terminal battery life and infrastructure running costs, this is a major factor in customer satisfaction. Companding techniques have been recently used to alleviate this high PAPR. In this paper, a companding scheme with an offset, amidst two nonlinear companding levels, is proposed to achieve better PAPR reduction while maintaining an acceptable bit error rate (BER) level, resulting in electronic products of higher power efficiency. Study cases have included the effect of companding on the OFDM signal with and without an offset. A novel closed-form approximation for the BER of the proposed companding scheme is also presented, and its accuracy is compared against simulation results. A method for choosing best companding parameters is presented based on contour plots. Practical emulation of a real time OFDM-based system has been implemented and evaluated using a Field Programmable Gate Array (FPGA).
1504

Attacks and Vulnerabilities of Hardware Accelerators for Machine Learning: Degrading Accuracy Over Time by Hardware Trojans

Niklasson, Marcus, Uddberg, Simon January 2024 (has links)
The increasing application of Neural Networks (NNs) in various fields has heightened the demand for specialized hardware to enhance performance and efficiency. Field-Programmable Gate Arrays (FPGAs) have emerged as a popular choice for implementing NN accelerators due to their flexibility, high performance, and ability to be customized for specific NN architectures. However, the trend of outsourcing Integrated Circuit (IC) design to third parties has introduced new security vulnerabilities, particularly in the form of Hardware Trojans (HTs). These malicious alterations can severely compromise the integrity and functionality of NN accelerators. Building upon this, this study investigates a novel type of HT that degrades the accuracy of Convolutional Neural Network (CNN) accelerators over time. Two variants of the attack are presented: Gradually Degrading Accuracy Trojan (GDAT) and Suddenly Degrading Accuracy Trojan (SDAT), implemented in various components of the CNN accelerator. The approach presented leverages a sensitivity analysis to identify the most impactful targets for the trojan and evaluates the attack’s effectiveness based on stealthiness, hardware overhead, and impact on accuracy.  The overhead of the attacks was found to be competitive when compared to other trojans, and has the potential to undermine trust and cause economic damages if deployed. Out of the components targeted, the memory component for the feature maps was identified as the most vulnerable to this attack, closely followed by the bias memory component. The feature map trojans resulted in a significant accuracy degradation of 78.16% with a 0.15% and 0.29% increase in Look-Up-Table (LUT) utilization for the SDAT and GDAT variants, respectively. In comparison, the bias trojans caused an accuracy degradation of 63.33% with a LUT utilization increase of 0.20% and 0.33% for the respective trojans. The power consumption overhead was consistent at 0.16% for both the attacks and trojan versions.
1505

Event Sequence Identification and Deep Learning Classification for Anomaly Detection and Predication on High-Performance Computing Systems

Li, Zongze 12 1900 (has links)
High-performance computing (HPC) systems continue growing in both scale and complexity. These large-scale, heterogeneous systems generate tens of millions of log messages every day. Effective log analysis for understanding system behaviors and identifying system anomalies and failures is highly challenging. Existing log analysis approaches use line-by-line message processing. They are not effective for discovering subtle behavior patterns and their transitions, and thus may overlook some critical anomalies. In this dissertation research, I propose a system log event block detection (SLEBD) method which can extract the log messages that belong to a component or system event into an event block (EB) accurately and automatically. At the event level, we can discover new event patterns, the evolution of system behavior, and the interaction among different system components. To find critical event sequences, existing sequence mining methods are mostly based on the a priori algorithm which is compute-intensive and runs for a long time. I develop a novel, topology-aware sequence mining (TSM) algorithm which is efficient to generate sequence patterns from the extracted event block lists. I also train a long short-term memory (LSTM) model to cluster sequences before specific events. With the generated sequence pattern and trained LSTM model, we can predict whether an event is going to occur normally or not. To accelerate such predictions, I propose a design flow by which we can convert recurrent neural network (RNN) designs into register-transfer level (RTL) implementations which are deployed on FPGAs. Due to its high parallelism and low power, FPGA achieves a greater speedup and better energy efficiency compared to CPU and GPU according to our experimental results.
1506

Analyse et amélioration de la qualité de services WEB multimédia et leurs mises en oeuvre sur ordinateur et sur FPGA

Al-Canaan, Amer January 2014 (has links)
Résumé : Les services Web, issus de l’avancée technologique dans le domaine des réseaux informatiques et des dispositifs de télécommunications portables et fixes, occupent une place primordiale dans la vie quotidienne des gens. La demande croissante sur des services Web multimédia (SWM), en particulier, augmente la charge sur les réseaux d’Internet, les fournisseurs de services et les serveurs Web. Cette charge est essentiellement due au fait que les SWM de haute qualité nécessitent des débits de transfert et des tailles de paquets importants. La qualité de service (par définition, telle que vue par l’utilisateur) est influencée par plusieurs facteurs de performance, comme le temps de traitement, le délai de propagation, le temps de réponse, la résolution d’images et l’efficacité de compression. Le travail décrit dans cette thèse est motivé par la demande continuellement croissante de nouveaux SWM et le besoin de maintenir et d’améliorer la qualité de ces services. Nous nous intéressons tout d’abord à la qualité de services (QdS) des SWM lorsqu’ils sont mis en œuvre sur des ordinateurs, tels que les ordinateurs de bureau ou les portables. Nous commençons par étudier les aspects de compatibilité afin d’obtenir des SWM fonctionnant de manière satisfaisante sur différentes plate-formes. Nous étudions ensuite la QdS des SWM lorsqu’ils sont mis en œuvre selon deux approches différentes, soit le protocole SOAP et le style RESTful. Nous étudions plus particulièrement le taux de compression qui est un des facteurs influençant la QdS. Après avoir considéré sous différents angles les SWM avec mise en œuvre sur des ordinateurs, nous nous intéressons à la QdS des SWM lorsqu’ils sont mis en œuvre sur FPGA. Nous effectuons alors une étude et une mise en œuvre qui permet d’identifier les avantages à mettre en œuvre des SWM sur FPGA. Les contributions se définissent en cinq volets comme suit : 1. Nous introduisons des méthodes de création, c’est-à-dire conception et mise en œuvre, de SWM sur des plate-formes logicielles hétérogènes dans différents environnements tels que Windows, OS X et Solaris. Un objectif que nous visons est de proposer une approche permettant d’ajouter de nouveaux SWM tout en garantissant la compatibilité entre les plate-formes, dans le sens où nous identifions les options nous permettant d’offrir un ensemble riche et varié de SWM pouvant fonctionner sur les différentes plate-formes. 2. Nous identifions une liste de paramètres pertinents influençant la QdS des SWM mis en œuvre selon le protocole SOAP et selon le style REST. 3. Nous développons un environnement d’analyse pour quantifier les impacts de chaque paramètre identifié sur la QdS de SWM. Pour cela, nous considérons les SWM mis en œuvre selon le protocole SOAP et aussi selon style REST. Les QdS obtenues avec SOAP et REST sont comparées objectivement. Pour faciliter la comparaison, la même gamme d’images (dans l’analyse de SWM SOAP) a été réutilisée et les mêmes plate-formes logicielles. 4. Nous développons une procédure d’analyse qui permet de déterminer une corrélation entre la dimension d’une image et le taux de compression adéquat. Les résultats obtenus confirment cette contribution propre à cette thèse qui confirme que le taux de compression peut être optimisé lorsque les dimensions de l’image ont la propriété suivante : le rapport entre la longueur et la largeur est égal au nombre d’or connu dans la nature. Trois libraires ont été utilisées à savoir JPEG, JPEG2000 et DjVu. 5. Dans un volet complémentaire aux quatre volets précédents, qui concernent les SWM sur ordinateurs, nous étudions ainsi la conception et la mise en œuvre de SWM sur FPGA. Nous justifions l’option de FPGA en identifiant ses avantages par rapport à deux autres options : ordinateurs et ASICs. Afin de confirmer plusieurs avantages identifiés, un SWM de QdS élevée et de haute performance est créé sur FPGA, en utilisant des outils de conception gratuits, du code ouvert (open-source) et une méthode fondée uniquement sur HDL. Notre approche facilitera l’ajout d’autres modules de gestions et d’orchestration de SWM. 6. La mise à jour et l’adaptation du code open-source et de la documentation du module Ethernet IP Core pour la communication entre le FPGA et le port Ethernet sur la carte Nexys3. Ceci a pour effet de faciliter la mise en œuvre de SWM sur la carte Nexys3. // Abstract : Web services, which are the outcome of the technological advancements in IT networks and hand-held mobile devices for telecommunications, occupy an important role in our daily life. The increasing demand on multimedia Web services (MWS), in particular, augments the load on the Internet, on service providers and Web servers. This load is mainly due to the fact that the high-quality multimedia Web services necessitate high data transfer rates and considerable payload sizes. The quality of service (QoS, by definition as it is perceived by the user) is influenced by several factors, such as processing time, propagation delay, response time, image resolution and compression efficacy. The research work in this thesis is motivated by the persistent demand on new MWS, and the need to maintain and improve the QoS. Firstly, we focus on the QoS of MWS when they are implemented on desktop and laptop computers. We start with studying the compatibility aspects in order to obtain MWS functioning satisfactorily on different platforms. Secondly, we study the QoS for MWS implemented according to the SOAP protocol and the RESTful style. In particular, we study the compression rate, which is one of the pertinent factors influencing the QoS. Thirdly, after the study of MWS when implemented on computers, we proceed with the study of QoS of MWS when implemented on hardware, in particular on FPGAs. We achieved thus comprehensive study and implementations that show and compare the advantages of MWS on FPGAs. The contributions of this thesis can be resumed as follows: 1. We introduce methods of design and implementation of MWS on heterogeneous platforms, such as Windows, OS X and Solaris. One of our objectives is to propose an approach that facilitates the integration of new MWS while assuring the compatibility amongst involved platforms. This means that we identify the options that enable offering a set of rich and various MWS that can run on different platforms. 2. We determine a list of relevant parameters that influence the QoS of MWS. 3. We build an analysis environment that quantifies the impact of each parameter on the QoS of MWS implemented on both SOAP protocol and RESTful style. Both QoS for SOAP and REST are objectively compared. The analysis has been held on a large scale of different images, which produces a realistic point of view describing the behaviour of real MWS. 4. We develop an analysis procedure to determine the correlation between the aspect ratio of an image and its compression ratio. Our results confirm that the compression ratio can be improved and optimised when the aspect ratio of iiiiv an image is close to the golden ratio, which exists in nature. Three libraries of compression schemes have been used, namely: JPEG, JPEG2000 and DjVu. 5. Complementary to the four contributions mentioned above, which concern the MWS on computers, we study also the design and implementation of MWS on FPGA. This is justified by the numerous advantages that are offered by FPGAs, compared to the other technologies such as computers and ASICs. In order to highlight the advantages of implementing MWS on FPGA, we developed on FPGA a MWS of high performance and high level of QoS. To achieve our goal, we utilised freely available design utilities, open-source code and a method based only on HDL. This approach is adequate for future extensions and add-on modules for MWS orchestration.
1507

Onduleur triphasé à structure innovante pour application aéronautique / Innovative three-phase Inverter structure for aircraft applications

Guepratte, Kevin 14 March 2011 (has links)
En aéronautique, les contraintes sont telles que la masse des filtres peut représenterjusqu'à 50% de la masse totale du convertisseur. Ces dernières années, les convertisseursmulticellulaires parallèles entrelacés et magnétiquement couplés ont conduit à améliorer lesperformances des convertisseurs (densité de puissance, efficacité, dynamique,...). Denombreuses topologies de filtrages entrelacés existent, l'objectif principal de cette étude est detrouver parmi ces topologies celles qui sont les mieux adaptées à la réalisation d’un onduleurde tension 110Veff / 400Hz triphasé 25kVA. Il est démontré que le choix du type de matériaumagnétique a un impact déterminant sur le poids, le volume et les pertes du convertisseur. Quidit parallélisation, dit multiplication du nombre de semi-conducteurs. Ces nouvelles structuresdoivent garantir à la fois un rendement élevé, une masse faible et une continuté defonctionnement, même en cas de panne d’un semiconducteur de puissance ou de sacommande. Mais coupler les phases entre elles, impose un lien indissociable qui peut êtrenuisible au fonctionnement de la structure en cas de dysfonctionnement. Des solutionsexistent et sont abordées dans l’étude. Enfin, la réalisation pratique d'un prototype semiindustrielde convertisseur triphasé utilisant des transformateurs interphases est présentée. Ils’agit d’un onduleur réseau avionique triphasé avec reconstruction de neutre pour fonctionneren déséquilibré. Les résultats expérimentaux démontrent l’avantage d’un convertisseur / In aeronautics field, the constraints are such as the mass of the filters can represent upto 50% of the total mass of the converter. During the last years, magnetic coupled interleavedconverters enhances performances (power density, efficiency, transient response). It existeseveral possibilities for use interleaved coupled topologies that use inter-phase transformerexist, the main objective of this study is to find among these topologies the best adaptedconfiguration in the context of a three-phase voltage inverter 110Veff/400Hz 25kVA. Thechoice of the magnetic material type has a great impact on determining the weight, thevolume and the losses of the converter. Parallelization leads to increase the semiconductornumber. These new structures must guarantee at the same time a raised efficiency, a low massand a great reliability, even in the event of case of breakdown of a power semiconductor orhis driver circuit. But coupleing the phases between themselve, imposes a dangerous stronglink on the structure operation in event of default. Solutions exist and are presented in thestudy. Lastly, the implementation of a semi-industrial of three-phase converter using interphasestransformers is performed. This inverter has been desinging to be use on a three-phaseavionics inverter network with rebuilding of neutral in case of unbalancing. Experimentalresults show the advantage of an interleaved converter compared with a conventional solution.
1508

Reliable On Board Data Processing System for the ICEYE- 1 satellite

Korczyk, Jakub January 2016 (has links)
Recent development in electronics for mobile devices has led to the decrease in sizes and cost of autonomous complex embedded systems such as satellites. It is now possible to build a satellite quicker and only for a fraction of previous costs by using Commercial Off The Shelf (COTS) components. Yet, there are some obstacles that need to be overcome before a successful small satellite can be designed. Among these are the radiation environment, thermal issues, the overall system complexity and tight schedules. This thesis addresses these issues and proposes an overall approach for designing small satellites’ electronics. This approach can be summarised in 6 recommendations: Keep it simple Use fast hardware iterations Do not use space grade components Use a single string design on the system level (no redundancy) Design with limited trust in the software Use simple, accessible and easy updatable documentation With respect to those recommendations an on board data processing system, the Processing Board, has been designed for the ICEYE-1 satellite. The ICEYE-1 satellite is a fully commercial Synthetic Aperture Radar (SAR) satellite that will be launched in December 2017. The designed board has been manufactured and verified during airborne test campaigns. / Nya elektronikutvecklingar för mobiltelefoner har lett till en minskning av storlek och kostnader för andra autonoma komplexa inbyggda system som t.ex. satelliter. Så kallade småsatelliter kan numera byggas snabbare och för endast en bråkdel av tidigare kostnader med hjälp av Commercial Off The Shelf (COTS) komponenter. Det finns dock vissa hinder som måste övervinnas om man vill designa en pålitligt fungerande småsatellit. Till dessa kan räknas strålningsmiljön, väl fungerande värmeledning, det totala systemets komplexitet samt snäva tidtabeller. Detta examensarbete behandlar dessa frågor och föreslår en övergripande strategi för att designa elektronik för småsatelliter. Detta tillvägagångssätt kan sammanfattas i 6 rekommendationer: Håll det enkelt Implementera snabba hårdvaruiterationer Använd inte rymdklassade komponenter Använd ingen redundans på systemnivå Designa med en begränsad tilltro på mjukvaran Dokumentera på ett enkelt, tillgängligt och lätt uppdateringsbart sätt Dessa rekommendationer har använts till att utveckla ett databehandlingssystem, kallat "Processing Board", till småsatelliten ICEYE-1. ICEYE-1 är en kommersiell Synthetic Aperture Radar (SAR) satellit som kommer att skjutas i omloppsbana i december 2017. Databehandlingssystemet i fråga har utvecklats och verifierats i samband med flygplansburna testkampanjer.
1509

Arithmetic recodings for ECC cryptoprocessors with protections against side-channel attacks / Unités arithmétiques reconfigurables pour cryptoprocesseurs robustes aux attaques

Chabrier, Thomas 18 June 2013 (has links)
Cette thèse porte sur l'étude, la conception matérielle, la validation théorique et pratique, et enfin la comparaison de différents opérateurs arithmétiques pour des cryptosystèmes basés sur les courbes elliptiques (ECC). Les solutions proposées doivent être robustes contre certaines attaques par canaux cachés tout en étant performantes en matériel, tant au niveau de la vitesse d'exécution que de la surface utilisée. Dans ECC, nous cherchons à protéger la clé secrète, un grand entier, utilisé lors de la multiplication scalaire. Pour nous protéger contre des attaques par observation, nous avons utilisé certaines représentations des nombres et des algorithmes de calcul pour rendre difficiles certaines attaques ; comme par exemple rendre aléatoires certaines représentations des nombres manipulés, en recodant certaines valeurs internes, tout en garantissant que les valeurs calculées soient correctes. Ainsi, l'utilisation de la représentation en chiffres signés, du système de base double (DBNS) et multiple (MBNS) ont été étudiés. Toutes les techniques de recodage ont été validées théoriquement, simulées intensivement en logiciel, et enfin implantées en matériel (FPGA et ASIC). Une attaque par canaux cachés de type template a de plus été réalisée pour évaluer la robustesse d'un cryptosystème utilisant certaines de nos solutions. Enfin, une étude au niveau matériel a été menée dans le but de fournir à un cryptosystème ECC un comportement régulier des opérations effectuées lors de la multiplication scalaire afin de se protéger contre certaines attaques par observation. / This PhD thesis focuses on the study, the hardware design, the theoretical and practical validation, and eventually the comparison of different arithmetic operators for cryptosystems based on elliptic curves (ECC). Provided solutions must be robust against some side-channel attacks, and efficient at a hardware level (execution speed and area). In the case of ECC, we want to protect the secret key, a large integer, used in the scalar multiplication. Our protection methods use representations of numbers, and behaviour of algorithms to make more difficult some attacks. For instance, we randomly change some representations of manipulated numbers while ensuring that computed values are correct. Redundant representations like signed-digit representation, the double- (DBNS) and multi-base number system (MBNS) have been studied. A proposed method provides an on-the-fly MBNS recoding which operates in parallel to curve-level operations and at very high speed. All recoding techniques have been theoretically validated, simulated extensively in software, and finally implemented in hardware (FPGA and ASIC). A side-channel attack called template attack is also carried out to evaluate the robustness of a cryptosystem using a redundant number representation. Eventually, a study is conducted at the hardware level to provide an ECC cryptosystem with a regular behaviour of computed operations during the scalar multiplication so as to protect against some side-channel attacks.
1510

Dynamic instruction set extension of microprocessors with embedded FPGAs

Bauer, Heiner 13 April 2017 (has links) (PDF)
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt.

Page generated in 0.027 seconds