Spelling suggestions: "subject:"highlevel bsynthesis"" "subject:"highlevel bioynthesis""
61 |
FPGA Accelerated Digital Image Correlation For Clamping Force MeasurementCsuvarszki, János Csanád January 2023 (has links)
Digital image correlation is a contactless optical method used for displacement and strain measurement which has become increasingly popular in the field of experimental mechanics. A specialized use case for the algorithm is to measure the clamping force in bolted joints, a crucial metric when considering the longevity and reliability of the constructs. However, in order to be able to measure the clamping force in real-time, the digital image correlation has to be carried out rapidly as the tightening of the bolts can happen in milliseconds. One approach to increase the speed of the process is hardware acceleration. This thesis presents and evaluates multiple variations of an Field Programmable Gate Arrays (FPGA)-accelerated digital image correlation framework. The goal of the project is to accelerate the image correlation to sufficient speeds so it can be used for highly dynamic and continuous tightenings, which can take 20 to 200 ms and 200 to 1000 ms or more to finish respectively. A baseline implementation was created based on an innovative digital image correlation framework. Strain calculation was altered for the specialized use of clamping force determination. Afterward, different parts of the framework were selected and optimized for hardware acceleration. The parts include both preprocessing and correlation steps. The targets for acceleration were optimized using techniques such as quantization and pipelining. The accelerators were created using high-level synthesis and the resulting implementations utilize both the processor and FPGA parts of a Zynq-7000 system-on-chip. Results show that all accelerators reduce the total execution time of the framework by varying degrees. Accelerators targeting the preprocessing parts such as Gaussian and B-spline filtering proved to be the most effective in speeding up the process achieving a 1,56 and 1,12 times speedup for the fixed-point and a 1,2 and 1,07 times speedup for the double floating-point versions respectively. A combined version containing multiple accelerators resulted in a 1,9 times average speedup. It can be concluded that the presented approach is not fast enough for all highly dynamic tightening processes, as the fastest execution speed achieved is above 100 ms, but could be used for continuous tightening depending on constructs. / Digital image correlation(DIC) är en kontaktlös optisk metod, använd för mätning av förskjutning och töjning, som blivit en allt mer populär inom experimentell mekanik. Ett användningsområde för algoritmen är att mäta klämkraften i skruvförband, en avgörande faktor för hållbarhet och tillförlitlighet i konstruktioner. Men för att mäta klämkraft i realtid, behöver DIC utföras väldigt snabbt då åtdragningsförloppet kan ske inom loppet av millisekunder. En metod för att öka hastigheten är hårdvaruacceleration. Denna avhandling presenterar och utvärderar ett flertal varianter av ett Field Programmable Gate Arrays (FPGA)-accelererat DIC ramverk. Avhandlingen syftar till att accelerera bildkorrelationen tillräckligt mycket för att kunna användas till dynamiska och kontinuerliga åtdragningar som tar 20 till 200 ms respektive 200 till 1000 ms eller mer. En referens-implementation skapades baserat på ett innovativt DIC ramverk. Beräkning av töjning anpassades för specialfallet: bestämmandet av klämkraft. Efter det valdes olika delar av ramverket ut och optimerades för hårdvaruacceleration. De valda delarna innehåller både preprocessor- och korrelationssteg. Delarna som valdes ut för acceleration optimerades med hjälp av tekniker som kvantisering och pipelining. Acceleratorerna skapades med hjälp av high-level synthesis och de resulterande implementationerna använder både processor och FPGA i en Zynq-7000 system-on-chip. Resultaten visar att alla acceleratorer reducerar ramverkets totala exekveringstid med varierande grad. Acceleratorer som riktar sig mot preprocessing som Gaussian och B-spline filtrering visade sig vara mest effektiva och resulterade i en 1.56 respektive 1.12 gånger snabbare exekveringstid för fixed point, och 1.2 respektive 1.07 gånger snabbare exikveringstid för double floating-point. En kombinerad version som innehöll flera acceleratorer resulterade i en 1.9 gånger snabbare genomsnittlig exekveringstid. Slutsatsen är att den presenterade metoden inte är tillräckligt snabb för alla dynamiska åtdragningsförlopp, då den snabbaste uppnådda exekveringstiden är över 100 ms. Men metoden skulle kunna användas för kontinuerliga åtdragningar beroende på konstruktionen.
|
62 |
Design space exploration using HLS in relation to code structuring / Utforskning av design space med HLS i förhållande till kodstruktureringDas, Debraj January 2022 (has links)
High Level Synthesis (HLS) is a methodology to translate a model developed in a high abstraction layer, e.g. C/C++/SystemC, that describes the algorithm into a Register-Transfer level (RTL) description like Verilog or VHDL. The resulting RTL description from the translation is subject to multiple user-controlled directives and an internal design space exploration algorithm specific to the toolchain used. HLS allow designers to focus on the behaviour of the design at a higher abstraction compared to the behavioural modelling available within the Hardware Description Language (HDL) as the compiler decides the movement of data and timing in the resulting design. Ericsson uses a legacy Advanced Peripheral Bus (APB) like interface called Memory/Register Interface (MIRI) interface for data movement in a subsystem of one of their Application-Specific Integrated Circuit (ASIC). The thesis attempts to upgrade the protocol to the more performant ARM Advanced Microcontroller Bus Architecture (AMBA) protocols’ Advanced High-performance Bus (AHB) or Advanced eXtensible Interface (AXI) interfaces. SystemC provides a host of functionalities to define the complete behaviour of the circuit at a high level of abstraction. This thesis will explore the effect of the structuring SystemC models on their synthesis, and perform design space exploration to understand the best design methodology to adopt in a SystemC model design and compare the models based on the final synthesis metrics like area, timing, and register counts. The toolchain for the thesis will be the Stratus HLS compiler developed by Cadence. Stratus supports all synthesizable constructs of SystemC. Most HLS research focuses on improving Design Space Exploration algorithms used internally in the HLS tools. However, designers can utilize algorithm structuring to provide the HLS engines with a better starting point. In this thesis, the Stratus toolchain will be used to experiment with different models with equivalent behaviour and performance. Thereafter, extract which constructs used in the models are optimal for allowing the internal design space exploration algorithm to perform in the best way possible. / HLS är en metod för att översätta en modell utvecklad på hög abstraktionsnivå t.ex. C/C++/SystemC som beskriver algoritmen på registeröverföringsnivå (RTL) som Verilog eller VHDL. Den resulterande RTL-beskrivningen utsätts för flera användarkontrollerade direktiv och en intern Design Space Exploration (DSE) algoritm, vilken är specifik för den verktygskedja som används. Detta gör det möjligt för en designer att fokusera på konstruktion beteende på en högre abstraktionsnivå jämfört med den beteendemodellering som finns tillgänglig inom det hårdvarubeskrivande språket (HDL:en) när kompilatorn bestämmer tidpunkten för utbytet av data i den resulterande designen. Ericsson använder ett äldre gränssnitt för Advanced Peripheral Bus (APB) som kallas Memory/Register Interface (MIRI), vilket är ett gränssnitt för utbyte av data i ett delsystem i en av deras Application-Specific Integrated Circuit (ASIC:ar). Avhandlingen försöker uppgradera protokollet till ett av de det mer högpresterande ARM Advanced Microcontroller Bus Architecture – protokollen Advanced High-Performance Bus (AHB) eller Advanced eXtensible Interface (AXI). SystemC tillhandahåller en mängd funktioner för att definiera kretsens fullständiga beteende vid en hög abstraktionsnivå. Denna avhandling utforskar effekten av strukturerade SystemC-modeller och deras syntesresultat samt konstruktionsrymden, för att förstå den bästa designmetodiken i ett SystemC-modelleringsdesignflöde och jämföra modellerna baserade på de slutliga syntesmätvärdena som storlek, timing, etc. Verktygskedjan för avhandlingen kommer att vara Stratus HLS -kompilatorn som utvecklats av Cadence. Stratus stöder alla syntetiserbara konstruktioner av SystemC. HLS-forskningen fokuserar främst på att förbättra Design Space Exploration, dvs de algoritmer som används internt i HLS-verktygen för att komma fram till lösningar. För att ge HLS -motorerna en bättre utgångspunkt. I denna avhandling kommer Stratus att användas för att utvärdera olika modeller med ekvivalent beteende och nästan samma prestanda efter Syntes, för att komma fram till vilka konstruktioner är optimala för att den interna DSE-algoritmen skall fungera bäst.
|
63 |
High Level Power Estimation and Reduction Techniques for Power Aware Hardware DesignAhuja, Sumit 14 June 2010 (has links)
The unabated continuation of the Moore's law has allowed the doubling of the number of transistors per unit area of a silicon die every 2 years or so. At the same time, an increasing demand on consumer electronics and computing equipments to run sophisticated applications has led to an unprecedented complexity of hardware designs. These factors have necessitated the abstraction level of design-entry of hardware systems to be raised beyond the Register-Transfer-Level (RTL) to Electronic System Level (ESL). However, power envelope on the designs due to packaging and other thermal limitations, and the energy envelope due to battery life-time considerations have also created a need for power/energy efficient design. The confluence of these two technological issues has created an urgent need for solving two problems: (i) How do we enable a power-aware design flow with a design entry point at the Electronic System Level? (ii) How do we enable power aware High Level Synthesis to automatically synthesize RTL implementation from ESL?
This dissertation distinguishes itself by addressing the following two issues: (i) Since power/energy consumption of electronic systems largely depends on implementation details, and high-level models abstract away from such details, power/energy estimation at such levels has not been addressed thoroughly. (ii) A lot of work has been done in applying various techniques on control-data-flow graphs (CDFG) to find power/area/latency pareto points during behavioral synthesis. However, high level C-based functional models of various compute-intensive components, which could be easily synthesized as co-processors, have many opportunities to reduce power. Some of these savings opportunities are traditional such as clock-gating, operand-isolation etc. The exploration of alternate granularities of these techniques with target applications in mind, opens the door for traditional power reduction opportunities at the high-level.
This work therefore concentrates on the aforementioned two areas of inadequacy of hardware design methodologies. Our proposed solutions include utilizing ESL simulation traces and mapping those to lower abstraction levels for power estimation, derivation of statistical power models using regression based learning for power estimation at early design stages, etc. On the HLS front, techniques that insert the power saving features during the synthesis process using exploration of granularity and scope of clock-gating, sequential clock-gating are proposed. Finally, this work shows how to marry two domains, that is estimation and reduction. In this regard, a power model is proposed, which helps in predicting power savings obtained using clock-gating and further guiding HLS to selectively insert clock-gating. / Ph. D.
|
64 |
FPGA Reservoir Computing Networks for Dynamic Spectrum SensingShears, Osaze Yahya 14 June 2022 (has links)
The rise of 5G and beyond systems has fuelled research in merging machine learning with wireless communications to achieve cognitive radios. However, the portability and limited power supply of radio frequency devices limits engineers' ability to combine them with powerful predictive models. This hinders the ability to support advanced 5G applications such as device-to-device (D2D) communication and dynamic spectrum sharing (DSS). This challenge has inspired a wave of research in energy efficient machine learning hardware with low computational and area overhead. In particular, hardware implementations of the delayed feedback reservoir (DFR) model show promising results for meeting these constraints while achieving high accuracy in cognitive radio applications. This thesis answers two research questions surrounding the applicability of FPGA DFR systems for DSS. First, can a DFR network implemented on an FPGA run faster and with lower power than a purely software approach? Second, can the system be implemented efficiently on an edge device running at less than 10 watts?
Two systems are proposed that prove FPGA DFRs can achieve these feats: a mixed-signal circuit, followed by a high-level synthesis circuit. The implementations execute up to 58 times faster, and operate at more than 90% lower power than the software models. Furthermore, the lowest recorded average power of 0.130 watts proves that these approaches meet typical edge device constraints. When validated on the NARMA10 benchmark, the systems achieve a normalized error of 0.21 compared to state-of-the-art error values of 0.15. In a DSS task, the systems are able to predict spectrum occupancy with up to 0.87 AUC in high noise, multiple input, multiple output (MIMO) antenna configurations compared to 0.99 AUC in other works. At the end of this thesis, the trade-offs between the approaches are analyzed, and future directions for advancing this study are proposed. / Master of Science / The rise of 5G and beyond systems has fuelled research in merging machine learning with wireless communications to achieve cognitive radios. However, the portability and limited power supply of radio frequency devices limits engineers' ability to combine them with powerful predictive models. This hinders the ability to support advanced 5G and internet-of-things (IoT) applications. This challenge has inspired a wave of research in energy efficient machine learning hardware with low computational and area overhead. In particular, hardware implementations of a low complexity neural network model, called the delayed feedback reservoir, show promising results for meeting these constraints while achieving high accuracy in cognitive radio applications. This thesis answers two research questions surrounding the applicability of field-programmable gate array (FPGA) delayed feedback reservoir systems for wireless communication applications. First, can this network implemented on an FPGA run faster and with lower power than a purely software approach? Second, can the network be implemented efficiently on an edge device running at less than 10 watts? Two systems are proposed that prove the FPGA networks can achieve these feats. The systems demonstrate lower power consumption and latency than the software models. Additionally, the systems maintain high accuracy on traditional neural network benchmarks and wireless communications tasks. The second implementation is further demonstrated in a software-defined radio architecture. At the end of this thesis, the trade-offs between the approaches are analyzed, and future directions for advancing this study are proposed.
|
65 |
Modélisation, exploration et estimation de la consommation pour les architectures hétérogènes reconfigurables dynamiquement / Model, exploration and estimation of consumption in dynamically reconfigurable heterogeneous architecturesBonamy, Robin 12 July 2013 (has links)
L'utilisation des accélérateurs reconfigurables, pour la conception de system-on-chip hétérogènes, offre des possibilités intéressantes d'augmentation des performances et de réduction de la consommation d'énergie. En effet, ces accélérateurs sont couramment utilisés en complément d'un (ou de plusieurs) processeur(s) pour permettre de décharger celui-ci (ceux-ci) des calculs intensifs et des traitements de flots de données. Le concept de reconfiguration dynamique, supporté par certains constructeurs de FPGA, permet d'envisager des systèmes beaucoup plus flexibles en offrant notamment la possibilité de séquencer temporellement l'exécution de blocs de calcul sur la même surface de silicium, réduisant alors les besoins en ressources d'exécution. Cependant, la reconfiguration dynamique n'est pas sans impact sur les performances globales du système et il est difficile d'estimer la répercussion des décisions de configuration sur la consommation d'énergie. L'objectif principal de cette thèse consiste à proposer une méthodologie d'exploration permettant d'évaluer l'impact des choix d'implémentation des différentes tâches d'une application sur un system-on-chip contenant une ressource reconfigurable dynamiquement, en vue d'optimiser la consommation d'énergie ou le temps d'exécution. Pour cela, nous avons établi des modèles de consommation des composants reconfigurables, en particulier les FPGAs, qui permettent d'aider le concepteur dans son design. À l'aide d'une méthodologie de mesure sur Virtex-5, nous montrons dans un premier temps qu'il est possible de générer des accélérateurs matériels de tailles variées ayant des performances temporelles et énergétiques diverses. Puis, afin de quantifier les coûts d'implémentation de ces accélérateurs, nous construisons trois modèles de consommation de la reconfiguration dynamique partielle. Finalement, à partir des modèles définis et des accélérateurs produits, nous développons un algorithme d'exploration des solutions d'implémentation pour un système complet. En s'appuyant sur une plate-forme de modélisation à haut niveau, celui-ci analyse les coûts d'implémentation des tâches et leur exécution sur les différentes ressources disponibles (processeur ou région configurable). Les solutions offrant les meilleures performances en fonction des contraintes de conception sont retenues pour être exploitées. / The use of reconfigurable accelerators when designing heterogeneous system-on-chip has the potential to increase performance and reduce energy consumption. Indeed, these accelerators are commonly a adjunct to one (or more) processor(s) and unload intensive computations and treatments. The concept of dynamic reconfiguration, supported by some FPGA vendors, allows to consider more flexible systems including the ability to sequence the execution of accelerators on the same silicon area, while reducing resource requirements. However, dynamic reconfiguration may impact overall system performance and it is hard to estimate the impact of configuration decisions on energy consumption.. The main objective of this thesis is to provide an exploration methodology to assess the impact of implementation choices of tasks of an application on a system-on-chip containing a dynamically reconfigurable resource, to optimize the energy consumption or the processing time. Therefore, we have established consumption models of reconfigurable components, particularly FPGAs, which assists the designer. Using a measurement methodology on Virtex-5, we first show the possibility to generate hardware accelerators of various sizes, execution time and energy consumption. Then, in order to quantify the implementation costs of these accelerators, we build three power models of the dynamic and partial reconfiguration. Finally, from these models, we develop an algorithm for the exploration of implementation and allocation possibilities for a complete system. Based on a high-level modeling platform, the implementation costs of the tasks and their performance on various resources (CPU or reconfigurable region) are analyzed. The solutions with the best characteristics, based on design constraints, are extracted.
|
66 |
Fast Code Exploration for Pipeline Processing in FPGA Accelerators / Exploração Rápida de Códigos para Processamento Pipeline em Aceleradores FPGARosa, Leandro de Souza 31 May 2019 (has links)
The increasing demand for energy efficient computing has endorsed the usage of Field-Programmable Gate Arrays to create hardware accelerators for large and complex codes. However, implementing such accelerators involve two complex decisions. The first one lies in deciding which code snippet is the best to create an accelerator, and the second one lies in how to implement the accelerator. When considering both decisions concomitantly, the problem becomes more complicated since the code snippet implementation affects the code snippet choice, creating a combined design space to be explored. As such, a fast design space exploration for the accelerators implementation is crucial to allow the exploration of different code snippets. However, such design space exploration suffers from several time-consuming tasks during the compilation and evaluation steps, making it not a viable option to the snippets exploration. In this work, we focus on the efficient implementation of pipelined hardware accelerators and present our contributions on speeding up the pipelines creation and their design space exploration. Towards loop pipelining, the proposed approaches achieve up to 100× speed-up when compared to the state-uf-the-art methods, leading to 164 hours saving in a full design space exploration with less than 1% impact in the final results quality. Towards design space exploration, the proposed methods achieve up to 9:5× speed-up, keeping less than 1% impact in the results quality. / A demanda crescente por computação energeticamente eficiente tem endossado o uso de Field- Programmable Gate Arrays para a criação de aceleradores de hardware para códigos grandes e complexos. Entretanto, a implementação de tais aceleradores envolve duas decisões complexas. O primeiro reside em decidir qual trecho de código é o melhor para se criar o acelerador, e o segundo reside em como implementar tal acelerador. Quando ambas decisões são consideradas concomitantemente, o problema se torna ainda mais complicado dado que a implementação do trecho de código afeta a seleção dos trechos de código, criando um espaço de projeto combinatorial a ser explorado. Dessa forma, uma exploração do espaço de projeto rápida para a implementação de aceleradores é crucial para habilitar a exploração de diferentes trechos de código. Contudo, tal exploração do espaço de projeto é impedida por várias tarefas que consumem tempo durante os passos de compilação a análise, o que faz da exploração de trechos de códigos inviável. Neste trabalho, focamos na implementação eficiente de aceleradores pipeline em hardware e apresentamos nossas contribuições para o aceleramento da criações de pipelines e de sua exploração do espaço de projeto. Referente à criação de pipelines, as abordagens propostas alcançam uma aceleração de até 100× quando comparadas às abordagens do estado-da-arte, levando à economia de 164 horas em uma exploração de espaço de projeto completa com menos de 1% de impacto na qualidade dos resultados. Referente à exploração do espaço de projeto, as abordagens propostas alcançam uma aceleração de até 9:5×, mantendo menos de 1% de impacto na qualidade dos resultados.
|
67 |
High-Level Synthesis Framework for Crosstalk Minimization in VLSI ASICsSankaran, Hariharan 31 October 2008 (has links)
Capacitive crosstalk noise can affect the delay of a switching signal or induce a glitch on a static signal causing timing violations or chip failure. Crosstalk noise depends on coupling parasitics, driver strength, signal timing characteristics, and signal transition patterns. Layout level crosstalk analysis techniques are generally pessimistic and computationally expensive for large designs due to lack of design flexibility at lower-levels of design hierarchy. The architectural decisions such as type of interconnect architecture, number of storage and execution units, network of communicating units, data bus width, etc., have a major impact on the quality of design attributes such as area, speed, power, and noise. To address all these concerns, we propose a high-level synthesis framework to optimize for worst-case crosstalk patterns on coupled nets, a floorplan driven high-level synthesis framework to minimize coupling capacitance, and an on-chip technique to dynamically detect and eliminate worst-case crosstalk pattern on bus-based macro-cell designs.
Due to Miller coupling effect, the switching activity pattern on adjacent nets may increase the effective capacitance seen by a victim net and thereby it may cause a worst-case signal delay on the victim net. However, signal activity pattern on coupled nets are dependent on data correlations which in turn depend on resource sharing. The resource sharing in turn depends on scheduling, allocation, and binding during high-level synthesis flow.
Therefore, we propose a Simulated Annealing (SA) based design space exploration of HLS design subspace, bus line re-ordering, and encoding subspaces to optimize for worst-case crosstalk pattern in bus-based macro-cell designs. We demonstrate that the proposed framework will aid layout level techniques in eliminating false positive violations. We also propose an SA based algorithm to explore floorplan and HLS subspaces to optimize coupling capacitances in bus-based macro-cell designs. We have integrated an RTL floorplanner in HLS flow to estimate coupling capacitances between bus lines. Crosstalk analysis using Cadence Celtic shows that the designs generated by the proposed framework results in less number of crosstalk violations compared to designs generated through traditional ASIC design flow. We also propose an on-chip crosstalk detection and elimination technique that dynamically detects and eliminates worst-case crosstalk pattern with minimum area penalty compared to other layout level techniques reported in the literature.
|
68 |
Parallel Hardware- and Software Threads in a Dynamically Reconfigurable System on a Programmable ChipRößler, Marko 06 December 2013 (has links) (PDF)
Today’s embedded systems depend on the availability of hybrid platforms, that contain heterogeneous computing resources such as programmable processors units (CPU’s or DSP’s) and highly specialized hardware cores. These platforms have been scaled down to integrated embedded system-on-chip. Modern platform FPGAs enhance such systems by the flexibility of runtime configurable silicon. One of the major advantages that arises is the ability to use hardware (HW) and software (SW) resources in a time-shared manner. Though the ability to dynamically assign computing resources based on decisions taken at runtime is given.
|
69 |
Implantation matérielle de chiffrements homomorphiques / Hardware implementation of homomorphic encryptionMkhinini, Asma 14 December 2017 (has links)
Une des avancées les plus notables de ces dernières années en cryptographie est sans contredit l’introduction du premier schéma de chiffrement complètement homomorphe par Craig Gentry. Ce type de système permet de réaliser des calculs arbitraires sur des données chiffrées, sans les déchiffrer. Cette particularité permet de répondre aux exigences de sécurité et de protection des données, par exemple dans le cadre en plein développement de l'informatique en nuage et de l'internet des objets. Les algorithmes mis en œuvre sont actuellement très coûteux en temps de calcul, et généralement implantés sous forme logicielle. Les travaux de cette thèse portent sur l’accélération matérielle de schémas de chiffrement homomorphes. Une étude des primitives utilisées par ces schémas et la possibilité de leur implantation matérielle est présentée. Ensuite, une nouvelle approche permettant l’implantation des deux fonctions les plus coûteuses est proposée. Notre approche exploite les capacités offertes par la synthèse de haut niveau. Elle a la particularité d’être très flexible et générique et permet de traiter des opérandes de tailles arbitraires très grandes. Cette particularité lui permet de viser un large domaine d’applications et lui autorise d’appliquer des optimisations telles que le batching. Les performances de notre architecture de type co-conception ont été évaluées sur l’un des cryptosystèmes homomorphes les plus récents et les plus efficaces. Notre approche peut être adaptée aux autres schémas homomorphes ou plus généralement dans le cadre de la cryptographie à base de réseaux. / One of the most significant advances in cryptography in recent years is certainly the introduction of the first fully homomorphic encryption scheme by Craig Gentry. This type of cryptosystem allows performing arbitrarily complex computations on encrypted data, without decrypting it. This particularity allows meeting the requirements of security and data protection, for example in the context of the rapid development of cloud computing and the internet of things. The algorithms implemented are currently very time-consuming, and most of them are implemented in software. This thesis deals with the hardware acceleration of homomorphic encryption schemes. A study of the primitives used by these schemes and the possibility of their hardware implementation is presented. Then, a new approach allowing the implementation of the two most expensive functions is proposed. Our approach exploits the high-level synthesis. It has the particularity of being very flexible and generic and makes possible to process operands of arbitrary large sizes. This feature allows it to target a wide range of applications and to apply optimizations such as batching. The performance of our co-design was evaluated on one of the most recent and efficient homomorphic cryptosystems. It can be adapted to other homomorphic schemes or, more generally, in the context of lattice-based cryptography.
|
70 |
Génération rapide d'accélerateurs matériels par synthèse d'architecture sous contraintes de ressources / High-level synthesis for fast generation of hardware accelerators under resource constraintsProst-Boucle, Adrien 08 January 2014 (has links)
Dans le domaine du calcul générique, les circuits FPGA sont très attrayants pour leur performance et leur faible consommation. Cependant, leur présence reste marginale, notamment à cause des limitations des logiciels de développement actuels. En effet, ces limitations obligent les utilisateurs à bien maîtriser de nombreux concepts techniques. Ils obligent à diriger manuellement les processus de synthèse, de façon à obtenir une solution à la fois rapide et conforme aux contraintes des cibles matérielles visées.Une nouvelle méthodologie de génération basée sur la synthèse d'architecture est proposée afin de repousser ces limites. L'exploration des solutions consiste en l'application de transformations itératives à un circuit initial, ce qui accroît progressivement sa rapidité et sa consommation en ressources. La rapidité de ce processus, ainsi que sa convergence sous contraintes de ressources, sont ainsi garanties. L'exploration est également guidée vers les solutions les plus pertinentes grâce à la détection, dans les applications à synthétiser, des sections les plus critiques pour le contexte d'utilisation réel. Cette information peut être affinée à travers un scénario d'exécution transmis par l'utilisateur.Un logiciel démonstrateur pour cette méthodologie, AUGH, est construit. Des expérimentations sont menées sur plusieurs applications reconnues dans le domaine de la synthèse d'architecture. De tailles très différentes, ces applications confirment la pertinence de la méthodologie proposée pour la génération rapide et autonome d'accélérateurs matériels complexes, sous des contraintes de ressources strictes. La méthodologie proposée est très proche du processus de compilation pour les microprocesseurs, ce qui permet son utilisation même par des utilisateurs non spécialistes de la conception de circuits numériques. Ces travaux constituent donc une avancée significative pour une plus large adoption des FPGA comme accélérateurs matériels génériques, afin de rendre les machines de calcul simultanément plus rapides et plus économes en énergie. / In the field of high-performance computing, FPGA circuits are very attractive for their performance and low consumption. However, their presence is still marginal, mainly because of the limitations of current development tools. These limitations force the user to have expert knowledge about numerous technical concepts. They also have to manually control the synthesis processes in order to obtain solutions both fast and that fulfill the hardware constraints of the targeted platforms.A novel generation methodology based on high-level synthesis is proposed in order to push these limits back. The design space exploration consists in the iterative application of transformations to an initial circuit, which progressively increases its rapidity and its resource consumption. The rapidity of this process, along with its convergence under resource constraints, are thus guaranteed. The exploration is also guided towards the most pertinent solutions thanks to the detection of the most critical sections of the applications to synthesize, for the targeted execution context. This information can be refined with an execution scenarion specified by the user.A demonstration tool for this methodology, AUGH, has been built. Experiments have been conducted with several applications known in the field of high-level synthesis. Of very differen sizes, these applications confirm the pertinence of the proposed methodology for fast and automatic generation of complex hardware accelerators, under strict resource constraints. The proposed methodology is very close to the compilation process for microprocessors, which enable it to be used even by users non experts about digital circuit design. These works constitute a significant progress for a broader adoption of FPGA as general-purpose hardware accelerators, in order to make computing machines both faster and more energy-saving.
|
Page generated in 0.0733 seconds