Global ETD Search

41	Analyse et modélisation de l'effet des marées sur les réseaux de nivellement hydrostatiques du CERN Boerez, Julien 21 February 2013 (has links) (PDF) Les géomètres de la section Survey de l'Organisation Européenne pour la Recherche Nucléaire (CERN) utilisent le nivellement hydrostatique HLS pour effectuer des alignements verticaux précis. Le HLS atteint des précisions micrométriques, ce qui lui permet d'être utilisé pour les expériences à but de physique fondamentale comme le Large Hadron Collider (LHC). HLS mesure certes des déformations qui ont pour conséquence de désaligner tout accélérateur de particules lié au sol, mais il mesure également d'autres phénomènes aux caractéristiques bien particulières. Parmi ces phénomènes mesurés, les marées terrestres représentent une part très largement majoritaire du signal. Leur effet sur les mesures HLS est périodique et engendre une inclinaison longue base qui n'aboutit pas au désalignement relatif des aimants constitutifs d'un accélérateur. Les objectifs de ce doctorat sont de pouvoir prédire les effets ne perturbant pas l'alignement relatif d'un accélérateur de particules et ainsi corriger les mesures HLS de ces signaux. En effet, les tolérances planimétrique et altimétrique à respecter dans le domaine des accélérateurs de particules sont de plus en plus serrées. Par exemple, le Compact Linear Collider (CLIC), aujourd'hui à l'étude de faisabilité, nécessite une précision d'alignement à 3σ de 10 μm dans une fenêtre glissante de 200 m selon les directions transversale et verticale. Le HLS est candidat pour assurer cet alignement vertical mais l'amplitude de marée est d'environ +/-20 μm à 200 m, rendant nécessaire la prise en compte de ce phénomène longue base pour que l'instrumentation réponde aux besoins du CLIC. Ce doctorat est inspiré des travaux déjà réalisés sur les inclinomètres longue base et décrit les effets mesurés par HLS afin de classer ces phénomènes selon qu'ils désalignent ou non un accélérateur de particules. Enfin, les outils et modèles pour prédire les effets maitrisables sont utilisés pour anticiper les différents signaux mesurés par les HLS installés au CERN. [SDU:OTHER] Planète et Univers/Autre Hydrostatic Levelling System (HLS) Nivellement hydrostatique Inclinomètre CERN LHC CLIC Alignement vertical Marées terrestres Surcharges Effets locaux Effets de cavité
42	Multimediální přehrávač pro iOS / Multimedia Player for iOS Singh, Kevin January 2019 (has links) Diploma work „Multimedia player for iOS“ deals with the description of the video formats such as HLS, MP4, MPEG Transport Stream, and Dash. Next, it continues with protected content DRM, VAST advertisements and analytics tool called Google Analytics. During compilation of this diploma work was created a framework in programming language Swift. This framework is basically a player, that besides playing Interactive videos is able to show ads, subtitles, can change the video quality, AirPlay and download the content for offline playing. A feature to play a protected content could not be done as a developer's request for a product deployment SDK Fairplay was refused from an SDK owner. A testing app was also created that implements developed a framework and prove the functionality of the developed functions.
43	A High-Level Interface for Accelerating Spiking Neural Networks on the Edge with Heterogeneous Hardware : Enabling Rapid Prototyping of Training Algorithms and Topologies on Field-Programmable Gate Arrays Eidlitz Rivera, Kaspar Oscarsson January 2024 (has links) With the increasing use of machine learning by devices at the network's edge, a trend of moving computation from data centers to these devices is emerging. This shift imposes strict energy requirements on the algorithms used and the hardware on which they are implemented. Neuromorphic spiking neural networks (SNNs) and heterogeneous sytems on a chip (SoCs) are showing great potential for energy-efficient computing on the edge. This thesis describes the development of a high-level interface for accelerating SNNs on an FPGA–CPU SoC. The system is based on an existing open-source, low-level implementation, adapting it for a research-focused Python front-end. The developed interface provides a productive environment for exploring and evaluating SNN algorithms and topologies through compatibility with industry-standard tools for numerical computing, data analysis, and visualization, while still taking full advantage of FPGA-based hardware acceleration. The system is evaluated and showcased by analyzing the training of a small network to solve the XOR problem. As the project matures, future development could enable integration with commonly used machine learning libraries, further increasing it's potential. Field-Programmable Gate Arrays FPGA Spiking Neural Networks SNN High-Level Synthesis HLS System on a Chip SoC Machine Learning Izhikevich Computer Systems Datorsystem
44	Command and control of Special Operations Forces missions in the US Northern Command Area of Responsibility McGregor, Otis W., III 03 1900 (has links) Approved for public release, distribution is unlimited / The need for a well thought out, planned, and rehearsed command and control organization to conduct special operations in the US Northern Command Area of Responsibility is vital to success in defending the Homeland. Currently, USNORTHCOM does not have an apportioned or assigned command and control structure for the conduct of special operations. This thesis analyzes three courses of action to fulfill this requirement: use the current USNORTHCOM battle staff command structure including the integration of the Standing Joint Force Headquarters-North; rely on the newly formed US Special Operations Command's Joint Task Force Structures; and establish a Theater Special Operations Command North assigned to USNORTHCOM. Through the conduct of analysis and research this thesis recommends that The Joint Staff direct the reorganization required to establish a Theater Special Operations Command North to exercise command and control of special operations forces conducting operations in the USNORTHCOM AOR. / Lieutenant Colonel, United States Army Special forces (Military science) United States National security Special Operations Forces (SOF) Homeland security (HLS) Homeland Defense (HLD) US Northern Command (USNORTHCOM) Command and control (C2) Joint task force (JTF) Threat to homeland Posse comitatus
45	Evaluation of high-level synthesis tools for generation of Verilog code from MATLAB based environments Bäck, Carl January 2020 (has links) FPGAs are of interest in the signal processing domain as they provide the opportunity to run algorithms at very high speed. One possible use case is to sort incoming data in a measurement system, using e.g. a histogram method. Developing code for FPGA applications usually requires knowledge about special languages, which are not common knowledge in the signal processing domain. High-level synthesis is an approach where high-level languages, as MATLAB or C++, can be used together with a code generation tool, to directly generate an FPGA ready output. This thesis uses the development of a histogram as a test case to investigate the efficiency of three different tools, HDL Coder in MATLAB, HDL Coder in Simulink and System Generator for DSP in comparison to the direct development of the same histogram in Vivado using Verilog. How to write and structure code in these tools for proper functionality was also examined. It has been found that all tools deliver an operation frequency comparable to a direct implementation in Verilog, decreased resource usage, a development time which decreased by 27% (HDL Coder in MATLAB), 45% (System Generator) and 64% (HDL Coder in Simulink) but at the cost of increased power consumption. Instructions for how to use all three tools has been collected and summarised. / I ingångssteget på ett mätsystem är det av intresse att använda en FPGA för att uppnå höga hastigheter på de oundvikliga datafiltrering och sorterings algoritmer som körs. Ett problem med FPGAer är att utvecklingen ställer höga krav på specifik kunskap gällande utvecklingsspråk och miljöer vilket för en person specialiserad inom t.ex. signalbehandling kan saknas helt. HLS är en metodik där högnivåspråk kan användas för digital design genom att nyttja ett verktyg för automatgenerering av kod. I detta arbete har utveckling av ett histogram använts som testfall för att utvärdera effektivitet samt designmetodik av tre olika HLS verktyg, HDL Coder till MATLAB, HDL Coder till Simulink och System Generator for DSP. Utvecklingen i dessa verktyg har jämförts mot utvecklingen av samma histogram i Vivado, där språket Verilog använts. Arbetets slutsater är att samtliga verktyg som testats leverar en arbetsfrekvens som är jämförbar med att skriva histogrammet direkt i Verilog, en minskad resursanvändning, utvecklingstid som minskat med 27% (HDL Coder i MATLAB), 45% (System Generator) och 64% (HDL Coder i Simulink) men med en ökad strömförbrukning. En sammanställning av instruktioner för utveckling med hjälp av verktygen har även gjorts. HLS System Generator for DSP Histogram Xilinx Zynq UltraScale+ FPGA design workflow Hardware Description Language Coder HDL Coder Field Programmable Gate Arrays Image processing Elektroteknik och elektronik
46	Database System Acceleration on FPGAs Moghaddamfar, Mehdi 30 May 2023 (has links) Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES info:eu-repo/classification/ddc/006 ddc:006
47	Enhancing Trust in Autonomous Systems without Verifying Software Stamenkovich, Joseph Allan 12 June 2019 (has links) The complexity of the software behind autonomous systems is rapidly growing, as are the applications of what they can do. It is not unusual for the lines of code to reach the millions, which adds to the verification challenge. The machine learning algorithms involved are often "black boxes" where the precise workings are not known by the developer applying them, and their behavior is undefined when encountering an untrained scenario. With so much code, the possibility of bugs or malicious code is considerable. An approach is developed to monitor and possibly override the behavior of autonomous systems independent of the software controlling them. Application-isolated safety monitors are implemented in configurable hardware to ensure that the behavior of an autonomous system is limited to what is intended. The sensor inputs may be shared with the software, but the output from the monitors is only engaged when the system violates its prescribed behavior. For each specific rule the system is expected to follow, a monitor is present processing the relevant sensor information. The behavior is defined in linear temporal logic (LTL) and the associated monitors are implemented in a field programmable gate array (FPGA). An off-the-shelf drone is used to demonstrate the effectiveness of the monitors without any physical modifications to the drone. Upon detection of a violation, appropriate corrective actions are persistently enforced on the autonomous system. / Master of Science / Autonomous systems are surprisingly vulnerable, not just from malicious hackers, but from design errors and oversights. The lines of code required can quickly climb into the millions, and the artificial decision algorithms can be inscrutable and fully dependent upon the information they are trained on. These factors cause the verification of the core software running our autonomous cars, drones, and everything else to be prohibitively difficult by traditional means. Independent safety monitors are implemented to provide internal oversight for these autonomous systems. A semi-automatic design process efficiently creates error-free monitors from safety rules drones need to follow. These monitors remain separate and isolated from the software typically controlling the system, but use the same sensor information. They are embedded in the circuitry and act as their own small, task-specific processors watching to make sure a particular rule is not violated; otherwise, they take control of the system and force corrective behavior. The monitors are added to a consumer off-the-shelf (COTS) drone to demonstrate their effectiveness. For every rule monitored, an override is triggered when they are violated. Their effectiveness depends on reliable sensor information as with any electronic component, and the completeness of the rules detailing these monitors. Autonomy Runtime Verification Field programmable gate arrays Field Programmable Gate Array Monitor Formal Methods UAS Drone aircraft Security Linear Temporal Logic LTL High-Level Synthesis HLS monitor model checking drone malware assurance robotics firmware hardware
48	Implementation of Bolt Detection and Visual-Inertial Localization Algorithm for Tightening Tool on SoC FPGA / Implementering av bultdetektering och visuell tröghetslokaliseringsalgoritm för åtdragningsverktyg på SoC FPGA Al Hafiz, Muhammad Ihsan January 2023 (has links) With the emergence of Industry 4.0, there is a pronounced emphasis on the necessity for enhanced flexibility in assembly processes. In the domain of bolt-tightening, this transition is evident. Tools are now required to navigate a variety of bolts and unpredictable tightening methodologies. Each bolt, possessing distinct tightening parameters, necessitates a specific sequence to prevent issues like bolt cross-talk or unbalanced force. This thesis introduces an approach that integrates advanced computing techniques with machine learning to address these challenges in the tightening areas. The primary objective is to offer edge computation for bolt detection and tightening tools' precise localization. It is realized by leveraging visual-inertial data, all encapsulated within a System-on-Chip (SoC) Field Programmable Gate Array (FPGA). The chosen approach combines visual information and motion detection, enabling tools to quickly and precisely do the localization of the tool. All the computing is done inside the SoC FPGA. The key element for identifying different bolts is the YOLOv3-Tiny-3L model, run using the Deep-learning Processor Unit (DPU) that is implemented in the FPGA. In parallel, the thesis employs the Error-State Extended Kalman Filter (ESEKF) algorithm to fuse the visual and motion data effectively. The ESEKF is accelerated via a full implementation in Register Transfer Level (RTL) in the FPGA fabric. We examined the empirical outcomes and found that the visual-inertial localization exhibited a Root Mean Square Error (RMSE) position of 39.69 mm and a standard deviation of 9.9 mm. The precision in orientation determination yields a mean error of 4.8 degrees, offset by a standard deviation of 5.39 degrees. Notably, the entire computational process, from the initial bolt detection to its final localization, is executed in 113.1 milliseconds. This thesis articulates the feasibility of executing bolt detection and visual-inertial localization using edge computing within the SoC FPGA framework. The computation trajectory is significantly streamlined by harnessing the adaptability of programmable logic within the FPGA. This evolution signifies a step towards realizing a more adaptable and error-resistant bolt-tightening procedure in industrial areas. / Med framväxten av Industry 4.0, finns det en uttalad betoning på nödvändigheten av ökad flexibilitet i monteringsprocesser. Inom området bultåtdragning är denna övergång tydlig. Verktyg krävs nu för att navigera i en mängd olika bultar och oförutsägbara åtdragningsmetoder. Varje bult, som har distinkta åtdragningsparametrar, kräver en specifik sekvens för att förhindra problem som bultöverhörning eller obalanserad kraft. Detta examensarbete introducerar ett tillvägagångssätt som integrerar avancerade datortekniker med maskininlärning för att hantera dessa utmaningar i skärpningsområdena. Det primära målet är att erbjuda kantberäkning för bultdetektering och åtdragningsverktygs exakta lokalisering. Det realiseras genom att utnyttja visuella tröghetsdata, allt inkapslat i en System-on-Chip (SoC) Field Programmable Gate Array (FPGA). Det valda tillvägagångssättet kombinerar visuell information och rörelsedetektering, vilket gör det möjligt för verktyg att snabbt och exakt lokalisera verktyget. All beräkning sker inuti SoC FPGA. Nyckelelementet för att identifiera olika bultar är YOLOv3-Tiny-3L-modellen, som körs med hjälp av Deep-learning Processor Unit (DPU) som är implementerad i FPGA. Parallellt använder avhandlingen algoritmen Error-State Extended Kalman Filter (ESEKF) för att effektivt sammansmälta visuella data och rörelsedata. ESEKF accelereras via en fullständig implementering i Register Transfer Level (RTL) i FPGA-strukturen. Vi undersökte de empiriska resultaten och fann att den visuella tröghetslokaliseringen uppvisade en Root Mean Square Error (RMSE) position på 39,69 mm och en standardavvikelse på 9,9 mm. Precisionen i orienteringsbestämningen ger ett medelfel på 4,8 grader, kompenserat av en standardavvikelse på 5,39 grader. Noterbart är att hela beräkningsprocessen, från den första bultdetekteringen till dess slutliga lokalisering, exekveras på 113,1 millisekunder. Denna avhandling artikulerar möjligheten att utföra bultdetektering och visuell tröghetslokalisering med hjälp av kantberäkning inom SoC FPGA-ramverket. Beräkningsbanan är avsevärt effektiviserad genom att utnyttja anpassningsförmågan hos programmerbar logik inom FPGA. Denna utveckling innebär ett steg mot att förverkliga en mer anpassningsbar och felbeständig skruvdragningsprocedur i industriområden. Bolt detection Visual-Inertial localization System-on-Chip (SoC) Field-Programmable Gate Array (FPGA) Machine learning Perspective-n-Points High-Level Synthesis (HLS) YOLO Tightening tool Bultdetektering visuell-tröghetslokalisering System-on-Chip (SoC) Field-Programmable Gate Array (FPGA) Machine Learning Perspective-n-Points High-Level Synthesis (HLS) YOLO åtdragningsverktyg Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
49	Impact des transformations algorithmiques sur la synthèse de haut niveau : application au traitement du signal et des images / Impact of algorithmic transforms for High Level Synthesis (HLS) : application to signal and image processing Ye, Haixiong 20 May 2014 (has links) La thèse porte sur l'impact d'optimisations algorithmiques pour la synthèse automatique HLS pour ASIC. Ces optimisations algorithmiques sont des transformations de haut niveau, qui de part leur nature intrinsèque restent hors de porter des compilateurs modernes, même les plus optimisants. Le but est d'analyser l'impact des optimisations et transformations de haut niveau sur la surface, la consommation énergétique et la vitesse du circuit ASIC. Les trois algorithmes évalués sont les filtres non récursifs, les filtres récursifs et un algorithme de détection de mouvement. Sur chaque exemple, des gains ont été possibles en vitesse et/ou en surface et/ou en consommation. Le gain le plus spectaculaire est un facteur x12.6 de réduction de l'énergie tout en maitrisant la surface de synthèse et en respectant la contrainte d'exécution temps réel. Afin de mettre en perspective les résultats (consommation et vitesse), un benchmark supplémentaire a été réalisé sur un microprocesseur ST XP70 avec extension VECx, un processeur ARM Cortex avec extension Neon et un processeur Intel Penryn avec extensions SSE. / The thesis deals with the impact of algorithmic transforms for HLS synthesis for ASIC. These algorithmic transforms are high level transforms that are beyond the capabilities of modern optimizing compilers. The goal is to analyse the impact of the High level transforms on area execution time and energy consumption. Three algorithms have been analyzed: non recursive filters, recursive filter and a motion detection application. On each algorithm, the optimizations and transformations lead to speedups and area/surface gains. The most impressive gain in energy reduction is a factor x12.6, while the area remains constant and the execution time smaller than the real-time constraint. A benchmark has been done on SIMD general purpose processor to compare the impact of the high level transforms: ST XP70 microprocessor with VECx extension, ARM Cortex with Non extension and Intel Penryn with SSE extension. Synthèse de haut niveau Transformation de haut niveau Filtre FIR Filtre IIR Sigma Delta Filtre morphologique Catapult-C Métaprogrammation Traitement du signal Traitement des images High Level Synthesis (HLS) High Level Transform (HLT) FIR filter IIR filter Sigma Delta Morphological filter Catapult-C Metaprogrammation Signal processing Image processing
50	Synthèse automatique d'interfaces de communication matérielles pour la conception d'applications du domaine du traitement du signal Chavet, Cyrille 26 October 2007 (has links) (PDF) Les applications du traitement du signal (TDSI) sont maintenant largement utilisées dans des domaines variés allant de l'automobile aux communications sans fils, en passant par les applications multimédias et les télécommunications. La complexité croissante des algorithmes implémentés, et l'augmentation continue des volumes de données et des débits applicatifs, requièrent souvent la conception d'accélérateurs matériels dédiés. Typiquement l'architecture d'un composant complexe du TDSI utilise des éléments de calculs de plus en plus complexes, des mémoires et des modules de brassage de données (entrelaceur/désentrelaceur pour les Turbo-Codes, blocs de redondance spatiotemporelle dans les systèmes OFDM/MIMO, ...), privilégie des connexions point à point pour la communication inter éléments de calcul et demande d'intégrer dans une même architecture plusieurs configurations et/ou algorithmes (systèmes (re)configurables). Aujourd'hui, le coût de ces systèmes en terme d'éléments mémorisant est très élevé; les concepteurs cherchent donc à minimiser la taille de ces tampons afin de réduire la consommation et la surface total du circuit, tout en cherchant à en optimiser les performances. Sur cette problématique globale, nous nous intéressons à l'optimisation des interfaces de communication entre composants. On peut voir ce problème comme la synthèse (1) d'interfaces pour l'intégration de composants virtuels (IP cores), (2) de composants de brassage de données (type entrelaceur) pouvant avoir plusieurs modes de fonctionnements, et (3) de chemins de données, potentiellement configurables, dans des flots de synthèse de haut niveau. Nous proposons une méthodologie de conception permettant de générer automatiquement un adaptateur de communication (interface) nommé Space-Time AdapteR (STAR). Notre flot de conception prend en entrée (1) des diagrammes temporels (fichier de contraintes) ou (2) une description en langage C de la règle de brassage des données (par exemple une règle d'entrelacement pour Turbo-Codes) et des contraintes utilisateur (débit, latence, parallélisme...) ou (3) en ensemble de CDFGs ordonnés et assignés. Ce flot formalise ensuite ces contraintes de communication sous la forme d'un Graphe de Compatibilité des Ressources Multi-Modes (MMRCG) qui permet une exploration efficace de l'espace des solutions architecturales afin de générer un composant STAR en VHDL de niveau transfert de registre (RTL) utilisé pour la synthèse logique. L'architecture STAR se compose d'un chemin de données (utilisant des FIFOs, des LIFOs et/ou des registres) et de machines d'état finis permettant de contrôler le système. L'adaptation spatiale (une donnée en peut être transmise de n'importe quel port d'entrée vers un ou plusieurs ports de sortie) est effectuée par un réseau d'interconnexion adapté et optimisé. L'adaptation temporelle est réalisée par les éléments de mémorisation, en exploitant leur sémantique de fonctionnement (FIFO, LIFO). Le composant STAR exploite une interface LIS (Latency Insensitive System) offrant un mécanisme de gel d'horloge qui permet l'asservissement par les données. Le flot de conception proposé génère des architectures pouvant intégrer plusieurs modes de fonctionnement (par exemple, plusieurs longueurs de trames pour un entrelaceur, ou bien plusieurs configurations dans une architecture multi-modes). Le flot de conception est basé sur quatre outils : - StarTor prend en entrée la description en langage C de l'algorithme d'entrelacement, et les contraintes de l'utilisateur (latence, débit, interface de communication, parallélisme d'entréesortie...). Il en extrait l'ordre des données d'entrée-sortie en produisant d'une trace à partir de la description fonctionnelle. Ensuite, l'outil génère le fichier de contraintes de communication qui sera utilisé par l'outil STARGene. - StarDFG prend en entrée un ensemble de CDFGs générés par un outil de synthèse de haut niveau. Ces CDFGs doivent être ordonnancés et les éléments de calculs doivent avoir été assignés. L'outil en extrait ensuite l'ordre des échanges de données. Enfin, il génère le fichier de contraintes de communication qui sera utilisé par l'outil STARGene. - STARGene, basé sur un flot à cinq étapes, génère l'architecture STAR : (1) construction des graphes de compatibilité des ressources MMRCG, à partir du fichier de contraintes, correspondant à chacun des modes de fonctionnement du design, (2) fusion des modes de fonctionnement, (3) assignation des structures de mémorisation (FIFO, LIFO ou Registre) sur le MMRCG (4) optimisation de l'architecture et (5) génération du VHDL niveau transfert de registre (RTL) intégrant les différents modes de communication. Le fichier de contraintes utilisé dans la première étape peut provenir de l'outil StarTor, comme nous l'avons indiqué, ou peut être généré par un outil de synthèse de haut niveau tel que l'outil GAUT développé au laboratoire LESTER. - StarBench génère un test-bench basé sur les contraintes de communication et permet de valider les architectures générées en comparant les résultats de simulation de l'architecture avec la spécification fonctionnelle. Les expérimentations que nous présentons dans le manuscrit ont été réalisées pour trois cas d'utilisation du flot STAR. En premier lieu, nous avons utilisé l'approche STAR dans le cadre de l'intégration et l'interconnexion de blocs IPs au sein d'une même architecture. Cette première expérience pédagogique permet de démontrer la validité de l'approche retenue et de mettre en avant les possibilités offertes en terme d'exploration de l'espace des solutions architecturales. Dans une seconde expérience, le flot STAR a été utilisé pour générer une architecture de type entrelaceur Ultra-Wide Band. Il s'agit là d'un cas d'étude industriel dans le cadre d'une collaboration avec la société STMicroelectronics. En utilisant notre flot, nous avons prouvé que nous pouvions réduire le nombre de points mémoires utilisés et diminuer la latence, par rapport aux approches classiques basées sur des bancs mémoires. De plus, lorsque nous utilisons notre flot, le nombre de structures à piloter est plus petit que dans l'architecture de référence, qui a été obtenue à l'aide d'un outil de synthèse de haut niveau du commerce. Actuellement, la surface totale de notre architecture d'entrelacement est environ 14% plus petite que l'architecture de référence STMicrolectronics. Enfin, dans une troisième série d'expériences, nous avons utilisé le modèle STAR dans un flot de synthèse de haut niveau ciblant la génération d'architectures reconfigurables. Cette approche a été expérimentée pour générer des architectures multi-débits (FFT 64 à 8 points, FIR 64 à 16 points...) et multi-modes (FFT et IFFT, DCT et produit de matrices...). Ces expériences nous ont permis de montrer la pertinence de l'association de l'approche STAR, pour l'optimisation et la génération de l'architecture de multiplexage et de mémorisation, à des algorithmes d'ordonnancement et d'assignation multi-configurations à l'étude dans GAUT (Thèse Caaliph Andriamissaina). Nous avons notamment obtenu des gains pouvant aller jusqu'à 75% en terme de surface par rapport à une architecture naïve et des gains pouvant aller jusqu'à 40% par rapport aux surfaces obtenues avec des méthodologies centrées sur la réutilisation d'opérateur (SPACT-MR). [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Synthèse de haut niveau HLS interface de communication architecture traitement du signal multi-mode interconnexions adaptation des communications ASIC FPGA

Search results