Global ETD Search

931	Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach Rettkowski, Jens 12 July 2022 (has links) A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements. This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced. As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis. The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated. The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems. info:eu-repo/classification/ddc/004 ddc:004
932	A Reconfigurable Device for GALS Systems Sciaraffa, Rocco January 2018 (has links) Globally Asynchronous Locally Synchronous (GALS) Field-Programmable Gate Array (FPGA) are composed of standard synchronous reconfigurable logic islands that communicate with each other via an asynchronous means. Past research into fully asynchronous FPGA has demonstrated high throughput and reliability adopting dual-rail encoding. GALS FPGAs have been proposed, relying on bundled-data encoding and fixed asynchronous communication between synchronous islands. This thesis proposes a new GALS FPGA architecture with fully reconfigurable asynchronous fabric, that relies on coarse-grained Configurable Logic Blocks (CLBs) to improve the communication capability of the device. Through datapath dedicated elements, asynchronous pipelines are efficiently mapped onto the device. The architecture is presented as well as the customized tool flow needed to compile Verilog for this new coarse-grained reconfigurable circuit.The main purpose of this thesis is to map communication-purpose user-circuits on the proposed asynchronous fabric and evaluate their performance. The benchmark circuits target the design of a Networkon-Chip (NoC) router and employ two-phase bundled-data protocol. The results are obtained through simulation and compared with the performances of the same circuits on a fine-grained classical FPGA style. The proposed architecture achieves up to 3.2x higher throughput and 2.9x lower latency than the classical one. The results show that the coarse-grained style efficiently maps asynchronous communication circuits, and it may be the starting point for future reconfigurable GALS systems. Future work should focus on improving the back-end synthesis and evaluating the FPGA GALS system as a whole. / Globala Asynkrona Lokalt Synkrona (GALS) FPGAer består av standardiserade synkrona rekonfigurerbara logiska öar som kommunicerar med varandra på ett asynkront sätt. Tidigare forskning om helt asynkrona FPGAer har demonstrerat att hög genomströmning och tillförlitlighet kan erhållas mha sk dual-rail kodning. GALS FPGA har också föreslagits, där man istället förlitar sig på kodad data och fast asynkron kommunikation mellan synkrona öar. Denna avhandling föreslår en ny GALS FPGA-arkitektur med en omkonfigurerbar asynkron struktur, bestående av sk Coarse-grained CLBs för att förbättra kommunikationsförmågan på enheten. Genom att datavägarna använder sig av dedikerade element, kan asynkrona pipelines mappas effektivt på enheten. Arkitekturen presenteras liksom det verktygsflöde som behövs för att kompilera Verilog för denna nya grovkornigt omkonfigurerbara krets.Huvudsyftet med denna avhandling är att mappa kommunikationskretsar på den föreslagna asynkrona strukturen och utvärdera dess prestanda. Referenskretsarna som används för utvärdering är en NoC router som använder sig av ett tvåfas kommunikationsprotokoll. Resultaten erhålls genom simulering och jämförs med prestanda av samma krets implementerad i en finkornig klassisk FPGA-stil. Den föreslagna arkitekturen uppnår ca 3.2x högre genomströmning och 2.9x lägre latens än den klassiska. Resultaten visar att en grovkornig stil kan mappa asynkrona kommunikationskretsar på ett effektivt sätt, och att det kan vara en bra utgångspunkt för framtida omkonfigurerbara GALS-system.Framtida arbete bör fokusera på att förbättra back-end-syntesen och att utvärdera FPGA GALS-systemet i sin helhet. FPGA GALS asynchronous coarse-grained NoC bundled-data FPGA GALS asynkron Coarse-Grained Reconfigu- rable NoC bundled data kommunikation Computer and Information Sciences Data- och informationsvetenskap
933	Algorithm Design and Optimization of Convolutional Neural Networks Implemented on FPGAs Du, Zekun January 2019 (has links) Deep learning develops rapidly in recent years. It has been applied to many fields, which are the main areas of artificial intelligence. The combination of deep learning and embedded systems is a good direction in the technical field. This project is going to design a deep learning neural network algorithm that can be implemented on hardware, for example, FPGA. This project based on current researches about deep learning neural network and hardware features. The system uses PyTorch and CUDA as assistant methods. This project focuses on image classification based on a convolutional neural network (CNN). Many good CNN models can be studied, like ResNet, ResNeXt, and MobileNet. By applying these models to the design, an algorithm is decided with the model of MobileNet. Models are selected in some ways, like floating point operations (FLOPs), number of parameters and classification accuracy. Finally, the algorithm based on MobileNet is selected with a top-1 error of 5.5%on software with a 6-class data set.Furthermore, the hardware simulation comes on the MobileNet based algorithm. The parameters are transformed from floating point numbers to 8-bit integers. The output numbers of each individual layer are cut to fixed-bit integers to fit the hardware restriction. A number handling method is designed to simulate the number change on hardware. Based on this simulation method, the top-1 error increases to 12.3%, which is acceptable. / Deep learning har utvecklats snabbt under den senaste tiden. Det har funnit applikationer inom många områden, som är huvudfälten inom Artificial Intelligence. Kombinationen av Deep Learning och innbyggda system är en god inriktning i det tekniska fältet. Syftet med detta projekt är att designa en Deep Learning-baserad Neural Network algoritm som kan implementeras på hårdvara, till exempel en FPGA. Projektet är baserat på modern forskning inom Deep Learning Neural Networks samt hårdvaruegenskaper.Systemet är baserat på PyTorch och CUDA. Projektets fokus är bild klassificering baserat på Convolutional Neural Networks (CNN). Det finns många bra CNN modeller att studera, t.ex. ResNet, ResNeXt och MobileNet. Genom att applicera dessa modeller till designen valdes en algoritm med MobileNetmodellen. Valet av modell är baserat på faktorer så som antal flyttalsoperationer, antal modellparametrar och klassifikationsprecision. Den mjukvarubaserade versionen av den MobileNet-baserade algoritmen har top-1 error på 5.5En hårdvarusimulering av MobileNet nätverket designades, i vilket parametrarna är konverterade från flyttal till 8-bit heltal. Talen från varje lager klipps till fixed-bit heltal för att anpassa nätverket till befintliga hårdvarubegränsningar. En metod designas för att simulera talförändringen på hårdvaran. Baserat på denna simuleringsmetod reduceras top-1 error till 12.3 Computer and Information Sciences Data- och informationsvetenskap
934	An Embedded System for Classification and Dirt Detection on Surgical Instruments Hallgrímsson, Guðmundur January 2019 (has links) The need for automation in healthcare has been rising steadily in recent years, both to increase efficiency and for freeing educated workers from repetitive, menial, or even dangerous tasks. This thesis investigates the implementation of two pre-determined and pre-trained convolutional neural networks on an FPGA for the classification and dirt detection of surgical instruments in a robotics application. A good background on the inner workings and history of artificial neural networks is given and expanded on in the context of convolutional neural networks. The Winograd algorithm for computing convolutional operations is presented as a method for increasing the computational performance of convolutional neural networks. A selection of development platform and toolchains is then made. A high-level design of the overall system is explained, before details of the high-level synthesis implementation of the dirt detection convolutional neural network are shown. Measurements are then made on the performance of the high-level synthesis implementation of the various blocks needed for convolutional neural networks. The main convolutional kernel is implemented both by using the Winograd algorithm and the naive convolution algorithm and comparisons are made. Finally, measurements on the overall performance of the end-to-end system are made and conclusions are drawn. The final product of the project gives a good basis for further work in implementing a complete system to handle this functionality in a manner that is both efficient in power and low in latency. Such a system would utilize the different strengths of general-purpose sequential processing and the parallelism of an FPGA and tie those together in a single system. / Behovet av automatisering inom vård och omsorg har blivit allt större de senaste åren, både vad gäller effektivitet samt att befria utbildade arbetare från repetitiva, enkla eller till och med farliga arbetsmoment. Den här rapporten undersöker implementeringen av två tidigare för-definierade och för-tränade faltade neurala nätverk på en FPGA, för att klassificera och upptäcka föroreningar på kirurgiska verktyg. En bra bakgrund på hur neurala nätverk fungerar, och deras historia, presenteras i kontexten faltade neurala nätverk. Winograd algoritmen, som används för att beräkna faltningar, beskrivs som en metod med syfte att öka beräkningsmässig prestanda. Val av utvecklingsplattform och verktyg utförs. Systemet beskrivs på en hög nivå, innan detaljer om hög-nivå-syntesimplementeringen av förorenings-detekterings-nätverket visas. Mätningar görs sedan av de olika bygg-blockens prestanda. Kärnkoden med faltnings-algoritmen implementeras både med Winograd-algoritmen och med den traditionella, naiva, metoden, och utfallet för bägge metoderna jämförs. Slutligen utförs mätningar på hela systemets prestanda och slutsatser dras därav. Projektets slutprodukt kan användas som en bra bas för vidare utveckling av ett komplett system som både är effektivt angående effektförbrukning och har bra prestanda, genom att knyta ihop styrkan hos traditionella sekventiella processorer med parallelismen i en FPGA till ett enda system. Neural Network CNN FPGA PetaLinux Winograd High-level Synthesis Neuralt nätverk Faltade neurala nätverk FPGA PetaLinux Winograd Hög-nivå syntes Elektroteknik och elektronik
935	FPGA Implementation of Universal Access Transceiver (UAT) receiving unit for surveillance of small and general aircraft / FPGA-implementering av Universal Access Transceiver (UAT) mottagarenhet för övervakning av små och allmänna flygplan Chen, Baiheng January 2022 (has links) The Universal Access Transceiver (UAT) is one of the two datalinks available in Automatic Dependent Surveillance Broadcast (ADS-B) system to facilitate air traffic control and flight tracking of small and general-purpose aircraft. By allowing aircraft to be tracked passively through a radio broadcast of the aircraft position and flight information, flight safety can be ensured and air traffic order is maintained. With the ADS-B initiative, surveillance is encouraged to cover not only residential areas but also remote regions where the infrastructure of a radar station is less likely to be available. Hence a passive, low-power, compact and portable device that receives the radio signal and shares the extracted flight information to control center is welcomed so that air traffic control and surveillance to nearby aircraft can be made possible without massive infrastructure cost. The aim of this thesis project is to develop a compact and portable solution of ADS-B UAT receiver using FPGA to demodulate the received UAT signal and extract valid UAT messages from it, as an extensional function to Skysense’s former product of ADS-B 1090ES receiver. The work presented herein mainly focuses on the development of FPGA functions of the receiver which comprises demodulating the digitized UAT signal and extract UAT payload message from the samples. This work demonstrates the design process and implementation of a 978 MHz UAT receiver using Altera Cyclone IV FPGA. The final demonstrated design is capable of demodulating sampled UAT signal and transfer the demodulated raw data bits to a processing unit through UART interface. Simulation result and synthesis report together with analysis are presented. / Universal Access Transceiver (UAT) är en av de två datalänkar som finns tillgängliga i Automatic Dependent Surveillance-Broadcast (ADS-B)-systemet för att underlätta flygkontroll och spårning av små och allmännyttiga flygplan. Genom att tillåta att flygplan spåras passivt genom en radiosändning av flygplanets position och flyginformation kan flygsäkerheten garanteras och flygordningen upprätthållas. Med ADS-B-initiativet uppmuntras övervakningen att inte bara omfatta bostadsområden utan även avlägsna regioner där det är mindre sannolikt att en radarstations infrastruktur är tillgänglig. Därför välkomnas en passiv, energisnål, kompakt och bärbar enhet som tar emot radiosignalen och delar den extraherade flyginformationen till kontrollcentralen så att flygkontroll och övervakning till närbelägna flygplan kan göras möjlig utan enorma infrastrukturkostnader. Syftet med detta avhandlingsprojekt är att utveckla en kompakt och bärbar lösning med ADS-B UAT-mottagare med FPGA som tilläggsfunktion till produkten ADS-B 1090ES-mottagare från företaget Skysense AB. Det arbete som presenteras här fokuserar främst på utvecklingen av FPGA-funktioner hos mottagaren, vilka består av att demodulera den digitaliserade UAT-signalen och extrahera UAT-meddelandet om nyttolast från proverna. Detta arbete visar designprocessen för och genomförandet av en 978 MHz UAT-mottagare med Altera Cyclone IV FPGA. Den slutgiltiga demonstrerade konstruktionen kan demodulera en samplad UAT-signal och överföra de demodulerade rådatabitarna till en behandlingsenhet genom UART-gränssnittet. Simuleringsresultat och sammanfattande rapport presenteras tillsammans. FPGA ADS-B UAT Signal Processing Wireless Communication Digital Demodulation FPGA ADS-B UAT signalbehandling trådlös kommunikation digital demodulering Elektroteknik och elektronik
936	Evaluation of FPGA Partial Reconfiguration : for real-time Vision applications Guo, Guanghao January 2020 (has links) The usage of programmable logic resources in Field Programmable Gate Arrays, also known as FPGAs, has increased a lot recently due to the complexity of the algorithms, especially for some computer vision algorithms. Due to this reason, sometimes the hardware resources in the FPGA are not sufficient. Partial reconfiguration provides us with the possibility to solve this problem. Partial reconfiguration is a technique that can be used to reconfigure specific parts of the FPGA during run-time. By using this technique, we can reduce the need for programmable logic resources. This master thesis project aims to design a software framework for partial reconfiguration that can load a set of processing components/algorithms (e.g. object detection, optical flow, Harris-Corner detection etc) in the FPGA area without affecting real-time static components such as camera capture, basic image filtering and colour conversion which are continuously running. Partial reconfiguration has been applied to two different video processing pipelines, a direct streaming architecture and a frame buffer streaming architecture respectively. The result shows that reconfiguration time is predictable which depends on the partial bitstream size, and that partial reconfiguration can be used in real-time applications taking the partial bitstream size and the frequency to switch the partial bitstreams into account. / Användningen av programmerbara logiska resurser i Field Programmable Gate Arrayer, även känd som FPGA:er, har ökat mycket nyligen på grund av komplexiteten hos algoritmerna, speciellt för vissa datorvisningsalgoritmer. På grund av detta är det ibland inte tillräckligt med hårdvaruresurser i FPGA:n. Partiell omkonfiguration ger oss möjlighet att lösa detta problem. Partiell omkonfigurering är en teknik som kan användas för att omkonfigurera specifika delar av FPGA:n under körtid. Genom att använda denna teknik kan vi minska behovet av programmerbara logiska resurser. Det här mastersprojektet syftar till att utforma ett programvaru-ramverk för partiell omkonfiguration som kan ladda en uppsättning processkomponenter / algoritmer (t.ex. objektdetektering, optiskt flöde, Harris-Corner detection etc) i FPGA- området utan att påverka statiska realtids-komponenter såsom kamerafångst, grundläggande bildfiltrering och färgkonvertering som körs kontinuerligt. Partiell omkonfiguration har tillämpats på två olika videoprocessnings-pipelines, en direkt-strömmande respektive en rambuffert-strömmande arkitektur. Resultatet visar att omkonfigurationstiden är förutsägbar och att partiell omkonfiguration kan användas i realtids-tillämpningar. FPGA Partial Reconfiguration Embedded System Computer Vision High-Level Synthesis FPGA Partiell rekonfigurering Inbyggda System Datorvision Högnivåsyntes Software Engineering Programvaruteknik Embedded Systems Inbäddad systemteknik
937	Offloading Workloads from CPU of Multiplayer Game Server to FPGA : SmartNIC implementation with UDP Communication / Avlastning av arbetsbelastningar från CPU till FPGA för multiplayer Game Server : SmartNIC-implementering med UDP Kommunikation Bao, Junwen January 2022 (has links) For multiplayer games, the performance of the server’s Central Processing Unit (CPU) is the main factor that limits the number of players on the server at the same time. Compared with the CPU, the Field-Programmable Gate Array (FPGA) architecture has no instructions set and no shared memory. Offloading some tasks from the CPU to the FPGA may help the CPU improve processing efficiency. This thesis explores which tasks on a CPU can be offloaded to a FPGA and how to design such a circuit system. The performance of the developed system also needs to be measured. We decided to offload communication tasks and data processing tasks to an FPGA. The result is that the FPGA server is available for work, the maximum number of users is 80, and the maximum network latency is 30-40 ms. The most important result is that a FPGA can be used as a multi-player server. One of the severe limitations of this design is the number of hardware resources. A 7-series FPGA is divided into several similar clock regions, which means the number of Flip Flop (FF)s near the same clock edge is fixed. If adding more FFs in the same component, the routing delay can not meet the set-up time requirements. Previously, people used the FPGA as the support accelerator to the server CPU. The CPU still works as a paramount communication link with one or several multi-connection parts and connects to the FPGA via the Peripheral Component Interconnect Express (PCIe) to use the FPGA to process data or pack/unpack Ethernet frames. We have designed and implemented a whole multi-connection server in a Hardware Description Language (HDL) and downloaded the resulting hardware in an FPGA. / I spel med flera spelare är serverns CPU-prestanda (Central Processing Unit) den viktigaste faktorn som begränsar antalet spelare som servern samtidigt kan hantera. Jämfört med CPU:n har en FPGA (Field-Programmable Gate Array) inga instruktioner och inget delat minne. Avlastning av vissa uppgifter från den CPU till FPGA:n kan hjälpa CPU:n att förbättra bearbetningseffektiviteten. I denna avhandling undersöks vilka uppgifter på en CPU som kan överföras till en FPGA och hur man utformar ett sådant kretsystem. Prestandan hos det utvecklade systemet måste också mätas. Vi har beslutat att avlasta kommunikationsuppgifter och databehandlingsuppgifter. till en FPGA. Resultatet är att FPGA-servern är tillgänglig för arbete, det maximala antalet användare är 80, och den maximala nätverksfördröjningen är 30-40 ms. Det viktigaste resultatet är att en FPGA kan användas som en server för flera spelare. En av de allvarliga begränsningarna med denna konstruktion är antalet hårdvaruresurser. En FPGA i 7-serien är uppdelad i flera liknande klockregioner, vilket innebär att antalet Flip Flop (FF)s nära en klocka är fast. Om man lägger till fler FF:er i samma komponent, kommer fördröjningen inte att uppfylla tidskraven för setup. Tidigare har folk använt sig av FPGA:n som en stödaccelerator till serverprocessorn. CPU:n fungerar fortfarande som en viktig kommunikationslänk med en eller flera anslutningar och ansluter till FPGA:n via Peripheral Component Interconnect Express (PCIe) för att använda FPGA:n till att bearbeta data och paketera/packa upp Ethernet-ramar. Vi har implementerat en hel server med flera anslutningar med hjälp av hårdvaruvarubeskrivande språk (HDL) och laddat ner den resulterande designen i en FPGA. FPGA UDP Multiple-connection Server Network Communication Integrated Circuit Design FPGA UDP server med flera anslutningar nätverkskommunikation Integrerad kretsdesign Elektroteknik och elektronik
938	FPGA vs. SIMD: Comparison for Main Memory-Based Fast Column Scan Nusrat, Jahan Lisa, Ungethüm, Annett, Habich, Dirk, Lehner, Wolfgang, Nguyen, Duy Anh Tuan, Kumar, Akash 23 March 2023 (has links) The ever-increasing growth of data demands reliable data-base system with high-throughput and low-latency. Main memory-based column store database systems are state-of-the-art on this perspective, whereby data (values) in relational tables are organized by columns rather than by rows. In such systems, a full column scan is a fundamental key operation and thus, the optimization of the key operation is very crucial. This leads to have compact storage layout based fast column scan techniques through intra-value parallelism. For this reason, we investigated on different well-known fast column scan techniques using SIMD (Single Instruction Multiple Data) vectorization as well as using Field Programmable Gate Arrays (FPGA). Moreover, we present selective results of our exhaustive evaluation. Based on this evaluation, we find out the best column scan technique as per implementation mechanism–FPGA and SIMD. Finally, we conclude this paper via mentioning some lessons learned for our ongoing research activities. info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/620 ddc:620
939	Hardware Acceleration in the Context of Motion Control for Autonomous Systems / Hårdvaruacceleration i samband med rörelsekontroll för autonoma system Leslin, Jelin January 2020 (has links) State estimation filters are computationally intensive blocks used to calculate uncertain/unknown state values from noisy/not available sensor inputs in any autonomous systems. The inputs to the actuators depend on these filter’s output and thus the scheduling of filter has to be at very small time intervals. The aim of this thesis is to investigate the possibility of using hardware accelerators to perform this computation. To make a comparative study, 3 filters that predicts 4, 8 and 16 state information was developed and implemented in Arm real time and application purpose CPU, NVIDIA Quadro and Turing GPU, and Xilinx FPGA programmable logic. The execution, memory transfer time, and the total developement time to realise the logic in CPU, GPU and FPGA is discussed. The CUDA developement environment was used for the GPU implementation and Vivado HLS with SDSoc environment was used for the FPGA implementation. The thesis concludes that a hardware accelerator is needed if the filter estimates 16 or more state information even if the processor is entirely dedicated for the computation of filter logic. Otherwise, for a 4 and 8 state filter the processor shows similar performance as an accelerator. However, in a real time environment the processor is the brain of the system, so it has to give instructions to many other functions parallelly. In such an environment, the instruction and data caches of the processor will be disturbed and there will be a fluctuation in the execution time of the filter for every iteration. For this, the best and worst case processor timings are calculated and discussed. / Tillståndsberäkningsfilter är beräkningsintensiva block som används för att beräkna osäkra / okända tillståndsvärden från bullriga / ej tillgängliga sensoringångar i autonoma system. Ingångarna till manöverdonen beror på filterens utgång och därför måste schemaläggningen av filtret ske med mycket små tidsintervall. Syftet med denna avhandling är att undersöka möjligheten att använda hårdvaruacceleratorer för att utföra denna beräkning. För att göra en jämförande studie utvecklades och implementerades 3 filter som förutsäger information om 4, 8 och 16 tillstånd i realtid med applikationsändamålen CPU, NVIDIA Quadro och Turing GPU, och Xilinx FPGA programmerbar logik. Exekvering, minnesöverföringstid och den totala utvecklingstiden för att förverkliga logiken i båda hårdvarorna diskuteras. CUDAs utvecklingsmiljö användes för GPU-implementeringen och Vivado HLS med SDSoc-miljö användes för FPGA-implementering. Avhandlingen drar slutsatsen att en hårdvaru-accelerator behövs om filtret uppskattar information om mer än 16 tillstånd även om processorn är helt dedikerad för beräkning av filterlogik. För 4 och 8 tillståndsfilter, visar processorn liknande prestanda som en accelerator. Men i realtid är processorn hjärnan i systemet; så den måste ge instruktioner till många andra funktioner parallellt. I en sådan miljö kommer processorns instruktioner och datacacher att störas och det kommer att bli en fluktuation i exekveringstiden för filtret för varje iteration. För detta beräknas och diskuteras de bästa och värsta fallstiderna. Hardware acceleration Computation offloading State estimation filter Autonomous systems FPGA GPU. Hårdvaruacceleration beräkningsavlastning tillståndsskattningsfilter autonoma system FPGA GPU. Elektroteknik och elektronik
940	Predictable Multiprocessor Platform for Safety- Critical Real- Time Systems Sigurðsson, Páll Axel January 2021 (has links) Multicore systems excel at providing concurrent execution of applications, giving true parallelism where all cores can execute sequences of machine instructions at the same time. However, multicore systems come with their own sets of problems, most notably when cores in a system (or core tiles) share hardware components such as memory modules or Input/Output (IO) peripherals. This increased level of complexity makes it especially difficult to design and verify safety- critical systems that require real- time operation, such as flight controllers in airplanes and airbag controllers in the automotive industry. Verifying that that systems are predictable is therefore essential, requiring methods for measuring and finding out the Worst- Case Execution Times (WCETs) and Best- Case Execution Times (BCETs). Additionally, the designer must ensure isolation between running applications (indicating that the platform is composable). This thesis work consists of designing a predictable Multiprocessor System On- Chip (MPSoC) using Qsys and Quartus II, as well as providing methods and test benches that can support all claims made about the platform’s reported behavior. A shared- memory loosely coupled multicore design was implemented, which can be horizontally scaled from 2 to 8 core tiles. A high- level Hardware Abstraction Layer (HAL) is written for the platform to simplify its use. Using Nios II/e processors as the logical cores in the platform’s core tiles gives predictable (mostly static) latencies when the platform is tested, showing no erratic or unexplained timing variations. However, due to the Round Robin (RR) nature of the arbitration logic in the Avalon Switch Fabric (ASF), composability was not fully achieved in the platform. Groundwork for implementing Time- Division Multiplexing (TDM) arbitration logic is proposed and will ideally be fully implemented in future work. / Mångkärniga processorsystem utmärker sig när det kommer till samkörning mellan applikationer. De ger en sann parallellism, där alla kärnor kan köra processorinstruktioner samtidigt. Mångkärniga system kommer med sina egna problem, framför allt när kärnorna ska dela komponenter så som minnesmoduler och Input/Output tillbehör. Den ökade komplexiteten gör att det är extra svårt att designa och verifiera säkerhetskritiska system som kräver körning i realtid, så som flygkontrollers på flygplan och styrenheter för krockkudden i bilar. Verifiering av att systemen är förutsägbara är essentiellt, detta behöver metoder för att mäta och hitta den värsta möjliga exekveringstiden (WCET) och den bästa möjliga exekveringstiden (BCET). Utöver detta måste designern säkerställa att processerna som körs på kärnorna är isolerade ifrån varandra (komponerbara). Detta arbetet består av att designa ett förutsägbart mångkärnigt system på chip (MPSoC) med Qsys och Quartus II, samt att ge metoder och testbänkar som kan bevisa systemets hävdade beteende. Ett löst kopplat mångkärnigt system med delat minne implementerades, där systemets kärnor kan ökas horisontellt från 2 till 8 stycken. Ett Hardware Abstraction Layer (HAL) skapades för systemet för att simplifiera användningen. Användningen av Nios II/e som processorkärna gav förutsägbara exekveringstider när systemet testades och visade inga oförklarliga tids variationer. Däremot, på grund av att Avalon Switch Fabric (ASF) tilldelar access med Round Robin (RR), är systemet inte komponerbart. Basen för att implementera Time- Division Multiplexing (TDM) istället är föreslaget och kommer idealt implementeras som fortsatt arbete. Composability FPGA Multicore processing Predictability Real- time systems System on chip FPGA Förutsägbarhet Komposibilitet Mångkärnig bearbetning Realtidssystem System på chip Elektroteknik och elektronik

Search results