Global ETD Search

101	Crest Factor Reduction using High Level Synthesis Mahmood, Hassan January 2017 (has links) Modern wireless mobile communication technology has made noticeable improvements from the technologies in the past but is still plagued by poor power efficiency of power amplifiers found in today’s base stations. One of the factors that affect the power efficiency adversely comes from modern modulation techniques like orthogonal frequency division multiplexing which result in signals with high peak to average power ratio, also known as the crest factor. Crest factor reduction algorithms are used to solve this problem. However, the dominant method of hardware description for synthesis has been to start with writing register transfer level code which gives a very fixed implementation that may not be the optimal solution. This thesis project is focused on developing a peak cancellation crest factor reduction system, using a high-level language as the system design language, and synthesizing it using high-level synthesis. The aim is to find out if highlevel synthesis design methodology can yield increased productivity and improved quality of results for such designs as compared to the design methodology that requires the system to be implemented at the register transfer level. Design space exploration is performed to find an optimal design with respect to area. Finally, a few parameters are presented to measure the performance of the system, which helps in tuning it. The results of design space exploration helped in choosing the best possible implementation out of four different configurations. The final implementation that resulted from high-level synthesis had an area comparable to the previous register transfer level implementation. It was also concluded that, for this design, the high-level synthesis design methodology increased productivity and decreased design time. / Användning av högnivåsyntes för reduktion av toppfaktor Det har gjorts noterbara framsteg inom modern trådlös kommunikationsteknik för mobiltelefoni, men tekniken plågas fortfarande av dålig energieffektivitet hos förstärkarna i dagens basstationer. En faktor som påverkar energieffektiviteten negativt är om signaler har en stor skillnad mellan maximal effekt och medeleffekt. Kvoten mellan maximal effekt och medeleffekt kallas för toppfaktor, och en egenskap hos moderna moduleringstekniker, såsom ortogonal frekvensdelningsmodulering, är att de har en hög toppfaktor. Algoritmer för reducering av toppfaktor kan lösa det problemet. Den dominerande metoden för design av hårdvara är att skriva kod i ett hårdvarubeskrivande språk med abstraktionsnivån Register Transfer Level och sedan använda verktyg för att syntetisera hårdvara från koden. Resultatet är en specifik implementation som inte nödvändigtvis är den optimala lösningen. Det här examensarbetet är inriktat på att utveckla ett system för reducering av toppfaktor, baserat på algoritmen Peak Cancellation, genom att skriva kod i ett högnivåspråk och använda verktyg för högnivåsyntes för att syntetisera designen. Syftet är att ta reda på om högnivåsyntes som designmetod kan ge ökad produktivitet och ökad kvalitet, för den här typen av design, jämfört med den klassiska designmetoden med abstraktionsnivån Register Transfer Level. Verktyget för högnivåsyntes användes för att på ett effektivt sätt undersöka olika designalternativ för att optimera kretsytan. I rapporten presenteras ett antal parametrar för att mäta prestandan hos systemet, vilket ger information som kan användas för finjustering. Resultatet av undersökningen av designalternativ gjorde det möjligt att välja den bästa implementationen bland fyra olika konfigurationer. Den slutgiltiga implementationen hade en kretsyta som är jämförbar med en tidigare design som implementerats med hårdvarubeskrivande språk med abstraktionsnivån Register Transfer Level. En annan slutsats är att, för den här designen, så gav designmetoden med högnivåsyntes ökad produktivitet och minskad designtid. Peak to average power ratio Crest factor Peak cancellation crest factor reduction High level synthesis Design space exploration Object oriented programming SystemC Toppfaktor Algoritmen peak cancellation Högnivåsyntes SystemC Undersökningen av designalternativ Objektorienterad programmering Elektroteknik och elektronik
102	Deep Learning Model Deployment for Spaceborne Reconfigurable Hardware : A flexible acceleration approach Ferre Martin, Javier January 2023 (has links) Space debris and space situational awareness (SSA) have become growing concerns for national security and the sustainability of space operations, where timely detection and tracking of space objects is critical in preventing collision events. Traditional computer-vision algorithms have been used extensively to solve detection and tracking problems in flight, but recently deep learning approaches have seen widespread adoption in non-space related applications for their high accuracy. The performanceper-watt and flexibility of reconfigurable Field-Programmable Gate Arrays (FPGAs) make them a good candidate for deep learning model deployment in space, supporting in-flight updates and maintenance. However, the FPGA design costs of custom accelerators for complex algorithms remains high. The research focus of the thesis relies on novel high-level synthesis (HLS) workflows that allow the developer to raise the level of abstraction and lower design costs for deep learning accelerators, particularly for space-representative applications. To this end, four different hardware accelerators of convolutional neural network models for spacebased debris detection are implemented (ResNet, SqueezeNet, DenseNet, TinyCNN), using the open-source HLS tool NNgen. The obtained hardware accelerators are deployed to a reconfigurable module of the Zynq Ultrascale+ MPSoC programmable logic, and compared in terms of inference performance, resource utilization and latency. The tests on the target hardware show a detection accuracy over 95% for ResNet, DenseNet and SqueezeNet, and a localization intersection-over-union over 0.5 for the deep models, and over 0.7 for TinyCNN, for space debris objects at a range between 1km and 100km for a diameter of 1cm, or between 100km and 1000km for a diameter of 10cm. The obtained speed-ups with respect to software-only implementations lay between 3x and 32x for the different hardware accelerators. / Rymdskrot och rymdsituationstänksamhet (SSA) har blivit växande oro för nationell säkerhet och hållbarheten för rymdoperationer, där snabb upptäckt och spårning av rymdobjekt är avgörande för att förhindra kollisioner. Traditionella datorseendealgoritmer har använts omfattande för att lösa problem med upptäckt och spårning i flygning, men på senare tid har djupinlärningsmetoder fått stor användning inom icke rymdrelaterade applikationer på grund av sin höga noggrannhet. Prestandaper-watt och flexibiliteten hos omkonfigurerbara Field-Programmable Gate Arrays (FPGAs) gör dem till en bra kandidat för distribution av djupinlärningsmodeller i rymden, med stöd för uppdateringar och underhåll under flygning. Men FPGAdesignkostnaderna för anpassade acceleratorer för komplexa algoritmer är fortfarande höga. Forskningsfokus för avhandlingen ligger på nya högnivåsyntes (HLS) arbetsflöden som gör det möjligt för utvecklaren att höja abstraktionsnivån och sänka designkostnaderna för acceleratorer för djupinlärning, särskilt för tillämpningar i rymden. För detta har fyra olika hårdvaruacceleratorer för modeller av konvolutionsnätverk för upptäckt av rymdbaserat skrot implementerats (ResNet, SqueezeNet, DenseNet, TinyCNN), med hjälp av öppen källkod HLS-verktyget NNgen. De erhållna hårdvaruacceleratorerna distribueras till en omkonfigurerbar modul av Zynq Ultrascale+ MPSoC-programmerbar logik och jämförs med avseende på inferensprestanda, resursutnyttjande och latens. Testerna på målhardwaren visar en upptäktnoggrannhet på över 95% för ResNet, DenseNet och SqueezeNet, och en lokaliserings-intersektion-över-union på över 0,5 för de djupa modellerna och över 0,7 för TinyCNN för rymdskrotobjekt på en avstånd mellan 1 km och 100 km för en diameter på 1 cm eller mellan 100 km och 1000 km för en diameter på 10 cm. De erhållna hastighetsökningarna i förhållande till endast programvara ligger mellan 3x och 32x för de olika hårdvaruacceleratorerna. Space Situational Awareness Deep Learning Convolutional Neural Networks FieldProgrammable Gate Arrays System-On-Chip Computer Vision Dynamic Partial Reconfiguration High-Level Synthesis Rymdsituationstänksamhet Djupinlärning Konvolutionsnätverk System-On-Chip (SoC) Datorseende Dynamisk partiell omkonfigurering Högnivåsyntes. Elektroteknik och elektronik
103	High-Level-Synthese von Operationseigenschaften Langer, Jan 23 November 2011 (has links) In der formalen Verifikation digitaler Schaltkreise hat sich die Methodik der vollständigen Verifikation anhand spezieller Operationseigenschaften bewährt. Operationseigenschaften beschreiben das Verhalten einer Schaltung in einem festen Zeitintervall und können sequentiell miteinander verknüpft werden, um so das Gesamtverhalten zu spezifizieren. Zusätzlich beweist eine formale Vollständigkeitsprüfung, dass die Menge der Eigenschaften für jede Folge von Eingangssignalwerten die Ausgänge der zu verifizierenden Schaltung eindeutig und lückenlos determiniert. In dieser Arbeit wird untersucht, wie aus Operationseigenschaften, deren Vollständigkeit erfolgreich bewiesen wurde, automatisiert eine Schaltungsbeschreibung abgeleitet werden kann. Gegenüber der traditionellen Entwurfsmethodik auf Register-Transfer-Ebene (RTL) bietet dieses Verfahren zwei Vorteile. Zum einen vermeidet der Vollständigkeitsbeweis viele Arten von Entwurfsfehlern, zum anderen ähnelt eine Beschreibung mit Hilfe von Operationseigenschaften den in Spezifikationen häufig genutzten Zeitdiagrammen, sodass die Entwurfsebene der Spezifikationsebene angenähert wird und Fehler durch manuelle Verfeinerungsschritte vermieden werden. Das Entwurfswerkzeug vhisyn führt die High-Level-Synthese (HLS) einer vollständigen Menge von Operationseigenschaften zu einer Beschreibung auf RTL durch. Die Ergebnisse zeigen, dass sowohl die verwendeten Synthesealgorithmen, als auch die erzeugten Schaltungen effizient sind und somit die Realisierung größerer Beispiele zulassen. Anhand zweier Fallstudien kann dies praktisch nachgewiesen werden. / The complete verification approach using special operation properties is an accepted methodology for the formal verification of digital circuits. Operation properties describe the behavior of a circuit during a certain time interval. They can be sequentially concatenated in order to specify the overall behavior. Additionally, a formal completeness check proves that the sequence of properties consistently determines the exact value of the output signals for every valid sequence of input signal values. This work examines how a circuit description can be automatically derived from a set of operation properties whose completeness has been proven. In contrast to the traditional design flow at register-transfer level (RTL), this method offers two advantages. First, the prove of completeness helps to avoid many design errors. Second, the design of operation properties resembles the design of timing diagrams often used in textual specifications. Therefore, the design level is closer to the specification level and errors caused by refinement steps are avoided. The design tool vhisyn performs the high-level synthesis from a complete set of operation properties to a description at RTL. The results show that both the synthesis algorithms and the generated circuit descriptions are efficient and allow the design of larger applications. This is demonstrated by means of two case studies. info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/006 ddc:006 info:eu-repo/classification/ddc/621.3 ddc:621.3
104	Calcul flottant haute performance sur circuits reconfigurables / High-performance floating-point computing on reconfigurable circuits Pasca, Bogdan Mihai 21 September 2011 (has links) De plus en plus de constructeurs proposent des accélérateurs de calculs à base de circuits reconfigurables FPGA, cette technologie présentant bien plus de souplesse que le microprocesseur. Valoriser cette flexibilité dans le domaine de l'accélération de calcul flottant en utilisant les langages de description de circuits classiques (VHDL ou Verilog) reste toutefois très difficile, voire impossible parfois. Cette thèse a contribué au développement du logiciel FloPoCo, qui offre aux utilisateurs familiers avec VHDL un cadre C++ de description d'opérateurs arithmétiques génériques adapté au calcul reconfigurable. Ce cadre distingue explicitement la fonctionnalité combinatoire d'un opérateur, et la problématique de son pipeline pour une précision, une fréquence et un FPGA cible donnés. Afin de pouvoir utiliser FloPoCo pour concevoir des opérateurs haute performance en virgule flottante, il a fallu d'abord concevoir des blocs de bases optimisés. Nous avons d'abord développé des additionneurs pipelinés autour des lignes de propagation de retenue rapides, puis, à l'aide de techniques de pavages, nous avons conçu de gros multiplieurs, possiblement tronqués, utilisant des petits multiplieurs. L'évaluation de fonctions élémentaires en flottant implique souvent l'évaluation en virgule fixe d'une fonction. Nous présentons un opérateur générique de FloPoCo qui prend en entrée l'expression de la fonction à évaluer, avec ses précisions d'entrée et de sortie, et construit un évaluateur polynomial optimisé de cette fonction. Ce bloc de base a permis de développer des opérateurs en virgule flottante pour la racine carrée et l'exponentielle qui améliorent considérablement l'état de l'art. Nous avons aussi travaillé sur des techniques de compilation avancée pour adapter l'exécution d'un code C aux pipelines flexibles de nos opérateurs. FloPoCo a pu ainsi être utilisé pour implanter sur FPGA des applications complètes. / Due to their potential performance and unmatched flexibility, FPGA-based accelerators are part of more and more high-performance computing systems. However, exploiting this flexibility for accelerating floating-point computations by manually using classical circuit description languages (VHDL or Verilog) is very difficult, and sometimes impossible. This thesis has contributed to the development of the FloPoCo software, a C++ framework for describing flexible FPGA-specific arithmetic operators. This framework explicitly separates the description of the combinatorial functionality of an arithmetic operator, and its pipelining for a given precision, operating frequency and target FPGA.In order to be able to use FloPoCo for designing high performance floating-point operators, we first had to design the optimized basic blocks. We first developed pipelined addition architectures exploiting the fast-carry lines present in modern FPGAs. Next, we focused on multiplication architectures. Using tiling techniques, we proposed novel architectures for large multipliers, but also truncated multipliers, based on the multipliers found in modern FPGA DSP blocks. We also present a generic FloPoCo operator which inputs the expression of a function, its input and output precisions, and builds an optimized polynomial evaluator for the fixed-point evaluation of this function. Using this building block we have designed floating-point operators for the square-root and exponential functions which significantly outperform existing operators. Finally, we also made use of advanced compilation techniques for adapting the execution of a C program to the flexible pipelines of our operators. FPGA Virgule flottante FloPoCo Chemin de données arithmétique Pipeline pour une fréquence donnée Addition pipelinée Additionneur rapide Multiplication Karatsuba-Offman Carré Multiplieur tronqué Multiplication par pavage Virgule fixe Approximation polynomiale Racine carrée flottante Exponentielle flottante Accumulation flottante Schéma d'évaluation de Horner Somme de carrés flottante Synthèse de haut niveau Nid de boucles parfait Multiplication de matrices Jacobi Dilemme du fabricant de table Méthode des différences tabulées Communications pipelinées FPGA Floating-point FloPoCo Arithmetic datapath Frequency-driven pipelining Pipelined addition Short-latency adder Multiplication Karatsuba-Offman Squarer Truncated multiplier Multiplication tiling Fixed-point Polynomial approximation Floating-point square root Floating-point exponential Floating-point accumulation Horner datapath Floating-point sum-of-products High-level synthesis Perfect loop nests Matrix-matrix multiply Jacobi stencil Table maker's dilemma Pipelined communications Pipelined communications

Page generated in 0.045 seconds