• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 906
  • 337
  • 177
  • 171
  • 72
  • 65
  • 55
  • 27
  • 25
  • 19
  • 15
  • 12
  • 10
  • 8
  • 5
  • Tagged with
  • 2147
  • 518
  • 461
  • 311
  • 302
  • 228
  • 226
  • 212
  • 184
  • 183
  • 176
  • 173
  • 167
  • 167
  • 164
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1001

Sistema embarcado para a manutenção inteligente de atuadores elétricos / Embedded systems for intelligent maintenance of electrical actuators

Bosa, Jefferson Luiz January 2009 (has links)
O elevado custo de manutenção nos ambientes industriais motivou pesquisas de novas técnicas para melhorar as ações de reparos. Com a evolução tecnológica, principalmente da eletrônica, que proporcionou o uso de sistemas embarcados para melhorar as atividades de manutenção, estas agregaram inteligência e evoluíram para uma manutenção pró-ativa. Através de ferramentas de processamento de sinais, inteligência artificial e tolerância a falhas, surgiram novas abordagens para os sistemas de monitoramento a serviço da equipe de manutenção. Os ditos sistemas de manutenção inteligente, cuja tarefa é realizar testes em funcionamento (on-line) nos equipamentos industriais, promovem novos modelos de confiabilidade e disponibilidade. Tais sistemas são baseados nos conceitos de tolerância a falhas, e visam detectar, diagnosticar e predizer a ocorrência de falhas. Deste modo, fornece-se aos engenheiros de manutenção a informação antecipada do estado de comportamento do equipamento antes mesmo deste manifestar uma falha, reduzindo custos, aumentando a vida útil e tornando previsível o reparo. Para o desenvolvimento do sistema de manutenção inteligente objeto deste trabalho, foram estudadas técnicas de inteligência artificial (redes neurais artificiais), técnicas de projeto de sistemas embarcados e de prototipação em plataformas de hardware. No presente trabalho, a rede neural Mapas Auto-Organizáveis foi adotada como ferramenta base para detecção e diagnóstico de falhas. Esta foi prototipada numa plataforma de sistema embarcado baseada na tecnologia FPGA (Field Programmable Gate Array). Como estudo de caso, uma válvula elétrica utilizada em dutos para transporte de petróleo foi definida como aplicação alvo dos experimentos. Através de um modelo matemático, um conjunto de dados representativo do comportamento da válvula foi simulado e utilizado como entrada do sistema proposto. Estes dados visam o treinamento da rede neural e visam fornecer casos de teste para experimentação no sistema. Os experimentos executados em software validaram o uso da rede neural como técnica para detecção e diagnóstico de falhas em válvulas elétricas. Por fim, também realizou-se experimentos a fim de validar o projeto do sistema embarcado, comparando-se os resultado obtidos com este aos resultados obtidos a partir de testes em software. Os resultados revelam a escolha correta do uso da rede neural e o correto projeto do sistema embarcado para desempenhar as tarefas de detecção e diagnóstico de falhas em válvulas elétricas. / The high costs of maintenance in industrial environments have motivated research for new techniques to improve repair activities. The technological progress, especially in the electronics field, has provided for the use of embedded systems to improve repair, by adding intelligence to the system and turning the maintenance a proactive activity. Through tools like signal processing, artificial intelligence and fault-tolerance, new approaches to monitoring systems have emerged to serve the maintenance staff, leading to new models of reliability and availability. The main goal of these systems, also called intelligent maintenance systems, is to perform in-operation (on-line) test of industrial equipments. These systems are built based on fault-tolerance concepts, and used for the detection, the diagnosis and the prognosis of faults. They provide the maintenance engineers with information on the equipment behavior, prior to the occurrence of failures, reducing maintenance costs, increasing the system lifetime and making it possible to schedule repairing stops. To develop the intelligent maintenance system addressed in this dissertation, artificial intelligence (neural networks), embedded systems design and hardware prototyping techniques were studied. In this work, the neural network Self-Organizing Maps (SOM) was defined as the basic tool for the detection and the diagnosis of faults. The SOM was prototyped in an embedded system platform based on the FPGA technology (Field Programmable Gate Array). As a case study, the experiments were performed on an electric valve used in a pipe network for oil transportation. Through a mathematical model, a data set representative of the valve behavior was obtained and used as input to the proposed maintenance system. These data were used for neural network training and also provided test cases for system monitoring. The experiments were performed in software to validate the chosen neural network as the technique for the detection and diagnosis of faults in the electrical valve. Finally, experiments to validate the embedded system design were also performed, so as to compare the obtained results to those resulting from the software tests. The results show the correct choice of the neural network and the correct embedded systems design to perform the activities for the detection and diagnosis of faults in the electrical valve.
1002

Cost-effective dynamic repair for FPGAs in real-time systems / Reparo dinâmico de baixo custo para FPGAs em sistemas tempo-real

Santos, Leonardo Pereira January 2016 (has links)
Field-Programmable Gate Arrays (FPGAs) são largamente utilizadas em sistemas digitais por características como flexibilidade, baixo custo e alta densidade. Estas características advém do uso de células de SRAM na memória de configuração, o que torna estes dispositivos suscetíveis a erros induzidos por radiação, tais como SEUs. TMR é o método de mitigação mais utilizado, no entanto, possui um elevado custo tanto em área como em energia, restringindo seu uso em aplicações de baixo custo e/ou baixo consumo. Como alternativa a TMR, propõe-se utilizar DMR associado a um mecanismo de reparo da memória de configuração da FPGA chamado scrubbing. O reparo de FPGAs em sistemas em tempo real apresenta desafios específicos. Além da garantia da computação correta dos dados, esta computação deve se dar completamente dentro do tempo disponível (time-slot), devendo ser finalizada antes do tempo limite (deadline). A diferença entre o tempo de computação dos dados e a deadline é chamado de slack e é o tempo disponível para reparo do sistema. Este trabalho faz uso de scrubbing deslocado dinâmico, que busca maximizar a probabilidade de reparo da memória de configuração de FPGAs dentro do slack disponível, baseado em um diagnóstico do erro. O scrubbing deslocado já foi utilizado com técnicas de diagnóstico de grão fino (NAZAR, 2015). Este trabalho propõe o uso de técnicas de diagnóstico de grão grosso para o scrubbing deslocado, evitando as penalidades de desempenho e custos em área associados a técnicas de grão fino. Circuitos do conjunto MCNC foram protegidos com as técnicas propostas e submetidos a seções de injeção de erros (NAZAR; CARRO, 2012a). Os dados obtidos foram analisados e foram calculadas as melhores posição iniciais do scrubbing para cada um dos circuitos. Calculou-se a taxa de Failure-in-Time (FIT) para comparação entre as diferentes técnicas de diagnóstico propostas. Os resultados obtidos confirmaram a hipótese inicial deste trabalho que a redução do número de bits sensíveis e uma baixa degradação do período do ciclo de relógio permitiram reduzir a taxa de FIT quando comparadas com técnicas de grão fino. Por fim, uma comparação entre as três técnicas propostas é feita, analisando o desempenho e custos em área associados a cada uma. / Field-Programmable Gate Arrays (FPGAs) are widely used in digital systems due to characteristics such as flexibility, low cost and high density. These characteristics are due to the use of SRAM memory cells in the configuration memory, which make these devices susceptible to radiation-induced errors, such as SEUs. TMR is the most used mitigation technique, but it has an elevated cost both in area as well as in energy, restricting its use in low cost/low energy applications. As an alternative to TMR, we propose the use of DMR associated with a repair mechanism of the FPGA configuration memory called scrubbing. The repair of FPGA in real-time systems present a specific set of challenges. Besides guaranteeing the correct computation of data, this computation must be completely carried out within the available time (time-slot), being finalized before a time limit (deadline). The difference between the computation time and the deadline is called the slack and is the time available to repair the system. This work uses a dynamic shifted scrubbing that aims to maximize the repair probability of the configuration memory of the FPGA within the available slack based on error diagnostic. The shifted scrubbing was already proposed with fine-grained diagnostic techniques (NAZAR, 2015). This work proposes the use of coarse-grained diagnostic technique as a way to avoid the performance penalties and area costs associated to fine-grained techniques. Circuits of the MCNC suite were protected by the proposed techniques and subject to error-injection campaigns (NAZAR; CARRO, 2012a). The obtained data was analyzed and the best scrubbing starting positions for each circuit were calculated. The Failure-in-Time (FIT) rates were calculated to compare the different proposed diagnostic techniques. The obtained results validated the initial hypothesis of this work that the reduction of the number of sensitive bits and a low degradation of the clock cycle allowed a reduced FIT rate when compared with fine-grained diagnostic techniques. Finally, a comparison is made between the proposed techniques, considering performance and area costs associated to each one.
1003

Development of electronic systems for ultrasonic particle manipulation

Wang, Han January 2015 (has links)
Demands to handle individual particles or particle agglomerates have been emerging in the fields of biology and chemistry, and particle trapping and manipulation with mechanical waves generated from ultrasound sources, known as “acoustic tweezing”, has gained great interest by researchers and been proved useful for its unique advantages. With an analogy to optical tweezing, research has demonstrated the possibility to use modulated acoustic fields generated by ultrasound arrays for trapping individual particles and groups of particles at length scales from hundreds of µm to a few mm. This thesis explores and demonstrates particle trapping and manipulation with electronically-controlled miniaturized ultrasound arrays (element pitch around 500 µm or less), focusing on the development of dexterous electronic systems. Generally, in acoustic manipulation applications, low voltage outputs with continuous mode operation are required to create stable acoustic energy potential “landscapes” for trapping without damaging particles or cells. The research work of this thesis is oriented towards integration of control electronics with miniaturized ultrasound arrays. Test fixtures have been carefully designed and fabricated for the characterization of transducer arrays developed by collaborating researchers and array-controlled particle manipulation experiments have been demonstrated with customized fluorescence microscopy equipment. Most importantly, this thesis has established two versions of prototype Field programmable gate array (FPGA) based electronics to drive ultrasound arrays. One is a computer-controlled 16-channel system, with adjustable output frequencies, phases and amplitudes. Another is a 40-channel switching electronics for manual controlled output switching or time-shared output multiplexing. The electronic systems that have been developed are highly scalable and easily adapted for different acoustic tweezing applications. In conclusion, this thesis has proposed prototype electronic toolkits as research platforms to explore diverse possibilities for acoustic tweezing with miniaturized ultrasound arrays.
1004

Detection algorithms and FPGA implementations for SC-FDMA uplink receivers

Hänninen, T. (Tuomo) 29 June 2018 (has links)
Abstract The demand in mobile broadband communications is increasing dramatically. It is expected that 1000 times more mobile-network capacity will be needed within 10 years. Multiple-input, multiple-output (MIMO) antenna configuration and spatial multiplexing are among the essential techniques for reaching the targets. This creates motivation for study of advanced receivers for combating inter-antenna interference (IAI) and inter-symbol interference (ISI). While various receiver structures have been extensively considered for MIMO receivers, the emphasis has been on those operating in downlink orthogonal frequency-division multiple access (OFDM) systems, wherein ISI is not a problem. Advanced receiver structures for single-carrier frequency-division multiple access (SC-FDMA) uplink systems were studied and analysed. Various receivers were compared via MATLAB simulations, with the objective being to gain solid understanding of how they perform in different channel environments. An efficient combination of IAI and ISI equalisation for SC-FDMA receivers is proposed. The proposed receiver architecture is shown to be a considerable improvement over the conventional linear minimum mean-square error (LMMSE) receiver. Several MIMO detector algorithms and their performance–complexity characteristics are presented. The K-best algorithm with a list size of 8 is shown to be the best option for practical MIMO detector implementation of this receiver in the 4x4 MIMO 64-level quadrature amplitude modulation (QAM) scenario. The second objective involved examining the implementation aspects of the 8-best receiver to achieve good understanding of the complexity of various implementation architectures. It emerged that avoiding the sorting operation in the 8-best list sphere detector (LSD) tree-search algorithm implementation is not recommendable in the 4x4 MIMO 64-QAM scenario. Several field-programmable gate array (FPGA) implementations were carried out, with a range of high-level synthesis (HLS) tools. It is shown that HLS tools have improved significantly and are especially favourable for prototyping of large designs. Additionally, the importance of FPGA technology selection is addressed. Smaller silicon technology should be exploited if base-station baseband processing power consumption is to be minimised. The potential performance or complexity-related gain with the latest FPGAs should be taken into account in comparison of the performance–complexity characteristics of the algorithms. Differences of a few tens of per cent in estimated complexity or performance between two algorithms are often below the threshold of what can be gained or lost in the practical implementation process. / Tiivistelmä Tiheään asuttujen kaupunkien uudet langattomat palvelut tarvitsevat tietoliikenneverkkoja, jotka mahdollistavat suuremman tiedonsiirtonopeuden ja kapasiteetin kuin sen, jonka nykyiset mobiiliverkot voivat tarjota. On arveltu, että mobiiliverkkojen kapasiteetin tarve tuhatkertaistuu seuraavan kymmenen vuoden aikana. Tuhatkertainen kapasiteetti on arvioitu saavutettavan kasvattamalla kolmea eri osa-aluetta kymmenkertaiseksi: taajuusspektrin määrä, spektrin käytön tehokkuus sekä tukiasematiheys. Tämä väitöskirja keskittyy spektrin käytön tehokkuuden kasvattamiseen. Moniantennitoteutus (multiple-input multiple-output, MIMO) on siinä välttämätön. MIMO-tekniikkaa hyödyntävien solukkojärjestelmien tukiasemavastaanottimissa tarvitaan melko monimutkainen kanavakorjain sekä ilmaisin, joiden algoritmien optimointi ja toteutus ymmärretään vielä sangen puutteellisesti. Väitöskirjatutkimuksen päätavoitteena on tutkia edistyksellisiä vastaanotinrakenteita, joilla saavutetaan LTE-A-standardin tavoitetiedonsiirtonopeus kohtuullisella kompleksisuudella. Työssä keskitytään ns. nousevaan siirtosuuntaan (uplink) eli päätelaitteesta tukiasemaan tapahtuvaan tiedonsiirtoon, jossa käytetään yhden kantoaallon taajuusjakomonikäyttötekniikkaa (single-carrier frequency-division multiple-access, SC-FDMA) ortognaalisen taajuusjakomonikäytön (orthogonal frequency division multiple access, OFDMA) sijaan. Eri vastaanotinrakenteita ja näiden ilmaisinalgoritmeja vertaillaan tietokonesimuloinnein MATLAB-ympäristössä. Väitöskirjassa ehdotetaan kaksiosaista vastaanotinrakennetta, jossa antennien välinen keskinäishäiriö (inter antenna interference, IAI) ja symbolien välinen keskinäisvaikutus (intersymbol interference, ISI) poistetaan kahdessa eri vaiheessa. Tietokoneimulaatiot osoittavat ko. rakenteen parantavan suorituskykyä huomattavasti perinteiseen lineaariseen keskineliövirheen minimoivaan (linear minimum mean square error, LMMSE) vastaanottimeen verrattuna. Nk. K parasta polkua valitsevan MIMO-ilmaisinalgoritmin listan koolla kahdeksan todetaan tarjoavan 4x4 MIMO 64-tasoisen kvadratuuriamplitudimodulaation (quadrature amplitude modulation, QAM) ympäristössä parhaan kompromissin suorituskyvyn ja kompleksisuuden suhteen. Käytännön toteutettavuuden kannalta keskitytään ohjelmoitavaan digitaalipiiritoteutukseen (field-programmable gate array, FGPA) ja ns. korkean tason synteesi (high-level synthesis, HLS) -työkalujen käyttöön vastaanottimen suunnittelussa. K parasta polkua valitsevan MIMO-ilmaisinalgoritmin arkkitehtuurivertailut osoittavat, että sinänsä vaativaa lajittelualgoritmia ei aina kannata yrittää välttää kirjallisuudessa aikaisemmin ehdotetulla ratkaisulla. Useita eri HLS työkaluja käytetään FPGA toteutuksissa ja todetaan että työkalut ovat kehittyneet huomattavasti viimeisen kahdeksan vuoden aikana. Lisäksi todetaan, että 16 nm viivanleveyden piireillä voidaan saavuttaa noin 15 % suurempi ilmaisunopeus ja 60 % pienempi tehonkulutus verrattuna 28 nm viivanleveyttä käyttäviin piireihin. Erityisesti potentiaali tehonkulutuksen minimoiseksi kannattaa hyödyntää, mikäli signaalinkäsittely näyttelee merkittävää roolia vastaanottimen kokonaistehonkulutuksessa. Kokonaisuutena todetaan, että toteutukseen liittyvät valinnat sekä vaikutus lopputulokseen, tulisi ottaa huomioon jo algoritmien valinnassa. Pieni ero kahden eri algoritmin suorituskyvyn välillä häviää helposti toteutusvaiheen ratkaisujen vaikutusten alle.
1005

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

January 2018 (has links)
abstract: The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility. As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance. Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance. Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
1006

A Flexible FPGA-Assisted Framework for Remote Attestation of Internet Connected Embedded Devices

Patten, Jared Russell 01 March 2018 (has links)
Embedded devices permeate our every day lives. They exist in our vehicles, traffic lights, medical equipment, and infrastructure controls. In many cases, improper functionality of these devices can present a physical danger to their users, data or financial loss, etc. Improper functionality can be a result of software or hardware bugs, but now more than ever, is often the result of malicious compromise and tampering, or as it is known colloquially "hacking". We are beginning to witness a proliferation of cyber-crime, and as more devices are built with internet connectivity (in the so called "Internet of Things"), security should be of the utmost concern. Embedded devices have begun to seamlessly merge with our daily existence. Therefore the need for security grows as it more directly affects the safety of our data, property, and even physical health. This thesis presents an FPGA-assisted framework for remote attestation, a security service that allows a remote device to prove to a verifying entity that it can be trusted. In other words, it presents a protocol by which a device (be it an insulin pump, vehicle, etc.) can prove to a user (or other entity) that it can be trusted - i.e. that it has not been "hacked". This is accomplished through executable code integrity verification and run-time monitoring. In essence, the protocol verifies that a device is running authorized and untampered software and makes it known to a verifier in a trusted fashion. We implement the protocol on a physical device to demonstrate its feasibility and to examine its performance impact.
1007

Neutron Beam Testing Methodology and Results for a Complex Programmable Multiprocessor SoC

Anderson, Jordan Daniel 01 March 2019 (has links)
The Xilinx Multiprocessor System-on-Chip (MPSoC) is a complex device that uses 16nm FinFET technology to combine multiple processors, a large amount of FPGA resources, and many I/O interfaces on a single chip die. These features make the MPSoC a high-performance and architecturally flexible device. The potential computing power makes the MPSoC ideal for many embedded applications including terrestrial and space applications. The MPSoC, however, does not have extensive radiation history as many other devices have. The extent of the effect that ionized particles may have on the MPSoC is not well established. To solve this problem, neutron radiation testing can be used to determine the device's susceptibility to single-event upsets (SEUs). Though this thesis is not intended to qualify the MPSoC for space, this work does provide useful neutron radiation test data that helps to characterize the susceptible nature of the device. This thesis summarizes the SEU results obtained from neutron testing on the UltraScale+ MPSoC ZU9EG device. A series of three neutron beam tests were performed on the MPSoC ZU9EG at Los Alamos National Laboratories (LANL). Testing was performed using a novel testing methodology to collect SEU counts on the programmable logic and the processing system simultaneously. These results show a 10.1× improvement of the programmable logic CRAM over the previous Xilinx UltraScale device series.
1008

Neutron Beam Testing Methodology and Results for a Complex Programmable Multiprocessor SoC

Anderson, Jordan Daniel 01 March 2019 (has links)
The Xilinx Multiprocessor System-on-Chip (MPSoC) is a complex device that uses 16nm FinFET technology to combine multiple processors, a large amount of FPGA resources, and many I/O interfaces on a single chip die. These features make the MPSoC a high-performance and architecturally flexible device. The potential computing power makes the MPSoC ideal for many embedded applications including terrestrial and space applications.The MPSoC, however, does not have extensive radiation history as many other devices have. The extent of the effect that ionized particles may have on the MPSoC is not well established. To solve this problem, neutron radiation testing can be used to determine the device's susceptibility to single-event upsets (SEUs). . Though this thesis is not intended to qualify the MPSoC for space, this work does provide useful neutron radiation test data that helps to characterize the susceptible nature of the device. This thesis summarizes the SEU results obtained from neutron testing on the UltraScale+ MPSoC ZU9EG device. A series of three neutron beam tests were performed on the MPSoC ZU9EG at Los Alamos National Laboratories (LANL). Testing was performed using a novel testing methodology to collect SEU counts on the programmable logic and the processing system simultaneously. These results show a $10.1 timess improvement of the programmable logic CRAM over the previous Xilinx UltraScale device series.
1009

The Hybrid Architecture Parallel Fast Fourier Transform (HAPFFT)

Palmer, Joseph M. 16 June 2005 (has links)
The FFT is an efficient algorithm for computing the DFT. It drastically reduces the cost of implementing the DFT on digital computing systems. Nevertheless, the FFT is still computationally intensive, and continued technological advances of computers demand larger and faster implementations of this algorithm. Past attempts at producing high-performance, and small FFT implementations, have focused on custom hardware (ASICs and FPGAs). Ultimately, the most efficient have been single-chipped, streaming I/O, pipelined FFT architectures. These architectures increase computational concurrency through the use of hardware pipelining. Streaming I/O, pipelined FFT architectures are capable of accepting a single data sample every clock cycle. In principle, the maximum clock frequency of such a circuit is limited only by its critical delay path. The delay of the critical path may be decreased by the addition of pipeline registers. Nevertheless this solution gives diminishing returns. Thus, the streaming I/O, pipelined FFT is ultimately limited in the maximum performance it can provide. Attempts have been made to map the Parallel FFT algorithm to custom hardware. Yet, the Parallel FFT was formulated and optimized to execute on a machine with multiple, identical, processing elements. When executed on such a machine, the FFT requires a large expense on communications. Therefore, a direct mapping of the Parallel FFT to custom hardware results in a circuit with complex control and global data movement. This thesis proposes the Hybrid Architecture Parallel FFT (HAPFFT) as an alternative. The HAPFFT is an improved formulation for building Parallel FFT custom hardware modules. It provides improved performance, efficient resource utilization, and reduced design time. The HAPFFT is modular in nature. It includes a custom front-end parallel processing unit which produces intermediate results. The intermediate results are sent to multiple, independent FFT modules. These independent modules form the back-end of the HAPFFT, and are generic, meaning that any prexisting FFT architecture may be used. With P back-end modules a speedup of P will be achieved, in comparison to an FFT module composed solely of a single module. Furthermore, the HAPFFT defines the front-end processing unit as a function of P. It hides the high communication costs typically seen in Parallel FFTs. Reductions in control complexity, memory demands, and logical resources, are achieved. An extraordinary result of the HAPFFT formulation is a sublinear area-time growth. This phenomenon is often also called superlinear speedup. Sublinear area-time growth and superlinear speedup are equivalent terms. This thesis will subsequently use the term superlinear speedup to refer to the HAPFFT's outstanding speedup behavior. A further benefit resulting from the HAPFFT formulation is reduced design time. Because the HAPFFT defines only the front-end module, and because the back-end parallel modules may be composed of any preexisting FFT modules, total design time for a HAPFFT is greatly reduced
1010

Using Source-to-Source Transformations to Add Debug Observability to HLS-Synthesized Circuits

Monson, Joshua Scott 01 March 2016 (has links)
This dissertation introduces a novel approach for exposing the internal, source-level expressions of circuits generated by high-level synthesis (HLS) for in-circuit debug. The approach uses source-to-source transformations to instrument specific source-level expressions with debug ports. These debug ports allow a user to connect a debugging instrument (e.g. an embedded logic analyzer) to record the activity of the expression corresponding to the debug port. This dissertation demonstrates that a debugging solution based on these source-to-source transformations is feasible and that individual debug ports can be added for a cost of a 1-2% increase in circuit area on average. It also introduces another transformation that permits pointer-valued expressions to be instrumented for debug. It is demonstrated that all pointers in the CHStone benchmarks can be instrumented for an average 4% increase in circuit area. The debug port transformations are demonstrated on two HLS tools – Vivado HLS and Legup. The architecture of the source-to-source compiler allowed the necessary adaptations for the second tool (Legup) to be implemented using a minimal amount of additional code. Due to limitations in the Legup compiler an additional optimization was added to reduce the latency overhead incurred by the debug ports. User manuals and other documentation from 10 additional C-based HLS tools is examined to determine whether they are amenable to debug instrumentation using the source-to-source transformations. Of the 10 additional HLS tools examined, 6 were amenable to the transformations, 3 were likely to be amenable, and 1 was not. This dissertation estimates the cost of a complete debugging solution (i.e. one with debug ports and a debugging instrument) and identifies a possible worst case bound for adding debug ports. Finally, this dissertation analyzes two different debugging instruments and determines which instrument would be best for most HLS circuit mapped to FPGAs. It then estimates the overhead of this debugging solution.

Page generated in 0.0478 seconds