• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 6
  • 1
  • Tagged with
  • 15
  • 15
  • 8
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic

Khayyat, Ahmad 06 August 2013 (has links)
The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources. The design is constructed by identifying the main design decisions and evaluating the alternatives for each one. The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations in a block multiplication, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated analytically in terms of their performance and utilization of FPGA resources. Based on this analysis, a detailed, flexible design that accommodates various alternative design choices is described. The design is optimized for matrices of floating-point elements, and for the FPGA target platform. Prior work is analyzed based on the considered design choices in order to identify the similarities and the differences. The proposed design is implemented using the VHDL hardware description language. The implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation using the ModelSim logic simulator, and in hardware through compiling the implementation using the Altera Quartus II CAD software and testing it on the Altera DE4 board, featuring a Stratix IV EP4SGX530C2 FPGA device. The implementation supports a range of parameters to facilitate the experimental evaluation of design choices. Experimental results show that the design scales linearly with respect to the consumed resources. Although increasing the system size reduces the maximum operating frequency, it also increases the parallelism, resulting in a higher performance. For instance, with 8 floating-point arithmetic units, the system runs at 320 MHz, which corresponds to a performance of 4 GFLOPS, whereas with 64 arithmetic units, it runs at 160 MHz, which corresponds to a performance of 16 GFLOPS. It is also shown that using a transfer schedule based on inner products reduces the transfer time by up to 50% compared to other schedules. Although using square blocks minimizes the number of required block multiplications, other non-square blocks minimize the transfer time, resulting in better total times. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-08-03 12:46:13.484
2

Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic

Khayyat, Ahmad 06 August 2013 (has links)
The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources. The design is constructed by identifying the main design decisions and evaluating the alternatives for each one. The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations in a block multiplication, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated analytically in terms of their performance and utilization of FPGA resources. Based on this analysis, a detailed, flexible design that accommodates various alternative design choices is described. The design is optimized for matrices of floating-point elements, and for the FPGA target platform. Prior work is analyzed based on the considered design choices in order to identify the similarities and the differences. The proposed design is implemented using the VHDL hardware description language. The implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation using the ModelSim logic simulator, and in hardware through compiling the implementation using the Altera Quartus II CAD software and testing it on the Altera DE4 board, featuring a Stratix IV EP4SGX530C2 FPGA device. The implementation supports a range of parameters to facilitate the experimental evaluation of design choices. Experimental results show that the design scales linearly with respect to the consumed resources. Although increasing the system size reduces the maximum operating frequency, it also increases the parallelism, resulting in a higher performance. For instance, with 8 floating-point arithmetic units, the system runs at 320 MHz, which corresponds to a performance of 4 GFLOPS, whereas with 64 arithmetic units, it runs at 160 MHz, which corresponds to a performance of 16 GFLOPS. It is also shown that using a transfer schedule based on inner products reduces the transfer time by up to 50% compared to other schedules. Although using square blocks minimizes the number of required block multiplications, other non-square blocks minimize the transfer time, resulting in better total times. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-08-03 12:46:13.484
3

Real-Time Operating System Hardware Extension Core for System-on-Chip Designs

Best, Joel 08 January 2013 (has links)
This thesis presents a real-time operating system hardware extension core which supports the integration of hardware accelerators into real-time system-on-chip designs as hardware tasks. The hardware extension core utilizes reconfigurable logic to manage synchronization events, data transfers, and hardware task control. A reduction in interrupt latency, frequency, and execution time provides performance and predictability improvements for real-time applications. Required communication between the CPU and hardware accelerators is also reduced significantly. Compared to a software implementation, synthetic benchmarks of common synchronization tasks show up to a 41% increase in synchronization performance. Analysis of a test case design for audio encoding and encryption using three hardware accelerators shows results of a 2.89x throughput improvement in comparison to the use of software device driver tasks. Overall, this design simplifies the integration of hardware accelerators into real-time system-on-chip designs while improving the performance and predictability of these systems.
4

Geração de b-splines via FPGA / B-spline generation via FPGA

Silva, Luiz Marcelo Chiesse da 10 August 2012 (has links)
As b-splines são utilizadas em sistemas CAD/CAM/CAE para representar e definir curvas e superfícies complexas, sendo adotada pelos principais padrões da computação gráfica devido a características como representação matemática de forma compacta, flexibilidade e transformações afins. Em sistemas de aquisição de dados 3D e sistemas CAM-CNC integrados, a utilização da b-spline na transferência de informações geométricas e na reconstrução da superfície de objetos resulta em um significativo incremento na eficiência do processo, geralmente implementado em sistemas embarcados. Nestes sistemas embarcados, integrados no auxílio a máquinas de manufatura, a utilização de FPGAs é incipiente, sem circuitos para b-splines disponibilizados em lógica reconfigurável de circuito aberto (open core), razão pela qual este projeto propõe o desenvolvimento de um circuito de geração b-spline aberto, em um sistema embarcado FPGA, utilizando algoritmos adaptados para os circuitos, elaborados em linguagem Verilog HDL, padronizada para a síntese de circuitos em lógica reconfigurável. Os circuitos foram desenvolvidos, utilizando-se um barramento de dados padronizado em circuito aberto, nas seguintes implementações para processamento paralelo das b-splines: o BFEA, o método baseado em funções base fixas, ambos projetados para circuitos integrados, e o fast Cox-de Boor, desenvolvido para FPGAs. Foram comparados o tempo de execução e o consumo de recursos disponíveis no FPGA utilizado, entre cada implementação. Os resultados evidenciaram que os circuitos de funções base fixas apresentaram o processamento mais rápido para a geração de b-splines em um FPGA, com um tempo de execução em média 20% menor em relação às outras implementações. Os circuitos BFEA apresentaram a menor utilização de elementos lógicos, em média 50% menor em relação aos outros circuitos implementados. O circuito fast Cox-de Boor apresentou a melhor escalabilidade, devido à modularidade da implementação, com tempos de execução similares aos circuitos de funções base fixas. / The b-splines are used in CAD/CAM/CAE systems to represent and define complex curves and surfaces, being adopted by the main computer graphics standards due to features like compact mathematic representation, flexibility and affine transformations. In 3D acquisition systems and integrated CAM-CNC systems, the use of the b-spline in the geometric information data transfer and in the object surface reconstruction results in a increase in the process efficiency, generally implemented in embedded systems. In these embedded systems, integrated in the aid to manufacturing machines, the use of FPGAs is incipient, without available b-splines open core circuits in reconfigurable logic, the reason why this project propose the development of a b-spline generation open core circuit, in a FPGA embedded system, using adaptated algorithms for the circuits, made in Verilog HDL language, standardized for the circuit synthesis in reconfigurable logic. The circuits were developed, using an open core standardized data bus, in the following implementations of b-spline parallel processing: the BFEA, fixed basis functions based method, both designed for integrated circuits, and the fast Cox-de Boor, developed for FPGAs. The execution time and available resource consumption in the FPGA were compared, between each implementation. The results show that the fixed basis functions circuits presented the fastest processing for the b-splines generation in a FPGA, with a 20% mean execution time reduction in relation to the other implementations. The BFEA circuits presented the lowest logic elements use, in mean 50% fewer in relation to the other implemented circuits. The fast Cox-De Boor circuit presented the best scalability, due to the implementation modularity, with execution times similar to the fixed basis functions circuits.
5

Nouvelles Architectures Hybrides : Logique / Mémoires Non-Volatiles et technologies associées. / Novel Hybrid Logic / Non-Volatile memory Architectures and associated technologies

Palma, Giorgio 29 November 2013 (has links)
Les nouvelles approches de technologies mémoires permettront une intégration dite back-end, où les cellules élémentaires de stockage seront fabriquées lors des dernières étapes de réalisation à grande échelle du circuit. Ces approches innovantes sont souvent basées sur l'utilisation de matériaux actifs présentant deux états de résistance distincts. Le passage d'un état à l'autre est contrôlé en courant ou en tension donnant lieu à une caractéristique I-V hystérétique. Nos mémoires résistives sont composées d'argent en métal électrochimiquement actif et de sulfure amorphe agissant comme électrolyte. Leur fonctionnement repose sur la formation réversible et la dissolution d'un filament conducteur. Le potentiel d'application de ces nouveaux dispositifs n'est pas limité aux mémoires ultra-haute densité mais aussi aux circuits embarqués. En empilant ces mémoires dans la troisième dimension au niveau des interconnections des circuits logiques CMOS, de nouvelles architectures hybrides et innovantes deviennent possibles. Il serait alors envisageable d'exploiter un fonctionnement à basse énergie, à haute vitesse d'écriture/lecture et de haute performance telles que l'endurance et la rétention. Dans cette thèse, en se concentrant sur les aspects de la technologie de mémoire en vue de développer de nouvelles architectures, l'introduction d'une fonctionnalité non-volatile au niveau logique est démontrée par trois circuits hybrides: commutateurs de routage non volatiles dans un Field Programmable Gate Arrays, un 6T-SRAM non volatile, et les neurones stochastiques pour un réseau neuronal. Pour améliorer les solutions existantes, les limitations de la performances des dispositifs mémoires sont identifiés et résolus avec des nouveaux empilements ou en fournissant des défauts de circuits tolérants. / Novel approaches in the field of memory technology should enable backend integration, where individual storage nodes will be fabricated during the last fabrication steps of the VLSI circuit. In this case, memory operation is often based upon the use of active materials with resistive switching properties. A topology of resistive memory consists of silver as electrochemically active metal and amorphous sulfide acting as electrolyte and relies on the reversible formation and dissolution of a conductive filament. The application potential of these new memories is not limited to stand-alone (ultra-high density), but is also suitable for embedded applications. By stacking these memories in the third dimension at the interconnection level of CMOS logic, new ultra-scalable hybrid architectures becomes possible which exploit low energy operation, fast write/read access and high performance with respect to endurance and retention. In this thesis, focusing on memory technology aspects in view of developing new architectures, the introduction of non-volatile functionality at the logic level is demonstrated through three hybrid (CMOS logic ReRAM devices) circuits: nonvolatile routing switches in a Field Programmable Gate Array, nonvolatile 6T-SRAMs, and stochastic neurons of an hardware neural network. To be competitive or even improve existing solutions, limitations on the memory devices performances are identified and solved by stack engineering of CBRAM devices or providing faults tolerant circuits.
6

Geração de b-splines via FPGA / B-spline generation via FPGA

Luiz Marcelo Chiesse da Silva 10 August 2012 (has links)
As b-splines são utilizadas em sistemas CAD/CAM/CAE para representar e definir curvas e superfícies complexas, sendo adotada pelos principais padrões da computação gráfica devido a características como representação matemática de forma compacta, flexibilidade e transformações afins. Em sistemas de aquisição de dados 3D e sistemas CAM-CNC integrados, a utilização da b-spline na transferência de informações geométricas e na reconstrução da superfície de objetos resulta em um significativo incremento na eficiência do processo, geralmente implementado em sistemas embarcados. Nestes sistemas embarcados, integrados no auxílio a máquinas de manufatura, a utilização de FPGAs é incipiente, sem circuitos para b-splines disponibilizados em lógica reconfigurável de circuito aberto (open core), razão pela qual este projeto propõe o desenvolvimento de um circuito de geração b-spline aberto, em um sistema embarcado FPGA, utilizando algoritmos adaptados para os circuitos, elaborados em linguagem Verilog HDL, padronizada para a síntese de circuitos em lógica reconfigurável. Os circuitos foram desenvolvidos, utilizando-se um barramento de dados padronizado em circuito aberto, nas seguintes implementações para processamento paralelo das b-splines: o BFEA, o método baseado em funções base fixas, ambos projetados para circuitos integrados, e o fast Cox-de Boor, desenvolvido para FPGAs. Foram comparados o tempo de execução e o consumo de recursos disponíveis no FPGA utilizado, entre cada implementação. Os resultados evidenciaram que os circuitos de funções base fixas apresentaram o processamento mais rápido para a geração de b-splines em um FPGA, com um tempo de execução em média 20% menor em relação às outras implementações. Os circuitos BFEA apresentaram a menor utilização de elementos lógicos, em média 50% menor em relação aos outros circuitos implementados. O circuito fast Cox-de Boor apresentou a melhor escalabilidade, devido à modularidade da implementação, com tempos de execução similares aos circuitos de funções base fixas. / The b-splines are used in CAD/CAM/CAE systems to represent and define complex curves and surfaces, being adopted by the main computer graphics standards due to features like compact mathematic representation, flexibility and affine transformations. In 3D acquisition systems and integrated CAM-CNC systems, the use of the b-spline in the geometric information data transfer and in the object surface reconstruction results in a increase in the process efficiency, generally implemented in embedded systems. In these embedded systems, integrated in the aid to manufacturing machines, the use of FPGAs is incipient, without available b-splines open core circuits in reconfigurable logic, the reason why this project propose the development of a b-spline generation open core circuit, in a FPGA embedded system, using adaptated algorithms for the circuits, made in Verilog HDL language, standardized for the circuit synthesis in reconfigurable logic. The circuits were developed, using an open core standardized data bus, in the following implementations of b-spline parallel processing: the BFEA, fixed basis functions based method, both designed for integrated circuits, and the fast Cox-de Boor, developed for FPGAs. The execution time and available resource consumption in the FPGA were compared, between each implementation. The results show that the fixed basis functions circuits presented the fastest processing for the b-splines generation in a FPGA, with a 20% mean execution time reduction in relation to the other implementations. The BFEA circuits presented the lowest logic elements use, in mean 50% fewer in relation to the other implemented circuits. The fast Cox-De Boor circuit presented the best scalability, due to the implementation modularity, with execution times similar to the fixed basis functions circuits.
7

Twill: A Hybrid Microcontroller-FPGA Framework for Parallelizing Single- Threaded C Programs

Gallatin, Douglas S. 01 March 2014 (has links)
Increasingly System-On-A-Chip platforms which incorporate both micropro- cessors and re-programmable logic are being utilized across several fields ranging from the automotive industry to network infrastructure. Unfortunately, the de- velopment tools accompanying these products leave much to be desired, requiring knowledge of both traditional embedded systems languages like C and hardware description languages like Verilog. We propose to bridge this gap with Twill, a truly automatic hybrid compiler that can take advantage of the parallelism inherent in these platforms. Twill can extract long-running threads from single threaded C code and distribute these threads across the hardware and software domains to more fully utilize the asymmetric characteristics between processors and the embedded reconfigurable logic fabric. We show that Twill provides a sig- nificant performance increase on the CHStone benchmarks with an average 1.63 times increase over the pure hardware approach and an increase of 22.2 times on average over the pure software approach while reducing the area required by the reconfigurable logic by on average 1.73 times compared to the pure hardware approach.
8

Metodologia de projeto de sistemas dinamicamente reconfiguráveis. / Design methodologies of dynamically reconfigurable systems.

Leandro Kojima 20 April 2007 (has links)
FPGAs (Field Programmable Gate Arrays) dinamicamente reconfiguráveis (DR-FPGAs) são soluções promissoras para muitos sistemas embarcados devido a potencial redução de área de silício. Metodologias de projeto e ferramentas CAD relacionadas são ainda muito limitadas para auxiliarem os projetistas a encontrarem soluções dinamicamente reconfiguráveis para diferentes aplicações. Este trabalho propõe uma metodologia de projeto que combina modelos de alto nível em SystemC, técnicas de projeto de baixo nível e a metodologia de projeto modular da XILINX. SystemC foi utilizada para representar o comportamento de alto nível não temporizado e não-RTL, bem como o baixo nível RTL-DCS (Chaveamento Dinâmico de Circuitos). Um estudo de caso da Banda Base de um Controlador Bluetooth foi desenvolvido. Duas partições temporais foram testadas em nove diferentes DR-FPGAs. A exploração espacial mostrou que 33% das soluções investigadas atenderam a restrição da especificação de 625µs de tempo do quadro do pacote Bluetooth, deixando diferentes parcelas de recursos livres que podem ser explorados para acomodar outros módulos IP de sistemas mais complexos no mesmo dispositivo. / Dynamically Reconfigurable Field Programmable Gate Arrays (DR-FPGAs) are promising solutions for many embedded systems due to the potential silicon area reduction. Design methodologies and related CAD tools are still very limited to assist designers to encounter dynamically reconfigurable solutions for different applications. This work proposes a design methodology that combines high level SystemC models and design techniques with the low level modular design proposed by Xilinx. SystemC has been used to represent the high level untimed non-RTL behavior as well as the low level RTL-DCS (Dynamic Circuit Switching). A Bluetooth Baseband unit case study was performed. Two temporal-functional partitions were evaluated on nine different target DR-FPGAs. The design space exploration showed that 33% of the investigated solutions complied with the 625µs Bluetooth packet time frame specification leaving different amounts if free resources that may be explored to accommodate other IP modules of more complex systems on the same device.
9

Metodologia de projeto de sistemas dinamicamente reconfiguráveis. / Design methodologies of dynamically reconfigurable systems.

Kojima, Leandro 20 April 2007 (has links)
FPGAs (Field Programmable Gate Arrays) dinamicamente reconfiguráveis (DR-FPGAs) são soluções promissoras para muitos sistemas embarcados devido a potencial redução de área de silício. Metodologias de projeto e ferramentas CAD relacionadas são ainda muito limitadas para auxiliarem os projetistas a encontrarem soluções dinamicamente reconfiguráveis para diferentes aplicações. Este trabalho propõe uma metodologia de projeto que combina modelos de alto nível em SystemC, técnicas de projeto de baixo nível e a metodologia de projeto modular da XILINX. SystemC foi utilizada para representar o comportamento de alto nível não temporizado e não-RTL, bem como o baixo nível RTL-DCS (Chaveamento Dinâmico de Circuitos). Um estudo de caso da Banda Base de um Controlador Bluetooth foi desenvolvido. Duas partições temporais foram testadas em nove diferentes DR-FPGAs. A exploração espacial mostrou que 33% das soluções investigadas atenderam a restrição da especificação de 625µs de tempo do quadro do pacote Bluetooth, deixando diferentes parcelas de recursos livres que podem ser explorados para acomodar outros módulos IP de sistemas mais complexos no mesmo dispositivo. / Dynamically Reconfigurable Field Programmable Gate Arrays (DR-FPGAs) are promising solutions for many embedded systems due to the potential silicon area reduction. Design methodologies and related CAD tools are still very limited to assist designers to encounter dynamically reconfigurable solutions for different applications. This work proposes a design methodology that combines high level SystemC models and design techniques with the low level modular design proposed by Xilinx. SystemC has been used to represent the high level untimed non-RTL behavior as well as the low level RTL-DCS (Dynamic Circuit Switching). A Bluetooth Baseband unit case study was performed. Two temporal-functional partitions were evaluated on nine different target DR-FPGAs. The design space exploration showed that 33% of the investigated solutions complied with the 625µs Bluetooth packet time frame specification leaving different amounts if free resources that may be explored to accommodate other IP modules of more complex systems on the same device.
10

Parallel Genetic Algorithm Engine on an FPGA

La Spina, Mark 05 April 2010 (has links)
The field of FPGA design is ever-growing due to costs being lower than that of ASICs, as well as the time and cost of development. Creating programs to run on them is equally important as developing the devices themselves. Utilizing the increase in performance over software, as well as the ease of reprogramming the device, has led to complex concepts and algorithms that would otherwise be very time-consuming when implemented on software. One such focus has been towards a search and optimization algorithm called the genetic algorithm. The proposed approach is to take an existing application of the genetic algorithm on an FPGA, developed by Fernando et al. [1], and create several instances of it to make a parallel genetic algorithm engine. The genetic algorithm cores are interfaced with a controller module that will control the flow of data between them to implement the parallel execution. Both coarse-grained and fine-grained parallelism are tested and results collected to find the best performance when compared to the single core design. Initial experimental results show some improvement over the number of generations required to reach the optimal fitness level, as well as more significant improvement for the number of generations needed for the average fitness to reach the optimal level.

Page generated in 0.1586 seconds