1 |
Performance Analysis of Hardware/Software Co-Design of Matrix SolversHuang, Peng 28 November 2008
Solving a system of linear and nonlinear equations lies at the heart of many scientific and engineering applications such as circuit simulation, applications in electric power networks, and structural analysis. The exponentially increasing complexity of these computing applications and the high cost of supercomputing force us to explore affordable high performance computing platforms. The ultimate goal of this research is to develop hardware friendly parallel processing algorithms and build cost effective high performance parallel systems using hardware in order to enable the solution of large linear systems.
In this thesis, FPGA-based general hardware architectures of selected iterative methods and direct methods are discussed. Xilinx Embedded Development Kit (EDK) hardware/software (HW/SW) codesigns of these methods are also presented. For iterative methods, FPGA based hardware architectures of Jacobi, combined Jacobi and Gauss-Seidel, and conjugate gradient (CG) are proposed. The convergence analysis of the LNS-based Jacobi processor demonstrates to what extent the hardware resource constraints and additional conversion error affect the convergence of Jacobi iterative method. Matlab simulations were performed to compare the performance of three iterative methods in three ways, i.e., number of iterations for any given tolerance, number of iterations for different matrix sizes, and computation time for different matrix sizes. The simulation results indicate that the key to a fast implementation of the three methods is a fast implementation of matrix multiplication. The simulation results also show that CG method takes less number of iterations for any given tolerance, but more computation time as matrix size increases compared to other two methods, since matrix-vector multiplication is a more dominant factor in CG method than in the other two methods. By implementing matrix multiplications of the three methods in hardware with Xilinx EDK HW/SW codesign, the performance is significantly improved over pure software Power PC (PPC) based implementation. The EDK implementation results show that CG takes less computation time for any size of matrices compared to other two methods in HW/SW codesign, due to that fact that matrix multiplications dominate the computation time of all three methods while CG requires less number of iterations to converge compared to other two methods.<p>
For direct methods, FPGA-based general hardware architecture and Xilinx EDK HW/SW codesign of WZ factorization are presented. Single unit and scalable hardware architectures of WZ factorization are proposed and analyzed under different constraints. The results of Matlab simulations show that WZ runs faster than the LU on parallel processors but slower on a single processor. The simulation results also indicate that the most time consuming part of WZ factorization is matrix update. By implementing the matrix update of WZ factorization in hardware with Xilinx EDK HW/SW codesign, the performance is also apparently improved over PPC based pure software implementation.
|
2 |
Performance Analysis of Hardware/Software Co-Design of Matrix SolversHuang, Peng 28 November 2008 (has links)
Solving a system of linear and nonlinear equations lies at the heart of many scientific and engineering applications such as circuit simulation, applications in electric power networks, and structural analysis. The exponentially increasing complexity of these computing applications and the high cost of supercomputing force us to explore affordable high performance computing platforms. The ultimate goal of this research is to develop hardware friendly parallel processing algorithms and build cost effective high performance parallel systems using hardware in order to enable the solution of large linear systems.
In this thesis, FPGA-based general hardware architectures of selected iterative methods and direct methods are discussed. Xilinx Embedded Development Kit (EDK) hardware/software (HW/SW) codesigns of these methods are also presented. For iterative methods, FPGA based hardware architectures of Jacobi, combined Jacobi and Gauss-Seidel, and conjugate gradient (CG) are proposed. The convergence analysis of the LNS-based Jacobi processor demonstrates to what extent the hardware resource constraints and additional conversion error affect the convergence of Jacobi iterative method. Matlab simulations were performed to compare the performance of three iterative methods in three ways, i.e., number of iterations for any given tolerance, number of iterations for different matrix sizes, and computation time for different matrix sizes. The simulation results indicate that the key to a fast implementation of the three methods is a fast implementation of matrix multiplication. The simulation results also show that CG method takes less number of iterations for any given tolerance, but more computation time as matrix size increases compared to other two methods, since matrix-vector multiplication is a more dominant factor in CG method than in the other two methods. By implementing matrix multiplications of the three methods in hardware with Xilinx EDK HW/SW codesign, the performance is significantly improved over pure software Power PC (PPC) based implementation. The EDK implementation results show that CG takes less computation time for any size of matrices compared to other two methods in HW/SW codesign, due to that fact that matrix multiplications dominate the computation time of all three methods while CG requires less number of iterations to converge compared to other two methods.<p>
For direct methods, FPGA-based general hardware architecture and Xilinx EDK HW/SW codesign of WZ factorization are presented. Single unit and scalable hardware architectures of WZ factorization are proposed and analyzed under different constraints. The results of Matlab simulations show that WZ runs faster than the LU on parallel processors but slower on a single processor. The simulation results also indicate that the most time consuming part of WZ factorization is matrix update. By implementing the matrix update of WZ factorization in hardware with Xilinx EDK HW/SW codesign, the performance is also apparently improved over PPC based pure software implementation.
|
3 |
Academic Packing for Commercial FPGA ArchitecturesHaroldsen, Travis D. 01 July 2017 (has links)
With a few exceptions, academic packing algorithms for FPGAs are typically applied solely to theoretical architectures. This has allowed the algorithms to focus on the basic components of packing while abstracting away many of the details dictated by real hardware. As commercially available FPGAs have advanced, however, the academic algorithms and architectures have diverged significantly from their commercial counterparts. In this dissertation, the RapidSmith 2 framework is presented. This framework accurately reflects the architecture of Xilinx FPGAs and provides support for integrating custom tools into the commercial CAD tools. Using this framework, the RSVPack packing algorithm is implemented. The RSVPack algorithm can accept a design synthesized using the commercial Xilinx CAD tools, pack designs which make use of the many features of commercial FPGA architectures and return the packed designs to the Xilinx CAD tools to be placed and routed in their software. This enables researchers to isolate the packing portion of the algorithm from the commercial flow and evaluate different packing techniques while allowing the high-quality commercial tools to perform the remainder of the flow. Integrating the RSVPack algorithm the commercial flow shows RSVPack produces packing which lead to circuits with minimum clock periods within 10%, on average, of circuits generated using the pure Xilinx flow. Included in this work is a novel table lookup-based algorithm which RSVPack utilizes to quickly determine the routability of a cluster. This algorithm performs 5 times faster on average than the current academic alternatives. Finally, using RSVPack, this dissertation explores various techniques for improving the quality of packing for Xilinx circuits. Together, this demonstrates the potential for academic research into FPGA CAD tools for commercial architectures.
|
4 |
Evaluation of Xilinx System Generator / Evaluation of Xilinx System GeneratorFandén, Petter January 2001 (has links)
This Master’s Thesis is an evaluation of the software Xilinx System Generator (XSG) and blockset for Matlab. XSG is a module to simulink developed by Xilinx in order to generate VHDL code directly from functions implemented in Matlab. The evaluation was made at Saab Avionics AB in Järfälla, north of Stockholm. In order to investigate the performance of this new module XSG to simulink, a model of a frequency estimator often used in digital radar receivers were implemented in Matlab using XSG. Engineers working at SAAB Avionics implemented the same application directly in VHDL, without using Matlab and the XSG. After generating code the results were synthesised, analysed and compared. The frequency estimator basically contains an FFT, a windowing function and a sorting algorithm used to enable analyse of two real signals simultaneously. There were however problems during generation of the VHDL code and the model had to be broken into smaller parts containing only a 16-point FFT. The results of comparison in this report are based on models containing only this 16-point FFT and they show a small advantage for the System Generator according to the resource usage report generated during synthesis. Designing models for generation using Xilinx Blockset can create a lot of wiring between components. The reason for this is that the System Generator and Xilinx Blockset today is a new tool, not completely developed. There are many components found in simulink, Matlab that could not be found in Xilinx Blockset, this is however being improved. Another problem is long time for simulation and errors during generation. My opinion is that when used for smaller systems and with further development the System Generator can be a useful facility in designing digital electronics.
|
5 |
Evaluation of Xilinx System Generator / Evaluation of Xilinx System GeneratorFandén, Petter January 2001 (has links)
<p>This Master’s Thesis is an evaluation of the software Xilinx System Generator (XSG) and blockset for Matlab. XSG is a module to simulink developed by Xilinx in order to generate VHDL code directly from functions implemented in Matlab. The evaluation was made at Saab Avionics AB in Järfälla, north of Stockholm. </p><p>In order to investigate the performance of this new module XSG to simulink, a model of a frequency estimator often used in digital radar receivers were implemented in Matlab using XSG. Engineers working at SAAB Avionics implemented the same application directly in VHDL, without using Matlab and the XSG. After generating code the results were synthesised, analysed and compared. </p><p>The frequency estimator basically contains an FFT, a windowing function and a sorting algorithm used to enable analyse of two real signals simultaneously. There were however problems during generation of the VHDL code and the model had to be broken into smaller parts containing only a 16-point FFT. The results of comparison in this report are based on models containing only this 16-point FFT and they show a small advantage for the System Generator according to the resource usage report generated during synthesis. </p><p>Designing models for generation using Xilinx Blockset can create a lot of wiring between components. The reason for this is that the System Generator and Xilinx Blockset today is a new tool, not completely developed. There are many components found in simulink, Matlab that could not be found in Xilinx Blockset, this is however being improved. Another problem is long time for simulation and errors during generation. </p><p>My opinion is that when used for smaller systems and with further development the System Generator can be a useful facility in designing digital electronics.</p>
|
6 |
Developing a decentralized peripheral Profibus core for a Xilinx FPGA / Roelof Jacobus BurgerBurger, Roelof Jacobus January 2010 (has links)
The McTronX research group of the North–West University has over some years established a knowledge base in active magnetic bearing (AMB) systems. In 2009, an AMB system that met industrial standards in being robust, reliable and economical was developed by the research group. The digital control of the AMB system was implemented with the use of a dedicated single–board computer and communication hardware that interface with the motor drive electronics, power amplifiers and sensor drive units of the AMB system. A Xilinx® field programmable gate array (FPGA), connected to the single–board computer, was used to control the AMB system. The AMB system was designed to be used in a helium blower application and to form a basis for AMB and digital control research.
A programmable logic controller (PLC) is connected to the controller to operate the AMB system. To establish communication between the PLC and the FPGA, the Fieldbus standard PROFIBUS DP was chosen as being a robust industrial standard communication protocol. To reduce the cost of the entire system, the need arose to implement the PROFIBUS DP protocol on the current FPGA of the system.
This project involves the research, design, implementation, verification and validation of the PROFIBUS DP protocol on a Xilinx® Virtex©–5 FPGA. The PROFIBUS DP standard was researched, analyzed and developed in VHDL for the specific Xilinx® Virtex©–5 FPGA. The implemented protocol is used to establish a standardized PROFIBUS DP network between the PLC and the FPGA controller.
Through simulation the basic protocol was tested and later implemented in the real–time environment. Intensive verification and validation was done to ensure that the developed protocol conforms to the robust PROFIBUS DP standard and simultaneously meet the requirements and specifications of the AMB control system.
This dissertation documents the entire PROFIBUS implementation process, from standard analysis through to verification and validation of the developed protocol. In conclusion, the developed protocol is compared against a commercial off–the–shelf PROFIBUS PMC module. It was found that the VHDL–based PROFIBUS DP protocol not only competes well with the commercial PROFIBUS device, but also outperforms the device in various aspects. / Thesis (M.Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2011.
|
7 |
Developing a decentralized peripheral Profibus core for a Xilinx FPGA / Roelof Jacobus BurgerBurger, Roelof Jacobus January 2010 (has links)
The McTronX research group of the North–West University has over some years established a knowledge base in active magnetic bearing (AMB) systems. In 2009, an AMB system that met industrial standards in being robust, reliable and economical was developed by the research group. The digital control of the AMB system was implemented with the use of a dedicated single–board computer and communication hardware that interface with the motor drive electronics, power amplifiers and sensor drive units of the AMB system. A Xilinx® field programmable gate array (FPGA), connected to the single–board computer, was used to control the AMB system. The AMB system was designed to be used in a helium blower application and to form a basis for AMB and digital control research.
A programmable logic controller (PLC) is connected to the controller to operate the AMB system. To establish communication between the PLC and the FPGA, the Fieldbus standard PROFIBUS DP was chosen as being a robust industrial standard communication protocol. To reduce the cost of the entire system, the need arose to implement the PROFIBUS DP protocol on the current FPGA of the system.
This project involves the research, design, implementation, verification and validation of the PROFIBUS DP protocol on a Xilinx® Virtex©–5 FPGA. The PROFIBUS DP standard was researched, analyzed and developed in VHDL for the specific Xilinx® Virtex©–5 FPGA. The implemented protocol is used to establish a standardized PROFIBUS DP network between the PLC and the FPGA controller.
Through simulation the basic protocol was tested and later implemented in the real–time environment. Intensive verification and validation was done to ensure that the developed protocol conforms to the robust PROFIBUS DP standard and simultaneously meet the requirements and specifications of the AMB control system.
This dissertation documents the entire PROFIBUS implementation process, from standard analysis through to verification and validation of the developed protocol. In conclusion, the developed protocol is compared against a commercial off–the–shelf PROFIBUS PMC module. It was found that the VHDL–based PROFIBUS DP protocol not only competes well with the commercial PROFIBUS device, but also outperforms the device in various aspects. / Thesis (M.Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2011.
|
8 |
Enabling Gigabit IP for Embedded SystemsTsakiris, Nicholas, n.tsakiris@internode.on.net January 2009 (has links)
For any practical implementation of chip design, there needs to be a hardware platform available for the purpose of prototyping and implementation of FPGA-based programs, whether they are written in VHDL or Verilog. Communication between the platform and a computer is a useful feature of many hardware solutions as it allows for the capability of regular data transmission between the two devices. Furthermore, the ability to communicate between the platform and a computer at high-speeds requires a specially constructed interface, one that can be modified by the designer at their choosing.
There are a number of commercial packages which provide a hardware platform to perform this task, however there are drawbacks to many of the available options. Some may require special hardware to connect to a computer using proprietary connectors or boards, which increases the cost and reduces the flexibility of any solution. Other options may have limited access to the internal structure of the interface, limiting the ability of the developer to modify the interface to suit their needs. There may be an extra cost to provide the code to the interface, separate from the board, which can also tax design budgets.
This dissertation provides a solution in the form of a Gigabit Ethernet connection with a custom IP/network layer written in VHDL to facilitate the connection. With an increasing number of IP-enabled devices available such as IPTV and set top boxes, the ability to link hardware using Ethernet is very useful and so the development of a lean and capable network layer was considered a suitable focus for the project. The overall goal has been to provide an interface which is cheap, open, robust and efficient, retaining the flexibility a developer might require to modify the code to their needs.
After covering some basic background information about the project, the dissertation looks at the requirements of the board and interface, as well as the alternative interface solutions which were looked at before deciding on Gigabit Ethernet. The protocols used in Ethernet are then covered, with both an explanation of the structure of each and their relevance to the implementation. The Finite State Machines which control operation of the interface are covered in depth, with an explanation of their inter-connectivity to each other and how they fit in the data-flow between the computer and the board. Error correction and reliability is discussed, as well as any remaining components critical to the operation of the interface.
Pipelining, the method of design which provides the speed required for Gigabit Ethernet, is covered along with the extra speed optimisation techniques used in the design such as RAM swinging buffers. Testing and synthesis are covered which ensure the design is as robust as possible, both in simulations and in real-world applications. The final design was implemented on a Xilinx Spartan 3 FPGA (XC3S5000-5FG900C) and capable of a maximum speed of 128.287 MHz, which is more than enough to satisfy the requirements of Gigabit Ethernet under a variety of network conditions. The interface code occupies 1,166 slices of logic on the FPGA (3% of the total amount of logic available), making it sufficiently compact to run large projects on the same chip. The core was tested on physical hardware and performed correctly at real line Gigabit speeds. Configuration of the computer along with the method of connecting to the board and transferring data is mentioned, with explanation of the code run on the computer to make this possible. Finally, the dissertation provides an example application through the use of JPEG2000 image compression/decompression.
|
9 |
Generátor paketů na platformě FPGA / Packet generator on the FPGA platformBari, Lukáš January 2017 (has links)
The thesis deals with the theory and design of the network traffic generator on the FPGA platform. The VHDL programming language is used for the description. The work involves getting acquainted with the development processes and design tools needed to create the overall project. It also includes familiarity with the necessary FPGA, NetCOPE and COMBO cards. Based on this information, was designed, tested and implemented packet generator project for the Combo-80G card. For implementation was used framework from NetCOPE.
|
10 |
Towards Trojan Detection from a Raw BitstreamSimpson, Corey Ryan 23 March 2022 (has links)
Many avenues exist to insert malicious circuitry into an FPGA designs, including compromised CAD tools, overwriting bitstream files, and post-deployment attacks. The proprietary nature of the Xilinx bitstreams precludes the ability to validate an implemented design. This thesis introduces the BitRec and IPRec projects in an effort to support trojan detection tools. BitRec provides a novel approach to mapping of the Xilinx bitstream format into FPGA features in order to recreate the original design's netlist. BitRec supports the 7 Series, UltraScale and UltraScale+ architectures. IPRec then provides a novel approach to recognizing parameterizable IP within a flattened netlist in an effort to eliminate large sections of trusted circuitry from needing to be analyzed by a trojan detection tool.
|
Page generated in 0.0295 seconds