Spelling suggestions: "subject:"coprocessor"" "subject:"koprocessor""
1 |
Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable HardwareVargun, Bilgin 01 June 2012 (has links) (PDF)
In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer.
In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis.
|
2 |
Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable HardwareVargun, Bilgin 01 June 2012 (has links) (PDF)
In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer.
In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis.
|
3 |
High Performance Elliptic Curve Cryptographic Co-processorLutz, Jonathan January 2003 (has links)
In FIPS 186-2, NIST recommends several finite fields to be used in the elliptic curve digital signature algorithm (ECDSA). Of the ten recommended finite fields, five are binary extension fields with degrees ranging from 163 to 571. The fundamental building block of the ECDSA, like any ECC based protocol, is elliptic curve scalar multiplication. This operation is also the most computationally intensive. In many situations it may be desirable to accelerate the elliptic curve scalar multiplication with specialized hardware.
In this thesis a high performance elliptic curve processor is developed which is optimized for the NIST binary fields. The architecture is built from the bottom up starting with the field arithmetic units. The architecture uses a field multiplier capable of performing a field multiplication over the extension field with degree 163 in 0. 060 microseconds. Architectures for squaring and inversion are also presented. The co-processor uses Lopez and Dahab's projective coordinate system and is optimized specifically for Koblitz curves. A prototype of the processor has been implemented for the binary extension field with degree 163 on a Xilinx XCV2000E FPGA. The prototype runs at 66 MHz and performs an elliptic curve scalar multiplication in 0. 233 msec on a generic curve and 0. 075 msec on a Koblitz curve.
|
4 |
High Performance Elliptic Curve Cryptographic Co-processorLutz, Jonathan January 2003 (has links)
In FIPS 186-2, NIST recommends several finite fields to be used in the elliptic curve digital signature algorithm (ECDSA). Of the ten recommended finite fields, five are binary extension fields with degrees ranging from 163 to 571. The fundamental building block of the ECDSA, like any ECC based protocol, is elliptic curve scalar multiplication. This operation is also the most computationally intensive. In many situations it may be desirable to accelerate the elliptic curve scalar multiplication with specialized hardware.
In this thesis a high performance elliptic curve processor is developed which is optimized for the NIST binary fields. The architecture is built from the bottom up starting with the field arithmetic units. The architecture uses a field multiplier capable of performing a field multiplication over the extension field with degree 163 in 0. 060 microseconds. Architectures for squaring and inversion are also presented. The co-processor uses Lopez and Dahab's projective coordinate system and is optimized specifically for Koblitz curves. A prototype of the processor has been implemented for the binary extension field with degree 163 on a Xilinx XCV2000E FPGA. The prototype runs at 66 MHz and performs an elliptic curve scalar multiplication in 0. 233 msec on a generic curve and 0. 075 msec on a Koblitz curve.
|
5 |
StreamWorks: An Energy-efficient Embedded Co-processor for Stream ComputingJanuary 2014 (has links)
abstract: Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.
This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.
Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.
The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014
|
6 |
Development of an integrated co-processor based power electronic drive / by Robert D. HudsonHudson, Robert Dearn January 2008 (has links)
The McTronX research group at the North-West University is currently researching self-sensing techniques for Active Magnetic Bearings (AMB). The research is part of an ongoing effort to expand the knowledge base on AMBs in the School of Electrical, Electronic and Computer Engineering to support industries that make use of the technology. The aim of this project is to develop an integrated co-processor based power electronic drive with the emphasis placed on the ability of the co-processor to execute AMB self-sensing algorithms.
The two primary techniques for implementing self-sensing in AMBs are state estimation and modulation. This research focuses on hardware development to facilitate the implementation of the modulation method. Self-sensing algorithms require concurrent processing power and speed that are well suited to an architecture that combines a digital signal processor (DSP) and a field programmable gate array (FPGA). A comprehensive review of various power amplifier topologies shows that the pulse width modulation (PWM) switching amplifier is best suited for controlling the voltage and current required to drive the AMB coils. Combining DSPs and power electronics to form an integrated co-processor based power electronic drive requires detail attention to aspects of PCB design, including signal integrity and grounding.
A conceptual design is conducted and forms part of the process of compiling a subsystem development specification for the integrated drive, in conjunction with the McTronX Research Group. Component selection criteria, trade-off studies and various circuit simulations serve as the basis for this essential phase of the project. The conceptual design and development specification determines the architecture, functionality and interfaces of the integrated drive. Conceptual designs for the power amplifier, digital controller, electronic supply and mechanical layout of the integrated drive is provided.
A detail design is performed for the power amplifier, digital controller and electronic supply. Issues such as component selection, power supply requirements, thermal design, interfacing of the various circuit elements and PCB design are covered in detail. The output of the detail design is a complete set of circuit diagrams for the integrated controller.
The integrated drive is interfaced with existing AMB hardware and facilitates the successful implementation of two self-sensing techniques. The hardware performance of the integrated coprocessor based power electronic drive is evaluated by means of measurements taken from this experimental self-sensing setup. The co-processor performance is evaluated in terms of resource usage and execution time and performs satisfactorily in this regard.
The integrated co-processor based power electronic drive provided sufficient resources, processing speed and flexibility to accommodate a variety of self-sensing algorithms thus contributing to the research currently underway in the field of AMBs by the McTronX research group at the North-West University. / Thesis (M.Ing. (Electrical Engineering))--North-West University, Potchefstroom Campus, 2009.
|
7 |
Development of an integrated co-processor based power electronic drive / by Robert D. HudsonHudson, Robert Dearn January 2008 (has links)
The McTronX research group at the North-West University is currently researching self-sensing techniques for Active Magnetic Bearings (AMB). The research is part of an ongoing effort to expand the knowledge base on AMBs in the School of Electrical, Electronic and Computer Engineering to support industries that make use of the technology. The aim of this project is to develop an integrated co-processor based power electronic drive with the emphasis placed on the ability of the co-processor to execute AMB self-sensing algorithms.
The two primary techniques for implementing self-sensing in AMBs are state estimation and modulation. This research focuses on hardware development to facilitate the implementation of the modulation method. Self-sensing algorithms require concurrent processing power and speed that are well suited to an architecture that combines a digital signal processor (DSP) and a field programmable gate array (FPGA). A comprehensive review of various power amplifier topologies shows that the pulse width modulation (PWM) switching amplifier is best suited for controlling the voltage and current required to drive the AMB coils. Combining DSPs and power electronics to form an integrated co-processor based power electronic drive requires detail attention to aspects of PCB design, including signal integrity and grounding.
A conceptual design is conducted and forms part of the process of compiling a subsystem development specification for the integrated drive, in conjunction with the McTronX Research Group. Component selection criteria, trade-off studies and various circuit simulations serve as the basis for this essential phase of the project. The conceptual design and development specification determines the architecture, functionality and interfaces of the integrated drive. Conceptual designs for the power amplifier, digital controller, electronic supply and mechanical layout of the integrated drive is provided.
A detail design is performed for the power amplifier, digital controller and electronic supply. Issues such as component selection, power supply requirements, thermal design, interfacing of the various circuit elements and PCB design are covered in detail. The output of the detail design is a complete set of circuit diagrams for the integrated controller.
The integrated drive is interfaced with existing AMB hardware and facilitates the successful implementation of two self-sensing techniques. The hardware performance of the integrated coprocessor based power electronic drive is evaluated by means of measurements taken from this experimental self-sensing setup. The co-processor performance is evaluated in terms of resource usage and execution time and performs satisfactorily in this regard.
The integrated co-processor based power electronic drive provided sufficient resources, processing speed and flexibility to accommodate a variety of self-sensing algorithms thus contributing to the research currently underway in the field of AMBs by the McTronX research group at the North-West University. / Thesis (M.Ing. (Electrical Engineering))--North-West University, Potchefstroom Campus, 2009.
|
8 |
A Unifying Interface Abstraction for Accelerated Computing in Sensor NodesIyer, Srikrishna 31 August 2011 (has links)
Hardware-software co-design techniques are very suitable to develop the next generation of sensornet applications, which have high computational demands. By making use of a low power FPGA, the peak computational performance of a sensor node can be improved without significant degradation of the standby power dissipation. In this contribution, we present a methodology and tool to enable hardware/software co-design for sensor node application development. We present the integration of nesC, a sensornet programming language, with GEZEL, an easy-to-use hardware description language. We describe the hardware/software interface at different levels of abstraction: at the level of the design language, at the level of the co-simulator, and in the hardware implementation. We use a layered, uniform approach that is particularly suited to deal with the heterogeneous interfaces typically found on small embedded processors. We illustrate the strengths of our approach by means of a prototype application: the integration of a hardware-accelerated crypto-application in a nesC application. / Master of Science
|
9 |
CDMA Base Station Receive Co-Processor ArchitectureSanthosam, Charles L 02 1900 (has links)
Third generation mobile communication systems promise a greater data rate and new services to the mobile subscribers. 3G systems support up to 2 Mbps of data rate to a fixed subscriber and 144 Kbps of data rate to a fully mobile subscriber. Code Division Multiple Access (CDMA) is the air interface access scheme widely used in all the 3G communication systems. This access scheme has many inherent advantages m terms of noise immunity, security, coherent combining of multi path signals etc. But all these advantages come at the expense of higher complexity of the receivers. The receivers form the major portion of the processing involved in a base station. The heart of any CDMA receiver is the RAKE. The RAKE receiver separates the different multi-paths received by the antenna by using the properties of the Pseudo Random sequences. The phase and strength of each of these path signals is measured and are used by the coherent combiner, which de-rotates all the signals to a single reference and coherently combines them In general the Base station receivers make use of the top three multi-path signals ranked in terms of their signal energy Hence four RAKE fingers, each catering to single multi-path are needed for receiving a single code channel (3 for coherent combining and one for scanning). One such channel receiver requires a processing power of 860 MIPS (Mega Instructions Per Second). Some of the CDMA standards support up to 90 code channels at the same time. This means that the total processing power required at the base station is about 80 GIPS. This much of processing power will require large number of high end DSPs, which will be a very costly solution. In the current base station architectures these blocks are implemented using ASICs, which are specific to a particular standard and also the algorithms used for the different operations are fixed at the design time itself. This solution is not flexible and is not amenable for SDR (Software defined Radio) architectures for the Base stations.
This thesis proposes a Co-Processor solution, which can be attached to a generic DSP or any other processor. The processor can control the Co-Processor by programming its parameter registers using memory mapped register accesses. This co-processor implements only those blocks, which are compute intensive. This co-processor performs all chip-rate processing functions involved m a RAKE receiver. All the symbol-rate functions are implemented through software in the processor. This provides more choices m selecting the algorithms for timing recovery and scanning. The algorithms can be changed through software even after the base station is installed in the field.
All the inputs and outputs of the Co-Processor are passed through dual port RAMs with independent read and write clocks. This allows the Co-Processor and the processor to be running on two independent clocks. This memory scheme also increases the throughput as the reads and writes to these memories can happen simultaneously. This thesis introduces a concept of incorporating programmable PN/Gold code generators as part of the Co-Processor, which significantly reduces the amount of memory required to store the Scrambling and Spreading codes. The polynomial lengths as well as the polynomials of the code generator are programmable.
The input signal memory has a bus width equal to 4 times the bus width of the IQ signal bus width (4 * 24 = 96 bits) towards the Co-Processor to meet the huge data bandwidth requirement. This memory is arranged as word interleaved memory banks. This can supply one word per memory bank on each clock cycle as long as the accessed words fall in different memory banks. The number of banks is chosen as more than twice that of the number of Correlators/ Rake fingers. This gives more flexibility in choosing the address offsets to different Correlator inputs. This flexibility allows one to use different timing recovery schemes since the number of allowable address offsets for different Correlators is more.
The overall complexity of the solution is comparatively less with respect to the generic DSP based solution and much easier to modify for a different standard, when compared to the rigid ASIC based solution. The proposed solution is significantly different from the conventional way of designing the Base station with fixed ASICs and it clearly outweighs the solutions based on conventional approach in terms of flexibility, design complexity, design time and cost.
|
Page generated in 0.066 seconds