Global ETD Search

201	Performance and Energy Efficient Building Blocks for Network-on-Chip Architectures Vangal, Sriram R. January 2006 (has links) <p>The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures.</p><p>This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply.</p><p>It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control.</p> / Report code: LiU-TEK-LIC-2006:36. Network-on-Chip Architectures floating-point units tiled-architectures crossbar routers multi-processor interconnection Computer engineering Datorteknik
202	Adaptation of OSE<sub>ck</sub> for an FPGA-Based Soft Processor Platform Staf, Daniel January 2007 (has links) <p>Integrated systems become larger and more complicated every day while time to market is shortened. Due to this, there is a need for flexible hardware platforms that use programmable logic not only for custom hardware but also for realizing embedded processors.</p><p>This thesis aims to select a suitable, FPGA targeted, soft processor core and adapt the real-time operating system OSE<sub>ck</sub> to run on the selected target. A study of possibilities to integrate setup and configuration of OSE<sub>ck</sub> into the processor’s IDE is also performed.</p><p>Studies of OSE<sub>ck</sub> and the two processor candidates MicroBlaze and Nios II have been performed. The processor study showed that MicroBlaze and Nios II have a very similar architecture and both are suitable to host OSE<sub>ck</sub>. MicroBlaze was chosen as target processor mainly because of more available documentation regarding operating system integration.</p><p>Performance and footprint was measured with OSE<sub>ck</sub> on MicroBlaze. The performance figures indicate that MicroBlaze can not be expected to have the same processing power as hard processors but works well as a control processor. To achieve high application performance, custom hardware accelerators can be connected. Integration investigations and tests have been performed with the goal of making an interface that conforms to the normal MicroBlaze design flow.</p><p>OSE<sub>ck</sub> has been successfully adapted to run on MicroBlaze and integration in the development environment is possible although some steps have to be done manually. Alternative integration options are discussed.</p> FPGA OSEck RTOS Soft Processor Nios II MicroBlaze Computer engineering Datorteknik
203	Transportsfordon för rullstolsburna Fredrik, Flink, Kristofer, Andersson, Torbjörn, Oxelborn January 2010 (has links) <p>A vehicle that carries a person sitting in a wheelchair has been rebuilt. The original vehicle was provided with a petrol engine, hydrostatic drive and a mechanical steering rod, making it noisy and hard to maneuver. The purpose with this project was to convert the vehicle to make it more environmental and easier to drive.A new design has been proposed where an electrical drive has replaced the noisy combustion engine and an electrical motor with joystick control has replaced the manual steering.</p><p>The project was divided into two parts. One mechanical part where the construction of the drive and steering were to be decided. The other part involved programming the microprocessor and implementation of all the necessary electrical for the control system of the vehicle. The rebuild of the machine to a complete electrical machine was successful and the performance is similar to the original construction but easier to drive, less noisy and more environmental.</p> rullstol handikappshjälpmedel mekatronik teknik eldrift terrängfordon processor inbyggda system motor hjälpmedel
204	WASP : Lightweight Programmable Ephemeral State on Routers to Support End-to-End Applications Martin, Sylvain 07 November 2007 (has links) We present WASP (World-friendly Active packets for ephemeral State Processing), a novel active networks architecture that enables ephemeral storage of information on routers in order to ease distributed application synchronisation and co-operation. We aimed at a design compatible with modern routers hardware and with network operators' goals. Our solution has to scale with the number of interfaces of the device and to support throughput of several Gbps. Throughout this thesis we searched for the best trade-off between features (platform exibility) and guarantees (platform safety), with as little performance sacri ce as possible. We picked the Ephemeral State Processing (ESP) router, developed by K. Calvert's team at University of Kentucky, as a starting point and extended it with our own virtual processor (VPU) to offer higher exibility to the network programmer. The VPU is a minimalist bytecode interpreter that manipulates the content of the "ephemeral state store" of the router according to a microprogram present in packets. It ultimately allows the microprogram to drop or forward the packet on any router, acting as remotely programmable filters around unmodified IP routing cores. We developed two implementations of WASP: a "reference" module for the Linux kernel, and, based on that prototype experience, a WASP filter application for the IXP2400 network processor that proves feasibility of our platform at higher speed. We extensively tested those two implementations against their ESP counterpart in order to estimate the overhead of our approach. High speed tests on the IXP were also performed to ensure WASP's robustness, and were actually rich in lessons for future development on programmable network devices. The nature of WASP makes it a platform of choice to detect properties of the network along a given path. Thanks to per-flow variables (even if ephemeral) and its ability to sustain custom processing at wire-speed, we can for instance implement lightweight measurement of QoS parameters or enforce application-specific congestion control. We have however opted -- in the context of this thesis -- for a focus on another use of the platform: using the ephemeral state to advertise and detect members of distributed applications (e.g. grid computing or peer-to-peer systems) in a purely decentralised way. To evaluate the benefits of this approach, we propose a model of a peer-to-peer community where peers try and join former neighbours, and we show through simulations how efficiency and quality of user experience evolve with the presence of more WASP routers in the network. Ephemeral State Network Processor Internet Service Discovery Programmable Networks Peer-to-Peer
205	Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable Hardware Vargun, Bilgin 01 June 2012 (has links) (PDF) In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer. In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis. TK Electronics 7800-8360
206	Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable Hardware Vargun, Bilgin 01 June 2012 (has links) (PDF) In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer. In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis.
207	NoGAP: Novel Generator of Accelerators and Processors Karlström, Per Axel January 2010 (has links) ASIPs are needed to handle the future demand of flexible yet highperformance embedded computing. The flexibility of ASIPs makes them preferable over fixed function ASICs. Also, a well designed ASIP, has a power consumption comparable to ASICs. However the cost associated with ASIP design is a limiting factor for a more wide spread adoption. A number of different tools have been proposed, promising to ease this design process. However all of the current state of the art tools limits the designer due to a template based design process. It blocks design freedoms and limits the I/O bandwidth of the template. We have therefore proposed the Novel Generator of Accelerator and Processors (NoGAP). NoGAP is a design automation tool for ASIP andaccelerator design that puts very few limits on what can be designed, yet NoGAP gives support by automating much of the tedious anderror prone tasks associated with ASIP design. This thesis will present NoGAP and much of its key concepts. Such as; the NoGAP-CL) which is a language used to implement processors in NoGAP, This thesis exposes NoGAP's key technologies, which include automatic bus and wire sizing, instruction decoder and pipeline management, how PC-FSMs can be generated, how an assembler can be generated, and how cycle accurate simulators can be generated. We have so far proven NoGAP's strengths in three extensive case studies, in one a floating point pipelined data path was designed, in another a simple RISC processor was designed, and finally one advanced RISC style DSP was designed using NoGAP. All these case studies points to the same conclusion, that NoGAP speeds up development time, clarify complex pipeline architectures, retains design flexibility, and most importantly does not incur much performance penalty, compared to hand optimized RTL code. We belive that the work presented in this thesis shows that NoGAP, using our proposed novel approach to micro architecture design, can have a significant impact on both academic and industrial hardware design. To our best knowledge NoGAP is the first system that has demonstrated that a template free processor construction framework can be developed and generate high performance hardware solutions. / NoGAP ADL ESL Processor Accelerator Compiler Electrical engineering Elektroteknik Computer engineering Datorteknik
208	Efficient Pairings on Various Platforms Grewal, Gurleen 30 April 2012 (has links) Pairings have found a range of applications in many areas of cryptography. As such, to utilize the enormous potential of pairing-based protocols one needs to efficiently compute pairings across various computing platforms. In this thesis, we give an introduction to pairing-based cryptography and describe the Tate pairing and its variants. We then describe some recent work to realize efficient computation of pairings. We further extend these optimizations and implement the O-Ate pairing on BN-curves on ARM and x86-64 platforms. Specifically, we extend the idea of lazy reduction to field inversion, optimize curve arithmetic, and construct efficient tower extensions to optimize field arithmetic. We also analyze the use of affine coordinates for pairing computation leading us to the conclusion that they are a competitive choice for fast pairing computation on ARM processors, especially at high security level. Our resulting implementation is more than three times faster than any previously reported implementation on ARM processors. ARM processor cryptography pairing-based cryptography pairings pairing computation O-Ate pairing Combinatorics and Optimization
209	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
210	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544

Search results