Spelling suggestions: "subject:"borets og systemdesign""
1 |
Power optimized multipliersMathiassen, Stian January 2010 (has links)
<p>Power consumption becomes more important as more devices becomes embedded or battery dependant. Multipliers are generally complex circuits, consuming a lot of energy. This thesis uses Sand's multiplier generator, made for his master thesis, as a basis. It uses tree structures to perform the multiplication, but does not take power consumption into account when generating a multiplier. By adding power optimization to the generator, multipliers with low energy consumption could be made automatically. This thesis adds different reduction tree algorithms (Wallace, Dadda and Reduced Area) to the program, and an optimal algorithm might be found. After the multiplier tree generation, an optimization step is performed, trying to exploit the delay and activity characteristics of the generated multiplier. A simplified version of Oskuii's algorithm is used. To be able to compare the different algorithms with each other, a pre-layout power estimation routine was implemented. The estimator is also used by the post-generation optimization. Since accuracy is important in an estimator, the delay through a multiplier was also investigated. Taking the previous mentioned steps into account, we are able to get a 10% decrease in overall power reduction in a 0,18/0,15um CMOS technology, reported by "IC Compiler". Delay characteristics of a multiplier is also supplied, and can be used by other power estimators. This thesis shows how to achieve less power consumption in multipliers. It also shows that the delay model is important for estimation purposes, and how an estimator is used to optimize a multiplier. The findings in this thesis can be used as is, or be used as a basis for further study.</p>
|
2 |
Hardware-software intercommunication in reconfigurable systemsEndresen, Vegard Haugen January 2010 (has links)
<p>In this thesis hardware-software intercommunication in a reconfigurable system has been investigated based on a framework for run time reconfiguration. The goal has been to develop a fast and flexible link between applications running on an embedded processor and reconfigurable accelerator hardware in form of a Xilinx Virtex device. As a start the link was broken down into hardware and software components based on constraints from earlier work and a general literature search. A register architecture for reconfigurable modules, a reconfigurable interface and a backend bridge linking reconfigurable hardware with the system bus were identified as the main hardware components whereas device drivers and a hardware operating system were identified as software components. These components were developed in a bottom-up approach, then deployed, tested and evaluated. Synthesis and simulation results from this thesis suggest that a hybrid register architecture, a mix of shift based and addressable register architecture might be a good solution for a reconfigurable module. Such an architecture enables a reconfigurable interface with full duplex capability with an initially small area overhead compared to a full scale RAM implementation. Although the hybrid architecture might not be very suitable for all types of reconfigurable modules it can be a nice compromise when attempting to achieve a uniform reconfigurable interface. Backend bridge solutions were developed assuming the above hybrid reconfigurable interface. Three main types were researched: a software register backend, a data cache backend and an instruction and data cache backend. Performance evaluation shows that the instruction and data cache outperforms the other two with an average acceleration ratio of roughly 5-10. Surprisingly the data cache backend performs worst of all due to latency ratios and design choices. Aside from the BRAM component required for the cache backends, resource consumption was shown to be only marginally larger than a traditional software register solution. Caching using a controller in the backend-bridge can thus provide good speedup for little cost as far as BRAM resources are not scarce. A software-to-hardware interface has been created has been created through Linux character device driver and a hardware operating system daemon. While the device drivers provide a middleware layer for hardware access the HWOS separates applications from system management through a message queue interface. Performance testing shows a large increase in delay when involving the Linux device drivers and the HWOS as compared to calls directly from the kernel. Although this is natural, the software components are very important when providing a high performance platform. As additional work specialized cell handling for reconfigurable modules has been addressed in the context of a MPEG-4 decoder. Some light has also been shed on design of reconfigurable modules in Xilinx ISE which can radically improve development time and decrease complexity compared to a Xilinx Platform Studio flow. In the process of demonstrating run time reconfigurations it was discovered that a clock signal will resist being piped through bus macros. Also broken functionality has been shown when applying run time reconfiguration to synchronous designs using the framework for self reconfiguration.</p>
|
3 |
Construction of digital integer arithmetic : FPGA implementation of high throughput pipelined division circuitØvergaard, Johan Arthur January 2009 (has links)
<p>This assignment has been given by Defence Communication (DC) which is a division of Kongsberg Defence and Aerospace(KDA). KDA develops amongst other things military radio equipment for communication and data transfer. In this equipment there is use of digital logic that performes amongst other things integer and fixed point division. Current systems developed at KDA uses both application specific integrated circuit (ASIC) and field programmable gate arrays (FPGA) to implement the digital logic. In both these technologies it is implemented circuit to performed integer and fixed point division. These are designed for low latency implementations. For future applications it is desire to investigate the possibility of implementing a high throughput pipelined division circuit for both 16 and 64 bit operands. In this project several commonly implemented division methods and algorithms has been studied, amongst others digit recurrence and multiplicative algorithms. Of the studied methods, multiplicative methods early stood out as the best implementation. These methods include the Goldschmidt and Newton-Raphson method. Both these methods require and initial approximation towards the correct answer. Based on this, several methods for finding an initial approximation were investigated, amongst others bipartite and multipartite lookup tables. Of the two multiplicative methods, Newton-Raphsons method proved to give the best implementation. This is due to the fact that it is possible with Newton-Raphsons method to implement each stage with the same bit widths as the precision out of that stage. This means that each stage is only halve the size of the succeeding stage. Also since the first stages were found to be small compared to the last stage, it was found that it is best to use a rough approximation towards the correct value and then use more stages to achieve the target precision. To evaluate how different design choices will affect the speed, size and throughput of an implementation, several configurations were implemented in VHDL and synthesized to FPGAs. These implementations were optimized for high speed whit high pipeline depth and size, and low speed with low pipeline depth and size. This was done for both 16 and 64 bits implementations. The synthesizes showed that there is possible to achieve great speed at the cost of increased size, or a small circuit while still achieving an acceptable speed. In addition it was found that it is optimal in a high throughput pipelined division circuit to use a less precise initial approximation and instead use more iterations stages.</p>
|
4 |
Comparator-Based Switched-Capacitor Integrator for use in Delta-Sigma ModulatorTorgersen, Svend Bjarne January 2009 (has links)
<p>A comparator-based switched capacitor integrator for use in a Delta Sigma ADC has been designed. Basic theory about comparator-based circuits has been presented and design equations have been developed. The integrator had a targeted performance of a bandwidth of 1.5MHz with a SNR of 80dB. Due to the lack of a complete modulator feedback system, the integrator was simulated in open-loop. For the integrator not to saturate in open-loop, an overshoot calibration circuit was enabled during the simulation. This resulted in a severe deterioration of the integrated signal. The results are therefore significantly lower than expected, with a SNR of about 39dB but can be expected to be better in a closed-loop simulation. The power consumption of the implemented modules is 0.43mW. However, this is without several modules which were implemented as ideal.</p>
|
5 |
Low-power microcontroller coreEriksen, Stein Ove January 2009 (has links)
<p>Energy efficiency in embedded processors is of major importance in order to achieve longer operating time for battery operated devices. In this thesis the energy efficiency of a microcontroller based on the open source ZPU microprocessor is evaluated and improved. The ZPU microprocessor is a zero-operand stack machine originally designed for small size FPGA implementation, but in this thesis the core is synthesized for implementation with a 180nm technology library. Power estimation of the design is done both before and after synthesis in the design flow, and it is shown that power estimates based on RTL simulations (before synthesis) are 35x faster to obtain than power estimates based on gate-level simulations (after synthesis). The RTL estimates deviate from the gate-level estimates by only 15% and can provide faster design cycle iterations without sacrificing too much accuracy. The energy consumption of the ZPU microcontroller is reduced by implementing clock gating in the ZPU core and also implementing a tiny stack cache to reduce stack activity energy consumption. The result of these improvements show a 46% reduction in average power consumption. The ZPU architecture is also compared to the more common MIPS architecture, and the Plasma CPU of MIPS architecture is synthesized and simulated to serve as comparison to the ZPU microcontroller. The results of the comparison with the MIPS architecture shows that the ZPU needs on average 15x as many cycles and 3x as many memory accesses to complete the benchmark programs as the MIPS does.</p>
|
6 |
Multicell Battery monitoring and balancing with AVRBorgersen, Ole Johnny January 2009 (has links)
<p>Today Lithium Ion batteries are extensively used in all kinds of electronic equipment due to its superior properties. However, Lithium Ion batteries need to have all the individual cells monitored to ensure the safety and long life time. This master thesis' objective is to design a managing system for a ten cell Lithium Ion battery with an Atmel AVR microcontroller. The main challenge was to scale down the high voltage level a 10 cell battery has and still maintain accuracy when reading this voltage with the AVR. This was solved by using current sense monitors which can handle large common mode voltages. Hardware was made to show proof of concept. It was found that the scaling circuitry had an accuracy of 46mV. In competition with other single chip devices, some other methods have to be found. The design in this thesis is physically too large and too expensive to be of any commercial use. However some other methods worth looking into have been proposed in the last chapter.</p>
|
7 |
ULTRA LOW POWER APPLICATION SPECIFIC INSTRUCTION-SET PROCESSOR DESIGN : for a cardiac beat detector algorithmYassin, Yahya H. January 2009 (has links)
<p>High efficiency and low power consumption are among the main topics in embedded systems today. For complex applications, off-the-shelf processor cores might not provide the desired goals in terms of power consumption. By optimizing the processor for the application, or a set of applications, one could improve the computing power by introducing special purpose hardware units. The execution cycle count of the application would in this case be reduced significantly, and the resulting processor would consume less power. In this thesis, some research is done in how to optimize a software and hardware development for ultra low power consumption. A cardiac beat detector algorithm is implemented in ANSI C, and optimized for low power consumption, by using several software power optimization techniques. The resulting application is mapped on a basic processor architecture provided by Target Compiler Technologies. This processor is optimized further for ultra low power consumption by applying application specific hardware, and by using several hardware power optimization techniques. A general processor and the optimized processor has been mapped on a chip, using a 90 nm low power TSMC process. Information about power dissipation is extracted through netlist simulation, and the results of both processors have been compared. The optimized processor consume 55% less average power, and the duty cycle of the processor, i.e., the time in which the processor executes its task with respect to the time budget available, has been reduced from 14% to 2.8%. The reduction in the total execution cycle count is 81%. The possibilities of applying power gating, or voltage and frequency scaling are discussed, and it is concluded that further reduction in power consumption is possible by applying these power optimization techniques. For a given case, the average leakage power dissipation is estimated to be reduced by 97.2%.</p>
|
8 |
Construction of digital integer arithmetic : FPGA implementation of high throughput pipelined division circuitØvergaard, Johan Arthur January 2009 (has links)
This assignment has been given by Defence Communication (DC) which is a division of Kongsberg Defence and Aerospace(KDA). KDA develops amongst other things military radio equipment for communication and data transfer. In this equipment there is use of digital logic that performes amongst other things integer and fixed point division. Current systems developed at KDA uses both application specific integrated circuit (ASIC) and field programmable gate arrays (FPGA) to implement the digital logic. In both these technologies it is implemented circuit to performed integer and fixed point division. These are designed for low latency implementations. For future applications it is desire to investigate the possibility of implementing a high throughput pipelined division circuit for both 16 and 64 bit operands. In this project several commonly implemented division methods and algorithms has been studied, amongst others digit recurrence and multiplicative algorithms. Of the studied methods, multiplicative methods early stood out as the best implementation. These methods include the Goldschmidt and Newton-Raphson method. Both these methods require and initial approximation towards the correct answer. Based on this, several methods for finding an initial approximation were investigated, amongst others bipartite and multipartite lookup tables. Of the two multiplicative methods, Newton-Raphsons method proved to give the best implementation. This is due to the fact that it is possible with Newton-Raphsons method to implement each stage with the same bit widths as the precision out of that stage. This means that each stage is only halve the size of the succeeding stage. Also since the first stages were found to be small compared to the last stage, it was found that it is best to use a rough approximation towards the correct value and then use more stages to achieve the target precision. To evaluate how different design choices will affect the speed, size and throughput of an implementation, several configurations were implemented in VHDL and synthesized to FPGAs. These implementations were optimized for high speed whit high pipeline depth and size, and low speed with low pipeline depth and size. This was done for both 16 and 64 bits implementations. The synthesizes showed that there is possible to achieve great speed at the cost of increased size, or a small circuit while still achieving an acceptable speed. In addition it was found that it is optimal in a high throughput pipelined division circuit to use a less precise initial approximation and instead use more iterations stages.
|
9 |
Comparator-Based Switched-Capacitor Integrator for use in Delta-Sigma ModulatorTorgersen, Svend Bjarne January 2009 (has links)
A comparator-based switched capacitor integrator for use in a Delta Sigma ADC has been designed. Basic theory about comparator-based circuits has been presented and design equations have been developed. The integrator had a targeted performance of a bandwidth of 1.5MHz with a SNR of 80dB. Due to the lack of a complete modulator feedback system, the integrator was simulated in open-loop. For the integrator not to saturate in open-loop, an overshoot calibration circuit was enabled during the simulation. This resulted in a severe deterioration of the integrated signal. The results are therefore significantly lower than expected, with a SNR of about 39dB but can be expected to be better in a closed-loop simulation. The power consumption of the implemented modules is 0.43mW. However, this is without several modules which were implemented as ideal.
|
10 |
Low-power microcontroller coreEriksen, Stein Ove January 2009 (has links)
Energy efficiency in embedded processors is of major importance in order to achieve longer operating time for battery operated devices. In this thesis the energy efficiency of a microcontroller based on the open source ZPU microprocessor is evaluated and improved. The ZPU microprocessor is a zero-operand stack machine originally designed for small size FPGA implementation, but in this thesis the core is synthesized for implementation with a 180nm technology library. Power estimation of the design is done both before and after synthesis in the design flow, and it is shown that power estimates based on RTL simulations (before synthesis) are 35x faster to obtain than power estimates based on gate-level simulations (after synthesis). The RTL estimates deviate from the gate-level estimates by only 15% and can provide faster design cycle iterations without sacrificing too much accuracy. The energy consumption of the ZPU microcontroller is reduced by implementing clock gating in the ZPU core and also implementing a tiny stack cache to reduce stack activity energy consumption. The result of these improvements show a 46% reduction in average power consumption. The ZPU architecture is also compared to the more common MIPS architecture, and the Plasma CPU of MIPS architecture is synthesized and simulated to serve as comparison to the ZPU microcontroller. The results of the comparison with the MIPS architecture shows that the ZPU needs on average 15x as many cycles and 3x as many memory accesses to complete the benchmark programs as the MIPS does.
|
Page generated in 0.1 seconds