Spelling suggestions: "subject:"cow power,"" "subject:"cow lower,""
651 |
Design of Low-Power Reduction-Trees in Parallel MultipliersOskuii, Saeeid Tahmasbi January 2008 (has links)
<p>Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. Multipliers, being relatively complex units, are deciding factors to the overall speed, area, and power consumption of digital computers. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications. Traditionally, speed, area, and hardware resources have been the major design factors and concerns in digital design. However, the design paradigm shift over the past decade has entered dynamic power and static power into play as well.</p><p>In many situations, the overall performance of a system is decided by the speed of its multiplier. In this thesis, parallel multipliers are addressed because of their speed superiority. Parallel multipliers are combinational circuits and can be subject to any standard combinational logic optimization. However, the complex structure of the multipliers imposes a number of difficulties for the electronic design automation (EDA) tools, as they simply cannot consider the multipliers as a whole; i.e., EDA tools have to limit the optimizations to a small portion of the circuit and perform logic optimizations. On the other hand, multipliers are arithmetic circuits and considering arithmetic relations in the structure of multipliers can be extremely useful and can result in better optimization results. The different structures obtained using the different arithmetically equivalent solutions, have the same functionality but exhibit different temporal and physical behavior. The arithmetic equivalencies are used earlier mainly to optimize for area, speed and hardware resources.</p><p>In this thesis a design methodology is proposed for reducing dynamic and static power dissipation in parallel multiplier partial product reduction tree. Basically, using the information about the input pattern that is going to be applied to the multiplier (such as static probabilities and spatiotemporal correlations), the reduction tree is optimized. The optimization is obtained by selecting the power efficient configurations by searching among the permutations of partial products for each reduction stage. Probabilistic power estimation methods are introduced for leakage and dynamic power estimations. These estimations are used to lead the optimizers to minimum power consumption. Optimization methods, utilizing the arithmetic equivalencies in the partial product reduction trees, are proposed in order to reduce the dynamic power, static power, or total power which is a combination of dynamic and static power. The energy saving is achieved without any noticeable area or speed overhead compared to random reduction trees. The optimization algorithms are extended to include spatiotemporal correlations between primary inputs. As another extension to the optimization algorithms, the cost function is considered as a weighted sum of dynamic power and static power. This can be extended further to contain speed merits and interconnection power. Through a number of experiments the effectiveness of the optimization methods are shown. The average number of transitions obtained from simulation is reduced significantly (up to 35% in some cases) using the proposed optimizations.</p><p>The proposed methods are in general applicable on arbitrary multi-operand adder trees. As an example, the optimization is applied to the summation tree of a class of elementary function generators which is implemented using summation of weighted bit-products. Accurate transistor-level power estimations show up to 25% reduction in dynamic power compared to the original designs.</p><p>Power estimation is an important step of the optimization algorithm. A probabilistic gate-level power estimator is developed which uses a novel set of simple waveforms as its kernel. The transition density of each circuit node is estimated. This power estimator allows to utilize a global glitch filtering technique that can model the removal of glitches in more detail. It produces error free estimates for tree structured circuits. For circuits with reconvergent fanout, experimental results using the ISCAS85 benchmarks show that this method generally provides significantly better estimates of the transition density compared to previous techniques.</p>
|
652 |
Low-power high-resolution delta-sigma ADC design techniquesWang, Tao 09 June 2014 (has links)
This dissertation presents a low-power high-resolution delta-sigma ADC. Two new architectural design techniques are proposed to reduce the power dissipation of the ADC. Compared to the conventional active adder, the direct charge transfer (DCT) adder greatly saves power by keeping the feedback factor of the active adder unity. However, the inherent delay originated from the DCT adder will cause instability to the modulator and complex additional branches are usually needed to stabilize the loop. A simple and power-efficient technique is proposed to absorb the delay from the DCT adder and the instability issue is therefore solved. Another proposed low-power design technique is to feed differentiated inverted quantization noise to the input of the last integrator. The modulator noise-shaping order with this proposed technique is effectively increased from two to three without adding additional active elements.
The delta-sigma ADC with the proposed architectural design techniques has been implemented in transistor-level and fabricated in 0.18 µm CMOS technology. Measurement results showed a SNDR of 99.3 dB, a DR of 101.3 dB and a SFDR of 112 dB over 20 kHz signal bandwidth, resulting in a very low figure-of-merit (FoM) in its application category. Finally, two new circuit ideas, low-power parasitic-insensitive switched-capacitor integrator for delta-sigma ADCs and switched-resistor tuning technique for highly linear Gm-C filter design are presented. / Graduation date: 2012 / Access restricted to the OSU Community at author's request from June 9, 2012 - June 9, 2014
|
653 |
Design of Low-Power Reduction-Trees in Parallel MultipliersOskuii, Saeeid Tahmasbi January 2008 (has links)
Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. Multipliers, being relatively complex units, are deciding factors to the overall speed, area, and power consumption of digital computers. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications. Traditionally, speed, area, and hardware resources have been the major design factors and concerns in digital design. However, the design paradigm shift over the past decade has entered dynamic power and static power into play as well. In many situations, the overall performance of a system is decided by the speed of its multiplier. In this thesis, parallel multipliers are addressed because of their speed superiority. Parallel multipliers are combinational circuits and can be subject to any standard combinational logic optimization. However, the complex structure of the multipliers imposes a number of difficulties for the electronic design automation (EDA) tools, as they simply cannot consider the multipliers as a whole; i.e., EDA tools have to limit the optimizations to a small portion of the circuit and perform logic optimizations. On the other hand, multipliers are arithmetic circuits and considering arithmetic relations in the structure of multipliers can be extremely useful and can result in better optimization results. The different structures obtained using the different arithmetically equivalent solutions, have the same functionality but exhibit different temporal and physical behavior. The arithmetic equivalencies are used earlier mainly to optimize for area, speed and hardware resources. In this thesis a design methodology is proposed for reducing dynamic and static power dissipation in parallel multiplier partial product reduction tree. Basically, using the information about the input pattern that is going to be applied to the multiplier (such as static probabilities and spatiotemporal correlations), the reduction tree is optimized. The optimization is obtained by selecting the power efficient configurations by searching among the permutations of partial products for each reduction stage. Probabilistic power estimation methods are introduced for leakage and dynamic power estimations. These estimations are used to lead the optimizers to minimum power consumption. Optimization methods, utilizing the arithmetic equivalencies in the partial product reduction trees, are proposed in order to reduce the dynamic power, static power, or total power which is a combination of dynamic and static power. The energy saving is achieved without any noticeable area or speed overhead compared to random reduction trees. The optimization algorithms are extended to include spatiotemporal correlations between primary inputs. As another extension to the optimization algorithms, the cost function is considered as a weighted sum of dynamic power and static power. This can be extended further to contain speed merits and interconnection power. Through a number of experiments the effectiveness of the optimization methods are shown. The average number of transitions obtained from simulation is reduced significantly (up to 35% in some cases) using the proposed optimizations. The proposed methods are in general applicable on arbitrary multi-operand adder trees. As an example, the optimization is applied to the summation tree of a class of elementary function generators which is implemented using summation of weighted bit-products. Accurate transistor-level power estimations show up to 25% reduction in dynamic power compared to the original designs. Power estimation is an important step of the optimization algorithm. A probabilistic gate-level power estimator is developed which uses a novel set of simple waveforms as its kernel. The transition density of each circuit node is estimated. This power estimator allows to utilize a global glitch filtering technique that can model the removal of glitches in more detail. It produces error free estimates for tree structured circuits. For circuits with reconvergent fanout, experimental results using the ISCAS85 benchmarks show that this method generally provides significantly better estimates of the transition density compared to previous techniques.
|
654 |
Power and Energy Efficiency Evaluation for HW and SW Implementation of nxn Matrix Multiplication on Altera FPGAsRenbi, Abdelghani January 2009 (has links)
<p>In addition to the performance, low power design became an important issue in the design process of mobile embedded systems. Mobile electronics with rich features most often involve complex computation and intensive processing, which result in short battery lifetime and particularly when low power design is not taken in consideration. In addition to mobile computers, thermal design is also calling for low power techniques to avoid components overheat especially with VLSI technology. Low power design has traced a new era. In this thesis we examined several techniques to achieve low power design for FPGAs, ASICs and Processors where ASICs were more flexible to exploit the HW oriented techniques for low power consumption. We surveyed several power estimation methodologies where all of them were prone to at least one disadvantage. We also compared and analyzed the power and energy consumption in three different designs, which perform matrix multiplication within Altera platform and using state-of-the-art FPGA device. We concluded that NIOS II\e is not an energy efficient alternative to multiply nxn matrices compared to HW matrix multipliers on FPGAs and configware is an enormous potential to reduce the energy consumption costs.</p>
|
655 |
High performance continuous-time filters for information transfer systemsMohieldin, Ahmed Nader 30 September 2004 (has links)
Vast attention has been paid to active continuous-time filters over the years. Thus as the cheap, readily available integrated circuit OpAmps replaced their discrete circuit versions, it became feasible to consider active-RC filter circuits using large numbers of OpAmps. Similarly the development of integrated operational transconductance amplifier (OTA) led to new filter configurations. This gave rise to OTA-C filters, using only active devices and capacitors, making it more suitable for integration. The demands on filter circuits have become ever more stringent as the world of electronics and communications has advanced. In addition, the continuing increase in the operating frequencies of modern circuits and systems increases the need for active filters that can perform at these higher frequencies; an area where the LC active filter emerges. What mainly limits the performance of an analog circuit are the non-idealities of the used building blocks and the circuit architecture. This research concentrates on the design issues of high frequency continuous-time integrated filters.
Several novel circuit building blocks are introduced. A novel pseudo-differential fully balanced fully symmetric CMOS OTA architecture with inherent common-mode detection is proposed. Through judicious arrangement, the common-mode feedback circuit can be economically implemented.
On the level of system architectures, a novel filter low-voltage 4th order RF bandpass filter structure based on emulation of two magnetically coupled resonators is presented. A unique feature of the proposed architecture is using electric coupling to emulate the effect of the coupled-inductors, thus providing bandwidth tuning with small passband ripple.
As part of a direct conversion dual-mode 802.11b/Bluetooth receiver, a BiCMOS 5th order low-pass channel selection filter is designed. The filter operated from a single 2.5V supply and achieves a 76dB of out-of-band SFDR. A digital automatic tuning system is also implemented to account for process and temperature variations.
As part of a Bluetooth transmitter, a low-power quadrature direct digital frequency synthesizer (DDFS) is presented. Piecewise linear approximation is used to avoid using a ROM look-up table to store the sine values in a conventional DDFS. Significant saving in power consumption, due to the elimination of the ROM, renders the design more suitable for portable wireless communication applications.
|
656 |
CMOS inductively coupled power receiver for wireless microsensorsLazaro, Orlando 22 May 2014 (has links)
This research investigates how to draw energy from a distant emanating and alternating (i.e., AC) magnetic source and deliver it to a battery (i.e., DC). The objective is to develop, design, simulate, build, test, and evaluate a CMOS charger integrated circuit (IC) that wirelessly charges the battery of a microsystem. A fundamental challenge here is that a tiny receiver coil only produces mV's of AC voltage, which is difficult to convert into DC form. Although LC-boosted diode-bridge rectifiers in the literature today extract energy from similar AC sources, they can do so only when AC voltages are higher than what miniaturized coils can produce, unless tuned off-chip capacitors are available, which counters the aim of integration. Therefore, rather than rectify the AC voltage, this research proposes to rectify the current that the AC voltage induces in the coil. This way, the system can still draw power from voltages that fall below the inherent threshold limit of diode-bridge rectifiers. Still, output power is low because, with these low currents, small coils can only extract a diminutive fraction of the magnetic energy available, which is why investing battery energy is also part of this research. Ultimately, the significance of increasing the power that miniaturized platforms can output is higher integration and functionality of micro-devices, like wireless microsensors and biomedical implants.
|
657 |
From Slow to Ultra-fast MAS: Structural Determination of Type-Three Secretion System Bacterial Needles and Inorganic Materials by Solid-State NMRDemers, Jean-Philippe 23 April 2014 (has links)
No description available.
|
658 |
Low-power CMOS front-ends for wireless personal area networksPerumana, Bevin George 30 October 2007 (has links)
The potential of implementing subthreshold radio frequency circuits in deep sub-micron CMOS technology was investigated for developing low-power front-ends for wireless personal area network (WPAN) applications. It was found that the higher transconductance to bias current ratio in weak inversion could be exploited in developing low-power wireless front-ends, if circuit techniques are employed to mitigate the higher device noise in subthreshold region. The first fully integrated subthreshold low noise amplifier was demonstrated in the GHz frequency range requiring only 260 μW of power consumption. Novel subthreshold variable gain stages and down-conversion mixers were developed.
A 2.4 GHz receiver, consuming 540 μW of power, was implemented using a new subthreshold mixer by replacing the conventional active low noise amplifier by a series-resonant passive network that provides both input matching and voltage amplification. The first fully monolithic subthreshold CMOS receiver was also implemented with integrated subthreshold quadrature LO (Local Oscillator) chain for 2.4 GHz WPAN applications. Subthreshold operation, passive voltage amplification, and various low-power circuit techniques such as current reuse, stacking, and differential cross coupling were combined to lower the total power consumption to 2.6 mW.
Extremely compact resistive feedback CMOS low noise amplifiers were presented as a cost-effective alternative to narrow band LNAs using high-Q inductors. Techniques to improve linearity and reduce power consumption were presented. The combination of high linearity, low noise figure, high broadband gain, extremely small die area and low power consumption made the proposed LNA architecture a compelling choice for many wireless applications.
|
659 |
Τεχνικές μεταγλωττιστών και αρχιτεκτονικές επεξεργαστών για στατιστικές και δυναμικές εφαρμογέςΑλαχιώτης, Νικόλαος 19 July 2010 (has links)
Οι σημερινές εφαρμογές έχουν ολοένα και μεγαλύτερες ανάγκες επεξεργαστικής ισχύος προκειμένου να εκτελεστούν σε συντομότερο χρονικό διάστημα. Για να την ικανοποίηση αυτών των χρονικών περιορισμών απαιτείται η ανάπτυξη βελτιστοποιημένων τεχνικών σχεδιασμού.
Το αντικείμενο της παρούσας διατριβής σχετίζεται με την ανάπτυξη αρχιτεκτονικών και τεχνικών μεταφραστών με σκοπό την γρηγορότερη τροφοδότηση του επεξεργαστή με δεδομένα από την ιεραρχία μνήμης.
α) Μεθοδολογία επιτάχυνσης εκτέλεσης εφαρμογής πολλαπλασιασμού πινάκων
Παρουσιάζεται μία μεθοδολογία που βασίζεται στην τοπικότητα των δεδομένων με σκοπό την επιτάχυνση εκτέλεσης του πολλαπλασιασμού πινάκων. Μετά από διερεύνηση, παράγεται ο βέλτιστος τρόπος χρονοπρογραμματισμού των προσπελάσεων στη μνήμη λαμβάνοντας υπόψη την τοπικότητα των δεδομένων και τα μεγέθη των επιπέδων ιεραρχίας μνήμης. Ο χρόνος διερεύνησης είναι σύντομος καθώς απορρίπτονται όλες οι μη-βέλτιστες λύσεις. Η προτεινόμενη μεθοδολογία συγκρίνεται με άλλες υπάρχουσες και παρατηρείται αύξηση της απόδοσης μέχρι 55%.
β)Mεθοδολογία αποδοτικής υλοποίησης του Fast Fourier Transform (FFT)
Παρουσιάζεται μια νέα μεθοδολογία, που επιτυγχάνει βελτιωμένη απόδοση στην υλοποίηση του FFT, έχοντας ως γνώμονα την ελαχιστοποίηση των προσπελάσεων που πραγματοποιούνται στα δεδομένα. Η προτεινόμενη μεθοδολογία έχει σημαντικά πλεονεκτήματα. Πρώτον, την πλήρη αξιοποίηση της παραγωγής και της κατανάλωσης των αποτελεσμάτων των πεταλούδων του FFT αλγορίθμου, της επαναχρησιμοποίησης δεδομένων και της συμμετρίας των twiddle συντελεστών του FFT αλγορίθμου. Δεύτερον, η βέλτιστη λύση χρονοπρογραμματισμού βρίσκεται λαμβάνοντας υπόψη τόσο τον αριθμό των καταχωρητών, όσο και το μέγεθος της κρυφής μνήμης κάθε επιπέδου, αναζητώντας μόνο τον αριθμό του επιπέδου του tiling του FFT. Τρίτον, ο χρόνος μετάφρασης και το μέγεθος του πηγαίου κώδικα είναι πολύ μικροί συγκρινόμενοι με την SOA βιβλιοθήκη υλοποίησης του FFT αλγορίθμου, την FFTW. Η προτεινόμενη μεθοδολογία επιτυγχάνει αύξηση της απόδοσης μέχρι και 63% σε σχέση με την βιβλιοθήκη FFTW.
γ)Ανάπτυξη Αρχιτεκτονικών για Διαχείριση Μνήμης
Παρουσιάζεται μια αποσυζευγμένη αρχιτεκτονική επεξεργαστών με μια ιεραρχία μνήμης που αποτελείται μόνο από μνήμες scratch-pad, και μια κύρια μνήμη. Η αρχιτεκτονική αυτή εκμεταλλεύεται τα οφέλη των scratch-pad μνημών και τον παραλληλισμό μεταξύ της επεξεργασίας δεδομένων και υπολογισμού διευθύνσεων. Η αρχιτεκτονική συγκρίνεται στην απόδοση με την αρχιτεκτονική MIPS με cache και με scratch-pad ιεραρχίες μνήμης και παρουσιάζεται η υψηλότερη απόδοσή της. Τα πειραματικά αποτελέσματα δείχνουν ότι η απόδοση αυξάνεται μέχρι 3,7 φορές.
Στη συνέχεια γίνεται περαιτέρω έρευνα σε αρχιτεκτονικές με Scratch-pad μνήμες. Παρουσιάζεται μια αρχιτεκτονική που είναι σε θέση να παρέχει πληροφορίες για το ακριβές περιεχόμενο δεδομένων της scratch-pad, κατά τη διάρκεια της εκτέλεσης και μπορεί επίσης να εκτελέσει όλες τις απαραίτητες ενέργειες για την τοποθέτηση των νέων δεδομένων στη scratch-pad. Με αυτόν τον τρόπο, αξιοποιείται η επαναχρησιμοποίηση δεδομένων που εμφανίζεται τυχαία και δεν μπορεί να προσδιοριστεί από το μεταγλωττιστή. Συγκρίνεται με αρχιτεκτονική MIPS που περιέχει cache και με scratch-pad μνήμες και αναδεικνύεται η μεγαλύτερη απόδοσή της. Τα πειραματικά αποτελέσματα δείχνουν ότι η απόδοση αυξάνεται μέχρι 5 φορές έναντι των αρχιτεκτονικών με scratch-pad και 2.5 φορές έναντι των αρχιτεκτονικών με cache. / Modern applications have indence needs in processing power in order to be executed in short time. For satisfying the time limits, there have to be generated new techniques for optimizing the designs.
The object of the present thesis is about developing new compiler techniques and hardware architectures which aim to transfer data faster, from the memory hierarchy to the CPU.
a) Methdology for accelerating the execution of matrix multiplications
A new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. The scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Its exploration time is short, because it searches only for the number of the level of tiling used for finding the best tile size for each cache level. Compared with the best existing related work, which we implemented, better performance up to 55%
β)Methodology for increasing performance on Fast Fourier Transform (FFT)
A new methodology is presented based on minimizing the memory accesses for FFT. It exploits, the production and comsumption of the FFT batterfly results and the reuse of data. The optimum scheduling solution is found taking into account the number of registers and the cache memory size. The compile time and source code size are short comparing to SOA library. The methodology performance gains are up to 63% comparing to FFTW library.
γ)Ανάπτυξη Αρχιτεκτονικών για Διαχείριση Μνήμης
A decoupled processors architecture with a memory hierarchy is presented consisting only of scratch–pad memories, and a main memory. This architecture exploits both the benefits of scratch-pad memories and the parallelism between address computation and application data processing. The architecture is compared in performance with the MIPS architecture with cache and with scratch-pad memory hierarchies and with the existing decoupled architectures showing its higher normalized performance. Experimental results show that the performance is increased up to 3.7 times.
Continuing, more research is done on Scratch-pad memories. We present an architecture that is able to provide information about the exact data contents of scratch-pad during execution and can also do all the necessary operations for placing the new data blocks in scratch-pad. Thereby, the temporal locality which occurs randomly and can not be identified by the compiler is exploited. It is compared with the MIPS architecture with cache and with scratch-pad memories showing its higher normalized performance. Experimental results show that the performance is increased up to 5 times compared to cache architectures and 2,5 times compared to existing scratch-pad architectures.
|
660 |
Κυκλώματα αριθμητικής υπολοίπων με χαμηλή κατανάλωση και ανοχή σε διακυμάνσεις παραμέτρωνΚουρέτας, Ιωάννης 01 October 2012 (has links)
Το αριθμητικό σύστημα υπολοίπων (RNS) έχει προταθεί ως ένας τρόπος για επιτάχυνση των αριθμητικών πράξεων του πολλαπλασιασμού και της πρόσθεσης. Ένα από τα σημαντικά πλεονεκτήματα της χρήσης του RNS είναι ότι οδηγεί σε κυκλώματα που έχουν το χαρακτηριστικό της χαμηλής κατανάλωσης.
Πιο συγκεκριμένα στην παρούσα διατριβή γίνεται μια αναλυτική μελέτη πάνω στην ταχύτητα διεξαγωγής της πράξης του πολλαπλασιασμού και της άθροισης. Ο λόγος που γίνεται αυτό είναι διότι οι εφαρμογές επεξεργασίας σήματος χρησιμοποιούν ιδιαιτέρως τις προαναφερθείσες πράξεις. Επίσης γίνεται μελέτη της ισχύος που καταναλώνεται κατά την επεξεργασία ενός σήματος με τη χρήση των προτεινόμενων αριθμητικών κυκλωμάτων. Ιδιαίτερη έμφαση δίνεται στη χρήση απλών αρχιτεκτονικών τις οποίες μπορούν τα εργαλεία σύνθεσης να διαχειριστούν καλύτερα παράγοντας βέλτιστα κυκλώματα.
Τέλος η διατριβή μελετά τα προβλήματα διακύμανσης των παραμέτρων του υλικού που αντιμετωπίζει η σύγχρονη τεχνολογία κατασκευής ολοκληρωμένων κυκλωμάτων. Συγκεκριμένα σε τεχνολογία μικρότερη των 90nm παρατηρείται το φαινόμενο ίδια στοιχεία κυκλώματος να συμπεριφέρονται με διαφορετικό τρόπο. Το φαινόμενο αυτό γίνεται σημαντικά πιο έντονο σε τεχνολογίες κάτω των 45nm. Η παρούσα διατριβή προτείνει λύσεις βασισμένες στην παραλληλία και την ανεξαρτησία των επεξεργαστικών πυρήνων που παρέχει το RNS, για να αντιμετωπίσει το συγκεκριμένο φαινόμενο. / The Residue Number System (RNS) has been proposed as a means to speed up the implementation of multiplication-addition intensive applications, commonly found in DSP. The main benefit of RNS is the inherent parallelism, which has been exploited to build efficient multiply-add structures, and more recently, to design low-power systems.
In particular, this dissertation deals with the delay complexity of the multiply-add operation (MAC). The reason for this is that DSP applications are MAC intensive and hence this dissertation proposes solutions to increase the speed of processing. Furthermore, the
study of the multiply-add operations is extended to power consumption matters. The dissertation focus on simple architectures such that EDA tools produce efficient in both power and delay, synthesized circuits.
Finally the dissertation deals with variability matters that came up as the vlsi technology shrinks below 90nm. Variability becomes unaffordable especially for the 45nm technology node. This dissertation proposes solutions based on parallelism and the independence of the RNS cores to derive variation-tolerant architectures.
|
Page generated in 0.0799 seconds