Global ETD Search

191	Layout optimization in ultra deep submicron VLSI design Wu, Di 16 August 2006 (has links) As fabrication technology keeps advancing, many deep submicron (DSM) effects have become increasingly evident and can no longer be ignored in Very Large Scale Integration (VLSI) design. In this dissertation, we study several deep submicron problems (eg. coupling capacitance, antenna effect and delay variation) and propose optimization techniques to mitigate these DSM effects in the place-and-route stage of VLSI physical design. The place-and-route stage of physical design can be further divided into several steps: (1) Placement, (2) Global routing, (3) Layer assignment, (4) Track assignment, and (5) Detailed routing. Among them, layer/track assignment assigns major trunks of wire segments to specific layers/tracks in order to guide the underlying detailed router. In this dissertation, we have proposed techniques to handle coupling capacitance at the layer/track assignment stage, antenna effect at the layer assignment, and delay variation at the ECO (Engineering Change Order) placement stage, respectively. More specifically, at layer assignment, we have proposed an improved probabilistic model to quickly estimate the amount of coupling capacitance for timing optimization. Antenna effects are also handled at layer assignment through a linear-time tree partitioning algorithm. At the track assignment stage, timing is further optimized using a graph based technique. In addition, we have proposed a novel gate splitting methodology to reduce delay variation in the ECO placement considering spatial correlations. Experimental results on benchmark circuits showed the effectiveness of our approaches. VLSI physical design routing coupling capacitance antenna effect process variation
192	Robust, Low Power, Discrete Gate Sizing Casagrande, Anthony Joseph 01 January 2015 (has links) Ultra-deep submicron circuits require accurate modeling of gate delay in order to meetaggressive timing constraints. With the lack of statistical data, variability due to the mechanical manufacturing process and its chemical properties poses a challenging problem. Discrete gate sizing requires (i) accurate models that take into account random parametric variation and (ii) a fair allocation of resources to optimize the solution. The proposed GTFUZZ gate sizing algorithm handles both tasks. Gate sizing is modeled as a resource allocation problem using fuzzy game theory. Delay is modeled as a constraint and power is optimized in this algorithm. In GTFUZZ, delay is modeled as a fuzzy goal with fuzzy parameters to capture the imprecision of gate delay early in the design phase when extensive empirical data is absent. Dynamic power is modeled as a fuzzy goal without varying coefficients. The fuzzy goals provide a flexible platform for multimetric optimization. The robust GTFUZZ algorithm is compared against fuzzy linear programming (FLP) and deterministic worst-case FLP (DWCFLP) algorithms. The benchmark circuits are first synthesized, placed, routed, and optimized for performance using the Synopsys University 32/28nm standard cell library and technology files. Operating at the optimized clock frequency, results show an average power reduction of about 20% versus DWCFLP and 9% against variation-aware gate sizing with FLP. Timing and timing yield are verified by both Synopsys PrimeTime and Monte Carlo simulations of the critical paths using HSPICE. Design Automation Gate Sizing Process Variation VLSI Computer Engineering
193	Scalable algorithms for software based self test using formal methods Prabhu, Mahesh 09 July 2014 (has links) Transistor scaling has kept up with Moore's law with a doubling of the number of transistors on a chip. More logic on a chip means more opportunities for manufacturing defects to slip in. This, in turn, has made processor testing after manufacturing a significant challenge. At-speed functional testing, being completely non-intrusive, has been seen as the ideal way of testing chips. However for processor testing, generating instruction level tests for covering all faults is a challenge given the issue of scalability. Data-path faults are relatively easier to control and observe compared to control-path faults. In this research we present a novel method to generate instruction level tests for hard to detect control-path faults in a processor. We initially map the gate level stuck-at fault to the Register Transfer Level (RTL) and build an equivalent faulty RTL model. The fault activation and propagation constraints are captured using Control and Data Flow Graphs of the RTL as a Liner Temporal Logic (LTL) property. This LTL property is then negated and given to a Bounded Model Checker based on a Bit-Vector Satisfiability Module Theories (SMT) solver. From the counter-example to the property we can extract a sequence of instructions that activates the gate level fault and propagates the fault effect to one of the observable points in the design. Other than the user supplying instruction constraints, this approach is completely automatic and does not require any manual intervention. Not all the design behaviors are required to generate a test for a fault. We use this insight to scale our previous methodology further. Underapproximations are design abstractions that only capture a subset of the original design behaviors. The use of RTL for test generation affords us two types of under-approximations: bit-width reduction and operator approximation. These are abstractions that perform reductions based on semantics of the RTL design. We also explore structural reductions of the RTL, called path based search, where we search through error propagation paths incrementally. This approach increases the size of the test generation problem step by step. In this way the SMT solver searches through the state space piecewise rather than doing the entire search at once. Experimental results show that our methods are robust and scalable for generating functional tests for hard to detect faults. / text Functional testing VLSI testing Software-based-self-test
194	Σχεδίαση αποκωδικοποιητή VLSI για κώδικες LDPC Τσατσαράγκος, Ιωάννης 12 April 2010 (has links) Η διόρθωση λαθών με κώδικες LDPC είναι μεγάλου ενδιαφέροντος σε σημαντικές νέες τηλεπικοινωνιακές εφαρμογές, όπως δορυφορικό Digital Video Broadcast (DVB) DVB-S2, IEEE 802.3an (10GBASE-T) και IEEE 802.16 (WiMAX). Οι κώδικες LDPC ανήκουν στην κατηγορία των γραμμικών μπλοκ κωδικών. Πρόκειται για κώδικες ελέγχου και διόρθωσης σφαλμάτων μετάδοσης, με κυριότερο χαρακτηριστικό τους τον χαμηλής πυκνότητας πίνακα ελέγχου ισοτιμίας (Low Density Parity Check), από τον οποίο και πήραν το όνομά τους. Η αποκωδικοποίηση γίνεται μέσω μιας επαναληπτικής διαδικασίας ανταλλαγής πληροφορίας μεταξύ δύο τύπων επεξεργαστικών μονάδων. Η υλοποίηση σε υλικό των LDPC αποκωδικοποιητών αποτελεί ένα ραγδαία εξελισσόμενο πεδίο για τη σύγχρονη επιστημονική έρευνα. Σκοπός της παρούσας διπλωματικής εργασίας υπήρξε ο σχεδιασμός, η υλοποίηση και η βελτιστοποίηση αρχιτεκτονικών αποκωδικοποιητών VLSI για κώδικες LDPC. Έχουν αναπτυχθεί διάφοροι αλγόριθμοι αποκωδικοποίησης, οι οποίοι είναι επαναληπτικοί. Μελετήθηκαν αρχιτεκτονικές βασισμένες σε δύο αλγόριθμους, τον log Sum-Product και τον Min-Sum. Ο πρώτος είναι θεωρητικά βέλτιστος, αλλά ο Min-Sum είναι αρκετά απλούστερος και έχει μεγαλύτερο πρακτικό ενδιαφέρον στα πλαίσια μιας ρεαλιστικής εφαρμογής. Συγκεκριμένα, αναπτύχθηκαν δύο αλγόριθμοι αποκωδικοποίησης, οι οποίοι χρησιμοποιούν ως δομικά στοιχεία, τους δύο προαναφερθέντες αλγορίθμους και τη φιλοσοφία του layered decoding. Η μελέτη μας επικεντρώθηκε σε κώδικες, η δομή των πινάκων ελέγχου ισοτιμίας των οποίων, προσφέρεται για υλοποίηση. Για αυτό το λόγο, χρησιμοποιήσαμε κώδικες του προτύπου WiMax 802.16e. Η συνεισφορά της παρούσας εργασίας έγκειται στο σχεδιασμό και την υλοποίηση αποδοτικών αρχιτεκτονικών σε επίπεδο επιφάνειας και ταχύτητας αποκωδικοποίησης (Mbps), καθώς και η διερεύνηση του σχετικού σχεδιαστικού χώρου, χρησιμοποιώντας ως σχεδιαστικές παραμέτρους, τον αλγόριθμο αποκωδικοποίησης, τη χρονοδρομολόγηση των πράξεων, το βαθμό παραλληλίας της αρχιτεκτονικής, το βάθος του pipelining και την αριθμητική αναπαράσταση των δεδομένων. Επιπλέον, είναι σημαντικό να αναφέρουμε πως, στα πλαίσια της σχεδίασης του LDPC αποκωδικοποιητή και με τη βοήθεια του εργαλείου Matlab, αναπτύχθηκαν παραμετρικά scripts για την παραγωγή του VHDL κώδικα. Οι δύο βασικές παράμετροι που χρησιμοποιήθηκαν ήταν το πλήθος των επεξεργαστικών μονάδων και το μήκος λέξης των δεδομένων. Τα scripts αυτά αποτέλεσαν ένα πολύ χρήσιμο εργαλείο κατά τη διαδικασία ανάπτυξης και βελτιστοποίησης της αρχιτεκτονικής, δίνοντας μας τη δυνατότητα να παράγουμε με αυτοματοποιημένο και γρήγορο τρόπο τον VHDL κώδικα, για τις επιμέρους μονάδες του αποκωδικοποιητή. Η υλοποίηση ενός μοντέλου αποκωδικοποιητή σε υλικό, μας δίνει τη δυνατότητα να διεξάγουμε ταχύτατες εξομοιώσεις, σε σχέση με αντίστοιχες υλοποιήσεις σε λογισμικό (π.χ. σε Matlab περιβάλλον). Διαθέτουμε, έτσι, ένα ισχυρό εργαλείο για τη μελέτη της επίδοσης διαφόρων ρεαλιστικών υλοποιήσεων αποκωδικοποιητών. Κατά τη διάρκεια της υλοποίησης, αξιοποιήθηκε αναπτυξιακό σύστημα βασισμένο σε virtex-4 fpga. / LDPC (low-density parity-check) codes are widely applied for error correction, in the development of highly efficient modern digital communication systems, as satellite Digital Video Broadcast (DVB) DVB-S2, IEEE 802.3an (10GBASE-T) and IEEE 802.16 (WiMax). LDPC codes are linear block codes, characterized by a sparse parity-check matrix. They are error detection and correction codes. The most typical decoding procedure is the message passing algorithm that implements the iterative exchange of node-generated messages between two types of processing units, called check and variable nodes. Hardware implementation of an LDPC decoder is a fast growing field for contemporary scientific research. This work presents the results of the design, implementation and optimization of a VLSI decoder for LDPC codes. Several iterative decoding algorithms have been developed. At this work we present architectures based on the log Sum-Product (Log-SP) and Min-Sum algorithm. Log-SP is theoretically optimal; however Min-Sum is substantially simpler and reduces the hardware complexity. Two alternative decoding algorithms have been developed, that use these two algorithms for the check-node LLR update, and the philosophy of layered decoding for the exchange of messages. Our study focused on WiMax 801.16e LDPC codes, whose form, based on permuted identity matrices, is suitable for a hardware realization. The contribution of this work lays within the design and implementation of area and decoding throughput efficient architectures, as well a detailed investigation of design space, using decoding algorithm, message exchange scheduling, pipelining and quantization schemes as design parameters. Furthermore, important to mention is, -the development of parametric Matlab scripts, in order to achieve easy and automated structural VHDL code production. The two key parameters are the number of the processing units and the data length. A hardware realization of a LDPC decoder, gives us a simulation tool that is much faster than corresponding software implementations (for example, a matlab implementation). During the implementation procedure, development board based in virtex-4 fpga has been used. Αποκωδικοποιητής Αρχιτεκτονικές 005.72 Decoder Architectures FPGA VLSI LDPC Layered decoding
195	A Constant Delay Logic Style - An Alternative Way of Logic Design Chuang, Pierce I Jen January 2010 (has links) High performance, energy efficient logic style has always been a popular research topic in the field of very large scale integrated (VLSI) circuits because of the continuous demands of ever increasing circuit operating frequency. The invention of the dynamic logic in the 80s is one of the answers to this request as it allows designers to implement high performance circuit block, i.e., arithmetic logic unit (ALU), at an operating frequency that traditional static and pass transistor CMOS logic styles are difficult to achieve. However, the performance enhancement comes with several costs, including reduced noise margin,charge-sharing noise, and higher power dissipation due to higher data activity. Furthermore, dynamic logic has gradually lost its performance advantage over static logic due to the increased self-loading ratio in deep-submicron technology (65nm and below) because of the additional NMOS CLK footer transistor. Because of dynamic logic's limitations and diminished speed reward, a slowly rising need has emerged in the past decade to explore new logic style that goes beyond dynamic logic. In this thesis a constant delay (CD) logic style is proposed. The constant delay characteristic of this logic style regardless of the logic expression makes it suitable in implementing complicated logic expression such as addition. Moreover, CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage is ready. This feature enables performance advantage over static and dynamic logic styles in a single cycle, multi-stage circuit block. Several design considerations including appropriate timing window width adjustment to reduce power consumption and maintain sufficient noise margin to ensure robust operations are discussed and analyzed. Using 65nm general purpose CMOS technology, the proposed logic demonstrates an average speed up of 94% and 56% over static and dynamic logic respectively in five different logic expressions. Post layout simulation results of 8-bit ripple carry adders conclude that CD-based design is 39% and 23% faster than the static and dynamic-based adders respectively. For ultra-high speed applications, CD-based design exhibits improved energy, power-delay product, and energy-delay product efficiency compared to static and dynamic counterparts. VLSI adder constant delay logic style Electrical and Computer Engineering
196	Parallel VLSI Architectures for Multi-Gbps MIMO Communication Systems Sun, Yang January 2011 (has links) In wireless communications, the use of multiple antennas at both the transmitter and the receiver is a key technology to enable high data rate transmission without additional bandwidth or transmit power. Multiple-input multiple-output (MIMO) schemes are widely used in many wireless standards, allowing higher throughput using spatial multiplexing techniques. MIMO soft detection poses significant challenges to the MIMO receiver design as the detection complexity increases exponentially with the number of antennas. As the next generation wireless system is pushing for multi-Gbps data rate, there is a great need for high-throughput low-complexity soft-output MIMO detector. The brute-force implementation of the optimal MIMO detection algorithm would consume enormous power and is not feasible for the current technology. We propose a reduced-complexity soft-output MIMO detector architecture based on a trellis-search method. We convert the MIMO detection problem into a shortest path problem. We introduce a path reduction and a path extension algorithm to reduce the search complexity while still maintaining sufficient soft information values for the detection. We avoid the missing counter-hypothesis problem by keeping multiple paths during the trellis search process. The proposed trellis-search algorithm is a data-parallel algorithm and is very suitable for high speed VLSI implementation. Compared with the conventional tree-search based detectors, the proposed trellis-based detector has a significant improvement in terms of detection throughput and area efficiency. The proposed MIMO detector has great potential to be applied for the next generation Gbps wireless systems by achieving very high throughput and good error performance. The soft information generated by the MIMO detector will be processed by a channel decoder, e.g. a low-density parity-check (LDPC) decoder or a Turbo decoder, to recover the original information bits. Channel decoder is another very computational-intensive block in a MIMO receiver SoC (system-on-chip). We will present high-performance LDPC decoder architectures and Turbo decoder architectures to achieve 1+ Gbps data rate. Further, a configurable decoder architecture that can be dynamically reconfigured to support both LDPC codes and Turbo codes is developed to support multiple 3G/4G wireless standards. We will present ASIC and FPGA implementation results of various MIMO detectors, LDPC decoders, and Turbo decoders. We will discuss in details the computational complexity and the throughput performance of these detectors and decoders. MIMO LDPC Coding VLSI Turbo Coding Wireless ASIC FPGA
197	A formal, hierarchical design and validation methodology for VLSI Davie, Bruce S. January 1988 (has links) The high cost of fabricating VLSI circuits requires that they be validated, that is, shown to function correctly, before manufacture. The cost of design errors can be kept to a minimum if such validation occurs as early as possible; this is achieved by integrating validation into a hierarchical design procedure. In this thesis, a hierarchical approach to design, in which validation is performed between each pair of adjacent levels in the hierarchy, is developed. In order to adopt such an approach, a language is required for the formal description of hardware behaviour and structure. Therefore an important aspect of the development of the methodology, and a major theme of the thesis, is the development of languages to support the methodology. An enhanced version of CIRCAL, which enables large and abstract devices to be described concisely and supports formal reasoning about the behaviour of constructed systems, is presented. Specifications should accurately model the behaviour of real hardware and should be useful for design and validation; they should also be easy to write. In order to realise these goals, a number of specification techniques have been developed and a new language which enforces some of these techniques, thereby easing the specification task, is proposed. Ways in which a language may assist design have been investigated. Language constructs which restrict a designer, thereby removing some design decisions, have been developed. A simple correctness-preserving transformation is presented, illustrating another way in which a designer may be assisted by a formal language. Specification techniques play an important part in the validation task, as accurate and consistent modelling is vital in establishing the correctness of implementations. Techniques have also been developed which enable detailed implementations to be usefully compared with more abstract specifications. This is demonstrated in a large example, the specification, design and formal verification of a simple microprocessor. Finally, the concept of contextual constraints, restrictions on the environment in which a device may be placed, is introduced. A method of specifying such constraints has been developed, and it is shown that their formal treatment can provide assistance in specification, design and verification. 621.3192
198	VLSI Implementation of Lattice Reduction for MIMO Wireless Communication Systems Youssef, Ameer 31 December 2010 (has links) Lattice-Reduction has become a popular way of improving the performance of MIMO detectors. However, developing an efficient high-throughput VLSI implementation of LR has been a major challenge in the literature. This thesis proposes a hardware-optimized version of the popular LLL algorithm that reduces its complexity by 70% and achieves a fixed runtime while maintaining ML diversity. The proposed algorithm is implemented for 4x4 MIMO systems and uses a novel pipelined architecture that achieves a fixed low processing latency of 40 cycles, resulting in a fixed throughput that is independent of the channel correlation. The proposed LR core, fabricated in 0.13um CMOS, is the first fabricated and tested LR ASIC implementation in the literature. Test results show that the LR core achieves a maximum clock rate of 204 MHz, yielding a throughput of 510 Mbps, thus satisfying the aggressive throughput requirements of emerging 4G wireless standards, such as IEEE-802.16m and LTE-Advanced. MIMO Systems Lattice Reduction VLSI Implementation Wireless Communication 0544
199	VLSI Implementation of Lattice Reduction for MIMO Wireless Communication Systems Youssef, Ameer 31 December 2010 (has links) Lattice-Reduction has become a popular way of improving the performance of MIMO detectors. However, developing an efficient high-throughput VLSI implementation of LR has been a major challenge in the literature. This thesis proposes a hardware-optimized version of the popular LLL algorithm that reduces its complexity by 70% and achieves a fixed runtime while maintaining ML diversity. The proposed algorithm is implemented for 4x4 MIMO systems and uses a novel pipelined architecture that achieves a fixed low processing latency of 40 cycles, resulting in a fixed throughput that is independent of the channel correlation. The proposed LR core, fabricated in 0.13um CMOS, is the first fabricated and tested LR ASIC implementation in the literature. Test results show that the LR core achieves a maximum clock rate of 204 MHz, yielding a throughput of 510 Mbps, thus satisfying the aggressive throughput requirements of emerging 4G wireless standards, such as IEEE-802.16m and LTE-Advanced. MIMO Systems Lattice Reduction VLSI Implementation Wireless Communication 0544
200	Analytical Layer Planning for Nanometer VLSI Designs Chang, Chi-Yu 2012 August 1900 (has links) In this thesis, we proposed an intermediate sub-process between placement and routing stage in physical design. The algorithm is for generating layer guidance for post-placement optimization technique especially buffer insertion. This issue becomes critical in nowadays VLSI chip design due to the factor of timing, congestion, and increasingly non-uniform parasitic among different metal layers. Besides, as a step before routing, this layer planning algorithm accounts for routability by considering minimized overlap area between different nets. Moreover, layer directive information which is a crucial concern in industrial design is also considered in the algorithm. The core problem is formulated as nonlinear programming problem which is composed of objective function and constraints. The problem is further solved by conjugate gradient method. The whole algorithm is implemented by C++ under Linux operating system and tested on ISPD2008 Global Routing Contest Benchmarks. The experiment results are shown in the end of this thesis and confirm the effectiveness of our approach especially in routability aspect. layer planning buffer insertion routability analytical VLSI physical design

Search results