Global ETD Search

251	VLSI NMOS hardware design of a linear phase FIR low pass digital filter Chabbi, Charef January 1985 (has links) No description available. Integer Linear Programming FIR
252	Implementation of basic software tools to start a VLSI program at Ohio University with a high speed parallel multiplier as an example Choudhury, Niren Ch. January 1985 (has links) No description available. VLSI High Speed Parallel Multiplier UNIX Operating System
253	Parallel processing and VLSI design: A high speed efficient multiplier Dandu, Venkata Satyanarayana Raju January 1985 (has links) No description available. Very Large Scale Integrated (VLSI) High-Resolution Lithographic Computer Arithmetic
254	Global Routing in VLSI: Algorithms, Theory, and Computation Dickson, Chris 05 1900 (has links) <p> Global routing in VLSI (very large scale integration) design is one of the most challenging discrete optimization problems in computational theory and practice. In this thesis, we present a polynomial time approximation algorithm for the global routing problem based on an integer programming formulation. The algorithm features a theoretical approximation bound, while ensuring all the routing demands are concurrently satisfied.</p> <p> We provide both a serial and a parallel implementation, as well as develop several heuristics to improve the quality of the solution and reduce running time. Our computational tests on a well-known benchmark set show that, combined with certain heuristics, our new algorithms perform very well compared with other integer programming approaches.</p> / Thesis / Master of Science (MSc)
255	Diseño e implementación de arquitecturas para estructuras paralelas Pasciaroni, Alejandro 29 December 2015 (has links) Este trabajo de investigación explora el diseño e implementación de arquitecturas paralelas que permiten el procesamiento en paralelo de datos. Se consideró, como caso de estudio, el procesamiento en tiempo real del algoritmo del filtro de partículas para aquellas aplicaciones que requieren miles de ellas. En estos casos el algoritmo presenta un cuello de botella en el tiempo de ejecución debido al remuestreo, la única operación del algoritmo cuyo procesamiento no puede ser paralelizado en forma directa. El estudio tuvo como objetivos la revisión bibliográfica sobre los algoritmos de remuestreo e implentación del filtro de partículas y por último la proposición de arquitecturas digitales para un elemento de procesamiento para luego considerar arquitecturas con procesamiento distribuido. Se revisionaron las estrategias de paralelización del algoritmo de remuestreo y se llevó acabo una evaluación cualitativa y cuantitativa del comportamiento de las mismas. La estrategia seleccionada para las arquitecturas propuestas es el remuestreo distribuido que se basa en la distribución del remuestreo en grupos de partículas. De la evaluación se concluye que si se aumenta la cantidad de partículas por grupo se reduce el error en la estimación pero no sucede lo mismo si se aumenta la cantidad de grupos de igual cantidad de partículas. Se propusieron tres arquitecturas digitales basadas en el remuestreo distribuido. Las dos primeras arquitecturas se basan en el modelo computacional Dataflow y la tercera arquitectura es un arreglo de procesadores de propósito general que integran una arquitectura Single Instruction Multiple Data (SIMD). El primer diseño prioriza la tasa de procesamiento mientras que los otros dos el área de silicio requerida. Para reducir el área del elemento de procesamiento se recurrió a la multiplexación en tiempo de ciertos recursos computacionales. Se realizó un análisis comparativo en términos de tiempo de ejecución y área de silicio de las arquitecturas propuestas. Se observa que el multiplexado en tiempo de recursos resulta exitosa en la reducción del área total. Por otra parte a igual número de grupos de procesamiento instanciados resultará conveniente el Diseño 1 si se prioriza la tasa de procesamiento y el Diseño 2 si la prioridad es minimizar el área de silicio. El Diseño 3 no presenta ventaja respecto al Diseño 1 a pesar de disponer de un diseño regular y un elemento de procesamiento más versátil. / This research work explores the design and implentation of digital architectures that allows parallel data processing. The particle filtering in real time is considered as case study specially for those applications that requires thousands of particles. In those cases the algorithm presents a bottle neck in the execution time of the filter due to the resampling operation which can not be parallelized in a straight way. The study had as objectives the bibliographic revision of resampling algorithms and particle filter implementation and the proposition of digital architectures for processing elements that integrate a distributed processing architecture. The bibliographic revision of strategies to parallelize resampling algorithms was carried out. Further a quantitative and qualitative evaluation of the strategies was made. The distributed resampling strategy was choosen for the architecture implementations. This strategy is based on the distribution of the resampling operation into groups of particles. From the evalution it is concluded that: the estimation error of the filter is improved by increasing the number of particles per group. However, increasing the number of groups with equal quantity of particles does not reduce the error estimation. Three digital architectures were proposed based on distributed resampling. The two first architectures are based on the dataflow computational model and the third one is an array of general purpose processors that conforms a Single Instruction Multiple Data architecture (SIMD). First design is focused on maximizing the data processing rate meanwhile the two other designs are focused on reducing the required silicon area. In order to reduce the silicon area a time multiplexing of hardware resources was implemented. A comparison in terms of execution time and silicon area was carried out for the three proposed architectures. From this analysis is possible to observe taht the time multiplexing of hardware resources was successful in reducing the silicon area. Comparing Design 1 and Design 2 it is concluded that: for an equal number of processing groups instantiated Design 1 results more appropiate when data processing rate is important meanwhile Design 2 is the best option when the design goal is to reduce the silicon area. Finally Design 3 does not presents any advantage compared to Design 1 despite its more versatile processing element and its regular design. Ingeniería Circuitos integrados VLSI Arquitecturas digitales Filtro de partículas
256	High Speed Circuit Design Based on a Hybrid of Conventional and Wave Pipelining Sulistyo, Jos Budi 03 October 2005 (has links) The increasing capabilities of multimedia appliances demand arithmetic circuits with higher speed and reasonable power dissipation. A common technique to attain those goals is synchronous pipelining, which increases the throughput of a circuit at the expense of longer latency, and it is therefore suitable where throughput takes priority over latency. Two synchronous pipelining approaches, conventional pipelining and wave pipelining, are commonly employed. Conventional pipelining uses registers to divide the circuit into shorter paths and synchronize among sub-blocks, while wave pipelining uses the delay of combinational elements to perform those tasks. As wave pipelining does not introduce additional registers, in principle, it can attain a higher throughput and lower power consumption. However, its throughput is limited by delay variations, while delay balancing often leads to increased power dissipation. This dissertation proposes a hybrid pipelining method called HyPipe, which divides the circuit into sub-blocks using conventional pipelining, and applies wave pipelining to each sub-block. Each sub-block is derived from a single base circuit, leading to a better delay balance and greater throughput than with heterogeneous circuits. Another requirement for wave pipelining to achieve high speed is short signal rise and fall times. Since CMOS wide-NAND and wide-NOR gates exhibit long rise and fall times and large delay variations, they should be decomposed. We show that the straightforward decomposition using alternating levels of NAND and NOR gates results in large delay variations. Therefore, we propose a new decomposition method using only one gate type. Our method reduces delay variations by up to 39%, and it is appropriate for wave pipelining based on standard-cells or sea-of-gates. We laid out a 4x4 HyPipe multiplier as a proof of concept and performed a post-layout SPICE simulation. The multiplier achieves a throughput of 4.17 billion multiplications per second or a clock period of 2.52 four-load inverter delays, which is almost twice the speed of any existing multiplier in the open literature. When the supply voltage is reduced to 1.2 V from 1.8 V, its power consumption is reduced from 76.2 mW to 18.2 mW while performing 2.33 billion multiplications per second. / Ph. D. high-speed CMOS VLSI pipelining wave pipelining arithmetic circuits
257	Gigahertz-Range Multiplier Architectures Using MOS Current Mode Logic (MCML) Srinivasan, Venkataramanujam 18 December 2003 (has links) The tremendous advancement in VLSI technologies in the past decade has fueled the need for intricate tradeoffs among speed, power dissipation and area. With gigahertz range microprocessors becoming commonplace, it is a typical design requirement to push the speed to its extreme while minimizing power dissipation and die area. Multipliers are critical components of many computational intensive circuits such as real time signal processing and arithmetic systems. The increasing demand in speed for floating-point co-processors, graphic processing units, CDMA systems and DSP chips has shaped the need for high-speed multipliers. The focus of our research for modern digital systems is two fold. The first one is to analyze a relatively unexplored logic style called MOS Current Mode Logic (MCML), which is a promising logic technique for the design of high performance arithmetic circuits with minimal power dissipation. The second one is to design high-speed arithmetic circuits, in particular, gigahertz-range multipliers that exploit the many attractive features of the MCML logic style. A small library of MCML gates that form the core components of the multiplier were designed and optimized for high-speed operation. The three 8-bit MCML multiplier architectures designed and simulated in TSMC 0.18 mm CMOS technology are: 3-2-tree architecture with ripple carry adder (Architecture I), 4-2-tree design with ripple carry adder (Architecture II) and 4-2-tree architecture with carry look-ahead adders (Architecture III). Architecture I operates with a maximum throughput of 4.76 GHz (4.76 Billion multiplications per second) and a latency of 3.78 ns. Architecture II has a maximum throughput of 3.3 GHz and a latency of 3 ns and Architecture III has a maximum throughput of 2 GHz and a latency of 3 ns. Architecture I achieves the highest throughput among the three multipliers, but it incurs the largest area and latency, in terms of clock cycle count as well as absolute delay. Although it is difficult to compare the speed of our multipliers with existing ones, due to the use of different technologies and different optimization goals, we believe our multipliers are among the fastest found in contemporary literature. / Master of Science VLSI design MCML High-speed circuit design High-speed multipliers
258	Fast Approximation Framework for Timing and Power Analysis of Ultra-Low-Voltage Circuits Rafeei, Lalleh 07 May 2012 (has links) Ultra-Low-Voltage operation, which can be considered an extreme case of voltage scaling, can greatly reduce the power consumption of circuits. Despite the fact that Ultra-Low-Voltage operation has been proven to be very effective by several successful prototypes in recent years, there is no fast, effective, and comprehensive technique for designers to estimate power and delay of a design operating in the Ultra-Low-Voltage region. While some frameworks and mathematical models exist to estimate power or delay, certain limitations exist, such as being applicable to either power or delay, or within a certain region of transistor operation. This thesis presents a simulation framework that can quickly and accurately characterize a circuit from nominal voltage all the way down into the subthreshold region. The framework uses the nominal frequency and power of a target circuit, which can be obtained using gate-level or transistor-level simulation tools as well as normalized ring oscillator curves to predict delay and power characteristics at lower operating voltages. A specific contribution of this thesis is to introduce a weighted average method, which is a major improvement to a previously published form of this framework. Another contribution is that the amount of process variation in ULV regions of a circuit can be estimated using the proposed framework. The weighted averages framework takes into account the types of gates that are used in the circuit and critical path to give a more accurate power and timing characterization. Despite being many orders of magnitude lower than the nominal voltage, the errors are no greater than 11.27 percent for circuit delay, 16.96 percent for active energy, and 4.86 percent for leakage power for the weighted averages technique. This is in contrast to the original framework which has a maximum error of 39.75, 17.60, and 8.90 percent for circuit delay, active energy, and leakage power, respectively. To validate our framework, a detailed analysis is given in the presence of a variety of design parameters such as fanout, transistor widths, et cetera. In addition, we also validate our framework for a range of sequential benchmark circuits. / Master of Science subthreshold approximation framework power timing CMOS VLSI Ultra-Low-Voltage
259	On the Characterization of Library Cells Sulistyo, Jos Budi 01 September 2000 (has links) In this work, a simplified method for performing characterization of a standard cell is presented. The method presented here is based on Synopsys models of cell delay and power dissipation, in particular the linear delay model. This model is chosen as it allows rapid characterization with a modest number of simulations, while still achieving acceptable accuracy. Additionally, a guideline for developing standard cell libraries for use with Synopsys synthesis and simulation tools and Cadence Placement-and-Routing tools is presented. A cell layout library, built in accordance with the presented guidelines, was laid out, and a test chip, namely a dual 4-bit counter, was built using the library to demonstrate the suitability of the method. / Master of Science VLSI Cadence Characterization Synopsys Timing Model Power Estimation Standard Cell
260	Development of a Low-Power SRAM Compiler Jagasivamani, Meenatchi 11 September 2000 (has links) Considerable attention has been paid to the design of low-power, high-performance SRAMs (Static Random Access Memories) since they are a critical component in both hand-held devices and high-performance processors. A key in improving the performance of the system is to use an optimum sized SRAM. In this thesis, an SRAM compiler has been developed for the automatic layout of memory elements in the ASIC environment. The compiler generates an SRAM layout based on a given SRAM size, input by the user, with the option of choosing between fast vs. low-power SRAM. Array partitioning is used to partition the SRAM into blocks in order to reduce the total power consumption. Experimental results show that the low-power SRAM is capable of functioning at a minimum operating voltage of 2.1 V and dissipates 17.4 mW of average power at 20 MHz. In this report, we discuss the implementation of the SRAM compiler from the basic component to the top-level SKILL code functions, as well as simulation results and discussion. / Master of Science SRAM compiler static generator VLSI Memory low-power RAM

Search results