Global ETD Search

91	Generic low power reconfigurable distributed arithmetic processor Liu, Zhenyu January 2009 (has links) Higher performance, lower cost, increasingly minimizing integrated circuit components, and higher packaging density of chips are ongoing goals of the microelectronic and computer industry. As these goals are being achieved, however, power consumption and flexibility are increasingly becoming bottlenecks that need to be addressed with the new technology in Very Large-Scale Integrated (VLSI) design. For modern systems, more energy is required to support the powerful computational capability which accords with the increasing requirements, and these requirements cause the change of standards not only in audio and video broadcasting but also in communication such as wireless connection and network protocols. Powerful flexibility and low consumption are repellent, but their combination in one system is the ultimate goal of designers. A generic domain-specific low-power reconfigurable processor for the distributed arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor features high efficiency in terms of area, power and delay, which approaches the performance of an ASIC design, while retaining the flexibility of programmable platforms. The architecture not only supports typical distributed arithmetic algorithms which can be found in most still picture compression standards and video conferencing standards, but also offers implementation ability for other distributed arithmetic algorithms found in digital signal processing, telecommunication protocols and automatic control. In this processor, a simple reconfigurable low power control unit is implemented with good performance in area, power and timing. The generic characteristic of the architecture makes it applicable for any small and medium size finite state machines which can be used as control units to implement complex system behaviour and can be found in almost all engineering disciplines. Furthermore, to map target applications efficiently onto the proposed architecture, a new algorithm is introduced for searching for the best common sharing terms set and it keeps the area and power consumption of the implementation at low level. The software implementation of this algorithm is presented, which can be used not only for the proposed architecture in this dissertation but also for all the implementations with adder-based distributed arithmetic algorithms. In addition, some low power design techniques are applied in the architecture, such as unsymmetrical design style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection and unsymmetrical mapping basic computing units. All these design techniques achieve extraordinary power consumption saving. It is believed that they can be extended to more low power designs and architectures. The processor presented in this dissertation can be used to implement complex, high performance distributed arithmetic algorithms for communication and image processing applications with low cost in area and power compared with the traditional methods. 621.3815
92	Analyzing the Impact of Radiation-induced Failures in All Programmable System-on-Chip Devices / Avaliação do impacto de falhas induzidas pela radiação em dispositivos sistemas-em-chip totalmente programáveis Tambara, Lucas Antunes January 2017 (has links) O recente avanço da indústria de semicondutores tem possibilitado a integração de componentes complexos e arquiteturas de sistemas dentro de um único chip de silício. Atualmente, FPGAs do estado da arte incluem, não apenas a matriz de lógica programável, mas também outros blocos de hardware, como processadores de propósito geral, blocos de processamento dedicado, interfaces para vários periféricos, estruturas de barramento internas ao chip, e blocos analógicos. Estes novos dispositivos são comumente chamados de Sistemasem-Chip Totalmente Programáveis (APSoCs). Uma das maiores preocupações acerca dos efeitos da radiação em APSoCs é o fato de que erros induzidos pela radiação podem ter diferente probabilidade e criticalidade em seus blocos de hardware heterogêneos, em ambos os níveis de dispositivo e projeto. Por esta razão, este trabalho realiza uma investigação profunda acerca dos efeitos da radiação em APSoCs e da correlação entre a sensibilidade de recursos de hardware e software na performance geral do sistema. Diversos experimentos estáticos e dinâmicos inéditos foram realizados nos blocos de hardware de um APSoC a fim de melhor entender as relações entre confiabilidade e performance de cada parte separadamente. Os resultados mostram que há um comprometimento a ser analisado entre o desempenho e a área de choque de um projeto durante o desenvolvimento de um sistema em um APSoC. Desse modo, é fundamental levar em consideração cada opção de projeto disponível e todos os parâmetros do sistema envolvidos, como o tempo de execução e a carga de trabalho, e não apenas a sua seção de choque. Exemplificativamente, os resultados mostram que é possível aumentar o desempenho de um sistema em até 5.000 vezes com um pequeno aumento na sua seção de choque de até 8 vezes, aumentando assim a confiabilidade operacional do sistema. Este trabalho também propõe um fluxo de análise de confiabilidade baseado em injeções de falhas para estimar a tendência de confiabilidade de projetos somente de hardware, de software, ou de hardware e software. O fluxo objetiva acelerar a procura pelo esquema de projeto com a melhor relação entre performance e confiabilidade dentre as opções possíveis. A metodologia leva em consideração quatro grupos de parâmetros, os quais são: recursos e performance; erros e bits críticos; medidas de radiação, tais como seções de choque estáticas e dinâmicas; e, carga de trabalho média entre falhas. Os resultados obtidos mostram que o fluxo proposto é um método apropriado para estimar tendências de confiabilidade de projeto de sistemas em APSoCs antes de experimentos com radiação. / The recent advance of the semiconductor industry has allowed the integration of complex components and systems’ architectures into a single silicon die. Nowadays, state-ofthe-art FPGAs include not only the programmable logic fabric but also hard-core parts, such as hard-core general-purpose processors, dedicated processing blocks, interfaces to various peripherals, on-chip bus structures, and analog blocks. These new devices are commonly called of All Programmable System-on-Chip (APSoC) devices. One of the major concerns about radiation effects on APSoCs is that radiation-induced errors may have different probability and criticality in their heterogeneous hardware parts at both device and design levels. For this reason, this work performs a deep investigation about the radiation effects on APSoCs and the correlation between hardware and software resources sensitivity in the overall system performance. Several static and dynamic experiments were performed on different hardware parts of an APSoC to better understand the trade-offs between reliability and performance of each part separately. Results show that there is a trade-off between design cross section and performance to be analyzed when developing a system on an APSoC. Therefore, today it is mandatory to take into account each design option available and all the parameters of the system involved, such as the execution time and the workload of the system, and not only its cross section. As an example, results show that it is possible to increase the performance of a system up to 5,000 times by changing its architecture with a small impact in cross section (increase up to 8 times), significantly increasing the operational reliability of the system. This work also proposes a reliability analysis flow based on fault injection for estimating the reliability trend of hardware-only designs, software-only designs, and hardware and software co-designs. It aims to accelerate the search for the design scheme with the best trade-off between performance and reliability among the possible ones. The methodology takes into account four groups of parameters, which are the following: area resources and performance; the number of output errors and critical bits; radiation measurements, such as static and dynamic cross sections; and, Mean Workload Between Failures. The obtained results show that the proposed flow is a suitable method for estimating the reliability trend of system designs on APSoCs before radiation experiments. Microeletrônica Circuitos digitais Radiação Processor Radiation effects Fault injection
93	A Fortran List Processor (FLIP) Fugal, Karl A. 01 May 1970 (has links) A series of Basic Assembler Language subroutines were developed and made available to the FORTRAN IV language processor which makes list processing possible in a flexible and easily understood way. The subroutine will create and maintain list structures in the computer's core storage. The subroutines are sufficiently general to permit FORTRAN programmers to tailor list processing routines to their own individual requirements. List structure sizes are limited only by the amount of core storage available. (61 pages) processing language fortran list processor FLIP Mathematics Statistics and Probability
94	Exploration of non-volatile magnetic memory for processor architecture / Exploration d'architecture de processeur à technologie mémoire non volatile MRAM Senni, Sophiane 14 December 2015 (has links) De par la réduction continuelle des dimensions du transistor CMOS, concevoir des systèmes sur puce (SoC) à la fois très denses et énergétiquement efficients devient un réel défi. Concernant la densité, réduire la dimension du transistor CMOS est sujet à de fortes contraintes de fabrication tandis que le coût ne cesse d'augmenter. Concernant l'aspect énergétique, une augmentation importante de la puissance dissipée par unité de surface frêne l'évolution en performance. Ceci est essentiellement dû à l'augmentation du courant de fuite dans les transistors CMOS, entraînant une montée de la consommation d'énergie statique. En observant les SoCs actuels, les mémoires embarquées volatiles tels que la SRAM et la DRAM occupent de plus en plus de surface silicium. C'est la raison pour laquelle une partie significative de la puissance totale consommée provient des composants mémoires. Ces deux dernières décennies, de nouvelles mémoires non volatiles sont apparues possédant des caractéristiques pouvant aider à résoudre les problèmes des SoCs actuels. Parmi elles, la MRAM est une candidate à fort potentiel car elle permet à la fois une forte densité d'intégration et une consommation d'énergie statique quasi nulle, tout en montrant des performances comparables à la SRAM et à la DRAM. De plus, la MRAM a la capacité d'être non volatile. Ceci est particulièrement intéressant pour l'ajout de nouvelles fonctionnalités afin d'améliorer l'efficacité énergétique ainsi que la fiabilité. Ce travail de thèse a permis de mener une exploration en surface, performance et consommation énergétique de l'intégration de la MRAM au sein de la hiérarchie mémoire d'un processeur. Une première exploration fine a été réalisée au niveau mémoire cache pour des architectures multicoeurs. Une seconde étude a permis d'évaluer la possibilité d'intégrer la MRAM au niveau registre pour la conception d'un processeur non volatile. Dans le cadre d'applications des objets connectés, de nouvelles fonctionnalités ainsi que les intérêts apportés par la non volatilité ont été étudiés et évalués. / With the downscaling of the complementary metal-oxide semiconductor (CMOS) technology,designing dense and energy-efficient systems-on-chip (SoC) is becoming a realchallenge. Concerning the density, reducing the CMOS transistor size faces up to manufacturingconstraints while the cost increases exponentially. Regarding the energy, a significantincrease of the power density and dissipation obstructs further improvement inperformance. This issue is mainly due to the growth of the leakage current of the CMOStransistors, which leads to an increase of the static energy consumption. Observing currentSoCs, more and more area is occupied by embedded volatile memories, such as staticrandom access memory (SRAM) and dynamic random access memory (DRAM). As a result,a significant proportion of total power is spent into memory systems. In the past twodecades, alternative memory technologies have emerged with attractive characteristics tomitigate the aforementioned issues. Among these technologies, magnetic random accessmemory (MRAM) is a promising candidate as it combines simultaneously high densityand very low static power consumption while its performance is competitive comparedto SRAM and DRAM. Moreover, MRAM is non-volatile. This capability, if present inembedded memories, has the potential to add new features to SoCs to enhance energyefficiency and reliability. In this thesis, an area, performance and energy exploration ofembedding the MRAM technology in the memory hierarchy of a processor architectureis investigated. A first fine-grain exploration was made at cache level for multi-core architectures.A second study evaluated the possibility to design a non-volatile processorintegrating MRAM at register level. Within the context of internet of things, new featuresand the benefits brought by the non-volatility were investigated. Mram Processeur embarqué Memory hierarchy Mram Embedded processor Memory hierarchy
95	OPTO-VLSI PROCESSING FOR RECONFIGURABLE OPTICAL DEVICES POH, Chung, chungp@student.ecu.edu.au January 2006 (has links) The implementation of Wavelength Division Multiplexing system (WDM) optical fibre transmission systems has the potential to realise this high capacity data rate exceeding 10 Tb/s. The ability to reconfigure optical networks is a desirable attribute for future metro applications where light paths can be set up or taken down dynamically as required in the network. The use of microelectronics in conjunction with photonics enables intelligence to be added to the high-speed capability of photonics, thus realising reconfigurable optical devices which can revolutionise optical telecommunications and many more application areas. In this thesis, we investigate and demonstrate the capability of Opto-VLSI processors to realise a reconfigurable WDM optical device of many functions, namely, optical multiband filtering, optical notch filtering, and reconfigurable-Optical-Add-Drop Multiplexing (ROADM). We review the potential technologies available for tunable WDM components, and discuss their advantages and disadvantages. We also develop a simple yet effective algorithm that optimises the performance of Opto-VLSI processors, and demonstrate experimentally the multi-function WDM devices employing Opto-VLSI processors. Finally, the feasibility of Opto-VLSI-based WDM devices in meeting the stringent requirements of the optical communications industry is discussed. Opto-VSLI processor optimisation turnale add-drop notch equalisation
96	A Domain Specific DSP Processor / En domänspecifik DSP-processor Tell, Eric January 2001 (has links) <p>This thesis describes the design of a domain specific DSP processor. The thesis is divided into two parts. The first part gives some theoretical background, describes the different steps of the design process (both for DSP processors in general and for this project) and motivates the design decisions made for this processor. </p><p>The second part is a nearly complete design specification. </p><p>The intended use of the processor is as a platform for hardware acceleration units. Support for this has however not yet been implemented.</p> Datorteknik DSP processor design CPU design Datorteknik Computer engineering Datorteknik
97	Scheduling of Batch Processors in Semiconductor Manufacturing – A Review Mathirajan, M., Appa Iyer, Sivakumar 01 1900 (has links) In this paper a review on scheduling of batch processors (SBP) in semiconductor manufacturing (SM) is presented. It classifies SBP in SM into 12 groups. The suggested classification scheme organizes the SBP in SM literature, summarizes the current research results for different problem types. The classification results are presented based on various distributions and various methodologies applied for SBP in SM are briefly highlighted. A comprehensive list of references is presented. It is hoped that, this review will provide a source for other researchers/readers interested in SBP in SM research and help simulate further interest. / Singapore-MIT Alliance (SMA) scheduling batch processor semiconductor manufacturing system review classification
98	The Named-State Register File Nuth, Peter R. 01 August 1993 (has links) This thesis introduces the Named-State Register File, a fine-grain, fully-associative register file. The NSF allows fast context switching between concurrent threads as well as efficient sequential program performance. The NSF holds more live data than conventional register files, and requires less spill and reload traffic to switch between contexts. This thesis demonstrates an implementation of the Named-State Register File and estimates the access time and chip area required for different organizations. Architectural simulations of large sequential and parallel applications show that the NSF can reduce execution time by 9% to 17% compared to alternative register files. multithreaded context switch register fully-associative sthread parallel processor
99	Asymmetric clustering using a register cache Morrison, Roger Allen 18 April 2006 (has links) Graduation date: 2006 / Conventional register files spread porting resources uniformly across all registers. This paper proposes a method called Asymmetric Clustering using a Register Cache (ACRC). ACRC utilizes a fast register cache that concentrates valuable register file ports to the most active registers thereby reducing the total register file area and power consumption. A cluster of functional units and a highly ported register cache execute the majority of instructions, while a second cluster with a full register file having fewer read ports processes instructions with source registers not found in the register cache. An ‘in-cache’ marking system tracks the contents of the register cache and routes instructions to the correct cluster. This system utilizes logic similar to the ‘ready’ bit system found in wake-up and select logic keeping the additional logic required to a minimum. When using a 256-entry register file, this design reduces the total register file area by an estimated 65% while exhibiting similar IPC performance compared to a non-clustered 8-way processor. As the feature size becomes smaller and processor clocks become faster, the number of clock cycles needed to access the register file will increase. Therefore, the smaller register file area requirement and subsequent smaller register file delay of ACRC will lead to better IPC performance than conventional processors. register cluster cache architecture computer processor Cache memory Registers (Computers)
100	Modeling and Optimization of Delay and Power for Key Components of Modern High-performance Processors Safi, Elham 13 April 2010 (has links) In designing a new processor, computer architects consider a myriad of possible organizations and designs to decide which best meets the constraints on performance, power and cost for each particular processor. To identify practical designs, architects need to have insight into the physical-level characteristics (delay, power and area) of various components of modern processors implemented in recent fabrication technologies. During early stages of design exploration, however, developing physical-level implementations for various design options (often in the order of thousands) is impractical or undesirable due to time and/or cost constraints. In lieu of actual measurements, analytical and/or empirical models can offer reasonable estimates of these physical-level characteristics. However, existing models tend to be out-dated for three reasons: (i) They have been developed based on old circuits in old fabrication technologies; (ii) The high-level designs of the components have evolved and older designs may no longer be representative; and, (iii) The overall architecture of processors has changed significantly, and new components for which no models exist have been introduced or are being considered. This thesis studies three key components of modern high-performance processors: Counting Bloom Filters (CBFs), Checkpointed Register Alias Tables (RATs), and Compacted Matrix Schedulers (CMSs). CBFs optimize membership tests (e.g., whether a block is cached). RAT and CMS increase the opportunities for exploiting instruction-level parallelism; RAT is the core of the renaming stage, and CMS is an implementation for the instruction scheduler. Physical-level studies or models for these components have been limited or non-existent. In addition to investigating these components at the physical level, this thesis (i) proposes a novel speed- and energy-efficient CBF implementation; (ii) studies how the number of RAT checkpoints affects its latency and energy, and overall processor performance; and, (iii) studies the CMS and its accompanying logic at the physical level. This thesis also develops empirical and analytical latency and energy models that can be adapted for newer fabrication technologies. Additionally, this thesis proposes physical-level latency and energy optimizations for these components motivated by design inefficiencies exposed during the physical-level study phase. Processor design Computer architecture Physical-level implementation 0544

Search results