Global ETD Search

131	LEAST-RECENTLY-USED (LRU) CIRCUIT DESIGN FOR PRIORITIZED CACHE Eaton, Ronald 01 December 2014 (has links) In modern embedded systems, real-time applications are often executed on multi-core systems that also run non real-time critical applications. It is well known that cache sharing among multi-core systems or concurrent threads running on a single CPU potentially causes real-time application execution delays. This makes the worst-case execution time (WCET) prediction of these real-time applications more difficult. An encouraging approach to address this problem is prioritized cache. Currently, the implementation of prioritized cache is done at the architectural level using cache controllers. This thesis focuses on the implementation of two prioritized LRU (least-recently-used) cache replacement policy circuits inside the cache circuit to support the prioritized cache operation. This will decrease cache latency. The circuits are implemented using the Synopsys 28nm EDK. Based on the circuit implementation, the area and power overheads associated with prioritized cache are investigated. Two prioritized LRU circuit designs are presented. cache computer architecture LRU priority
132	Compilação para arquitetura reconfigurável / Silva, Antonio Carlos Fernandes da. January 2009 (has links) Orientador: Renata Spolon Lobato / Banca: Aleardo Manacero Junior / Banca: Jorge Luiz e Silva / Resumo: A computação reconfigurável aparece como uma alternativa viável para a crescente demanda por desempenho em sistemas computacionais. Devido ao grande desenvolvimento de pesquisas nesta area, tornam-se cada vez mais necessárias ferramentas para auxílio ao desenvolvimento ou migraçõ de aplicativos para as arquiteturas que dão suporte a este novo paradigma. Dentro deste contexto, neste trabalho e apresentado o desenvolvimento de um compilador para arquitetura reconfigurável, desenvolvido com base no framework Phoenix, que tem como objetivo gerar c odigo para o Nios II. Nios II e um processador RISC virtual que pode ser executado sobre um FPGA. Os resultados obtidos durante o desenvolvimento do trabalho demonstram sua viabilidade e sua utilidade na geração de aplicativos para plataformas reconfiguráveis. / Abstract: The recon gurable computing appears as an possible alternative for the growing demand for performance in computing systems. Due to the large research's development in this area, it becomes increasingly necessary tools for development aiding or migration of applications for architectures that supports this new paradigm. In this context, this work presents the development of a compiler for recon gurable architecture. It was based on Phoenix framework, that aims to generate code for Nios II. Nios II is a virtual RISC processor that can be implemented on a FPGA. The results that was obtained while the work development evidences its practicability and utility to generate applications for recon gware. / Mestre Arquitetura de computador. Computer architecture. eng
133	A quantitative evaluation of data compression in the memory hierarchy Kjelso, Morten January 1997 (has links) This thesis explores the use of lossless data compression in the memory hierarchy of contemporary computer systems. Data compression may realise performance benefits by increasing the capacity of a level in the memory hierarchy and by improving the bandwidth between two levels in the memory hierarchy. Lossless data compression is already widely used in parts ofthe memory hierarchy. However, most of these applications are characterised by targeting inexpensive and relatively low performance devices such as magnetic disk and tape devices. The consequences of this are that the benefits of data compression are not realised to their full potential. This research aims to understand how the benefits of data compression can be realised for levels of the memory hierarchy which have a greater impact on system performance and system cost. This thesis presents a review of data compression in the memory hierarchy and argues that main memory compression has the greatest potential to improve system performance. The review also identifies three key issues relating to the use of data compression in the memory hierarchy. Quantitative investigations are presented to address these issues for main memory data compression. The first investigation is into memory data, and shows that memory data from a range of Unix applications typically compresses to half its original size. The second investigation develops three memory compression architectures, taking into account the results of the previous investigation. Furthermore, the management of compressed data is addressed and management methods are developed which achieve storage efficiencies in excess of 90% and typically complete allocation and de allocation operations with only a few memory accesses. The experimental work then culminates in a performance investigation. This shows that when memory resources are strecthed, hardware based memory compression can improve system performance by up to an order of magnitude. Furthermore, software based memory compression can improve system performance by up to a factor of 2. Finally, the performance models and quantitative results contained in this thesis enable us to identify under what conditions memory compression offers performance benefits. This may help designers incorporate memory compression into future computer systems. 621.39
134	Architectures and limits of GPU-CPU heterogeneous systems Wong, Henry Ting-Hei 11 1900 (has links) As we continue to be able to put an increasing number of transistors on a single chip, the answer to the perpetual question of what the best processor we could build with the transistors is remains uncertain. Past work has shown that heterogeneous multiprocessor systems provide benefits in performance and efficiency. This thesis explores heterogeneous systems composed of a traditional sequential processor (CPU) and highly parallel graphics processors (GPU). This thesis presents a tightly-coupled heterogeneous chip multiprocessor architecture for general-purpose non-graphics computation and a limit study exploring the potential benefits of GPU-like cores for accelerating a set of general-purpose workloads. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with GMA X4500 GPU cores. Pangaea introduces a resource partitioning of the GPU, where 3D graphics-specific hardware is removed to reduce area or add more processing cores, and a 3-instruction extension to the IA32 ISA that supports fast communication between CPU and GPU by building user-level interrupts on top of existing cache coherency mechanisms. By removing graphics-specific hardware on a 65 nm process, the area saved is equivalent to 9 GPU cores, while the power saved is equivalent to 5 cores. Our FPGA prototype shows thread spawn latency improvements from thousands of clock cycles to 26. A set of non-graphics workloads demonstrate speedups of up to 8.8x. This thesis also presents a limit study, where we measure the limit of algorithm parallelism in the context of a heterogeneous system that can be usefully extracted from a set of general-purpose applications. We measure sensitivity to the sequential performance (register read-after-write latency) of the low-cost parallel cores, and latency and bandwidth of the communication channel between the two cores. Using these measurements, we propose system characteristics that maximize area and power efficiencies. As in previous limit studies, we find a high amount of parallelism. We show, however, that the potential speedup on GPU-like systems is low (2.2x - 12.7x) due to poor sequential performance. Communication latency and bandwidth have comparatively small performance effects (<25%). Optimal area efficiency requires a lower-cost parallel processor while optimal power efficiency requires a higher-performance parallel processor than today's GPUs. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Multicore processors Computer architecture GPU
135	Multiprocessor/Multicomputer Systems and Optimal Loading Techniques Adams, Francis D. 01 July 1980 (has links) (PDF) This report reviews the subject of multiprocessor/multicomputer systems and optimal loading techniques. This report covers: 1. The interrelationship of Multiprocessor/Multicomputer (Multiple Instruction Stream Multiple Data Stream, MIMD) systems and other architectures by presenting a categorization of computer architectures. 2. Comparison of Multiprocessor/Multicomputer (MIMD), versus Parallel Processor (Single Instruction stream Multiple Data stream, SIMD) systems. 3. Multiprocessor/Multicomputer problems, pitfalls and new goals. 4. Investigation of loading techniques by reviewing particular MIMD executive designs. Computer architecture Computer networks Engineering
136	Compiler-Directed Error Resilience for Reliable Computing Liu, Qingrui 08 August 2018 (has links) Error resilience has become as important as power and performance in modern computing architecture. There are various sources of errors that can paralyze real-world computing systems. Of particular interest to this dissertation are single-event errors. They can be the results of energetic particle strike or abrupt power outage that corrupts the program states leading to system failures. Specifically, energetic particle strike is the major cause of soft error while abrupt power outage can result in memory inconsistency in the nonvolatile memory systems. Unfortunately, existing techniques to handle those single-event errors are either resource consuming (e.g., hardware approaches) or heavy-weight (e.g., software approaches). To address this problem, this dissertation identifies idempotent processing as an alternative recovery technique to handle the system failures in an efficient and low-cost manner. Then, this dissertation first proposes to design and develop a compiler-directed lightweight methodology which leverages idempotent processing and the state-of-the-art sensor-based detection to achieve soft error resilience at low-cost. This dissertation also introduces a lightweight soft error tolerant hardware design that redefines idempotent processing where the idempotent regions can be created, verified and recovered from the processor's point of view. Furthermore, this dissertation proposes a series of compiler optimizations that significantly reduce the hardware and runtime overhead of the idempotent processing. Lastly, this dissertation proposes a failure-atomic system integrated with idempotent processing to resolve another type of single-event error, i.e., failure-induced memory inconsistency in the nonvolatile memory systems. / Ph. D. / Our computing systems are vulnerable to different kinds of errors. All these errors can potentially crash real-world computing systems. This dissertation specifically addresses the challenges of single-event errors. Single-event errors can be caused by energetic particle strikes or abrupt power outage that can corrupt the program states leading to system failures. Unfortunately, existing techniques to handle those single-event errors are expensive in terms of hardware/software. To address this problem, this dissertation leverages an interesting property called idempotence in the program. A region of code is idempotent if and only if it always generates the same output whenever the program jumps back to the region entry from any execution point within the region. Thus, we can leverage the idempotent property as a low-cost recovery technique to recover the system failures by jumping back to the beginning of the region where the errors occur. This dissertation proposes solutions to incorporate the idempotent property for resilience against those single-event errors. Furthermore, this dissertation introduces a series of optimization techniques with compiler and hardware support to improve the efficiency and overheads for error resilience. We believe that our proposed techniques in this dissertation can inspire researchers for future error resilience research. Reliability Compiler Optimization Computer Architecture
137	The architecture and design of a high level language processor Brees, Roger. January 1984 (has links) Call number: LD2668 .T4 1984 B73 / Master of Science Computer architecture. Computer engineering. Masters theses
138	PERFORMANCE OF HIERARCHICALLY FLEXIBLE ADAPTIVE COMPUTER ARCHITECTURE APPLIED TO SORTING PROBLEMS Ferng, Ming-Jehn, 1958- January 1987 (has links) In this thesis existing models of adaptive computer architecture were modified to adapt actual sorting problems to "divide 'n' conquer" (DQ) coordinator type configuration in which the children processors were expanded from three to four. Two hire/fire strategies, one using packets waiting in queue and the other using the average turn around time, were applied to maintain the hierarchical tree structure. More than 1200 simulation runs were analyzed and compared, finding that the first strategy was best at fast packet arrival rate and the second strategy was best at slow packets arrival rate. Comparing the hire/fire signal generation policies, the "fc-root" was best and the "root-fp" was worst. While comparing the effect of variable weighting factors in processors, using smaller weighting factor in either "partitioner" for the first strategy or "f-computer" for the second strategy may improve the system performance. (Abstract shortened with permission of author.) Computer architecture. Computer simulation. Sorting (Electronic computers)
139	Evaluation of Instruction Prefetch Methods for Coresonic DSP Processor Lind, Tobias January 2016 (has links) With increasing demands on mobile communication transfer rates the circuits in mobile phones must be designed for higher performance while maintaining low power consumption for increased battery life. One possible way to improve an existing architecture is to implement instruction prefetching. By predicting which instructions will be executed ahead of time the instructions can be prefetched from memory to increase performance and some instructions which will be executed again shortly can be stored temporarily to avoid fetching them from the memory multiple times. By creating a trace driven simulator the existing hardware can be simulated while running a realistic scenario. Different methods of instruction prefetch can be implemented into this simulator to measure how they perform. It is shown that the execution time can be reduced by up to five percent and the amount of memory accesses can be reduced by up to 25 percent with a simple loop buffer and return stack. The execution time can be reduced even further with the more complex methods such as branch target prediction and branch condition prediction. Instruction prefetch branch prediction DSP computer architecture
140	Using machine-learning to efficiently explore the architecture/compiler co-design space Dubach, Christophe January 2009 (has links) Designing new microprocessors is a time consuming task. Architects rely on slow simulators to evaluate performance and a significant proportion of the design space has to be explored before an implementation is chosen. This process becomes more time consuming when compiler optimisations are also considered. Once the architecture is selected, a new compiler must be developed and tuned. What is needed are techniques that can speedup this whole process and develop a new optimising compiler automatically. This thesis proposes the use of machine-learning techniques to address architecture/compiler co-design. First, two performance models are developed and are used to efficiently search the design space of amicroarchitecture. These models accurately predict performance metrics such as cycles or energy, or a tradeoff of the two. The first model uses just 32 simulations to model the entire design space of new applications, an order of magnitude fewer than state-of-the-art techniques. The second model addresses offline training costs and predicts the average behaviour of a complete benchmark suite. Compared to state-of-the-art, it needs five times fewer training simulations when applied to the SPEC CPU 2000 and MiBench benchmark suites. Next, the impact of compiler optimisations on the design process is considered. This has the potential to change the shape of the design space and improve performance significantly. A new model is proposed that predicts the performance obtainable by an optimising compiler for any design point, without having to build the compiler. Compared to the state-of-the-art, this model achieves a significantly lower error rate. Finally, a new machine-learning optimising compiler is presented that predicts the best compiler optimisation setting for any new program on any new microarchitecture. It achieves an average speedup of 1.14x over the default best gcc optimisation level. This represents 61% of the maximum speedup available, using just one profile run of the application. 004

Search results