Global ETD Search

161	Adaptive memory hierarchies for next generation tiled microarchitectures Herrero Abellanas, Enric 05 July 2011 (has links) Les últimes dècades el rendiment dels processadors i de les memòries ha millorat a diferent ritme, limitant el rendiment dels processadors i creant el conegut memory gap. Sol·lucionar aquesta diferència de rendiment és un camp d'investigació d'actualitat i que requereix de noves sol·lucions. Una sol·lució a aquest problema són les memòries “cache”, que permeten reduïr l'impacte d'unes latències de memòria creixents i que conformen la jerarquia de memòria. La majoria de d'organitzacions de les “caches” estan dissenyades per a uniprocessadors o multiprcessadors tradicionals. Avui en dia, però, el creixent nombre de transistors disponible per xip ha permès l'aparició de xips multiprocessador (CMPs). Aquests xips tenen diferents propietats i limitacions i per tant requereixen de jerarquies de memòria específiques per tal de gestionar eficientment els recursos disponibles. En aquesta tesi ens hem centrat en millorar el rendiment i la eficiència energètica de la jerarquia de memòria per CMPs, des de les “caches” fins als controladors de memòria. A la primera part d'aquesta tesi, s'han estudiat organitzacions tradicionals per les “caches” com les privades o compartides i s'ha pogut constatar que, tot i que funcionen bé per a algunes aplicacions, un sistema que s'ajustés dinàmicament seria més eficient. Tècniques com el Cooperative Caching (CC) combinen els avantatges de les dues tècniques però requereixen un mecanisme centralitzat de coherència que té un consum energètic molt elevat. És per això que en aquesta tesi es proposa el Distributed Cooperative Caching (DCC), un mecanisme que proporciona coherència en CMPs i aplica el concepte del cooperative caching de forma distribuïda. Mitjançant l'ús de directoris distribuïts s'obté una sol·lució més escalable i que, a més, disposa d'un mecanisme de marcatge més flexible i eficient energèticament. A la segona part, es demostra que les aplicacions fan diferents usos de la “cache” i que si es realitza una distribució de recursos eficient es poden aprofitar els que estan infrautilitzats. Es proposa l'Elastic Cooperative Caching (ElasticCC), una organització capaç de redistribuïr la memòria “cache” dinàmicament segons els requeriments de cada aplicació. Una de les contribucions més importants d'aquesta tècnica és que la reconfiguració es decideix completament a través del maquinari i que tots els mecanismes utilitzats es basen en estructures distribuïdes, permetent una millor escalabilitat. ElasticCC no només és capaç de reparticionar les “caches” segons els requeriments de cada aplicació, sinó que, a més a més, és capaç d'adaptar-se a les diferents fases d'execució de cada una d'elles. La nostra avaluació també demostra que la reconfiguració dinàmica de l'ElasticCC és tant eficient que gairebé proporciona la mateixa taxa de fallades que una configuració amb el doble de memòria.Finalment, la tesi es centra en l'estudi del comportament de les memòries DRAM i els seus controladors en els CMPs. Es demostra que, tot i que els controladors tradicionals funcionen eficientment per uniprocessadors, en CMPs els diferents patrons d'accés obliguen a repensar com estan dissenyats aquests sistemes. S'han presentat múltiples sol·lucions per CMPs però totes elles es veuen limitades per un compromís entre el rendiment global i l'equitat en l'assignació de recursos. En aquesta tesi es proposen els Thread Row Buffers (TRBs), una zona d'emmagatenament extra a les memòries DRAM que permetria guardar files de dades específiques per a cada aplicació. Aquest mecanisme permet proporcionar un accés equitatiu a la memòria sense perjudicar el seu rendiment global. En resum, en aquesta tesi es presenten noves organitzacions per la jerarquia de memòria dels CMPs centrades en la escalabilitat i adaptativitat als requeriments de les aplicacions. Els resultats presentats demostren que les tècniques proposades proporcionen un millor rendiment i eficiència energètica que les millors tècniques existents fins a l'actualitat. / Processor performance and memory performance have improved at different rates during the last decades, limiting processor performance and creating the well known "memory gap". Solving this performance difference is an important research field and new solutions must be proposed in order to have better processors in the future. Several solutions exist, such as caches, that reduce the impact of longer memory accesses and conform the system memory hierarchy. However, most of the existing memory hierarchy organizations were designed for single processors or traditional multiprocessors. Nowadays, the increasing number of available transistors has allowed the apparition of chip multiprocessors, which have different constraints and require new ad-hoc memory systems able to efficiently manage memory resources. Therefore, in this thesis we have focused on improving the performance and energy efficiency of the memory hierarchy of chip multiprocessors, ranging from caches to DRAM memories. In the first part of this thesis we have studied traditional cache organizations such as shared or private caches and we have seen that they behave well only for some applications and that an adaptive system would be desirable. State-of-the-art techniques such as Cooperative Caching (CC) take advantage of the benefits of both worlds. This technique, however, requires the usage of a centralized coherence structure and has a high energy consumption. Therefore we propose the Distributed Cooperative Caching (DCC), a mechanism to provide coherence to chip multiprocessors and apply the concept of cooperative caching in a distributed way. Through the usage of distributed directories we obtain a more scalable solution and, in addition, has a more flexible and energy-efficient tag allocation method. We also show that applications make different uses of cache and that an efficient allocation can take advantage of unused resources. We propose Elastic Cooperative Caching (ElasticCC), an adaptive cache organization able to redistribute cache resources dynamically depending on application requirements. One of the most important contributions of this technique is that adaptivity is fully managed by hardware and that all repartitioning mechanisms are based on distributed structures, allowing a better scalability. ElasticCC not only is able to repartition cache sizes to application requirements, but also is able to dynamically adapt to the different execution phases of each thread. Our experimental evaluation also has shown that the cache partitioning provided by ElasticCC is efficient and is almost able to match the off-chip miss rate of a configuration that doubles the cache space. Finally, we focus in the behavior of DRAM memories and memory controllers in chip multiprocessors. Although traditional memory schedulers work well for uniprocessors, we show that new access patterns advocate for a redesign of some parts of DRAM memories. Several organizations exist for multiprocessor DRAM schedulers, however, all of them must trade-off between memory throughput and fairness. We propose Thread Row Buffers, an extended storage area in DRAM memories able to store a data row for each thread. This mechanism enables a fair memory access scheduling without hurting memory throughput. Overall, in this thesis we present new organizations for the memory hierarchy of chip multiprocessors which focus on the scalability and of the proposed structures and adaptivity to application behavior. Results show that the presented techniques provide a better performance and energy-efficiency than existing state-of-the-art solutions. Arquitectura de computadors Hardware 004
162	The Design and Qualification of a Hydraulic Hardware-in-the-Loop Simulator Driscoll, Scott Crawford 20 May 2005 (has links) The goal of this work was to design and evaluate a hydraulic Hardware-in-the-Loop (HIL) simulation system based around electric and hydraulic motors. The idea behind HIL simulation is to install real hardware within a physically emulated environment, so that genuine performance can be assessed without the expense of final assembly testing. In this case, coupled electric and hydraulic motors were used to create the physical environment emulation by imparting flows and pressures on test hardware. Typically, servo-valves are used for this type of hydraulic emulation, and one of the main purposes of this work was to compare the effectiveness of using motors instead of the somewhat standard servo-valve. Towards this end, a case study involving a Sauer Danfoss proportional valve and emulation of a John Deere backhoe cylinder was undertaken. The design of speed and pressure controllers used in this emulation is presented, and results are compared to data from a real John Deere backhoe and proportional valve. While motors have a substantially lower bandwidth than servo-valves due to their inertia, they have the ability to control pressure at zero and near-zero flows, which is fundamentally impossible for valves. The limitations and unique capabilities of motors are discussed with respect to characteristics of real hydraulic systems. Emulation Hardware-in-the-loop Hydraulics
163	Software Design of A Cost/Performance Estimation Method for Hardware/Software Partitioning Huang, Yau-Shian 01 October 2001 (has links) In the age of deep submicron VLSI, we can design various system applications in a single chip. On this system-on-chip design, there are ASIC circuitry, processor core together with software components, and hardware modules. During system design, we need to select the forms of execution for kinds of system functions.It is called hardware/software partitioning. Different hardware/software partitioning, affect the achievable cost and performance of the accordingly elaborated system chip designs. In this research, we explore research and software design issues of an estimation method for hardware/software partitioning. It consists of these tasks: ¡Esoftware scheduling ¡Ehardware/software co-scheduling ¡Ecost and performance estimation for hardware/software partitioning For a system description, given a chosen hardware/software partitioning and a set of allocated resources, we can perform the corresponding cost and performance estimation task that can be utilized directly by system designs or can be called by a hardware/software partitioning optimization program. We designed the experimental software for this estimation method. We also carried out a set of experiments based upon real and synthesized design cases. partition hardware/software scheduling
164	The Design and Implementation of Hardware-based Packet Forwarding Mechanism on Web Cluster Lee, Chih-Feng 29 July 2002 (has links) The Internet and web service have become the most popularly platform and application of the Client-Server model due to the universality of the network recently years. Its growth is too fast to imagine the effect, many traditional service changes into web service stage by stage, and the load of the servers become more and more heavy. In the situation the server architecture must be adapted oppositely. The web cluster architecture has the best suit of the scalability, reliability and high performance requirement, was used extensively. We have designed and implemented a mechanism termed Content-aware Distributor, which is a software module for kernel-level extension, to effectively support content-based routing. This paper is based on the achievement of the software-based Content-aware Distributor; we deliver some high repetition and fixity tasks to the hardware module, instead of the software module, to expect the hardware module could share the load of the software module and speedup the packet processing. We design and implement the hardware-based packet forwarding mechanism, by the analyze result from the software module; partition three major functions into three Engines: The Analyze Engine, which is responsible to identify and analyze the header of the packet, and decide the packet needs to be send to the upper layer or forwarded; The Lookup Engine, which is responsible to lookup the address of the table which stores the data of packet modification; and the Update Engine, used to modify the packet header as soon as possible then transfer to the send queue. We use an algorithm termed Patch to fast calculate the checksums; it causes the packet length independence modification. For the implementation, we use the Verilog HDL and EDA tools of Altera Corporation to accomplish the whole design. Simulation and evaluation the performance of processing the minimum packets, by operation at 50MHz system clock; our mechanism is faster double times than the packet receiving of two Fast Ethernet ports. From the resule we know our hardware mechanism is not only sharing the load of the upper layer, but also speedup the packet forwarding. hardware web cluster forward
165	The hardware industry managing strategy in Taiwan. Lee, Ing-Jhy 07 July 2003 (has links) Abstract Hardware industry is a related to us in our daily life. It is widely including, large hardware and small hardware. Large hardware is including building, manufactory, civil engineer and son on. They are very useful widely in the world. Otherwise, small hardware is including hardware production. They are less useful in the world. The hardware by door is a small part from them. The hardware by door is a traditional industry. It is related with rising and falling by building industry. When the economic increasing, the building is increased, too. About 1981, there are about 10 hardware industries in Taiwan. Because of economic recession and arising from Mainland China, at least, there are two hardware industries in Taiwan. It is third place in the world. The production class is 3% in the input of America. During several years of economic recession, I want to know the hardware industry in Taiwan how to keep the advantage of competition in the world. I am very interested in this, so why I want to choose this topic to analyze it. The essay is discussed about how to use managing strategy in ¡uZhy--Fu Corporation Ltd.¡v. hardware industry managing strategy
166	Fault tolerance in distributed real-time computer systems Baba, Mohd Dani January 1996 (has links) A distributed real-time computer system consists of several processing nodes interconnected by communication channels. In a safety critical application, the real-time system should maintain timely and dependable services despite component failures or transient overloads due to changes in application environment. When a component fails or an overload occurs, the hard real-time tasks may miss their timing constraints, and it is desired that the system to degrade in a graceful, predictable manner. The approach adopted to the problem in this thesis is by integrating the resource scheduling with fault tolerance mechanism. This thesis provides a basis for the modelling and design of an adaptive fault tolerant distributed real-time computer system. The main issue is to determine a priori the worst case timing response of the given hard realtime tasks. In this thesis the worst case timing response of the given hard real-time task of the distributed system using the Controller Area Network (CAN) communication protocol is evaluated as to whether they can satisfy their timing deadlines. In a hard real-time system, the task scheduling is the most critical problem since the scheduling strategy ensures that tasks meet their deadlines. In this thesis several fixed priority scheduling schemes are evaluated to select the most efficient scheduler in terms of the bus utilisation and access time. Static scheduling is used as it can be considered to be most appropriate for safety critical applications since the schedulability can easily be verified. Furthermore for a typical industrial application, the hard real-time system has to be adaptable to accommodate changes in the system or application requirements. This .goal of flexibility can be achieved by integrating the static scheduler using an imprecise computation technique with the fault tolerant mechanism which uses active redundant components. 621.39 Computer hardware
167	RSA in hardware Gillmore, Brooks Colin 21 February 2011 (has links) This report presents the RSA encryption and decryption schemes and discusses several methods for expediting the computations required, specifically the modular exponentiation operation that is required for RSA. A hardware implementation of the CIOS (Coarsely Integrated Operand Scanning) algorithm for modular multiplication is attempted on a XILINX Spartan3 FPGA in the TLL-5000 development platform used at the University of Texas at Austin. The development of the hardware is discussed in detail and some Verilog source code is provided for an implementation of modular multiplication. Some source code is also provided for an RSA executable to run on the TLL-6219 ARM-based development platform, to be used to generate test vectors. / text Montgomery multiplier Cryptography Hardware
168	A High-Speed Reconfigurable System for Ultrasound Research Wall, Kieran 13 December 2010 (has links) Many opportunities exist in medical ultrasound research for experimenting with novel designs, both of transducers and of signal processing techniques. However any experiment must have a reliable platform on which to develop these techniques. In my thesis work, I have designed, built, and tested a high-speed reconfigurable ultrasound beamforming platform. The complete receive beamformer system described in this thesis consists of hardware, firmware, and software components. All of these components work together to provide a platform for beamforming that is expandable, high-speed, and robust. The complexity of the operations being performed is hidden from the user by a simple to use and accessible software interface. Existing beamformer hardware is usually designed for real-time 2D image formation often using serial processing. The platform I built uses parallel processing in order to process ultrasound images 100 times faster than conventional systems. Conventional hardware is locked to a single or small number of similar transducers, while my design can be on-the-fly reprogrammed to work with nearly any transducer type. The system is also expandable to handle any size of device, while conventional systems can only handle a fixed number of device channels. The software I have created interfaces with the hardware and firmware components to provide an easy way to make use of the system’s reconfigurability. It also delivers a platform that can be simply expanded to host post-processing or signal analysis software to further fulfill a researcher’s needs. / Thesis (Ph.D, Physics, Engineering Physics and Astronomy) -- Queen's University, 2010-12-10 11:23:01.961 Ultrasound Beamformer FPGA Hardware
169	The development and application of a method for producing software tools for computer systems design Cavouras, J. C. January 1978 (has links) No description available. 621.39 Computer hardware
170	A multiple processor system using microprocessors Parsons, N. K. January 1978 (has links) No description available. 621.39 Computer hardware

Search results